Back to search
A survey of Vision-Language Pre-training (VLP) focused on the BLIP lineage. Covers key architectural shifts including the Q-Former, Perceiver Resampler, and the integration of Diffusion Transformers with Autoregressive models.
Stars
0
Forks
0
Watchers
0
Open Issues
0
Overall repository health assessment
No language data available
No package.json found
This might not be a Node.js project
9
commits