Found 6 repositories(showing 6)
OpenHelix-Team
Official implementation of ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver.
zinengtang
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
vendetta127
AerithVLM is a hybrid remote sensing VLM combining DINOv3 and CLIP through a learned alignment head. Using a dual-encoder architecture, Vision Perceiver, and LLaMA backbone, it supports robust visual grounding, captioning, and open-ended geospatial reasoning.
shade-archive
A ROS2 Wrapper for DeepMind's Vision Perceiver IO Model
habib-analyst
Hybrid Global Context Vision Transformer (GCViT) + Perceiver IO framework for medical image classification — Accepted in The Journal of Supercomputing.
violayhho
A survey of Vision-Language Pre-training (VLP) focused on the BLIP lineage. Covers key architectural shifts including the Q-Former, Perceiver Resampler, and the integration of Diffusion Transformers with Autoregressive models.
All 6 repositories loaded