Found 8,730 repositories(showing 30)
huggingface
๐ค Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
huggingface
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
lucidrains
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
microsoft
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
jacobgil
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
google-research
No description available
facebookresearch
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
cmhungsteve
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
dk-liang
Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)
google-research
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
jeonsworld
Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
NVlabs
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
apple
This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization" ICCV 2023
ViTAE-Transformer
The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
microsoft
This is a collection of our NAS and Vision Transformer work.
dandelin
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
czczup
[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
Yangzhangcst
A paper list of some recent Transformer-based CV works.
IDEA-Research
Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)
DirtyHarryLYL
Recent Transformer-based CV and related works.
yitu-opensource
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
facebookresearch
A method to increase the speed and lower the memory footprint of existing vision transformers.
jacobgil
Explainability for Vision Transformers
facebookresearch
Hiera: A fast, powerful, and simple hierarchical vision transformer.
WangLibo1995
UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery, ISPRS. Also, including other vision transformers and CNNs for satellite, aerial image and UAV image segmentation.
LeapLabTHU
Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention
NVlabs
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention
lukemelas
Vision Transformer (ViT) in PyTorch
xxxnell
(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"
kentaroy47
Let's train vision transformers (ViT) for cifar 10 / cifar 100!