Found 42 repositories(showing 30)
lim142857
Demo code for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"
ajayarunachalam
Designing, Implementing & Deploying Transformer Deep Learning Network Architecture for computer vision tasks
mobilint
No description available
wangyubo79
Vision Transformer is a new model to achieve SOTA in vision classification with using transformer style encoders. The demo is a sample implementation of Vision Transformer trained from scratch with TensorFlow on Amazon SageMaker.
tayo4christ
Real-time gesture recognition system using Vision Transformers, ONNX, and Gradio. Includes dataset preparation, training, evaluation, and a browser-based demo app.
AlexThunder01
Deep Learning pipeline for thyroid nodule diagnosis in ultrasound. Benchmarks CNNs vs. Vision Transformers (YOLOv12, DINOv3) on a 7k+ dataset. Features a two-stage approach (Detection + Classification), achieving SOTA results with Foundation Models. Includes explainability maps and a GUI demo.
chinmay-pardeshi
Implementation and experimentation with Vision Transformer (ViT) architecture for image classification tasks using PyTorch or TensorFlow.
WaterHorseOnStreet
a small demo for using cuda and Vision Transformer to prediction trajectory
asu-bridge93
K-Pop Idol Classification: Computer vision project using fine-tuned Vision Transformers (ViT) to identify K-pop idols from TOMORROW X TOGETHER. Features YOLOv8 face detection, grayscale preprocessing technique improving accuracy from 60% to 85%, and interactive Gradio demo interface.
frederikcodes
unsupervised anomaly detection on the mvtec ad dataset using vision transformer embeddings (dino/mae). includes feature extraction, knn/mahalanobis scoring, heatmaps, evaluation metrics (auroc/pro), and a streamlit demo for interactive visualization.
Yashkatiyar24
Upload an image and ask natural-language questions about it — the model answers based on visual understanding. This demo uses ViLT (Vision-and-Language Transformer), a pretrained multimodal model fine-tuned for Visual Question Answering (VQA). The model jointly reasons over image and text inputs to generate accurate answers.
ZhuoxuanCao
A modular, easy-to-use framework for fine-tuning BLIP-1 on custom image captioning tasks using LoRA and Hugging Face Transformers. Includes data preprocessing, training scripts, and inference demos — with custom patching on the vision backbone. Ideal for researchers, engineers, and AI enthusiasts building lightweight captioning systems.
ph-phuc
Basic Vision Transformer implementation from Scratch and Fine-tuning pretrained ViT using Tensorflow 2.0 Keras.
DSML-march2025-luis
A project showcasing Vision Transformers (ViTs) with demos for image classification and object detection to illustrate how ViTs process visual data differently from CNNs.
m-parvaneh
A demo of using vision transformers with PyTorch and the Kinetics400 Dataset
wenyi999
Vision Transformer Attention Visualization Demo
ISHASHENDRE189
No description available
kjanik70
Vision Transformer + TensorRT demos and helpers
DelbyIntelligence
No description available
DelbyIntelligence
No description available
No description available
listar2000
Demo codes for a basic vision transformer for MNIST dataset
leecool9669
Vision Transformer (ViT) based NSFW image classification WebUI - Gradio demo for content moderation
R-Tatara
A simple zero-shot object detection demo using Google's OWL-ViT (Open-World Localization Vision Transformer).
ahirtonlopes
Repository with demos on fine-tuning Vision Transformers and using Gemini via Colab (Vertex AI), designed for non-scientists to explore Computer Vision techniques.
sudheesh4
A minimal PyTorch demo of Energy-Based model on MNIST/Fashion-MNIST using a frozen Vision Transformer backbone (DINOv2).
adrienmanciet-sys
Notebook de démo permettant d'implémenter un petit Vision Transformer, visualiser les cartes d'attention et tester d'autres architectures.
dhaaivat
An end-to-end implementation of a Tiny Vision Transformer (TinyViT) trained from scratch on the CIFAR-10 dataset. This repository is meant to demystify Vision Transformers by breaking down their components clearly and providing a fully functional training + demo pipeline.
minhkhoango
A demo on achieving a 4x model size reduction for Vision Transformers on the edge by analyzing hardware-aware performance trade-offs.
cchandel-dev
this repo contains a demo of a custom made object detector that uses vision transformers that was shown at Western Universities Minds & Machines Lecture Series