Found 40 repositories(showing 30)
jmisilo
CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.
Lahdhirim
Image caption generation using a hybrid CLIP-GPT2 architecture. CLIP encodes the image while GPT-2 decodes into natural language captions. Modular and configurable pipelines for training, inference, and evaluation on datasets like COCO.
An image captioning system that combines CLIP with GPT-2.
MjdMahasneh
No description available
heisenberg1804
End-to-end image captioning system using CLIP ViT-B/32 for visual encoding and GPT-2 with LoRA fine-tuning for caption generation. Trained on COCO Captions (Karpathy split) with a learned mapping network bridging vision and language embedding spaces.
Noob-Coder2
This project is an AI-powered image captioning bot that generates descriptive captions for images using the Conceptual Captions(Shortened Version) and CLIP-GPT architecture.
kliu128
Multimodal image + text captioning for 416k figures from arXiv. Uses CLIP + SciBERT + GPT-2 in an encoder-decoder architecture. CS224N final project.
saksham-ops
Image Captioning with CLIP and GPT-2 — Multimodal deep learning model integrating CLIP vision encoder and GPT-2 text decoder for image-to-text generation. Trained on 30K+ Flickr images, achieving BLEU-4 of 5.28% and CIDEr of 37.76%, outperforming CNN+LSTM baseline by 12%.
Developed an end-to-end AI system that generates relevant image captions based on user-provided keywords. Integrated CLIP for image-text similarity and GPT-2 for creative text generation to produce and rank captions.
lucasmbll
Pretrained a GPT‑2 (124M) on FineWeb‑Edu (~10B tokens) and fine‑tuned it for COCO image captioning using a frozen CLIP ViT‑B/32 encoder. Explores gated middle cross‑attention, BLIP‑2 Q‑Former prefixes, and lightweight linear prefixes.
AgriCLIP is a CLIP-based vision–language model for agriculture and livestock. Trained on ALive (600k image–text pairs) with GPT-4 captions and fine-grained DINO features, it achieves 48% zero-shot accuracy, outperforming CLIP in crop, livestock, and fish classification tasks.
udaykumar1307
Image Captioning System (CLIP + GPT-2)
Vignesh010101
Image Captioning Using CLIP & GPT Models
shrnik
No description available
koushik-mahamkali
ImageCaptioning requiring low specs and increasing optimization
jatinpsingh
Image captioning pipeline using CLIP vision encoder and GPT-2 decoder
manugaurdl
Pytorch implementation of Clip-Cap paper.
lachlanchen
Video & image captioning with OpenAI CLIP embeddings + GPT decoder
baichuanzhou
No description available
Chantoone
No description available
yunusskeete
Automated Scalable 3D Captioning with Pretrained Models (Based on Cap3D)
No description available
sauravsoni6377
An image captioning system that combines CLIP for image feature extraction and GPT-2 for generating descriptive captions.
Syntax1on
AI tool: Upload video → Auto highlights via Whisper + GPT → Clip preview w/ captions 🎬
rafat-74
Image Caption Generator using CLIP + GPT-2 Developed a model that generates image captions using CLIP and GPT-2 Implemented using VS Code, tested on trained datasets, with automatic English-to-Arabic translation
ho-edwardd
Developed a Transformer Mapper architecture to link a pretrained CLIP image model and pretrained GPT-2 language model for robust image captioning.
Anandupy
AI-based Instagram Caption Generator using DeepFace, CLIP, and GPT-2 with Emotion Detection and NLP.
theophile-lt
From-scratch GPT-2 trained on Fineweb_edu (10B tokens), extended to image captioning with frozen CLIP features and lightweight multimodal bridges.
usha1310
Built an AI-powered image captioning tool by integrating CLIP and GPT-2 using prefix mapping, with real-time demos via Gradio and Streamlit.
jaypatelp001
AI Caption & Hashtag Generator is a **Streamlit web app** that generates creative captions and trending hashtags for your images using **CLIP** and **GPT-based models