Found 3,247 repositories(showing 30)
ashnkumar
Keras model to generate HTML code from hand-drawn website mockups. Implements an image captioning architecture to drawn source images.
peteanderson80
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
rmokady
Simple image captioning model
lucidrains
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
fpgaminer
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
LeDat98
Hybrid RAG system combining vector search, knowledge graph (LightRAG), and cross-encoder reranking — with Docling document parsing, visual intelligence (image/table captioning), agentic streaming chat, and inline citations. Powered by Gemini or local Ollama models.
1038lab
Joy Caption is a ComfyUI node using the LLaVA model to generate stylized image captions, supporting batch processing and GGUF models.
peteanderson80
Automatic image captioning model based on Caffe, using features from bottom-up attention.
201528014227051
Datasets for remote sensing images (Paper:Exploring Models and Data for Remote Sensing Image Caption Generation)
FennelFetish
An image viewer and AI-assisted editing/captioning/masking tool that helps with curating datasets for generative AI models, finetunes and LoRA.
1038lab
A custom ComfyUI node for MiniCPM vision-language models, supporting v4, v4.5, and v4 GGUF formats, enabling high-quality image captioning and visual analysis.
neural-nuts
[DEPRECATED] A Neural Network based generative model for captioning images using Tensorflow
snrazavi
Deep Learning workshop including image classification, face recognition, Object detection, language modelling, image captioning and neural machine translation.
paraschopra
Four-in-one deep network: image search, image captioning, similar words and similar images using a single model
jmisilo
CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.
IDEA-Research
Official DINO-X Model Context Protocol (MCP) server that empowers LLMs with real-world visual perception through image object detection, localization, and captioning APIs.
njchoma
Image Captioning based on Bottom-Up and Top-Down Attention model
HughKu
Image captioning ready-to-go inference: show and tell model compatible with Tensorflow r1.9
minwoosun
[CVPR 2025] BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
AkagawaTsurunaki
ZerolanCore integrates many open-source, locally deployable AI models, and aims to integrate a series of AI models such as large language model (LLM), automatic speech recognition (ASR), text-to-speech (TTS), image captioning, optical character recognition (OCR), video captioning, etc.
MiteshPuthran
The LSTM model generates captions for the input images after extracting features from pre-trained VGG-16 model. (Computer Vision, NLP, Deep Learning, Python)
IBM Code Model Asset Exchange: Show and Tell Image Caption Generator
ntrang086
generate captions for images using a CNN-RNN model that is trained on the Microsoft Common Objects in COntext (MS COCO) dataset
Chen-Yang-Liu
[IEEE GRSL 2024 🔥] RSCaMa: Remote Sensing Image Change Captioning with State Space Model
google-research-datasets
VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automatic pipeline starting from the Conceptual Captions Image-Captioning Dataset.
bearcatt
A length-controllable and non-autoregressive image captioning model.
232525
Implementation of 'End-to-End Transformer Based Model for Image Captioning' [AAAI 2022]
zarzouram
Pytorch implementation of image captioning using transformer-based model.
Sajid030
Deep learning-based image captioning with Flickr8k dataset. Code includes data prep, model training, and a Streamlit app.
RotsteinNoam
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions