Found 13,289 repositories(showing 30)
vladmandic
SD.Next: All-in-one WebUI for AI generative image and video creation, captioning and processing
karpathy
Efficient Image Captioning code in Torch, runs on GPU
ashnkumar
Keras model to generate HTML code from hand-drawn website mockups. Implements an image captioning architecture to drawn source images.
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
stephengpope
The NCA Toolkit API eliminates monthly subscription fees by consolidating common API functionalities into a single FREE API. Designed for businesses, creators, and developers, it streamlines advanced media processing, including video editing and captioning, image transformations, cloud storage, and Python code execution.
ttengwang
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
yawiii
提示词小助手可以一键调用智谱、硅基流动、gemini、本地ollama、百度等大语言模型服务,实现提示词翻译、润色扩写、图片反推。支持提示词预设实现一键插入、历史提示词查找等功能。是一个全能型提示词插件。The Prompt Assistant enables one-click access to LLMs/VLMs for prompt translation, expansion, and image captioning. It also supports one-click preset insertion and historical prompt search.
jcjohnson
Dense image captioning in Torch
ruotianluo
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)
NVlabs
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
peteanderson80
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
rmokady
Simple image captioning model
fpgaminer
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
zhjohnchan
A curated list of image captioning and related area resources. :-)
ruotianluo
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
YehLi
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
DeepRNN
Tensorflow implementation of "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
google-research-datasets
Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.
aimagelab
Meshed-Memory Transformer for Image Captioning. CVPR 2020
alpv95
Dank Learning codebase, generate a meme from any image using AI. Uses a modified version of the Show and Tell image captioning network
LuoweiZhou
Vision-Language Pre-training for Image Captioning and Question Answering
forence
This repository focus on Image Captioning & Video Captioning & Seq-to-Seq Learning & NLP
husthuaan
Code for paper "Attention on Attention for Image Captioning". ICCV 2019
jiasenlu
Implementation of "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"
yashk2810
Image Captioning using InceptionV3 and beam search
krasserm
Transformer-based image captioning extension for pytorch/fairseq
JDAI-CV
Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
saahiluppal
Image Captioning Using Transformer
LeDat98
Hybrid RAG system combining vector search, knowledge graph (LightRAG), and cross-encoder reranking — with Docling document parsing, visual intelligence (image/table captioning), agentic streaming chat, and inline citations. Powered by Gemini or local Ollama models.
ltguo19
Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019