Found 18,907 repositories(showing 30)
salesforce
LAVIS - A One-stop Library for Language-Vision Intelligence
vladmandic
SD.Next: All-in-one WebUI for AI generative image and video creation, captioning and processing
salesforce
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
karpathy
Efficient Image Captioning code in Torch, runs on GPU
ashnkumar
Keras model to generate HTML code from hand-drawn website mockups. Implements an image captioning architecture to drawn source images.
OpenGVLab
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
OFA-Sys
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
stephengpope
The NCA Toolkit API eliminates monthly subscription fees by consolidating common API functionalities into a single FREE API. Designed for businesses, creators, and developers, it streamlines advanced media processing, including video editing and captioning, image transformations, cloud storage, and Python code execution.
ttengwang
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
yawiii
提示词小助手可以一键调用智谱、硅基流动、gemini、本地ollama、百度等大语言模型服务,实现提示词翻译、润色扩写、图片反推。支持提示词预设实现一键插入、历史提示词查找等功能。是一个全能型提示词插件。The Prompt Assistant enables one-click access to LLMs/VLMs for prompt translation, expansion, and image captioning. It also supports one-click preset insertion and historical prompt search.
jcjohnson
Dense image captioning in Torch
ruotianluo
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)
NVlabs
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
peteanderson80
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
rmokady
Simple image captioning model
imaginary-cloud
Simple Swift class to provide all the configurations you need to create custom camera view in your app
brh55
:raised_hands: A pure JS react-native component to render a masonry~ish layout for images with support for dynamic columns, progressive image loading, device rotation, on-press handlers, and headers/captions.
NVlabs
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
jhc13
Tag manager and captioner for image datasets
lucidrains
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
W2GenAI-Lab
LucidFlux: Caption-Free Photo-Realistic Image Restoration via a Large-Scale Diffusion Transformer, ICLR 2026
fpgaminer
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
zhjohnchan
A curated list of image captioning and related area resources. :-)
microsoft
Oscar and VinVL
ruotianluo
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
YehLi
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
yunjey
TensorFlow Implementation of "Show, Attend and Tell"
cuixing158
:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works
DeepRNN
Tensorflow implementation of "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"