Search Results

Found 18,907 repositories(showing 30)

LAVIS

salesforce

💛87

LAVIS - A One-stop Library for Language-Vision Intelligence

11.2k

1.1k

BSD-3-Clause

Jupyter Notebook

Updated 14 hours ago

deep-learningdeep-learning-libraryimage-captioning+8

sdnext

vladmandic

💛83

SD.Next: All-in-one WebUI for AI generative image and video creation, captioning and processing

7.0k

557

Apache-2.0

Python

Updated 7 hours ago

ai-artcaptiondiffusers+7

BLIP

salesforce

💛78

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

5.7k

763

BSD-3-Clause

Jupyter Notebook

Updated 14 hours ago

image-captioningimage-text-retrievalvision-and-language-pre-training+4

neuraltalk2

karpathy

💛83

Efficient Image Captioning code in Torch, runs on GPU

5.6k

1.3k

Jupyter Notebook

Updated 4 days ago

sketch-code

ashnkumar

💛77

Keras model to generate HTML code from hand-drawn website mockups. Implements an image captioning architecture to drawn source images.

5.2k

681

Python

Updated 2 days ago

augmentationdeep-learningimage-processing+2

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

3.2k

235

Apache-2.0

Python

Updated 22 hours ago

chatgptclickdraggan+17

a-PyTorch-Tutorial-to-Image-Captioning

sgrvinod

💛80

Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning

2.9k

726

MIT

Python

Updated 6 days ago

attention-mechanismcomputer-visionencoder-decoder+5

OFA

OFA-Sys

💛75

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

2.6k

250

Apache-2.0

Python

Updated 35 minutes ago

chineseimage-captioningmultimodal+8

no-code-architects-toolkit

stephengpope

💛77

The NCA Toolkit API eliminates monthly subscription fees by consolidating common API functionalities into a single FREE API. Designed for businesses, creators, and developers, it streamlines advanced media processing, including video editing and captioning, image transformations, cloud storage, and Python code execution.

2.3k

988

GPL-2.0

Python

Updated 5 hours ago

Caption-Anything

ttengwang

💛73

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything

1.8k

104

BSD-3-Clause

Python

Updated 5 days ago

chatgptcontrollable-generationcontrollable-image-captioning+2

ComfyUI-Prompt-Assistant

yawiii

💛72

提示词小助手可以一键调用智谱、硅基流动、gemini、本地ollama、百度等大语言模型服务，实现提示词翻译、润色扩写、图片反推。支持提示词预设实现一键插入、历史提示词查找等功能。是一个全能型提示词插件。The Prompt Assistant enables one-click access to LLMs/VLMs for prompt translation, expansion, and image captioning. It also supports one-click preset insertion and historical prompt search.

1.8k

GPL-3.0

JavaScript

Updated 2 hours ago

comfyuiexpandprompt+2

densecap

jcjohnson

🧡66

Dense image captioning in Torch

1.6k

427

MIT

Jupyter Notebook

Updated 1 week ago

ImageCaptioning.pytorch

ruotianluo

🧡61

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

1.5k

423

MIT

Python

Updated 1 week ago

describe-anything

NVlabs

💛72

[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning

1.5k

Apache-2.0

Python

Updated 1 day ago

describe-anythingdetailed-localized-captioninglarge-multimodal-models+1

bottom-up-attention

peteanderson80

🧡50

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

1.5k

376

MIT

Jupyter Notebook

Updated 1 month ago

caffecaptioning-imagesfaster-rcnn+5

CLIP_prefix_caption

rmokady

💛74

Simple image captioning model

1.4k

222

MIT

Jupyter Notebook

Updated 2 days ago

CameraManager

imaginary-cloud

🧡65

Simple Swift class to provide all the configurations you need to create custom camera view in your app

1.4k

326

MIT

Swift

Updated 1 week ago

cameracarthagecocoapods+7

react-native-masonry

brh55

💛73

:raised_hands: A pure JS react-native component to render a masonry~ish layout for images with support for dynamic columns, progressive image loading, device rotation, on-press handlers, and headers/captions.

1.4k

157

MIT

JavaScript

Updated 3 days ago

masonrymasonry-gridmasonry-layout+4

prismer

NVlabs

🧡67

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

1.3k

NOASSERTION

Python

Updated 5 days ago

image-captioninglanguage-modelmulti-modal-learning+4

taggui

jhc13

🧡62

Tag manager and captioner for image datasets

1.3k

GPL-3.0

Python

Updated 16 hours ago

cogvlmflorence-2image-captioning+5

CoCa-pytorch

lucidrains

💛72

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

1.2k

MIT

Python

Updated 5 days ago

artificial-intelligenceattention-mechanismcontrastive-learning+4

LucidFlux

W2GenAI-Lab

🧡67

LucidFlux: Caption-Free Photo-Realistic Image Restoration via a Large-Scale Diffusion Transformer, ICLR 2026

1.2k

102

NOASSERTION

Python

Updated 2 days ago

joycaption

fpgaminer

💛72

JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.

1.1k

Apache-2.0

Jupyter Notebook

Updated 16 hours ago

captioningjoycaptionvlm

awesome-image-captioning

zhjohnchan

🧡68

A curated list of image captioning and related area resources. :-)

1.1k

182

Updated 5 hours ago

Oscar

microsoft

🧡64

Oscar and VinVL

1.1k

250

MIT

Python

Updated 4 days ago

image-captioningimage-text-searchoscar+4

self-critical.pytorch

ruotianluo

🧡69

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

1.0k

276

MIT

Python

Updated 55 minutes ago

image-captioning

xmodaler

YehLi

💛72

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

969

105

NOASSERTION

Python

Updated 5 hours ago

cross-modal-retrievalimage-captioningpretraining+4

show-attend-and-tell

yunjey

❤️49

TensorFlow Implementation of "Show, Attend and Tell"

906

323

MIT

Jupyter Notebook

Updated 2 months ago

attention-mechanismimage-captioningmscoco-image-dataset+2

Awesome-CV-MasterHub

cuixing158

🧡61

:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works

906

Updated 45 minutes ago

awesomeimage-captioningimage-classification+17

image_captioning

DeepRNN

❤️49

Tensorflow implementation of "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

796

352

MIT

Python

Updated 1 month ago

GitHub Explorer

Search Results

LAVIS

sdnext

BLIP

neuraltalk2

sketch-code

InternGPT

a-PyTorch-Tutorial-to-Image-Captioning

OFA

no-code-architects-toolkit

Caption-Anything

ComfyUI-Prompt-Assistant

densecap

ImageCaptioning.pytorch

describe-anything

bottom-up-attention

CLIP_prefix_caption

CameraManager

react-native-masonry

prismer

taggui

CoCa-pytorch

LucidFlux

joycaption

awesome-image-captioning

Oscar

self-critical.pytorch

xmodaler

show-attend-and-tell

Awesome-CV-MasterHub

image_captioning

LAVIS

sdnext

BLIP

neuraltalk2

sketch-code

InternGPT

a-PyTorch-Tutorial-to-Image-Captioning

OFA

no-code-architects-toolkit

Caption-Anything

ComfyUI-Prompt-Assistant

densecap

ImageCaptioning.pytorch

describe-anything

bottom-up-attention

CLIP_prefix_caption

CameraManager

react-native-masonry

prismer

taggui

CoCa-pytorch

LucidFlux

joycaption

awesome-image-captioning

Oscar

self-critical.pytorch

xmodaler

show-attend-and-tell

Awesome-CV-MasterHub

image_captioning