Search Results

Found 7,156 repositories(showing 30)

UI-TARS-desktop

bytedance

💚95

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

29.3k

2.9k

Apache-2.0

TypeScript

Updated 8 minutes ago

agentagent-tarsbrowser-use+11

Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

24.8k

2.7k

Apache-2.0

MDX

Updated 6 minutes ago

agentagentsai+17

serve

jina-ai

💚95

☁️ Build multimodal AI applications with cloud-native stack

21.9k

2.2k

Apache-2.0

Python

Updated 12 hours ago

cloud-nativecncfdeep-learning+17

NeMo

NVIDIA-NeMo

💚100

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

17.1k

3.4k

Apache-2.0

Python

Updated 1 hour ago

asrdeeplearninggenerative-ai+7

Duix-Avatar

duixcom

💚98

🚀 Truly open-source AI avatar(digital human) toolkit for offline video generation and digital human cloning.

12.7k

2.1k

NOASSERTION

Updated 5 hours ago

ai-avatarai-avatarscloning+5

pipecat

pipecat-ai

💚91

Open Source framework for voice and multimodal conversational AI

11.1k

1.9k

BSD-2-Clause

Python

Updated 12 minutes ago

aichatbot-frameworkchatbots+3

lancedb

💛83

Developer-friendly OSS embedded retrieval library for multimodal AI. Search More; Manage Less.

9.9k

827

Apache-2.0

HTML

Updated 47 minutes ago

approximate-nearest-neighbor-searchimage-searchnearest-neighbor-search+5

gorse

gorse-io

💛83

AI powered open source recommender system engine supports classical/LLM rankers and multimodal content via embedding

9.6k

887

Apache-2.0

Updated 4 hours ago

collaborative-filteringgoknn+2

deeplake

activeloopai

💛86

Deeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and training.

9.1k

708

Apache-2.0

C++

Updated 5 hours ago

agentagentic-ragai+16

mmagic

open-mmlab

💛88

OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic 🪄: Generative-AI (AIGC), easy-to-use APIs, awsome model zoo, diffusion models, for text-to-image generation, image/video restoration/enhancement, etc.

7.4k

1.1k

Apache-2.0

Jupyter Notebook

Updated 15 hours ago

aigccomputer-visiondeep-learning+16

lance

lance-format

💛73

Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

6.3k

623

Apache-2.0

Rust

Updated 48 minutes ago

apache-arrowcomputer-visiondata-analysis+13

mmf

facebookresearch

💛80

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

5.6k

947

NOASSERTION

Python

Updated 2 days ago

captioningdeep-learningdialog+7

Daft

Eventual-Inc

💛75

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

5.4k

439

Apache-2.0

Rust

Updated 58 minutes ago

ai-engineeringai-pipelinearrow+16

VILA

NVlabs

💛72

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

3.8k

319

Apache-2.0

Python

Updated 1 day ago

InternGPT

OpenGVLab

💛71

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

3.2k

235

Apache-2.0

Python

Updated 15 hours ago

chatgptclickdraggan+17

Skywork-R1V

SkyworkAI

💛71

Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.

3.2k

278

MIT

Python

Updated 6 hours ago

deepseek-r1grpollm+8

Generative-Media-Skills

SamurAIGPT

💛76

Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image, video, and audio generation powered by muapi.ai.

3.0k

330

MIT

Shell

Updated 12 hours ago

agent-toolsai-agentsai-art+17

awesome-embodied-vla-va-vln

jonyzhang2023

🧡69

A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.

2.9k

127

Updated 2 hours ago

clawpanel

qingchencloud

💛75

🦞 OpenClaw 可视化管理面板 — 内置 AI 助手（工具调用 + 图片识别 + 多模态），一键安装 | Visual management panel with built-in AI assistant (tool calling + vision + multimodal + i18n(11))

2.3k

286

NOASSERTION

JavaScript

Updated 2 hours ago

admin-panelai-agentai-assistant+17

Magma

microsoft

💛73

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

1.9k

157

MIT

Python

Updated 9 hours ago

pixeltable

💛74

Data Infrastructure providing a declarative, incremental approach for multimodal AI workloads.

1.6k

206

Apache-2.0

Python

Updated 1 hour ago

aiartificial-intelligencechatbot+12

OpenAdapt

OpenAdaptAI

💛74

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

1.5k

229

MIT

Python

Updated 56 minutes ago

agentsai-agentsai-agents-framework+17

parlor

fikrikarim

💛72

On-device, real-time multimodal AI. Have natural voice and vision conversations with an AI that runs entirely on your machine. Powered by Gemma 4 E2B and Kokoro.

1.2k

123

Apache-2.0

HTML

Updated 1 hour ago

apple-silicongemmakokoro+10

UForm

unum-cloud

🧡67

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

1.2k

Apache-2.0

Python

Updated 16 hours ago

bertclipclustering+17

xiaoyaosearch

dtsola

🧡67

小遥搜索，听懂你的话、看懂你的图，用AI找到本地任何文件。让搜索像聊天一样简单。XiaoyaoSearch: Understands your words, reads your images, finds any local file with AI. Making search as easy as chatting.

1.1k

100

NOASSERTION

Python

Updated 10 minutes ago

agent-skillsai-searchdocument-search+7

Pixelle-MCP

AIDC-AI

💛72

An Open-Source Multimodal AIGC Solution based on ComfyUI + MCP + LLM https://pixelle.ai

942

125

MIT

Python

Updated 12 hours ago

vectordb-recipes

lancedb

💛73

Resource, examples & tutorials for multimodal AI, RAG and agents using vector search and LLMs

940

166

Apache-2.0

Jupyter Notebook

Updated 5 days ago

agentsaideep-learning+14

agents-js

livekit

🧡63

Build realtime multimodal AI agents with Node.js

795

265

Apache-2.0

TypeScript

Updated 59 minutes ago

vllm-mlx

waybarrios

🧡58

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.

781

177

Python

Updated 6 hours ago

anthropicapple-siliconaudio-processing+17

Generative-AI

fnzhan

🧡66

[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era

755

TeX

Updated 6 days ago

aigcdiffusion-modelgans+2

GitHub Explorer

Search Results

UI-TARS-desktop

haystack

serve

NeMo

Duix-Avatar

pipecat

lancedb

gorse

deeplake

mmagic

lance

mmf

Daft

VILA

InternGPT

Skywork-R1V

Generative-Media-Skills

awesome-embodied-vla-va-vln

clawpanel

Magma

pixeltable

OpenAdapt

parlor

UForm

xiaoyaosearch

Pixelle-MCP

vectordb-recipes

agents-js

vllm-mlx

Generative-AI

UI-TARS-desktop

haystack

serve

NeMo

Duix-Avatar

pipecat

lancedb

gorse

deeplake

mmagic

lance

mmf

Daft

VILA

InternGPT

Skywork-R1V

Generative-Media-Skills

awesome-embodied-vla-va-vln

clawpanel

Magma

pixeltable

OpenAdapt

parlor

UForm

xiaoyaosearch

Pixelle-MCP

vectordb-recipes

agents-js

vllm-mlx

Generative-AI