Search Results

Found 678 repositories(showing 30)

aisearch-openai-rag-audio

Azure-Samples

🧡64

A simple example implementation of the VoiceRAG pattern to power interactive voice generative AI experiences using RAG with Azure AI Search and Azure OpenAI's gpt-4o-realtime-preview model.

552

352

MIT

Python

Updated 6 hours ago

ai-azd-templatesazd-templatesazure+9

whatsapp-chatgpt-bot

wassengerhq

🧡56

Ready-to-use AI Multimodal ChatGPT-based WhatsApp chatbot assistant for your business. Now supports GPT-4o with text + audio + image input, audio responses, and improved RAG + MCP 🤩

149

MIT

JavaScript

Updated 1 week ago

ai-botchatbotchatgpt+9

SmartRAG is a privacy-first multimodal RAG system that lets you chat intelligently with your documents, images, and audio. Upload PDFs, Word files, or recordings and get accurate, context-aware answers all processed locally on your device with no external APIs.

111

MIT

Python

Updated 16 hours ago

Awesome-Multimodal-RAG

JarvisUSTC

🧡60

A curated list of the latest advancements, papers, tools, and datasets for **Multimodal Retrieval-Augmented Generation (RAG)**. Multimodal RAG integrates information retrieval and generation across multiple data modalities (e.g., text, image, video, audio).

MIT

Updated 1 week ago

deepsearch

deepsearch-ai

❤️35

A multimodal RAG application that enables semantic search on multimedia sources like audio, video and images

Apache-2.0

Python

Updated 1 month ago

nbmultirag

nannib

❤️35

Un framework in Italiano ed Inglese, che permette di chattare con i propri documenti in RAG, anche multimediali (audio, video, immagini e OCR). It is an Italian and English framework, which allows you to chat with your documents in RAG, including multimedia (audio, video, images and OCR).

GPL-3.0

Python

Updated 1 month ago

aichatbotcustomize+10

Multimodal-RAG-Implementation

CornelliusYW

🧡55

This repository contains a Multimodal Retrieval-Augmented Generation (RAG) Pipeline that integrates images, audio, and text for advanced multimodal querying and response generation..

Jupyter Notebook

Updated 1 week ago

chat_with_audios

karthikponna

❤️45

This project combines the power of Retrieval-Augmented Generation (RAG) with AssemblyAI's transcription capabilities, enabling you to interact with audio recordings as if they were conversational text. By leveraging Qwen3-32b for natural language understanding, this solution efficiently retrieves and answers queries based on your audio content.

Python

Updated 1 month ago

SmartRAG

byerlikaya

🧡65

Multi-Modal RAG for .NET — query databases, documents, images and audio in natural language. Production-ready with multi-AI support, vector storage, and multi-database coordination.

MIT

Updated 1 day ago

csharpdocument-processingdotnet+7

sample-nova-sonic-speech2speech-webrtc

aws-samples

🧡55

Sample voice agent application based on Amazon Nova 2 Sonic and Amazon Kinesis Video Streams WebRTC service. It demonstrates the real-time audio streaming interaction between user and speech-to-speech model via WebRTC connection. It also supports tool use like RAG with Bedrock Knowledge Base, MCP servers, and Strands agent.

MIT-0

Python

Updated 4 days ago

kinesis-video-streamsnova-sonicwebrtc

openai-rag-audio

navintkr

❤️35

No description available

MIT

TypeScript

Updated 2 months ago

RageAudioTool

CamxxCore

❤️35

Reverse engineering audio metadata file formats of the RAGE engine

MIT

Updated 1 year ago

SpeakingAI

awtestergit

❤️35

SpeakingAI is a demo of privately deployable 'GPT-4o like AI + RAG', a fully functional web AI server with audio query/answer in streaming, using LLM and RAG for backend knowledge.

Apache-2.0

Python

Updated 1 year ago

audio-to-audioggufllm+1

TreeHacks-ZoneOut

NxtGenLegend

🧡60

#3 Winner of Best Use of Zoom API at Stanford TreeHacks 2025! An AI-powered meeting assistant that captures video, audio and textual context from Zoom calls using multimodal RAG.

MIT

JavaScript

Updated 2 weeks ago

ai-assistantartificial-intelligenceaudio-processing+17

Resonate

SartajBhuvaji

❤️40

Data Science Capstone Project based on RAG LLMs. The project aims to improve meetings by providing an interface to recollect information from audio/video meetings.

MIT

Python

Updated 7 months ago

huggingfacelangchainllms+2

4sales-ai-RAG-Backend

David-patrick-chuks

❤️35

A production-ready AI Agent API with advanced RAG (Retrieval-Augmented Generation) capabilities, built with Node.js, TypeScript, MongoDB Atlas, Redis, and Google Gemini integration. This API provides intelligent question-answering based on trained knowledge from various sources including documents, websites, YouTube videos, audio, and video files.

TypeScript

Updated 5 months ago

RAGE-Audio-Toolkit

AndrewMulti

❤️30

GTA IV and EFLC audio and metadata editor

MIT

Pascal

Updated 11 months ago

RAG-over-Audio-Data

wittyicon29

❤️40

Python-based system designed to transcribe audio files, split the transcripts into manageable chunks, create text embeddings using HuggingFace models, and employ advanced question-answering models for retrieval-based QA.

MIT

Python

Updated 7 months ago

assemblyaiembeddingslangchain+2

rag-services

xphoenix-ai

❤️35

Audio enabled end-to-end RAG pipeline

Apache-2.0

Python

Updated 1 month ago

agentsai-agentschatbot+17

langchain-multimodal-rag-system

fufankeji

❤️35

A traceable multimodal RAG QA system built on LangChain 1.0, supporting OCR, VLM, and real-time audio transcription.

TypeScript

Updated 2 months ago

MMeRAG

zakahan

🧡55

MMeRAG is an open-source RAG (Retrieval-Augmented Generation), Provides a parser for audio and video data to implement RAG for audio and video. MMeRAG是一个开源的RAG项目，提供了一种用于音频和视频数据的解析器，用来实现音视频的RAG。

MIT

Python

Updated 3 weeks ago

whatsapp-chatgpt-bot-php

wassengerhq

🧡50

A featured, easy-to-use, multimodal WhatsApp AI GPT-4o Chatbot in PHP for your Business. Supports GPT-4o with text + audio + image input, audio responses, and improved RAG + MCP Tools 🤩

MIT

PHP

Updated 2 months ago

chatbotchatgptgpt4+7

Build-Notebook-LM-Clone

mcrao

🧡55

A full-stack AI-powered application that replicates Google's NotebookLM functionality. Upload PDF documents, chat with them using advanced RAG (Retrieval-Augmented Generation), and automatically generate engaging podcast-style audio conversations with AI hosts discussing the document content.

Python

Updated 2 weeks ago

SRAG

CyrilDesch

❤️35

Highly flexible RAG system with advanced document parsing and audio processing.

GPL-3.0

Scala

Updated 1 month ago

aiai-searchdeep-learning+7

whatsapp-chatgpt-bot-csharp

wassengerhq

❤️40

Ready-to-use, customizable WhatsApp AI GPT4o Chatbot in C# for your Business. Supports GPT-4o with text + audio + image input, audio responses, and improved RAG + MCP Tools 🤩

MIT

Updated 5 months ago

agent-aichatbotchatgpt+11

TaleemAI

nameershah

🧡60

Built for HEC Generative AI Hackathon. Multi-agent AI tutor with smart LLM routing, PDF-grounded RAG, and audio lecture generation for accessible education.

Python

Updated 16 hours ago

agriculture-chatbot

DreamRealized

🧡50

The Agriculture Assistant is an LLM-powered system aiding farmers with agricultural advice. It has Frontend (React), Backend (Python), and RAG subsystems, supporting text/image/audio input, Mandarin audio output, and knowledge retrieval via vector databases.

TypeScript

Updated 1 week ago

agentllmmcp+1

doc-ingestor

saleemh

❤️35

MCP server for intelligent document ingestion using Docling. Convert PDFs, DOCX, images, audio & more to clean Markdown for AI/RAG pipelines. Mac M2 optimized with MLX acceleration, VLM processing & queue management.

Python

Updated 3 months ago

AgnosticRag-Q

M-kadi

🧡65

AgnosticRag-Q is a cloud-agnostic, multi-modal Retrieval-Augmented Generation (RAG) platform designed with a clean, extensible architecture. It provides a powerful Core API and GUI for building, testing, and deploying RAG pipelines across multiple LLM providers (Ollama, OpenAI, Gemini, Transformers) and data sources (TXT, CSV, Images, Audio).

Updated 5 days ago

colpalifastapigemini+13

VoiceRAG

ramosv

🧡50

VoiceRAG is an open-source RAG pipeline for customer service calls. It uses a call center voice recordings dataset from Kaggle to transcribe audio, embed conversations, and retrieve context for LLM-based reasoning.

MIT

Updated 1 month ago

GitHub Explorer

Search Results

aisearch-openai-rag-audio

whatsapp-chatgpt-bot

SmartRAG

Awesome-Multimodal-RAG

deepsearch

nbmultirag

Multimodal-RAG-Implementation

chat_with_audios

SmartRAG

sample-nova-sonic-speech2speech-webrtc

openai-rag-audio

RageAudioTool

SpeakingAI

TreeHacks-ZoneOut

Resonate

4sales-ai-RAG-Backend

RAGE-Audio-Toolkit

RAG-over-Audio-Data

rag-services

langchain-multimodal-rag-system

MMeRAG

whatsapp-chatgpt-bot-php

Build-Notebook-LM-Clone

SRAG

whatsapp-chatgpt-bot-csharp

TaleemAI

agriculture-chatbot

doc-ingestor

AgnosticRag-Q

VoiceRAG

aisearch-openai-rag-audio

whatsapp-chatgpt-bot

SmartRAG

Awesome-Multimodal-RAG

deepsearch

nbmultirag

Multimodal-RAG-Implementation

chat_with_audios

SmartRAG

sample-nova-sonic-speech2speech-webrtc

openai-rag-audio

RageAudioTool

SpeakingAI

TreeHacks-ZoneOut

Resonate

4sales-ai-RAG-Backend

RAGE-Audio-Toolkit

RAG-over-Audio-Data

rag-services

langchain-multimodal-rag-system

MMeRAG

whatsapp-chatgpt-bot-php

Build-Notebook-LM-Clone

SRAG

whatsapp-chatgpt-bot-csharp

TaleemAI

agriculture-chatbot

doc-ingestor

AgnosticRag-Q

VoiceRAG