Found 678 repositories(showing 30)
Azure-Samples
A simple example implementation of the VoiceRAG pattern to power interactive voice generative AI experiences using RAG with Azure AI Search and Azure OpenAI's gpt-4o-realtime-preview model.
wassengerhq
Ready-to-use AI Multimodal ChatGPT-based WhatsApp chatbot assistant for your business. Now supports GPT-4o with text + audio + image input, audio responses, and improved RAG + MCP 🤩
itanishqshelar
SmartRAG is a privacy-first multimodal RAG system that lets you chat intelligently with your documents, images, and audio. Upload PDFs, Word files, or recordings and get accurate, context-aware answers all processed locally on your device with no external APIs.
JarvisUSTC
A curated list of the latest advancements, papers, tools, and datasets for **Multimodal Retrieval-Augmented Generation (RAG)**. Multimodal RAG integrates information retrieval and generation across multiple data modalities (e.g., text, image, video, audio).
deepsearch-ai
A multimodal RAG application that enables semantic search on multimedia sources like audio, video and images
nannib
Un framework in Italiano ed Inglese, che permette di chattare con i propri documenti in RAG, anche multimediali (audio, video, immagini e OCR). It is an Italian and English framework, which allows you to chat with your documents in RAG, including multimedia (audio, video, images and OCR).
CornelliusYW
This repository contains a Multimodal Retrieval-Augmented Generation (RAG) Pipeline that integrates images, audio, and text for advanced multimodal querying and response generation..
karthikponna
This project combines the power of Retrieval-Augmented Generation (RAG) with AssemblyAI's transcription capabilities, enabling you to interact with audio recordings as if they were conversational text. By leveraging Qwen3-32b for natural language understanding, this solution efficiently retrieves and answers queries based on your audio content.
byerlikaya
Multi-Modal RAG for .NET — query databases, documents, images and audio in natural language. Production-ready with multi-AI support, vector storage, and multi-database coordination.
aws-samples
Sample voice agent application based on Amazon Nova 2 Sonic and Amazon Kinesis Video Streams WebRTC service. It demonstrates the real-time audio streaming interaction between user and speech-to-speech model via WebRTC connection. It also supports tool use like RAG with Bedrock Knowledge Base, MCP servers, and Strands agent.
navintkr
No description available
CamxxCore
Reverse engineering audio metadata file formats of the RAGE engine
awtestergit
SpeakingAI is a demo of privately deployable 'GPT-4o like AI + RAG', a fully functional web AI server with audio query/answer in streaming, using LLM and RAG for backend knowledge.
NxtGenLegend
#3 Winner of Best Use of Zoom API at Stanford TreeHacks 2025! An AI-powered meeting assistant that captures video, audio and textual context from Zoom calls using multimodal RAG.
SartajBhuvaji
Data Science Capstone Project based on RAG LLMs. The project aims to improve meetings by providing an interface to recollect information from audio/video meetings.
David-patrick-chuks
A production-ready AI Agent API with advanced RAG (Retrieval-Augmented Generation) capabilities, built with Node.js, TypeScript, MongoDB Atlas, Redis, and Google Gemini integration. This API provides intelligent question-answering based on trained knowledge from various sources including documents, websites, YouTube videos, audio, and video files.
AndrewMulti
GTA IV and EFLC audio and metadata editor
wittyicon29
Python-based system designed to transcribe audio files, split the transcripts into manageable chunks, create text embeddings using HuggingFace models, and employ advanced question-answering models for retrieval-based QA.
xphoenix-ai
Audio enabled end-to-end RAG pipeline
fufankeji
A traceable multimodal RAG QA system built on LangChain 1.0, supporting OCR, VLM, and real-time audio transcription.
zakahan
MMeRAG is an open-source RAG (Retrieval-Augmented Generation), Provides a parser for audio and video data to implement RAG for audio and video. MMeRAG是一个开源的RAG项目,提供了一种用于音频和视频数据的解析器,用来实现音视频的RAG。
wassengerhq
A featured, easy-to-use, multimodal WhatsApp AI GPT-4o Chatbot in PHP for your Business. Supports GPT-4o with text + audio + image input, audio responses, and improved RAG + MCP Tools 🤩
A full-stack AI-powered application that replicates Google's NotebookLM functionality. Upload PDF documents, chat with them using advanced RAG (Retrieval-Augmented Generation), and automatically generate engaging podcast-style audio conversations with AI hosts discussing the document content.
CyrilDesch
Highly flexible RAG system with advanced document parsing and audio processing.
wassengerhq
Ready-to-use, customizable WhatsApp AI GPT4o Chatbot in C# for your Business. Supports GPT-4o with text + audio + image input, audio responses, and improved RAG + MCP Tools 🤩
nameershah
Built for HEC Generative AI Hackathon. Multi-agent AI tutor with smart LLM routing, PDF-grounded RAG, and audio lecture generation for accessible education.
DreamRealized
The Agriculture Assistant is an LLM-powered system aiding farmers with agricultural advice. It has Frontend (React), Backend (Python), and RAG subsystems, supporting text/image/audio input, Mandarin audio output, and knowledge retrieval via vector databases.
saleemh
MCP server for intelligent document ingestion using Docling. Convert PDFs, DOCX, images, audio & more to clean Markdown for AI/RAG pipelines. Mac M2 optimized with MLX acceleration, VLM processing & queue management.
M-kadi
AgnosticRag-Q is a cloud-agnostic, multi-modal Retrieval-Augmented Generation (RAG) platform designed with a clean, extensible architecture. It provides a powerful Core API and GUI for building, testing, and deploying RAG pipelines across multiple LLM providers (Ollama, OpenAI, Gemini, Transformers) and data sources (TXT, CSV, Images, Audio).
ramosv
VoiceRAG is an open-source RAG pipeline for customer service calls. It uses a call center voice recordings dataset from Kaggle to transcribe audio, embed conversations, and retrieve context for LLM-based reasoning.