Found 26 repositories(showing 26)
aws-samples
Sample voice agent application based on Amazon Nova 2 Sonic and Amazon Kinesis Video Streams WebRTC service. It demonstrates the real-time audio streaming interaction between user and speech-to-speech model via WebRTC connection. It also supports tool use like RAG with Bedrock Knowledge Base, MCP servers, and Strands agent.
LohithR22
ReclaimingVoice is an AI-powered speech therapy app that uses multi-agent LLMs and Retrieval-Augmented Generation (RAG) to deliver personalized, medically accurate therapy plans and real-time feedback, making expert speech care accessible anywhere.
krutika13
A Retrieval-Augmented Generation (RAG)-based AI agent for call centers with speech-to-speech interaction using Whisper, FAISS, HuggingFace LLMs, and Coqui TTS. The system enables users to speak naturally and receive context-aware voice responses grounded in custom knowledge bases.
mappfinity
A RAG-powered voice TTS agent that retrieves relevant context and generates natural speech responses. Built to demonstrate intelligent information retrieval, dynamic reasoning, and quality voice synthesis in real time.
rmohanlal3
PoSTA is a positive self talk AI Assistant. It uses Nvidia's open source technology NeMo Agentic Toolkit for Generative AI. RAG is being utilized to retrieve information and for Nvidia Riva's text to speech (TTS) service is selected for modeling. Our aim is to develop a highly personalized experience with own voice and own avatar for self-growth.
manasa-26
Multimodal Voice RAG Agent using Speech-to-Text, FAISS Search, and Text-to-Speech
fiv3fingers
Open-source voice agent — speech-to-text, RAG, and multi-LLM (Llama, Phi-3 Vision, Granite). Text, voice, image, and code in one Streamlit app.
Warishayat
This project is a Voice-Driven Multi-Modal RAG system that allows users to interact with an intelligent agent using spoken input instead of typed text. It combines real-time speech recognition, retrieval-augmented generation, and text-to-speech synthesis to enable a fully voice-based conversational experience with powerful LLMs.
mmujtaba0085
Echo-Persona is a full-stack Digital Twin AI platform. It allows users to create and interact with customizable personas using Retrieval-Augmented Generation (RAG), local Whisper-based Speech-to-Text (English & Urdu), AI voice cloning, and specialized Agentic AI models for document research.
csperera
Buffett's Brain is an Agentic AI chatbot powered by Retrieval-Augmented Generation (RAG) that allows you to interactively explore the investment philosophy of Warren Buffett and Charlie Munger. Ask questions about value investing, business analysis, mental models, etc.—all grounded in decades of shareholder letters, speeches, and writings.
RatulPradhan
real-time, voice-driven personal email assistant that listens to your spoken queries, retrieves relevant email context, and responds in your own cloned voice
Fahad-Awan1
No description available
werzum
A sample project to test agentic orchestration with personal note search (RAG-based), speech-to-text (Whisper-based) and text-parsing agents
Thadeus-Cruz
An AI agent with a double RAG system fetches real-time flight details and uses speech to interact with users.
Surya-Muthuraman
An AI-powered voice bot that handles customer calls using RAG (Retrieval-Augmented Generation) with Ollama and Pinecone. Features real-time speech recognition, text-to-speech, and seamless handover to human agents via LiveKit.
iqbal1201
A Multimodal Chatbot Agent which support Text and Audio as input by empowering Speech-to-Text (STT) and Text-to-Speech (TTS) in Azure Opean AI Service. The chatbot also is built usinf RAG pipeline for grounding the contextual information
kaveeris
OmniSense AI is a real-time multimodal intelligent agent combining face detection, speech-to-text, RAG, memory, and agentic reasoning. It supports text and voice interaction, activates speech only when a face is detected, and runs fully locally using open-source tools and LLMs.
saroj-raj
Sophisticated AI-powered interview assistant providing real-time responses using Ollama LLMs, Whisper speech-to-text, and comprehensive Agentic AI expertise. Features multi-agent systems, LLM evaluation, and production-ready RAG pipelines.
Blcisse
Alfred AI Assistant, is a production-grade, multimodal intelligent assistant designed for real-time reasoning, automation, and workflow orchestration. The system integrates speech-to-text and text-to-speech pipelines, vision-language reasoning, and RAG to deliver contextual, reliable, and predictable agent behavior across complex user workflows.
Mahir-Baig
This project implements an agent with a RAG-first workflow that prioritizes an internal knowledge base and grounding via the Perplexity API. It supports both text-based and speech-based (STT) user prompts and can read generated responses aloud using text-to-speech (TTS).
Wimukthi316
An Autonomous Multi-Modal AI Agent for Enterprise Intelligence. Features a full MLOps pipeline, RAG with Gemini API, Speech-to-Text (Whisper), and Document Intelligence (LayoutLM). Built with FastAPI, Next.js, and Azure.
Samad503
Built an interactive AI agent that answers questions from uploaded PDFs using RAG. • Implemented text cleaning, PDF parsing, and embeddings with ChromaDB. • Integrated ZhipuAI and LangChain tools with a Streamlit web interface and text-to-speech.
SwarnabhG07
Hackathon demo branch for Hack & Forge 2026. Contains the complete HireHub interview simulation platform — FastAPI backend, multi-agent RAG pipeline (Gemini + FAISS + sentence-transformers), proctored exam interface with speech-to-text, and candidate dashboard. Run uvicorn main:app --reload to start.
itsnaveenkroy
A multimodal AI math tutor that reliably solves JEE-style math problems using a multi-agent pipeline, RAG over a curated knowledge base, and memory-based self-learning. Accepts text, image (OCR), and audio (speech-to-text) input.
anishgillella
AI Relationship Mediator Voice Agent. A warm, empathetic AI therapist that helps couples understand each other better through real-time voice calls. Combines speech recognition, LLM reasoning, vector search (RAG), and voice synthesis to bridge emotional and logical communication
23f3004092
AI-powered podcast generator built using Streamlit, LangChain Agents, Retrieval-Augmented Generation (RAG), and Text-to-Speech (TTS). Users can upload documents (PDF, CSV, TXT, etc.), and the system automatically analyzes the content and generates an engaging two-speaker podcast-style conversation.
All 26 repositories loaded