Search Results

Found 26 repositories(showing 26)

sample-nova-sonic-speech2speech-webrtc

aws-samples

🧡55

Sample voice agent application based on Amazon Nova 2 Sonic and Amazon Kinesis Video Streams WebRTC service. It demonstrates the real-time audio streaming interaction between user and speech-to-speech model via WebRTC connection. It also supports tool use like RAG with Bedrock Knowledge Base, MCP servers, and Strands agent.

MIT-0

Python

Updated 3 days ago

kinesis-video-streamsnova-sonicwebrtc

ReclaimingVoice

LohithR22

🧡60

ReclaimingVoice is an AI-powered speech therapy app that uses multi-agent LLMs and Retrieval-Augmented Generation (RAG) to deliver personalized, medically accurate therapy plans and real-time feedback, making expert speech care accessible anywhere.

MIT

JavaScript

Updated 3 weeks ago

speech-to-speech-rag-agent

krutika13

🧡55

A Retrieval-Augmented Generation (RAG)-based AI agent for call centers with speech-to-speech interaction using Whisper, FAISS, HuggingFace LLMs, and Coqui TTS. The system enables users to speak naturally and receive context-aware voice responses grounded in custom knowledge bases.

Jupyter Notebook

Updated 3 weeks ago

voice-rag

mappfinity

❤️40

A RAG-powered voice TTS agent that retrieves relevant context and generates natural speech responses. Built to demonstrate intelligent information retrieval, dynamic reasoning, and quality voice synthesis in real time.

MIT

Python

Updated 3 months ago

PoSTAA

rmohanlal3

🧡55

PoSTA is a positive self talk AI Assistant. It uses Nvidia's open source technology NeMo Agentic Toolkit for Generative AI. RAG is being utilized to retrieve information and for Nvidia Riva's text to speech (TTS) service is selected for modeling. Our aim is to develop a highly personalized experience with own voice and own avatar for self-growth.

MIT

Python

Updated 3 weeks ago

VoiceAssist-RAG

manasa-26

❤️35

Multimodal Voice RAG Agent using Speech-to-Text, FAISS Search, and Text-to-Speech

Python

Updated 9 months ago

agentic-aigenerative-aillm+6

Voice-Interpreter-AI-Agent

fiv3fingers

🧡55

Open-source voice agent — speech-to-text, RAG, and multi-LLM (Llama, Phi-3 Vision, Granite). Text, voice, image, and code in one Streamlit app.

Jupyter Notebook

Updated 4 weeks ago

conversational-aimultimodal-aipython+6

Voice-Based-Rag

Warishayat

❤️40

This project is a Voice-Driven Multi-Modal RAG system that allows users to interact with an intelligent agent using spoken input instead of typed text. It combines real-time speech recognition, retrieval-augmented generation, and text-to-speech synthesis to enable a fully voice-based conversational experience with powerful LLMs.

Apache-2.0

Python

Updated 10 months ago

EchoPersona

mmujtaba0085

🧡55

Echo-Persona is a full-stack Digital Twin AI platform. It allows users to create and interact with customizable personas using Retrieval-Augmented Generation (RAG), local Whisper-based Speech-to-Text (English & Urdu), AI voice cloning, and specialized Agentic AI models for document research.

Python

Updated 2 weeks ago

buffetts_brain

csperera

🧡65

Buffett's Brain is an Agentic AI chatbot powered by Retrieval-Augmented Generation (RAG) that allows you to interactively explore the investment philosophy of Warren Buffett and Charlie Munger. Ask questions about value investing, business analysis, mental models, etc.—all grounded in decades of shareholder letters, speeches, and writings.

Python

Updated 1 day ago

RAG_Speech_to_Speech_Agent_Chaining

RatulPradhan

❤️35

real-time, voice-driven personal email assistant that listens to your spoken queries, retrieves relevant email context, and responds in your own cloned voice

JavaScript

Updated 11 months ago

Speech-to-speech-multi-agent-RAG-model

Fahad-Awan1

❤️25

No description available

Python

Updated 7 months ago

AgenticOrchestration

werzum

❤️35

A sample project to test agentic orchestration with personal note search (RAG-based), speech-to-text (Whisper-based) and text-parsing agents

Python

Updated 5 months ago

Flight-Information-AI-Agent

Thadeus-Cruz

❤️35

An AI agent with a double RAG system fetches real-time flight details and uses speech to interact with users.

Python

Updated 5 months ago

Ai_VoiceBot

Surya-Muthuraman

🧡55

An AI-powered voice bot that handles customer calls using RAG (Retrieval-Augmented Generation) with Ollama and Pinecone. Features real-time speech recognition, text-to-speech, and seamless handover to human agents via LiveKit.

Python

Updated 3 weeks ago

multimodal_rag_chatbot

iqbal1201

❤️35

A Multimodal Chatbot Agent which support Text and Audio as input by empowering Speech-to-Text (STT) and Text-to-Speech (TTS) in Azure Opean AI Service. The chatbot also is built usinf RAG pipeline for grounding the contextual information

HTML

Updated 9 months ago

OmniSense-AI

kaveeris

❤️45

OmniSense AI is a real-time multimodal intelligent agent combining face detection, speech-to-text, RAG, memory, and agentic reasoning. It supports text and voice interaction, activates speech only when a face is detected, and runs fully locally using open-source tools and LLMs.

Python

Updated 2 months ago

Real-Time-AI-Interview-Assistant

saroj-raj

❤️35

Sophisticated AI-powered interview assistant providing real-time responses using Ollama LLMs, Whisper speech-to-text, and comprehensive Agentic AI expertise. Features multi-agent systems, LLM evaluation, and production-ready RAG pipelines.

MIT

Python

Updated 4 months ago

ALFRED-AIA-107

Blcisse

❤️45

Alfred AI Assistant, is a production-grade, multimodal intelligent assistant designed for real-time reasoning, automation, and workflow orchestration. The system integrates speech-to-text and text-to-speech pipelines, vision-language reasoning, and RAG to deliver contextual, reliable, and predictable agent behavior across complex user workflows.

TypeScript

Updated 2 months ago

Agentx

Mahir-Baig

❤️45

This project implements an agent with a RAG-first workflow that prioritizes an internal knowledge base and grounding via the Perplexity API. It supports both text-based and speech-based (STT) user prompts and can read generated responses aloud using text-to-speech (TTS).

Python

Updated 2 months ago

SentinAI-Autonomous-Enterprise-Agent

Wimukthi316

❤️45

An Autonomous Multi-Modal AI Agent for Enterprise Intelligence. Features a full MLOps pipeline, RAG with Gemini API, Speech-to-Text (Whisper), and Document Intelligence (LayoutLM). Built with FastAPI, Next.js, and Azure.

Python

Updated 2 months ago

AI-Book-Agent

Samad503

❤️35

Built an interactive AI agent that answers questions from uploaded PDFs using RAG. • Implemented text cleaning, PDF parsing, and embeddings with ChromaDB. • Integrated ZhipuAI and LangChain tools with a Streamlit web interface and text-to-speech.

Python

Updated 6 months ago

Project-Rejection

SwarnabhG07

🧡55

Hackathon demo branch for Hack & Forge 2026. Contains the complete HireHub interview simulation platform — FastAPI backend, multi-agent RAG pipeline (Gemini + FAISS + sentence-transformers), proctored exam interface with speech-to-text, and candidate dashboard. Run uvicorn main:app --reload to start.

HTML

Updated 3 weeks ago

Math_Mentor

itsnaveenkroy

🧡55

A multimodal AI math tutor that reliably solves JEE-style math problems using a multi-agent pipeline, RAG over a curated knowledge base, and memory-based self-learning. Accepts text, image (OCR), and audio (speech-to-text) input.

Python

Updated 4 weeks ago

serene

anishgillella

❤️45

AI Relationship Mediator Voice Agent. A warm, empathetic AI therapist that helps couples understand each other better through real-time voice calls. Combines speech recognition, LLM reasoning, vector search (RAG), and voice synthesis to bridge emotional and logical communication

JavaScript

Updated 1 month ago

Study-Assistant-Agent

23f3004092

❤️45

AI-powered podcast generator built using Streamlit, LangChain Agents, Retrieval-Augmented Generation (RAG), and Text-to-Speech (TTS). Users can upload documents (PDF, CSV, TXT, etc.), and the system automatically analyzes the content and generates an engaging two-speaker podcast-style conversation.

Python

Updated 1 month ago

All 26 repositories loaded

GitHub Explorer

Search Results

sample-nova-sonic-speech2speech-webrtc

ReclaimingVoice

speech-to-speech-rag-agent

voice-rag

PoSTAA

VoiceAssist-RAG

Voice-Interpreter-AI-Agent

Voice-Based-Rag

EchoPersona

buffetts_brain

RAG_Speech_to_Speech_Agent_Chaining

Speech-to-speech-multi-agent-RAG-model

AgenticOrchestration

Flight-Information-AI-Agent

Ai_VoiceBot

multimodal_rag_chatbot

OmniSense-AI

Real-Time-AI-Interview-Assistant

ALFRED-AIA-107

Agentx

SentinAI-Autonomous-Enterprise-Agent

AI-Book-Agent

Project-Rejection

Math_Mentor

serene

Study-Assistant-Agent

sample-nova-sonic-speech2speech-webrtc

ReclaimingVoice

speech-to-speech-rag-agent

voice-rag

PoSTAA

VoiceAssist-RAG

Voice-Interpreter-AI-Agent

Voice-Based-Rag

EchoPersona

buffetts_brain

RAG_Speech_to_Speech_Agent_Chaining

Speech-to-speech-multi-agent-RAG-model

AgenticOrchestration

Flight-Information-AI-Agent

Ai_VoiceBot

multimodal_rag_chatbot

OmniSense-AI

Real-Time-AI-Interview-Assistant

ALFRED-AIA-107

Agentx

SentinAI-Autonomous-Enterprise-Agent

AI-Book-Agent

Project-Rejection

Math_Mentor

serene

Study-Assistant-Agent