Found 73,132 repositories(showing 30)
ggml-org
Port of OpenAI's Whisper model in C/C++
coqui-ai
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
mozilla
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
SYSTRAN
Faster Whisper transcription with CTranslate2
m-bain
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
index-tts
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
cjpais
A free, open source, and extensible speech-to-text application that works completely offline.
leon-ai
🧠 Leon is your open-source personal assistant.
NVIDIA-NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
jianchang512
Translate the video from one language to another and embed dubbing & subtitles.
modelscope
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
kaldi-asr
kaldi-asr/kaldi is the official location of the Kaldi project.
alphacep
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
PaddlePaddle
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
speechbrain
A PyTorch-based Speech Toolkit
k2-fsa
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages
Zackriya-Solutions
Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper live transcription, speaker diarization, and Ollama summarization built on Rust. 100% local processing. no cloud required. Meetily (Meetly Ai - https://meetily.ai) is the #1 Self-hosted, Open-source Ai meeting note taker for macOS & Windows.
rhasspy
A fast, local neural text to speech system
rany2
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
mozilla
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
QuentinFuxa
Simultaneous speech-to-text models
KoljaB
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Uberi
Speech recognition module for Python, supporting several engines and APIs, online and offline.
jasonppy
Zero-Shot Speech Editing and Text-to-Speech in the Wild
nl8590687
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
FunAudioLLM
Multilingual Voice Understanding Model
jaywalnut310
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
jianchang512
一个简单的本地网页界面,使用ChatTTS将文字合成为语音,同时支持对外提供API接口。A simple native web interface that uses ChatTTS to synthesize text into speech, along with support for external API interfaces.
myshell-ai
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
Zyphra
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers.