Found 72,989 repositories(showing 30)
unslothai
Unsloth Studio is a web UI for training and running open models like Qwen3.5, Gemma 4, DeepSeek, gpt-oss locally.
RVC-Boss
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
coqui-ai
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
2noise
A generative speech model for daily dialogue.
babysor
🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time
myshell-ai
Instant voice cloning by MIT and MyShell. Audio foundation model.
mozilla
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
FunAudioLLM
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
index-tts
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
cjpais
A free, open source, and extensible speech-to-text application that works completely offline.
nari-labs
A TTS model capable of generating ultra-realistic dialogue in one pass.
leon-ai
🧠 Leon is your open-source personal assistant.
NVIDIA-NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
jianchang512
Translate the video from one language to another and embed dubbing & subtitles.
modelscope
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
PaddlePaddle
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
k2-fsa
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages
rhasspy
A fast, local neural text to speech system
rany2
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
mozilla
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
QuentinFuxa
Simultaneous speech-to-text models
espnet
End-to-End Speech Processing Toolkit
open-mmlab
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
KoljaB
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
jasonppy
Zero-Shot Speech Editing and Text-to-Speech in the Wild
netease-youdao
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Plachtaa
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
jaywalnut310
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
jianchang512
一个简单的本地网页界面,使用ChatTTS将文字合成为语音,同时支持对外提供API接口。A simple native web interface that uses ChatTTS to synthesize text into speech, along with support for external API interfaces.
myshell-ai
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.