Found 16,855 repositories(showing 30)
modelscope
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
k2-fsa
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages
snakers4
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
smacke
Automagically synchronize subtitles with video.
cjhutto
VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
CheshireCC
faster_whisper GUI with PySide6
TEN-framework
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
ricky0123
Voice activity detector (VAD) for the browser with a simple API
FluidInference
Frontier CoreML audio models in your apps โ text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.
k2-fsa
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.
meizhong986
ASR/STT subtitle generator. Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD. Noise-robust for JAV
hustvl
[ICCV 2023 & ICLR 2026] VAD: Vectorized Scene Representation for Efficient Autonomous Driving
jtkim-kaist
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
amsehili
An audio/acoustic activity detection and audio segmentation tool
rapidaai
Rapida is an open-source, end-to-end voice AI orchestration platform for building real-time conversational voice agents with audio streaming, STT, TTS, VAD, multi-channel integration, agent state management, and observability.
junegunn
A simple Vimscript test framework
dpirch
Voice activity detection (VAD) library, based on WebRTC's VAD engine
soniqo
AI speech toolkit for Apple Silicon โ ASR, TTS, speech-to-speech, VAD, and diarization powered by MLX and CoreML
shashikg
An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
0x5446
API and websocket server for sensevoice. It has inherited some enhanced features, such as VAD detection, real-time streaming recognition, and speaker verification.
YihuaJerry
[MM 2025] EventVAD: Training-Free Event-Aware Video Anomaly Detection
DmitryRyumin
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
gkonovalov
Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
marsbroshok
Voice Activity Detector in Python
hcmlab
Real-time Voice Activity Detection in Noisy Eniviroments using Deep Neural Networks
FireRedTeam
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singing ASR. FireRedVAD supports speech/singing/music in 100+ langs. FireRedLID supports 100+ langs and 20+ zh dialects. FireRedPunc supports zh and en.
gtreshchev
Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime.
charliewolfe
Manual mapper that uses PTE manipulation, Virtual Address Descriptor (VAD) manipulation, and forceful memory allocation to hide executable pages. (VAD hide / NX bit swapping)
Thinklab-SJTU
BEVFormer, UniAD, VAD in Closed-Loop CARLA Evaluation with World Model RL Expert Think2Drive
filippogiruzzi
Voice Activity Detection based on Deep Learning & TensorFlow