Found 428 repositories(showing 30)
mravanelli
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
facebookresearch
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
facebookresearch
SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
szechyjs
Digital Speech Decoder
kensho-technologies
A fast and lightweight python-based CTC beam search decoder for speech recognition.
facebookresearch
Training and evaluation pipeline for MEG and EEG brain signal encoding and decoding using deep learning. Code for our paper "Decoding speech perception from non-invasive brain recordings" published in Nature Machine Intelligence, 2023.
daanzu
Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
f4exb
Digital Speech Decoder (DSD) rewritten as a C++ library
vrenkens
Speech recognition software where the neural net is trained with TensorFlow and GMM training and decoding is done in Kaldi
argilo
GNU Radio block for Digital Speech Decoder
flinkerlab
No description available
ictnlp
A fast speech-to-speech & speech-to-text translation model that supports simultaneous decoding and offers 28× speedup.
UCSF-Chang-Lab-BRAVO
Code associated with the paper titled "A high-performance neuroprosthesis for speech decoding and avatar control" , published in Nature in 2023.
manyeyes
c# library for decoding paraformer, sensevoice Models,used in speech recognition (ASR)
chrisenytc
A speech recognition API service to decode audio to text
ravising-h
Image Processing, Speech Processing, Encoder Decoder, Research Paper implementation
idiap
Juicer is a Weighted Finite State Transducer (WFST) based decoder for Automatic Speech Recognition (ASR).
menon92
Transformer based Bangla Speech Recognition | Encoder Decoder Architecture
jgmakin
code for decoding speech as text from neural data
UFAL-DSG
Online decoder for Kaldi NNET2 and GMM speech recognition models with Python bindings.
alibabasglab
This repository contains the audio samples for "D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement" which is submitted to ICASSP 2023.
felixperfler
[Interspeech 2024] Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement
mravanelli
This code implements a basic MLP for speech recognition. The MLP is trained with pytorch, while feature extraction, alignments, and decoding are performed with Kaldi. The current implementation supports dropout and batch normalization. An example for phoneme recognition using the standard TIMIT dataset is provided.
NeuSpeech
Decode Neural signal as Speech
mravanelli
THEANO-KALDI-RNNs is a project implementing various Recurrent Neural Networks (RNNs) for RNN-HMM speech recognition. The Theano Code is coupled with the Kaldi decoder.
zhangzihan-is-good
We constructed an EEG dataset based on imagined speech and performed semantic decoding on it.
mjhydri
This repo contains the source code of the first deep learning-base singing voice beat tracking system. It leverages WavLM and DistilHuBERT pre-trained speech models to create vocal embeddings and trains linear multi-head self-attention layers on top of them to extract vocal beat activations. Then, it uses HMM decoder to infer signing beats and tempo.
laysent
node-gyp version of Silk Speech Codec, able to decode/encode audio from/to silk format (widely used by Tencent apps, such as WeChat/WeiXin, QQ)
llm-jp
Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequences of interleaved semantic and acoustic tokens.
lee-jhwn
Toward Fully-End-to-End Listened Speech Decoding from EEG Signals (Interspeech 2024)