Found 10,619 repositories(showing 30)
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
RVC-Boss
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
huggingface
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
RVC-Project
Easily train a good VC model with voice data <= 10 mins!
fishaudio
SOTA Open Source TTS
svc-develop-team
SoftVC VITS Singing Voice Conversion
lukas-blecher
pix2tex: Using a ViT to convert images of equations into LaTeX code.
k2-fsa
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages
open-mmlab
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
voicepaw
so-vits-svc fork with realtime support, improved interface and more features.
fishaudio
vits2 backbone with multilingual-bert
jaywalnut310
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Plachtaa
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
innnky
基于vits与softvc的歌声音色转换模型
IAHispano
A simple, high-quality voice conversion tool focused on ease of use and performance.
PlayVoice
Core Engine of Singing Voice Conversion & Singing Voice Clone
CjangCjengh
Executable file for VITS inference
High-Logic
GPT-SoVITS ONNX Inference Engine & Model Converter
lightly-ai
All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.
Voine
移动版二次元 AI 老婆聊天器
innnky
无需情感标注的情感可控语音合成模型,基于VITS
PlayVoice
Best practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support ONNX streaming out!
yitu-opensource
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
PriesiaMioShirakana
多个SVC/TTS的C++推理库
baofff
A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".
THU-MIG
RepViT: Revisiting Mobile CNN From ViT Perspective [CVPR 2024] and RepViT-SAM: Towards Real-Time Segmenting Anything
Artrajz
A simple VITS HTTP API, developed by extending Moegoe with additional features.
luoyily
Speech synthesis model /inference GUI repo for galgame characters based on Tacotron2, Hifigan, VITS and Diff-svc
daixiangzi
A paper list of some recent works about Token Compress for Vit and VLM
lukemelas
Vision Transformer (ViT) in PyTorch