Found 11,484 repositories(showing 30)
snorkel-team
A system for quickly generating training data with weak supervision
NVIDIA
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
facebookresearch
A data augmentations library for audio, image, text, and video.
makcedward
Data augmentation for NLP
ZhaoJ9014
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥
QData
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
TorchIO-project
Medical imaging processing for AI applications.
iver56
A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.
google-research
Unsupervised Data Augmentation (UDA)
facebookresearch
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
425776024
一键中文数据增强包 ; NLP数据增强、bert数据增强、EDA:pip install nlpcda
visual-layer
fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.
jasonwei20
Data augmentation for NLP, presented at EMNLP 2019
AgaMiko
List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.
yongzhuo
自然语言处理(nlp),小姜机器人(闲聊检索式chatbot),BERT句向量-相似度(Sentence Similarity),XLNET句向量-相似度(text xlnet embedding),文本分类(Text classification), 实体提取(ner,bert+bilstm+crf),数据增强(text augment, data enhance),同义句同义词生成,句子主干提取(mainpart),中文汉语短文本相似度,文本特征工程,keras-http-service调用
Code for TKDE paper "Self-supervised learning on graphs: Contrastive, generative, or predictive"
zhanlaoban
An implement of the paper of EDA for Chinese corpus.中文语料的EDA数据增强工具。NLP数据增强。论文阅读笔记。
mit-han-lab
[NeurIPS 2020] Differentiable Augmentation for Data-Efficient GAN Training
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.
MIC-DKFZ
A framework for data augmentation for 2D and 3D image classification and segmentation
Paperspace
Data Augmentation For Object Detection
iver56
Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
quqxui
Awesome papers about generative Information Extraction (IE) using Large Language Models (LLMs)
NVIDIA-NeMo
🎨 NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.
goru001
Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need
styfeng
Collection of papers and resources for data augmentation for NLP.
JasonLiTW
實作基於CNN的台鐵訂票驗證碼辨識以及透過模仿及資料增強的訓練集產生器 (Simple captcha solver based on CNN and a training set generator by imitating the style of captcha and data augmentation)
CrazyVertigo
This is a list of awesome methods about data augmentation.
zhunzhong07
Random Erasing Data Augmentation. Experiments on CIFAR10, CIFAR100 and Fashion-MNIST