Found 2,654 repositories(showing 30)
QData
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
ray-project
This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.
facebookresearch
SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in combination with self-training and knowledge-distillation, or for retrieving paraphrases.
ogrisel
Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.
princeton-nlp
Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃
IDEA-CCNL
GTS Engine: A powerful NLU Training System。GTS引擎(GTS-Engine)是一款开箱即用且性能强大的自然语言理解引擎,聚焦于小样本任务,能够仅用小样本就能自动化生产NLP模型。
Niger-Volta-LTI
Yorùbá language training text for NLP, ASR and TTS tasks
CLUEbenchmark
高性能小模型测评 Shared Tasks in NLPCC 2020. Task 1 - Light Pre-Training Chinese Language Model for NLP Task
This repository contains the code to reconstruct the training dataset from NLP/ML Papers in PDF format together with their corresponding slides.
chrisc36
Methods of training NLP models to ignored biased strategies
AmirhosseinHonardoust
A complete NLP and Machine Learning project to detect fake and real news using TF-IDF and Logistic Regression. Includes full training pipeline, evaluation charts, and an interactive Streamlit web app for real-time credibility analysis. Dataset adapted from Kaggle’s Fake and Real News Dataset.
MrinmoiHossain
The course is contained knowledge that are useful to work on deep learning as an engineer. Simple neural networks & training, CNN, Autoencoders and feature extraction, Transfer learning, RNN, LSTM, NLP, Data augmentation, GANs, Hyperparameter tuning, Model deployment and serving are included in the course.
NLP实验:新词挖掘+预训练模型继续Pre-training
Adversarial Training for NLP in Keras
GermanT5
Wikipedia text corpus for self-supervised NLP model training
stanfordnlp
Model training tutorials for the Stanza Python NLP Library
bothub-it
Bothub is an open platform for predicting, training and sharing NLP datasets in multiple languages
ChaitanyaK77
This Repository provides a Jupyter Notebook for building a small language model from scratch using 'TinyStories' dataset. Covers data preprocessing, BPE tokenization, binary storage, GPU memory management, and training a Transformer in PyTorch. Generate sample stories to test your model. Ideal for learning NLP and PyTorch.
georgebrock
Tools for training Stanford NLP's NER models
ksdkamesh99
A Natural Language Processing model trained with over 1,00,000 (1 Lakh) names is used to predict a gender of a person based on the first name of the person.This model is created using Long Short Term Memory(LSTM) a variant of Recurrent Nueral Network which has training accuracy of 99.35% and tested over 11,000 samples with a test accuracy of 89.08% which is quite high in nlp for out of sample test cases.
rohanmistry231
A comprehensive collection of 50,000 prompts for AI model training and prompt engineering, designed to enhance NLP model performance and creativity. Includes categorized prompts and tools for generating, testing, and optimizing prompts for various AI applications.
QData
A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)
Preprocessing of the dataset of 347 subtitles for the TV series (thanks to Taiga Corpus) to build a word2vec model, JamSpell model, neural network training, chat bot training or in any other NLP task.
dwhitena
Materials for the "Modern NLP: Pre-training, Fine-tuning, Prompt Engineering, and Human Feedback" workshop at ODSC East 2023
AaronGrainer
A simple project training 3 separate NLP tasks simultaneously using Multitask-Learning
Contains code for training NLP models that takes in text and predicts concepts & keywords from a list of standardized NASA keywords. Code for the API that uses models trained by this repo is in `concept-tagging-api` repository.
ARBML
A simple strategy for training and finetuning NLP models for Arabic. Specify the parameters and just wait for the results. A simple design that makes use of the different tools in our NLP pipeline.
avacaondata
Python library for automatic training, optimization and comparison of Transformer models on most NLP tasks.
flozi00
An open source NLP as a service project focused on providing state of the art systems with ease. Training and inference by simple docker commands
Droidtown
NLP Training/Teaching Materials with Articut