Search Results

Found 3 repositories(showing 3)

data-augmentation-with-paraphrase-llms

YEnesK

❤️40

Implementation LLM-based text-augmentation pipeline that enlarges IMDB and AG News datasets with PEGASUS/T5 paraphrasers, embeds all texts with all-MiniLM-L6-v2, trains an MLP classifier, and reports accuracy gains of up to ~10 pp thanks to augmented train / test ensembles.

MIT

Jupyter Notebook

Updated 8 months ago

DAugSindhi

rajavavek

❤️40

DAugSindhi addresses the challenges of Sindhi text classification in Natural Language Processing (NLP) due to limited annotated datasets. The study uses data augmentation techniques like Easy Data Augmentation (EDA), Back Translation, Paraphrasing, and Text Generation with Large Language Models (LLMs) to artificially expand the dataset.

Apache-2.0

Jupyter Notebook

Updated 9 months ago

HSOP

mojing122

❤️30

HSOP (Hidden State Optimized Prompt-tuning), a new framework that mixes soft prompt-tuning with hidden state-based data augmentation for more reliable detection. HSOP utilizes a small-scale LLM ($\leq$4B parameters) to generate paraphrased versions of each input text as augmented samples.

Python

Updated 5 months ago

All 3 repositories loaded

GitHub Explorer

Search Results

data-augmentation-with-paraphrase-llms

DAugSindhi

HSOP

data-augmentation-with-paraphrase-llms

DAugSindhi

HSOP