Found 24 repositories(showing 24)
argilla-io
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
ZJU-REAL
This repository is the official implementation of TimeHC-RL (Distilabel (Data Generation) + TRL (SFT) + VeRL (GRPO)).
AIAnytime
Synthetic Data Generation using LLM via Argilla, Distilabel, ChatGPT, etc.
argilla-io
Repository containing the SPIN experiments on the DIBT 10k ranked prompts
argilla-io
A working repository for experimental pipelines in distilabel
GURPREETKAURJETHRA
Synthetic Data Generation using LLM via Argilla, Distilabel, ChatGPT, etc.
lightonai
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
497429018
Distilabel DeepSeek-R1 模型蒸馏实战
A repo that implements Stanford CRFM their HELM Instruct with adaptable evaluation criteria
djellalmohamedaniss
A custom Step for LLM API cost calculation for the distilabel library.
johannhartmann
A simple distilabel generation pipeline to create a dataset for inclusivity training for language models.
younghosck
practice distilabel
VidyaPeddinti
No description available
ichikomunikation
No description available
johnmccabe
No description available
burtenshaw
No description available
conda-forge
A conda-smithy repository for distilabel.
burtenshaw
No description available
No description available
yych42
No description available
HammamWahab
No description available
alvaldes
A simplified Distilabel-inspired pipeline for synthetic dataset creation using pandas DataFrames and local Ollama models.
thibaud-perrin
Generate synthetic datasets for instruction tuning and preference alignment using tools like `distilabel` for efficient and scalable data creation.
AreebAhmad-02
the repo is for the real time rag pipeline for the research papers , extract all the rag research papers from the arxiv and the semantic chunkin is done on it , then the embedding model finetuning is done to make cluster for finetuning embedding model the distilabel is used for generating synthetic data set
All 24 repositories loaded