Search Results

Found 302 repositories(showing 30)

ms-swift

modelscope

💚92

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...) (AAAI 2025).

13.6k

1.3k

Apache-2.0

Python

Updated 17 minutes ago

deepseek-r1embeddinggrpo+17

simple_GRPO

lsdefine

💛73

A very simple GRPO implement for reproducing r1-like LLM thinking.

1.6k

130

Apache-2.0

Python

Updated 12 hours ago

mlx-tune

ARahim3

🧡67

Fine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.

1.0k

Apache-2.0

Python

Updated 29 minutes ago

apple-silicondeep-learninghuggingface+17

diy-llm

datawhalechina

🧡66

🎓 系统性大语言模型构建课程｜🛠️ 覆盖预训练数据工程、Tokenizer、Transformer、MoE、GPU 编程 (CUDA/Triton)、分布式训练、Scaling Laws、推理优化及对齐 (SFT/RLHF/GRPO)｜🚀 6 个渐进式作业 + 代码驱动，建立 LLM 全栈认知体系

408

Jupyter Notebook

Updated 3 minutes ago

gpu-programmingllmnlp+4

MLX-GRPO

Doriandarko

🧡65

A pure MLX-based training pipeline for fine-tuning LLMs using GRPO on Apple Silicon.

238

Python

Updated 3 days ago

unsloth-buddy

TYH-labs

🧡60

Zero-friction LLM fine-tuning skill for Claude Code, Gemini CLI & any ACP agent. Unsloth on NVIDIA · TRL+MPS/MLX on Apple Silicon. Automates env setup, LoRA training (SFT, DPO, GRPO, vision), post-hoc GRPO log diagnostics, evaluation, and export end-to-end. Part of the Gaslamp AI platform.

211

Python

Updated 21 hours ago

apple-siliconclaude-codedpo+10

Easy-LLM-Post-Training

zht8506

🧡65

Implement popular LLM post-training algorithms (SFT, DFT, DPO, GRPO, etc.) in PyTorch with easy code!

178

Python

Updated 17 hours ago

GRPO-With-Cargo-Feedback

Oxen-AI

💛70

This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedback

119

MIT

Python

Updated 2 days ago

MM-UPT

waltonfuture

🧡55

[NeurIPS 2025] Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Apache-2.0

Python

Updated 3 weeks ago

grpomllmunsupervised-learning

FinRL-DAPO-SR

Ruijian-Zha

🧡55

🚀 A New DAPO Algorithm for Stock Trading (arXiv:2505.06408) Implementation of our IEEE IDS 2025 accepted algorithm combining Dynamic Sampling Policy Optimization (DAPO), Group Relative Policy Optimization (GRPO), and LLM-driven risk/sentiment signals for efficient and profitable stock trading on the NASDAQ-100 index.

Python

Updated 3 weeks ago

grpo-llm-evaluator

mkurman

🧡60

Fine-tunes a student LLM using teacher feedback for improved reasoning and answer quality. Implements GRPO with teacher-provided evaluations.

Apache-2.0

Python

Updated 1 week ago

BandPO

OpenMOSS

💛70

Official implementation of BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning. BandPO replaces canonical clipping (PPO/GRPO) with dynamic bounds to resolve exploration bottlenecks and prevent entropy collapse.

GPL-3.0

Python

Updated 22 hours ago

Text-to-SQL-GRPO-Fine-tuning-Pipeline

yai333

❤️45

This repository contains a pipeline for fine-tuning Large Language Models (LLMs) for Text-to-SQL conversion using General Reward Proximal Optimization (GRPO).

Python

Updated 1 month ago

SofT-GRPO-master

zz1358m

🧡65

Code for the SofT-GRPO algorithm on the LLM soft-thinking reasoning pattern.

MIT

Python

Updated 5 days ago

grpo_code

axolotl-ai-cloud

🧡55

A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.

Apache-2.0

Python

Updated 2 weeks ago

Rank-GRPO

yaochenzhu

🧡50

(ICLR'26 + Netflix) Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning

Python

Updated 1 week ago

mafia

Infatoshi

🧡65

Train an LLM to play Mafia via GRPO

Python

Updated 4 days ago

PipelineLLM

iBacklight

🧡55

PipelineLLM 是一个系统性的大语言模型（LLM）后训练学习项目，涵盖从监督微调（SFT）到偏好优化（DPO）、强化学习（RLHF/PPO/GRPO）再到持续学习（Continual Learning)的完整技术栈。

MIT

Python

Updated 1 week ago

continual-learningfine-tuningllm-infrastructure+8

aws-sft-grpo-budget-llm-finetune

EsmaeilNarimissa

❤️25

No description available

Jupyter Notebook

Updated 9 months ago

Scaf-GRPO

JIA-Lab-research

❤️40

Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning

Python

Updated 1 month ago

Note-for-LLM-Training

Bader-CN

💛70

一个完整的 LLM 训练的基本流程笔记 (Tokenizer -> PreTraining -> SFT -> DPO -> GRPO)

GPL-3.0

Python

Updated 1 day ago

Fine-tune-LLMS-with-grpo

Hmzbo

❤️45

Fine-tune LLMs with GRPO algorithm tutorial

Jupyter Notebook

Updated 2 months ago

deepspeed-grpo-qlora-vllm

Minami-su

🧡50

This repository, deepspeed-grpo-qlora-vllm, provides a complete framework for fine-tuning LLMs using Group Relative Policy Optimization (GRPO) on 4-bit quantized models (QLoRA). It utilizes DeepSpeed ZeRO-3 for scalable training and integrates with a VLLM server to dynamically serve the fine-tuned LoRA adapters.

Apache-2.0

Python

Updated 2 months ago

clever_searcher

Azzedde

🧡55

Intelligent web discovery agent with LLM-powered planning, multi-source search, smart deduplication, and GRPO preference dataset collection. Autonomously searches, analyzes, and summarizes web content while building training data for model fine-tuning.

MIT

Python

Updated 2 weeks ago

ai-agentsautomationcrawling+4

NLP-Reading-List

rraghavkaushik

🧡65

A curated collection of NLP and LLM resources. Covers essential papers and blogs on Transformers, Reinforcement Learning (RLHF, DPO, GRPO), Mechanistic Interpretability, Scaling Laws, and MLSys.

Updated 3 days ago

awesome-llm-resourcesdeep-learninglatest-llm-papers+14

manim-trainer

SuienS

🧡60

A toolkit for fine-tuning Large Language Models (LLMs) to generate Manim animation code using Supervised Fine-Tuning (SFT) and Visually Grounded Reinforcement Learning using Group Relative Policy Optimisation (GRPO/GSPO) techniques.

NOASSERTION

Python

Updated 1 week ago

THIP

Thrillcrazyer

❤️45

"Thinking is Process." Leverage Process Mining Technique for LLM Reinforcement Learning. Official Repository of "Reasoning-Aware GRPO using Process Mining"

Apache-2.0

Python

Updated 1 week ago

sb3-grpo

kechirojp

❤️40

GRPO (Group Relative Policy Optimization) implementation for Stable Baselines3. Drop-in PPO replacement with instant action comparison. Easy pip install, full API compatibility. Used by DeepSeek for LLM training.

MIT

Python

Updated 1 month ago

algorithmgrpogymnasium-environment+6

Reinforcement-Fine-Tuning-LLMs-with-GRPO

ahmecse

❤️45

RFT with GRPO: RFT helps adapt LLMs to complex reasoning tasks like math and coding by using RL, enabling models to develop their own strategies instead of mimicking examples as in SFT. GRPO, a tailored RL algorithm, excels in tasks with verifiable outcomes and works well with small datasets.

Jupyter Notebook

Updated 1 month ago

grpollmsrl+2

Modern-GPTRL-PyTorch

XiaomingX

🧡50

🚀 项目使命：弥合算法理论与工程实践的鸿沟本项目是一个专为中文开发者设计的深度学习与强化学习算法全栈实验室。我们通过对 GPT-2、RLHF、MuZero 以及 Alignment (GRPO, Weak-to-Strong) 等前沿算法的现代化 PyTorch 重构，旨在提供一个“所见即所得”的学习与研究基准。核心差异化价值全栈重构: 彻底告别不再维护的 TensorFlow 1.x / JAX 遗留代码，全面拥抱 PyTorch 2.x 生态。理论实战闭环: 每一行核心逻辑都配有详尽的中文注释，直接对应论文中的数学公式。对齐技术前瞻: 率先集成了 GRPO (DeepSeek)、Weak-to-Strong (OpenAI) 等 LLM 对齐关键算法。

NOASSERTION

Python

Updated 1 month ago

gpt-2

GitHub Explorer

Search Results

ms-swift

simple_GRPO

mlx-tune

diy-llm

MLX-GRPO

unsloth-buddy

Easy-LLM-Post-Training

GRPO-With-Cargo-Feedback

MM-UPT

FinRL-DAPO-SR

grpo-llm-evaluator

BandPO

Text-to-SQL-GRPO-Fine-tuning-Pipeline

SofT-GRPO-master

grpo_code

Rank-GRPO

mafia

PipelineLLM

aws-sft-grpo-budget-llm-finetune

Scaf-GRPO

Note-for-LLM-Training

Fine-tune-LLMS-with-grpo

deepspeed-grpo-qlora-vllm

clever_searcher

NLP-Reading-List

manim-trainer

THIP

sb3-grpo

Reinforcement-Fine-Tuning-LLMs-with-GRPO

Modern-GPTRL-PyTorch

ms-swift

simple_GRPO

mlx-tune

diy-llm

MLX-GRPO

unsloth-buddy

Easy-LLM-Post-Training

GRPO-With-Cargo-Feedback

MM-UPT

FinRL-DAPO-SR

grpo-llm-evaluator

BandPO

Text-to-SQL-GRPO-Fine-tuning-Pipeline

SofT-GRPO-master

grpo_code

Rank-GRPO

mafia

PipelineLLM

aws-sft-grpo-budget-llm-finetune

Scaf-GRPO

Note-for-LLM-Training

Fine-tune-LLMS-with-grpo

deepspeed-grpo-qlora-vllm

clever_searcher

NLP-Reading-List

manim-trainer

THIP

sb3-grpo

Reinforcement-Fine-Tuning-LLMs-with-GRPO

Modern-GPTRL-PyTorch