A multi-stage pipeline that enhances Qwen2.5 language models with DeepSeek Reasoner's chain-of-thought capabilities. Implements the DeepSeek-R1 methodology through cold-start SFT, reasoning-oriented RL, rejection sampling, and optional model distillation.
Stars
11
Forks
3
Watchers
11
Open Issues
1
Overall repository health assessment
No package.json found
This might not be a Node.js project
17
commits
Merge pull request #4 from nschlaepfer/codex/implement-improvements-for-mlx-based-grpo-trainer
0528106View on GitHubMerge pull request #3 from nschlaepfer/codex/refactor-pipeline-for-new-models-integration
4f4c8f8View on GitHubMerge pull request #2 from nschlaepfer/codex/clean-up-repository-and-update-readme
0842102View on GitHubupdated with superprompt system. (anthropic partial expansion to the CoT from r1)
31335a1View on GitHub