Found 1 repositories(showing 1)
nschlaepfer
A multi-stage pipeline that enhances Qwen2.5 language models with DeepSeek Reasoner's chain-of-thought capabilities. Implements the DeepSeek-R1 methodology through cold-start SFT, reasoning-oriented RL, rejection sampling, and optional model distillation.
All 1 repositories loaded