This repository provides a production-grade implementation of the Reinforcement Learning from Human Feedback (RLHF) pipeline. It mirrors the post-training infrastructure used by major research labs, optimized for consumer hardware — including CPU-only environments with zero GPU requirement.
Stars
12
Forks
3
Watchers
12
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
20
commits
Experimental internal rlhf pipeline now pushed, docs updated. MCTS, a*, hidden CoT, and final test time compute paradigms.
d48fc72View on GitHubInference optimizations now runtime rl-'able' for use with model_merging for one base weights. Currently used tested and will be released on qwen3_1.7b f16, full MaggiePie 300k, and rlhf.py.
9cc9e78View on GitHubRefactor checkpoint saving logic to ensure final checkpoints are created for all training runs; add memory-safe configuration for Qwen3-1.7B VPS and update RLHFOrchestrator to support auto-saving final models.
1023259View on GitHubEnhance PagedKVCache with sequence length tracking and improve MCTSGenerator device handling; add SFT model testing script
f9e8f6aView on GitHubMerge branch 'main' of https://github.com/calisweetleaf/Reinforcement-Learning-Full-Pipeline
5e91001View on GitHubFix README badges and remove duplicate DOI badge
899e3b5View on GitHubUpdate README.md due to new Zenodo edit fixing license.
69c5aeaView on GitHubMerge branch 'main' of https://github.com/calisweetleaf/Reinforcement-Learning-Full-Pipeline
d9f72f3View on GitHub