Found 1 repositories(showing 1)
YouliangYuan
Rubric Reward Model to reduce “miracle steps” and unfaithful CoT in math; SFT+PPO training and verified evaluation.
All 1 repositories loaded