Back to search
Rubric Reward Model to reduce “miracle steps” and unfaithful CoT in math; SFT+PPO training and verified evaluation.
Stars
9
Forks
0
Watchers
9
Open Issues
1
Overall repository health assessment
No package.json found
This might not be a Node.js project
24
commits