Back to search
Official implementation of "TAPO: Dynamic Teacher and Perturbed Answer Injection for Policy Optimization", a fine-grained RL framework for reasoning alignment in LLMs.
Stars
0
Forks
0
Watchers
0
Open Issues
0
Overall repository health assessment
No language data available
No package.json found
This might not be a Node.js project
5
commits