Back to search
This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.
Stars
2
Forks
1
Watchers
2
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
20
commits
3
commits