Found 11 repositories(showing 11)
YangRui2015
Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"
WisdomShell
RewardAnything: Generalizable Principle-Following Reward Models
Akshint0407
This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.
AlignRM
ICML'25: The Devil Is in the Details: Tackling Unimodal Spurious Correlations for Generalizable Multimodal Reward Models
zhuohaoyu
RewardAnything: Generalizable Principle-Following Reward Models
jiachenwestlake
Code for "Generalizing reward modeling for out-of-distribution preference learning" in ECML PKDD'2024.
Mikesterner87
This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.
yinzhangyue
[ACL' 25] The official code repository for DG-PRM: Dynamic and Generalizable Process Reward Modeling
lblaoke
Training generalizable reward models / aligned models
aneeshn
An implementation of the Reward predictive modelling approach introduced in "Reward-predictive representations generalize across tasks in reinforcement learning".
The course teaches how to fine-tune Large Language Models using Reinforcement Learning, specifically GRPO (Generalized Reward-Policy Optimization), instead of supervised labels. Core idea Don’t tell the model the “correct answer.”
All 11 repositories loaded