Search Results

Found 11 repositories(showing 11)

Generalizable-Reward-Model

YangRui2015

🧡65

Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"

MIT

Python

Updated 47 minutes ago

RewardAnything

WisdomShell

❤️40

RewardAnything: Generalizable Principle-Following Reward Models

NOASSERTION

Python

Updated 3 months ago

alignmentevaluationgrpo+4

Nano-R1

Akshint0407

🧡60

This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.

Apache-2.0

Jupyter Notebook

Updated 3 weeks ago

adaptersgrpohuggingface+7

Generalizable-MM-RM

AlignRM

❤️20

ICML'25: The Devil Is in the Details: Tackling Unimodal Spurious Correlations for Generalizable Multimodal Reward Models

Python

Updated 9 months ago

RewardAnything

zhuohaoyu

❤️35

RewardAnything: Generalizable Principle-Following Reward Models

NOASSERTION

HTML

Updated 7 months ago

alignmentevaluationllm+5

OODPL

jiachenwestlake

❤️20

Code for "Generalizing reward modeling for out-of-distribution preference learning" in ECML PKDD'2024.

Updated 9 months ago

Nano-R1

Mikesterner87

💛70

This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.

Apache-2.0

Jupyter Notebook

Updated 1 hour ago

adaptersbuildgrpo+11

DG-PRM

yinzhangyue

❤️40

[ACL' 25] The official code repository for DG-PRM: Dynamic and Generalizable Process Reward Modeling

Apache-2.0

Python

Updated 8 months ago

trl_training_toolkit

lblaoke

❤️35

Training generalizable reward models / aligned models

Python

Updated 8 months ago

reward_predictive_modelling

aneeshn

❤️35

An implementation of the Reward predictive modelling approach introduced in "Reward-predictive representations generalize across tasks in reinforcement learning".

Python

Updated 5 years ago

reinforcement-learningreinforcement-learning-algorithmsreinforcement-learning-environments

Reinforcement-Finetuning-LLMs-with-GRPO-Reasoning-Models

sayedRaheel

❤️45

The course teaches how to fine-tune Large Language Models using Reinforcement Learning, specifically GRPO (Generalized Reward-Policy Optimization), instead of supervised labels. Core idea Don’t tell the model the “correct answer.”

Jupyter Notebook

Updated 2 months ago

All 11 repositories loaded

GitHub Explorer

Search Results

Generalizable-Reward-Model

RewardAnything

Nano-R1

Generalizable-MM-RM

RewardAnything

OODPL

Nano-R1

DG-PRM

trl_training_toolkit

reward_predictive_modelling

Reinforcement-Finetuning-LLMs-with-GRPO-Reasoning-Models

Generalizable-Reward-Model

RewardAnything

Nano-R1

Generalizable-MM-RM

RewardAnything

OODPL

Nano-R1

DG-PRM

trl_training_toolkit

reward_predictive_modelling

Reinforcement-Finetuning-LLMs-with-GRPO-Reasoning-Models