Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Stars
1.6k
Forks
132
Watchers
1.6k
Open Issues
18
Overall repository health assessment
No package.json found
This might not be a Node.js project
fix(algorithms/ppo_lag): update KL-penalty term coefficient (#173)
2c1799fView on GitHubfeat(models/score_model): add score model support for Gemma/Mistral/Phi/Qwen2 (#170)
7ba1417View on GitHubdeps(transformers): pin `transformers` minimum version to 4.37 (#163)
acc00fcView on GitHublint: appease warnings for DeepSpeed integration in `transformers`
b9a7b4dView on GitHubfix(models/pretrained): fix resizing embeddings under ZeRO-3 (#158)
e7aac24View on GitHubchore(models/score_model): remove unused arguments in `ScoreModel.forward()`
aaca045View on GitHublint(models/score_model): fix type hints for `ScoreModel`s
4b56149View on GitHubdocs(models/score_model): fix docstring for `ScoreModel`s
cc17d62View on GitHubrefactor(trainers): improve end indices calculation (#157)
8af44bdView on GitHub