GitHub Explorer

by Alexey Ratnikov

GitHub Explorer

GitHub Explorer|TRENDING COMPARE|FEEDBACK

Back to search

PKU-Alignment/safe-rlhf - GitHub Explorer | GitHub Explorer | Trending | Compare

Back to search

safe-rlhf

PKU-Alignment•PUBLIC

View on GitHub

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

ai-safetyalpacabeaverdatasetsdeepspeedgpt

Apache License 2.0

Created on May 15, 2023

Updated on Apr 7, 2026

Stars

1.6k

Forks

132

Watchers

1.6k

Open Issues

Repository Health Score

🧡

68/100

Fair

Overall repository health assessment

Score Breakdown

Activity

Active development - updated this week

30/30

100%

Issues Analytics

Total Issues

All time

Open

29% of total

Closed

Recent Commits

docs(README.md): release PKU-SafeRLHF datasets (#178)

Jiaming Ji•1 year ago

e8cca16View on GitHub

docs(README.md): update model URLs

Xuehai Pan•1 year ago

82743ccView on GitHub

chore(pre-commit): update pre-commit hooks

Xuehai Pan•1 year ago

9c89721View on GitHub

fix(algorithms/ppo_lag): update KL-penalty term coefficient (#173)

Xuehai Pan•1 year ago

2c1799fView on GitHub

feat(models/score_model): add score model support for Gemma/Mistral/Phi/Qwen2 (#170)

Xuehai Pan•2 years ago

7ba1417View on GitHub

docs(README.md): update citations

Xuehai Pan•2 years ago

8e6c8eeView on GitHub

chore: update license header

Xuehai Pan•2 years ago

37267b8View on GitHub

chore(pre-commit): update pre-commit hooks

Xuehai Pan•2 years ago

ae4727bView on GitHub

deps(transformers): pin `transformers` minimum version to 4.37 (#163)

Xuehai Pan•2 years ago

acc00fcView on GitHub

lint: appease warnings for DeepSpeed integration in `transformers`

Xuehai Pan•2 years ago

b9a7b4dView on GitHub

fix(models/pretrained): fix resizing embeddings under ZeRO-3 (#158)

Xuehai Pan•2 years ago

e7aac24View on GitHub

chore(models/score_model): remove unused arguments in `ScoreModel.forward()`

Xuehai Pan•2 years ago

aaca045View on GitHub

lint(models/score_model): fix type hints for `ScoreModel`s

Xuehai Pan•2 years ago

4b56149View on GitHub

docs(models/score_model): fix docstring for `ScoreModel`s

Xuehai Pan•2 years ago

cc17d62View on GitHub

refactor(trainers): improve end indices calculation (#157)

Xuehai Pan•2 years ago

8af44bdView on GitHub

View all commits

GitHub Explorer

safe-rlhf

Score Breakdown

Issues Activity: Last 6 months

Top Labels

Hottest Issues