Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
Stars
566
Forks
61
Watchers
566
Open Issues
4
Overall repository health assessment
No package.json found
This might not be a Node.js project
feat(TextRLActor): remove repetition_penalty parameter from TextRLActor and SoftmaxCategoricalHead since it may lead to NaN in training
5c98824View on GitHubfeat(setup.py, actor.py): update package version and adjust actor parameters
a0019c6View on GitHubfeat(setup.py, environment.py): update package version and disable token type ids return in TextRLEnv
a3488d6View on GitHubfeat(setup.py, environment.py): Bump version to 0.2.19 and disable return of token type ids in TextRLEnv
12eeab8View on GitHubUpdate environment to use self.model for OPTForCausalLM comparison
acae958View on GitHubadd support for OPT models in actor.py, and environment.py and update version number and
2a1fb18View on GitHubupdate example with logging configuration and remove topK for stability
6f063afView on GitHub