rl from zero pretrain, can it be done? yes.
Stars
291
Forks
21
Watchers
291
Open Issues
2
Overall repository health assessment
No package.json found
This might not be a Node.js project
Clean up docs/ and issues/ directories from main branch
d291d9bView on GitHubMerge pull request #43 from tokenbender/full-vocab-softmax-gradient-flow
2ce1595View on GitHubconfig: adjust training parameters for full vocab experiments
042c667View on GitHubfeat: compute softmax over full vocabulary for complete gradient flow
fb40379View on GitHubrefactor: move hardcoded values to config for better flexibility
8cf8549View on GitHubadd optimizations with vectorized operations, action space based probs calculation, removal of entropy calculation over entite vocab
9d9df27View on GitHubadd basic timing bash file and corresponding small iteration config
c46c34eView on GitHubfix critic download path reading from config file in start.sh
227791fView on GitHubMerge pull request #41 from tokenbender/feat/4bit-distributed-critic
f894a60View on GitHubfeat: add 4-bit quantization support for critic model in AvataRL
0f92f2eView on GitHubperf: optimize checkpoint loading to reduce memory usage
9d01979View on GitHubfeat: add 4-bit quantized critic loading for memory optimization
602455aView on GitHub