Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Stars
10.4k
Forks
1.0k
Watchers
10.4k
Open Issues
58
Overall repository health assessment
No package.json found
This might not be a Node.js project
41
commits
3
commits
2
commits
1
commits
1
commits
1
commits
1
commits
1
commits
1
commits
1
commits
add gnp/minbpe-rs as community extensions in README.md
45a0c2fView on GitHubmore work on lecture hehe. this is hard work! not sure an LLM can do this tbh
d201df8View on GitHubadd small amount of code to write out the gpt4 vocab in the same format as the base class, taking into account the byte shuffle
1cc9d06View on GitHub