High-performance .NET BPE tokenizer — up to 618 MiB/s, competitive with Rust. Zero-allocation counting, multilingual cache, o200k/cl100k/r50k/p50k encodings + HuggingFace tokenizer.json support.
Stars
86
Forks
7
Watchers
86
Open Issues
1
Overall repository health assessment
No package.json found
This might not be a Node.js project
148
commits
95
commits
30
commits
2
commits
1
commits
fix: use correct System.CommandLine v2.0.5 API for CLI invocation
ae58e8dView on GitHubchore(deps): bump the github-actions group with 3 updates (#119)
2642c2bView on GitHubFix home directory crash + use ArrayPool in BPE heap path
b97ecb9View on GitHubFix stack overflow on large files + optimize FileScanner for huge directories
482b84fView on GitHubFix FileScanner crash on Parallel.ForEach AggregateException
38411dbView on GitHubAdd --follow-symlinks flag for opt-in symlink directory traversal
5dd9806View on GitHubSkip symlink directories to avoid traversal into virtual/mounted filesystems
744ee6eView on GitHubFix FileScanner crash on problematic paths (symlink loops, timeouts, long paths)
c6067f7View on GitHubAdd CLI section to root README, add --version to options table
da35c23View on GitHubUpgrade actions to v5, seal ByteArrayComparer (CA1033)
0e7efd2View on GitHubAd-hoc codesign macOS binaries + update README + cask version fix
56f50d0View on GitHub