A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Stars
4.0k
Forks
566
Watchers
4.0k
Open Issues
39
Overall repository health assessment
No package.json found
This might not be a Node.js project
122
commits
6
commits
4
commits
3
commits
2
commits
2
commits
1
commits
1
commits
1
commits
1
commits
[Feature] Support for Qwen3 with TP=8 and optimized weight streaming speed (#101)
0e30e01View on GitHub[Fix] Streaming weight loader to fix OOM with tensor parallelism (#93)
1bdd8d5View on GitHub[Fix] Fix LM head all-gather output reshaping for bs > 1 (#98)
7c59f4dView on GitHub[Fix] Fix sampler inconsistency due to uninitialized random seed (#88)
690f55aView on GitHub[Test] Add WildChat offline benchmark with real-world prompts (#81)
64060c2View on GitHub