Back to search
From-scratch PyTorch implementation of Google's TurboQuant (ICLR 2026) for LLM KV cache compression. 5x compression at 3-bit with 99.5% attention fidelity.
Stars
804
Forks
103
Watchers
804
Open Issues
17
Overall repository health assessment
No package.json found
This might not be a Node.js project
6
commits
Correct README: generation results were invalid, update with real numbers
03e6112View on GitHubUpdate README for V3: generation results, QJL findings, community work
2fdeff1View on GitHubAdd TurboQuant V3: MSE-only, asymmetric K/V, bit-packed storage
9f0d9b2View on GitHub