Open-source PyTorch implementation of Google TurboQuant (ICLR 2026) — extreme KV-cache quantization to ~3 bits with zero accuracy loss. 6x less memory, up to 8x faster inference.
Stars
38
Forks
4
Watchers
38
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
18
commits
feat(search): add TurboQuant VectorIndex API with tests and docs
74fea85View on GitHubAdd 1.5/2.5-bit TurboQuant KV support (outlier channel allocation)
39c853cView on GitHubdocs(vllm): clarify TurboQuant KV vs FA3/Hopper and add smoke checklist
9e8f86cView on GitHub