First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.
Stars
20
Forks
5
Watchers
20
Open Issues
5
Overall repository health assessment
No package.json found
This might not be a Node.js project
20
commits
feat: v0.3.0 — asymmetric K/V bits, layer-adaptive precision, deprecate QJL
acef33bView on GitHubfeat: 4-bit nibble packing — halves index storage, 1 GB saved at 4K context
fe86073View on GitHubmarketing: add strategy, blog post, reddit posts, portfolio blog plan
efef4e2View on GitHubdocs: fix last stale refs — PyPI 0.1.0→0.2.0, roadmap status update
47d1238View on GitHubfeat: compressed index storage — real KV cache compression (v0.2.0)
78d8f34View on GitHubdocs: accuracy sweep — fix stale numbers, update references, add research index
5aab56bView on GitHubbench: add cross-architecture + long-context data (42 total data points)
94cbb3eView on GitHubdocs: full doc system — architecture, reference, workflow, codebase map
c444342View on GitHub