KV cache compression via block-diagonal rotation. Beats TurboQuant: better PPL (6.91 vs 7.07), 28% faster decode, 5.3x faster prefill, 44x fewer params. Drop-in llama.cpp integration.
Stars
247
Forks
24
Watchers
247
Open Issues
5
Overall repository health assessment
No package.json found
This might not be a Node.js project
61
commits
1
commits
Add RaBitQ module + comprehensive 1-bit/2-bit benchmark suite
616cb93View on GitHubUpdate CLAUDE.md with llama.cpp status, PPL results, and TODOs
1ba8989View on GitHubUpdate README: speed benchmarks, architecture evolution, commit history
7511721View on GitHubUpdate README: symmetric 3-bit PPL results beat TurboQuant
61154aeView on GitHubAdd Llama 3.1 8B benchmarks: 239 tok/s decode, PPL 8.44, 4% faster than FP16
6ce8c03View on GitHubUpdate README: delete Mac Metal numbers, add authoritative CUDA PPL
fc29d06View on GitHubAdd post-prefill PPL benchmarks: IsoQuant 4-bit 9.03, PlanarQuant 3-bit 10.12
ec98f4bView on GitHubRestore RotorQuant trivector centroids, add CUDA PPL to README
0c98c28View on GitHubAdd wikitext download step to PPL benchmark instructions
bfa6022View on GitHubUpdate README with PPL benchmarks: iso3 is 2.6-43x better than turbo3
a195f9bView on GitHub