TurboQuant KV cache compression for local LLM inference — 80% memory savings, near-zero quality loss on 8B+ models. PyTorch + MLX (Apple Silicon). Based on arXiv:2504.19874 (Google Research, ICLR 2026).
Stars
1
Forks
0
Watchers
1
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
41
commits
Merge pull request #6 from AlphaWaveSystems/fix/release-workflow
35110adView on GitHubfix(ci): make Release workflow idempotent + extract per-version changelog
cd9c346View on GitHubMerge pull request #5 from AlphaWaveSystems/feature/fisher-calibration
e017f7eView on GitHubfeat(fisher): Step #3 — offline gradient-based Fisher calibration
faaae52View on GitHubMerge pull request #4 from AlphaWaveSystems/feature/long-context-kv-memory
cef79cfView on GitHubdocs: correct K4/V2 memory framing across README, paper, where_tqai_shines
c1ff724View on GitHubdocs(report): Step #4 finding — K4/V2 does NOT save peak runtime memory
d2af811View on GitHubfix(benchmark_kv_memory): subprocess isolation + MLX-native memory metric
d09776cView on GitHubfeat(benchmarks): KV cache memory benchmark for long contexts (Step #4)
71b349dView on GitHubMerge pull request #3 from AlphaWaveSystems/feature/distilled-video
7f50938View on GitHubdocs(report): WAN 2.2 5B step sweep results — Step #1 finding
6c9f2ccView on GitHubfeat(dit): video preset system + step-sweep benchmark (Step #1)
89af781View on GitHubdocs(paper): update tqai_paper.md and .tex with v0.4 architecture
0d6d757View on GitHub