TurboQuant for GGML: 4.57x KV Cache Compression with 72K+ Context for Llama-3.3-70B on Consumer GPUs.
Stars
33
Forks
8
Watchers
33
Open Issues
1
Overall repository health assessment
No package.json found
This might not be a Node.js project
1.7k
commits
400
commits
362
commits
355
commits
265
commits
248
commits
246
commits
101
commits
101
commits
100
commits
fix: accurately describe as PolarQuant 3-bit, not full TurboQuant with QJL
4381bddView on GitHubreadme: credit unixsysdev's foundational work in intro, list our extensions
78d2629View on GitHubtq3_0 v2+v3: K+V compression with flash attention for 72K+ context
0794f00View on GitHubgguf-split : clarify operation of gguf-split (#19749)
8fc1749View on GitHubwebui: Fix editing assistant message without branching (#20944)
69e0eceView on GitHub