Practical guide: TurboQuant KV-cache quantization for llama.cpp. Run 122B models on consumer GPUs.
Stars
3
Forks
1
Watchers
3
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
No contributors data available
docs: overhaul README — architecture diagram, fixed anchors, VRAM visualization, bilingual
740999fView on GitHubresults: RTX 3090 consolidated — 4 runs, 15 measurements, avg -7.5% TPS
a99aedcView on GitHubresults: consolidate 4070 Laptop — 2 independent sessions, avg -4.6% TPS
bf29ad1View on GitHubresults: add verified RTX 4070 Laptop benchmark + cross-GPU comparison table
78e199bView on GitHubresults: add RTX 4070 Laptop 8GB benchmark (Llama-3.1 8B)
3e856b1View on GitHubfix: add GitHub repo link prominently in HF Space README
f0318bbView on GitHubresults: add v2 verification run — confirms v1 within measurement variance
571e9fbView on GitHubInitial release: TurboQuant practical guide for consumer hardware
87efc66View on GitHub