Back to search
Ultra-Low Bit KV-Cache Compression optimization layer built on top of llama.cpp for LLM inference. Reduces VRAM overhead by ~75-80% using custom CUDA kernels.
Stars
1
Forks
0
Watchers
1
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
6
commits
Initial commit: Add TurboQuant with integrated llama.cpp and README
0121e3dView on GitHubdocs: Update README with Quick Start and Fused CUDA performance metrics
fbfef59View on GitHub