Found 7 repositories(showing 7)
scrya-com
KV cache compression via block-diagonal rotation. Beats TurboQuant: better PPL (6.91 vs 7.07), 28% faster decode, 5.3x faster prefill, 44x fewer params. Drop-in llama.cpp integration.
anna-claudette
RAM-Backed MCP Memory Architecture for Consumer LLM Inference — 900K token context on 16GB VRAM
paulosbarros
Created by RotorQuant agent
kpalastro
No description available
windagency
Experiments on KV cache compression using RotorQuant / PlanarQuant / IsoQuant
ChiefBoyardee
Qwen Rotor Quantizer: RotorQuant-style KV cache compression experiments for Qwen hybrid attention models.
PNL-toshiyaishihara
RotorQuant: Clifford algebra vector quantization for LLM KV cache compression. 10-19x faster than TurboQuant, 44x fewer parameters.
All 7 repositories loaded