Search Results

Found 7 repositories(showing 7)

rotorquant

scrya-com

🧡65

KV cache compression via block-diagonal rotation. Beats TurboQuant: better PPL (6.91 vs 7.07), 28% faster decode, 5.3x faster prefill, 44x fewer params. Drop-in llama.cpp integration.

251

Python

Updated 1 hour ago

angruvadal

anna-claudette

🧡65

RAM-Backed MCP Memory Architecture for Consumer LLM Inference — 900K token context on 16GB VRAM

Python

Updated 1 day ago

amdconsumer-gpucontext-window+7

catequese-batismal

paulosbarros

🧡65

Created by RotorQuant agent

HTML

Updated 4 days ago

mlx_rotorquant

kpalastro

🧡50

No description available

MIT

Python

Updated 1 week ago

experiments-kv-cache-compression

windagency

🧡65

Experiments on KV cache compression using RotorQuant / PlanarQuant / IsoQuant

Shell

Updated 3 days ago

qwen-rotor-quantizer

ChiefBoyardee

🧡55

Qwen Rotor Quantizer: RotorQuant-style KV cache compression experiments for Qwen hybrid attention models.

Python

Updated 1 week ago

adaptiverotorquant

PNL-toshiyaishihara

🧡50

RotorQuant: Clifford algebra vector quantization for LLM KV cache compression. 10-19x faster than TurboQuant, 44x fewer parameters.

Updated 5 days ago

All 7 repositories loaded

GitHub Explorer

Search Results

rotorquant

angruvadal

catequese-batismal

mlx_rotorquant

experiments-kv-cache-compression

qwen-rotor-quantizer

adaptiverotorquant

rotorquant

angruvadal

catequese-batismal

mlx_rotorquant

experiments-kv-cache-compression

qwen-rotor-quantizer

adaptiverotorquant