Found 112 repositories(showing 30)
alicankiraz1
TurboMLX v0.1 Research Preview public source tree for Qwen3.5-focused MLX TurboQuant experiments.
AmesianX
TurboQuant KV Cache Compression for llama.cpp โ 5.2x memory reduction with near-lossless quality | Implementation of Google DeepMind's TurboQuant (ICLR 2026)
dhawalc
No description available
No description available
RecursiveIntell
Rust implementation of TurboQuant, PolarQuant, and QJL โ zero-overhead vector quantization for semantic search and KV cache compression (ICLR 2026)
ysnlly
No description available
Kubenew
# TurboQuant v3 (INT4 + AWQ + Protected Channels + Low-Rank) This notebook demonstrates a **TurboQuant-like** quantization algorithm: - Group-wise INT4 quantization - Activation-aware scaling (AWQ-style) - Protected FP16 channels - Optional low-rank correction (SVD)
Firmamento-Technologies
Near-optimal vector quantization from Google's ICLR 2026 paper โ 95% recall, 5x compression, zero preprocessing, pure Python FAISS replacement
Lucien2468
TurboQuant: Native 3-Bit Quantization for Ollama - Achieve 25-28% better compression than Q4_0 while maintaining high-speed CPU inference. Experimentally integrated into Ollama with custom GGML kernels for LLM efficiency.
mchintan
My implementation of the Google TurboQuant paper.
yzamari
TurboQuant (ICLR 2026) ported to Apple Silicon โ KV cache compression with MLX Metal kernels + PyTorch CPU
varjoranta
No description available
MartinCrespoC
๐ Run any LLM on any hardware. 130% faster MoE inference with ExpertFlow + TurboQuant KV compression. Ollama-compatible API. Built on llama.cpp.
ray-ruisun
reproduction of the core algorithms from: TurboQuant: Online Vector Quantization with Near-Optimal Distortion Rate
outmatic
High-performance .NET implementation of Google's TurboQuant algorithm (ICLR 2026). Near-optimal vector quantization: compress embeddings to 2-4 bits with cosine > 0.995.
minchoCoin
Implementation and practice of TurboQuant
WaveboSF
Model Switcher & Benchmark Tool for llama-server with TurboQuant KV-Cache
jyunming
Embedded vector database in Rust with Python bindings โ TurboQuant algorithm (arXiv:2504.19874), zero training, 2โ4 bit compression, HNSW ANN search, WAL persistence
Scottcjn
TQ3 KV cache compression for ComfyUI. 4.6x VRAM savings for video generation. Enables LTX-2.3 22B on V100 32GB.
snuri00
TurboQuant iOS: Metal-accelerated KV cache compression for on-device LLM inference on iPhone/iPad/Mac
rrhoopes3
google turbo quant + local models
wanglinteng
TurboQuant
xiehuanyi
No description available
Alperen012
Ultra-Low Bit KV-Cache Compression optimization layer built on top of llama.cpp for LLM inference. Reduces VRAM overhead by ~75-80% using custom CUDA kernels.
dengls24
Reproduction of TurboQuant (ICLR 2026, arXiv:2504.19874): Online Vector Quantization with Near-optimal Distortion Rate
Iro96
A more deep research about TurboQuant algorithms
limitless235
A from-scratch MLX implementation of TurboQuant for near-optimal, 2.5-bit LLM KV cache compression on Apple Silicon.
gotrendwise-com
Run Large Language Models on CPU with up to 8ร less RAM using advanced KV cache compression.
dev-sandhu-harsh
No description available
snuri00
TurboQuant: High-performance KV cache quantization for LLM inference. Implementation of Google's TurboQuant (arXiv:2504.19874), QJL, and PolarQuant.