Found 18 repositories(showing 18)
tonbistudio
From-scratch PyTorch implementation of Google's TurboQuant (ICLR 2026) for LLM KV cache compression. 5x compression at 3-bit with 99.5% attention fidelity.
hackimov
Open-source PyTorch implementation of Google TurboQuant (ICLR 2026) — extreme KV-cache quantization to ~3 bits with zero accuracy loss. 6x less memory, up to 8x faster inference.
mindtro
Vector compression with TurboQuant codecs for embeddings, retrieval, and KV-cache. 10x compression, pure NumPy core — optional GPU acceleration via PyTorch (CUDA/MPS) or MLX (Metal).
varjoranta
TurboQuant+ KV cache compression for vLLM. 3.8x smaller KV cache, same conversation quality. Fused CUDA kernels with automatic PyTorch fallback.
Arclabs001
Yet Another TurboQuant in PyTorch (YATQ) is a PyTorch implementation of TurboQuant for KV cache compression, following the paper TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate (ICLR 2026). With HuggingFace interface supported.
codepawl
PyTorch implementation of TurboQuant. Near-optimal vector quantization for KV cache compression and vector search. 3-bit with zero accuracy loss.
yzamari
TurboQuant (ICLR 2026) ported to Apple Silicon — KV cache compression with MLX Metal kernels + PyTorch CPU
lakshmana64
PyTorch toolkit for TurboQuant-based unbiased vector quantization, LLM KV-cache compression, and embedding retrieval.
sammyboi1801
A simple pytorch implementation of turboquant for model comparison
az9713
TurboQuant PyTorch implementation + deep 9,500-word tutorial. Fork of tonbistudio/turboquant-pytorch enhanced with comprehensive educational materials covering theory, math, and code.
BFinn
PyTorch implementation of TurboQuant (ICLR 2026) — two-stage KV cache vector quantization for LLM inference". Suggested topics: kv-cache, llm, quantization, vllm, pytorch, transformer.
sridharnandigam
No description available
gduchidze
From-Scratch Pytorch Implementation of Google's TurboQuant for LLM KV Cache Compression. 5x compression at 3-bit with 99.5% fidelity.
gaetanX21
Simple PyTorch implementation of the TurboQuant quantization algorithm.
anchitgupt
TurboQuant for PyTorch — Near-optimal vector quantization for LLM KV cache compression
ZhuShuairong
JAX implementation of https://github.com/tonbistudio/turboquant-pytorch from https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/
G26karthik
PyTorch implementation of TurboQuant (Google Research, 2026) for KV cache compression - 3.3× compression with only +6.1% perplexity degradation on GPT-2 Medium.
hammurabi-coder
TurboQuant KV-cache compression ported to Intel Arc B580 (XPU) via Triton — pure PyTorch fallback path. Triton kernel port pending fix for tl.gather materialization bug.
All 18 repositories loaded