Found 469 repositories(showing 30)
TheTom
No description available
tonbistudio
From-scratch PyTorch implementation of Google's TurboQuant (ICLR 2026) for LLM KV cache compression. 5x compression at 3-bit with 99.5% attention fidelity.
0xSero
TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
mitkox
vLLM TurboQuant
quantumaikr
LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.
scrya-com
KV cache compression via block-diagonal rotation. Beats TurboQuant: better PPL (6.91 vs 7.07), 28% faster decode, 5.3x faster prefill, 44x fewer params. Drop-in llama.cpp integration.
nicedreamzapp
Run Claude Code with local AI on Apple Silicon. 122B model at 41 tok/s with Google TurboQuant. No cloud, no API fees.
SharpAI
โก Native MLX Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, + iOS iPhone app.
teamchong
TurboQuant WASM SIMD vector compression โ 3 bits/dim with fast dot product. Requires relaxed SIMD (Chrome 114+, Firefox 128+, Safari 18+, Node 20+)
DevTechJr
No description available
cksac
No description available
alicankiraz1
TurboMLX v0.1 Research Preview public source tree for Qwen3.5-focused MLX TurboQuant experiments.
arozanov
TurboQuant KV cache compression for MLX with fused Metal kernels. 4.6x compression at 98% FP16 speed.
PacifAIst
Based on the implementation of Google's TurboQuant (ICLR 2026) โ Quansloth brings elite KV cache compression to local LLM inference. Quansloth is a fully private, air-gapped AI server that runs massive context models natively on consumer hardware with ease
OnlyTerp
First open-source implementation of Google TurboQuant (ICLR 2026) -- near-optimal KV cache compression for LLM inference. 5x compression with near-zero quality loss.
DevTechJr
turboquant-based compression engine for LLM KV cache
botirk38
Library for Google's Turboquant Algorithm
OmarHory
Open-source implementation of Google's TurboQuant (ICLR 2026) โ KV cache compression to 2.5โ4 bits with near-zero quality loss. 3.8โ5.7x memory reduction on Mistral-7B, no training required.
Dynamis-Labs
3% Is All You Need: Breaking TurboQuant's Compression Limit via Spectral Structure
kumar045
No description available
hackimov
Open-source PyTorch implementation of Google TurboQuant (ICLR 2026) โ extreme KV-cache quantization to ~3 bits with zero accuracy loss. 6x less memory, up to 8x faster inference.
ericcurtin
A TurboQuant inference server
animehacker
TurboQuant for GGML: 4.57x KV Cache Compression with 72K+ Context for Llama-3.3-70B on Consumer GPUs.
unixsysdev
No description available
dhawalc
No description available
helgklaizar
Extreme KV Cache Compression (1-3 bit) for LLMs natively on Apple Silicon (MLX). Features TurboQuant, asymmetric PolarQuant caching, and OpenAI server compatibility.
Alberto-Codes
TurboQuant KV cache compression plugin for vLLM โ asymmetric K/V, 8 models validated, consumer GPUs
AmesianX
TurboQuant KV Cache Compression for llama.cpp โ 5.2x memory reduction with near-lossless quality | Implementation of Google DeepMind's TurboQuant (ICLR 2026)
onur-gokyildiz-bhi
Pure Rust implementation of Google's TurboQuant (ICLR 2026) โ KV cache compression for LLMs
sharpner
A proof of concept of googles TurboQuant Paper https://arxiv.org/abs/2504.19874