Search Results

Found 469 repositories(showing 30)

turboquant_plus

TheTom

🧡69

No description available

5.8k

797

Apache-2.0

Python

Updated 2 minutes ago

turboquant-pytorch

tonbistudio

💛72

From-scratch PyTorch implementation of Google's TurboQuant (ICLR 2026) for LLM KV cache compression. 5x compression at 3-bit with 99.5% attention fidelity.

850

108

MIT

Python

Updated 9 minutes ago

turboquant

0xSero

💛72

TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration

820

GPL-3.0

Python

Updated 9 minutes ago

vllm-turboquant

mitkox

💛71

vLLM TurboQuant

452

Apache-2.0

Python

Updated 9 minutes ago

quant.cpp

quantumaikr

🧡66

LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.

281

Apache-2.0

Updated 16 minutes ago

delta-compressionembeddablegguf+7

rotorquant

scrya-com

🧡65

KV cache compression via block-diagonal rotation. Beats TurboQuant: better PPL (6.91 vs 7.07), 28% faster decode, 5.3x faster prefill, 44x fewer params. Drop-in llama.cpp integration.

253

Python

Updated 11 minutes ago

claude-code-local

nicedreamzapp

🧡65

Run Claude Code with local AI on Apple Silicon. 122B model at 41 tok/s with Google TurboQuant. No cloud, no API fees.

236

Python

Updated 8 hours ago

SwiftLM

SharpAI

🧡65

⚡ Native MLX Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, + iOS iPhone app.

213

MIT

C++

Updated 1 hour ago

apple-siliinferenceios+7

turboquant-wasm

teamchong

💛70

TurboQuant WASM SIMD vector compression — 3 bits/dim with fast dot product. Requires relaxed SIMD (Chrome 114+, Firefox 128+, Safari 18+, Node 20+)

207

MIT

Zig

Updated 31 minutes ago

turboquant-gpu

DevTechJr

🧡60

No description available

177

MIT

Python

Updated 19 minutes ago

turboquant-model

cksac

🧡50

No description available

169

Python

Updated 6 hours ago

Qwen3.5-TurboQuant-MLX-LM

alicankiraz1

🧡65

TurboMLX v0.1 Research Preview public source tree for Qwen3.5-focused MLX TurboQuant experiments.

Apache-2.0

Python

Updated 22 hours ago

turboquant-mlx

arozanov

🧡65

TurboQuant KV cache compression for MLX with fused Metal kernels. 4.6x compression at 98% FP16 speed.

Python

Updated 1 day ago

apple-siliconkv-cachellm+4

Based on the implementation of Google's TurboQuant (ICLR 2026) — Quansloth brings elite KV cache compression to local LLM inference. Quansloth is a fully private, air-gapped AI server that runs massive context models natively on consumer hardware with ease

Apache-2.0

Python

Updated 7 hours ago

cudaquanslothturboquant+1

turboquant

OnlyTerp

🧡65

First open-source implementation of Google TurboQuant (ICLR 2026) -- near-optimal KV cache compression for LLM inference. 5x compression with near-zero quality loss.

MIT

Python

Updated 10 hours ago

attentioncompressiondeep-learning+13

turboquant_cutile

DevTechJr

🧡65

turboquant-based compression engine for LLM KV cache

Python

Updated 7 hours ago

turboquant

botirk38

🧡65

Library for Google's Turboquant Algorithm

Zig

Updated 31 minutes ago

turboquant

OmarHory

💛70

Open-source implementation of Google's TurboQuant (ICLR 2026) — KV cache compression to 2.5–4 bits with near-zero quality loss. 3.8–5.7x memory reduction on Mistral-7B, no training required.

MIT

Python

Updated 1 day ago

spectralquant

Dynamis-Labs

🧡65

3% Is All You Need: Breaking TurboQuant's Compression Limit via Spectral Structure

MIT

Python

Updated 1 hour ago

compressionkv-cachelarge-language-models+7

turboquant_implementation

kumar045

🧡60

No description available

MIT

Python

Updated 14 hours ago

turboquant-kv

hackimov

💛70

Open-source PyTorch implementation of Google TurboQuant (ICLR 2026) — extreme KV-cache quantization to ~3 bits with zero accuracy loss. 6x less memory, up to 8x faster inference.

Apache-2.0

Python

Updated 2 days ago

inferrs

ericcurtin

🧡65

A TurboQuant inference server

Apache-2.0

Rust

Updated 2 minutes ago

llama-turboquant

animehacker

💛70

TurboQuant for GGML: 4.57x KV Cache Compression with 72K+ Context for Llama-3.3-70B on Consumer GPUs.

MIT

C++

Updated 4 hours ago

llama-turboquant

unixsysdev

🧡50

No description available

MIT

C++

Updated 15 hours ago

turboQuantDC

dhawalc

🧡60

No description available

MIT

Python

Updated 1 day ago

turboquant_mlx

helgklaizar

🧡60

Extreme KV Cache Compression (1-3 bit) for LLMs natively on Apple Silicon (MLX). Features TurboQuant, asymmetric PolarQuant caching, and OpenAI server compatibility.

Python

Updated 7 hours ago

apple-siliconkv-cachellm+2

turboquant-vllm

Alberto-Codes

🧡50

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

Apache-2.0

Python

Updated 11 hours ago

compressionconsumer-gpuinference-optimization+7

TurboQuant

AmesianX

🧡60

TurboQuant KV Cache Compression for llama.cpp — 5.2x memory reduction with near-lossless quality | Implementation of Google DeepMind's TurboQuant (ICLR 2026)

MIT

C++

Updated 40 minutes ago

tq-kv

onur-gokyildiz-bhi

💛70

Pure Rust implementation of Google's TurboQuant (ICLR 2026) — KV cache compression for LLMs

Apache-2.0

Rust

Updated 9 hours ago

turboquant-mlx

sharpner

🧡65

A proof of concept of googles TurboQuant Paper https://arxiv.org/abs/2504.19874

Python

Updated 9 hours ago

GitHub Explorer

Search Results

turboquant_plus

turboquant-pytorch

turboquant

vllm-turboquant

quant.cpp

rotorquant

claude-code-local

SwiftLM

turboquant-wasm

turboquant-gpu

turboquant-model

Qwen3.5-TurboQuant-MLX-LM

turboquant-mlx

Quansloth

turboquant

turboquant_cutile

turboquant

turboquant

spectralquant

turboquant_implementation

turboquant-kv

inferrs

llama-turboquant

llama-turboquant

turboQuantDC

turboquant_mlx

turboquant-vllm

TurboQuant

tq-kv

turboquant-mlx

turboquant_plus

turboquant-pytorch

turboquant

vllm-turboquant

quant.cpp

rotorquant

claude-code-local

SwiftLM

turboquant-wasm

turboquant-gpu

turboquant-model

Qwen3.5-TurboQuant-MLX-LM

turboquant-mlx

Quansloth

turboquant

turboquant_cutile

turboquant

turboquant

spectralquant

turboquant_implementation

turboquant-kv

inferrs

llama-turboquant

llama-turboquant

turboQuantDC

turboquant_mlx

turboquant-vllm

TurboQuant

tq-kv

turboquant-mlx