Search Results

Found 112 repositories(showing 30)

Qwen3.5-TurboQuant-MLX-LM

alicankiraz1

🧡65

TurboMLX v0.1 Research Preview public source tree for Qwen3.5-focused MLX TurboQuant experiments.

100

Apache-2.0

Python

Updated 15 hours ago

TurboQuant

AmesianX

🧡60

TurboQuant KV Cache Compression for llama.cpp — 5.2x memory reduction with near-lossless quality | Implementation of Google DeepMind's TurboQuant (ICLR 2026)

MIT

C++

Updated 9 hours ago

turboQuantDC

dhawalc

🧡60

No description available

MIT

Python

Updated 48 minutes ago

qwen3.5-gemma4-moe-flash-mlx-turbo-quant

philtrem

🧡55

No description available

Rust

Updated 2 hours ago

turbo-quant

RecursiveIntell

🧡65

Rust implementation of TurboQuant, PolarQuant, and QJL — zero-overhead vector quantization for semantic search and KV cache compression (ICLR 2026)

Rust

Updated 1 day ago

TorchTurboQuant

ysnlly

🧡50

No description available

Python

Updated 4 days ago

# TurboQuant v3 (INT4 + AWQ + Protected Channels + Low-Rank) This notebook demonstrates a **TurboQuant-like** quantization algorithm: - Group-wise INT4 quantization - Activation-aware scaling (AWQ-style) - Protected FP16 channels - Optional low-rank correction (SVD)

MIT

Python

Updated 3 days ago

TurboQuant

Firmamento-Technologies

💛70

Near-optimal vector quantization from Google's ICLR 2026 paper — 95% recall, 5x compression, zero preprocessing, pure Python FAISS replacement

Apache-2.0

Python

Updated 6 days ago

ann-searchapproximate-nearest-neighborcompression+17

Ollama-TurboQuant-Integration

Lucien2468

🧡50

TurboQuant: Native 3-Bit Quantization for Ollama - Achieve 25-28% better compression than Q4_0 while maintaining high-speed CPU inference. Experimentally integrated into Ollama with custom GGML kernels for LLM efficiency.

MIT

Updated 13 hours ago

ggmlllamaollama+2

TurboQuantKV

mchintan

💛70

My implementation of the Google TurboQuant paper.

MIT

Python

Updated 10 hours ago

turboQuantPlayground

yzamari

🧡65

TurboQuant (ICLR 2026) ported to Apple Silicon — KV cache compression with MLX Metal kernels + PyTorch CPU

Python

Updated 3 days ago

apple-siliconattentiondeep-learning+12

turbo-quant-lite

varjoranta

🧡50

No description available

MIT

Python

Updated 1 week ago

QuantumLeap---Llama.cpp-TurboQuant

MartinCrespoC

🧡55

🚀 Run any LLM on any hardware. 130% faster MoE inference with ExpertFlow + TurboQuant KV compression. Ollama-compatible API. Built on llama.cpp.

MIT

C++

Updated 2 days ago

aiamd-gpucpp+16

TurboQuant

ray-ruisun

🧡55

reproduction of the core algorithms from: TurboQuant: Online Vector Quantization with Near-Optimal Distortion Rate

Python

Updated 1 week ago

TurboQuant

outmatic

🧡65

High-performance .NET implementation of Google's TurboQuant algorithm (ICLR 2026). Near-optimal vector quantization: compress embeddings to 2-4 bits with cosine > 0.995.

MIT

Updated 1 day ago

compressioncsharpdotnet+8

TurboQuant_practice

minchoCoin

💛70

Implementation and practice of TurboQuant

MIT

Python

Updated 4 days ago

TurboQuant-QLauncher

WaveboSF

💛70

Model Switcher & Benchmark Tool for llama-server with TurboQuant KV-Cache

MIT

Python

Updated 18 hours ago

TurboQuantDB

jyunming

🧡55

Embedded vector database in Rust with Python bindings — TurboQuant algorithm (arXiv:2504.19874), zero training, 2–4 bit compression, HNSW ANN search, WAL persistence

NOASSERTION

Rust

Updated 21 hours ago

embeddingsquantizationrag+3

ComfyUI-TurboQuant

Scottcjn

🧡65

TQ3 KV cache compression for ComfyUI. 4.6x VRAM savings for video generation. Enables LTX-2.3 22B on V100 32GB.

Python

Updated 8 hours ago

turbo-quant-ios

snuri00

🧡60

TurboQuant iOS: Metal-accelerated KV cache compression for on-device LLM inference on iPhone/iPad/Mac

MIT

Swift

Updated 1 week ago

TurboGGUF

rrhoopes3

🧡60

google turbo quant + local models

MIT

Python

Updated 1 week ago

TurboQuant

wanglinteng

🧡55

TurboQuant

Python

Updated 2 weeks ago

TurboQuant

xiehuanyi

🧡55

No description available

Python

Updated 1 day ago

TurboQuant

Alperen012

🧡65

Ultra-Low Bit KV-Cache Compression optimization layer built on top of llama.cpp for LLM inference. Reduces VRAM overhead by ~75-80% using custom CUDA kernels.

C++

Updated 6 days ago

agent-memorycudainference+6

TurboQuant-Reproduction

dengls24

🧡50

Reproduction of TurboQuant (ICLR 2026, arXiv:2504.19874): Online Vector Quantization with Near-optimal Distortion Rate

Python

Updated 23 hours ago

TurboQuant-H

Iro96

🧡65

A more deep research about TurboQuant algorithms

Python

Updated 1 day ago

algorithmsllmllm-quantization+2

TurboQuant-Implementation

limitless235

💛70

A from-scratch MLX implementation of TurboQuant for near-optimal, 2.5-bit LLM KV cache compression on Apple Silicon.

MIT

Python

Updated 3 days ago

Llama-TurboQuant

gotrendwise-com

💛70

Run Large Language Models on CPU with up to 8× less RAM using advanced KV cache compression.

MIT

Python

Updated 3 days ago

3-bit-KV-Cache-Quantization_inspired-by-TurboQuant

dev-sandhu-harsh

🧡55

No description available

Python

Updated 4 days ago

turbo-quant

snuri00

🧡60

TurboQuant: High-performance KV cache quantization for LLM inference. Implementation of Google's TurboQuant (arXiv:2504.19874), QJL, and PolarQuant.

MIT

Python

Updated 1 week ago

GitHub Explorer

Search Results

Qwen3.5-TurboQuant-MLX-LM

TurboQuant

turboQuantDC

qwen3.5-gemma4-moe-flash-mlx-turbo-quant

turbo-quant

TorchTurboQuant

TurboQuant-v3

TurboQuant

Ollama-TurboQuant-Integration

TurboQuantKV

turboQuantPlayground

turbo-quant-lite

QuantumLeap---Llama.cpp-TurboQuant

TurboQuant

TurboQuant

TurboQuant_practice

TurboQuant-QLauncher

TurboQuantDB

ComfyUI-TurboQuant

turbo-quant-ios

TurboGGUF

TurboQuant

TurboQuant

TurboQuant

TurboQuant-Reproduction

TurboQuant-H

TurboQuant-Implementation

Llama-TurboQuant

3-bit-KV-Cache-Quantization_inspired-by-TurboQuant

turbo-quant

Qwen3.5-TurboQuant-MLX-LM

TurboQuant

turboQuantDC

qwen3.5-gemma4-moe-flash-mlx-turbo-quant

turbo-quant

TorchTurboQuant

TurboQuant-v3

TurboQuant

Ollama-TurboQuant-Integration

TurboQuantKV

turboQuantPlayground

turbo-quant-lite

QuantumLeap---Llama.cpp-TurboQuant

TurboQuant

TurboQuant

TurboQuant_practice

TurboQuant-QLauncher

TurboQuantDB

ComfyUI-TurboQuant

turbo-quant-ios

TurboGGUF

TurboQuant

TurboQuant

TurboQuant

TurboQuant-Reproduction

TurboQuant-H

TurboQuant-Implementation

Llama-TurboQuant

3-bit-KV-Cache-Quantization_inspired-by-TurboQuant

turbo-quant