Search Results

Found 68 repositories(showing 30)

rotorquant

scrya-com

🧡66

KV cache compression via block-diagonal rotation. Beats TurboQuant: better PPL (6.91 vs 7.07), 28% faster decode, 5.3x faster prefill, 44x fewer params. Drop-in llama.cpp integration.

712

Python

Updated 1 hour ago

TurboQuant

AmesianX

🧡65

TurboQuant KV Cache Compression for llama.cpp — 5.2x memory reduction with near-lossless quality | Implementation of Google DeepMind's TurboQuant (ICLR 2026)

MIT

C++

Updated 10 hours ago

llama-turboquant

unixsysdev

🧡50

No description available

MIT

C++

Updated 2 hours ago

llama-turboquant

animehacker

🧡65

TurboQuant for GGML: 4.57x KV Cache Compression with 72K+ Context for Llama-3.3-70B on Consumer GPUs.

MIT

C++

Updated 4 hours ago

Turboquant-llama

gamogestionweb

🧡60

No description available

MIT

Shell

Updated 4 days ago

prism-ml-biturbo

nisten

🧡55

1bit llama.cpp gguf weights paired with turboquant 4 bit kv cache

MIT

C++

Updated 1 week ago

llama.cpp-turboquant-hip

domvox

🧡60

TurboQuant KV cache compression for llama.cpp — HIP/ROCm port for AMD RDNA3 (gfx1100)

MIT

C++

Updated 9 hours ago

llama-cpp-turboquant-gemma4

test1111111111111112

🧡60

TurboQuant llama.cpp fork with optimized turbo4 kernels for Gemma 4 D=256/512 heads — lazy K/V, batch decode, warp-cooperative write. 120 t/s with 3.8x KV compression on RTX 3090.

MIT

C++

Updated 4 hours ago

local-ai-coding-setup

jamesarslan

🧡65

Complete local AI coding pipeline: Qwen3.5-35B-A3B + llama-server + TurboQuant + OpenCode + Context7 MCP + Chrome DevTools. 188 t/s on RTX 5090, zero cloud APIs.

Shell

Updated 4 hours ago

Llama.cpp-turboquant

M-Baraa-Mardini

🧡60

No description available

Apache-2.0

Updated 4 days ago

turbo-cli

md-exitcode0

🧡55

One-click LLM server with TurboQuant Llama CPP engine

Python

Updated 2 hours ago

llama-cpp-turboquant-guide

AI-Engineering-at

💛70

Practical guide: TurboQuant KV-cache quantization for llama.cpp. Run 122B models on consumer GPUs.

NOASSERTION

Shell

Updated 3 hours ago

Fused Triton kernels for TurboQuant KV cache compression — 2-4 bit quantization with RHT rotation. Drop-in HuggingFace & vLLM integration. Up to 4.9x KV cache compression for Llama, Qwen, Mistral, and more.

Apache-2.0

Python

Updated 1 week ago

attentioncompressioncuda+17

llama-cpp-turboquant-win-build-script

pjsgsy

🧡55

Simple all in one build script for llama-cpp-turboquant on Windows 11.

Batchfile

Updated 1 week ago

QuantumLeap---Llama.cpp-TurboQuant

MartinCrespoC

🧡55

🚀 Run any LLM on any hardware. 130% faster MoE inference with ExpertFlow + TurboQuant KV compression. Ollama-compatible API. Built on llama.cpp.

MIT

C++

Updated 16 hours ago

aiamd-gpucpp+16

turboquant-amd-vulkan

jimliddle

🧡50

A TurboQuant implementation with Llama.cpp for AMD with Vulkan runtime

C++

Updated 1 day ago

amdkvcachellms+2

turboquant-rocm-llamacpp

jagsan-cyber

🧡65

World's first TurboQuant KV cache compression for llama.cpp on AMD ROCm (RX 9070 / gfx1201)

Updated 22 hours ago

llama-docker

pdscomp

💛70

🦙 Docker template for running llama.cpp llama-server in router mode with NVIDIA CUDA and AMD Vulkan GPU acceleration. Features TurboQuant KV cache optimization, long context support (up to 256K tokens), and optimized configurations for 24GB+ VRAM cards.

MIT

Dockerfile

Updated 1 day ago

aicudadocker+5

Nexus-Inference-Engine-

JoelHJames1

🧡65

NEXUS: Production C++ inference engine for Apple Silicon. Run 400B+ LLMs on your Mac via layer streaming, Metal GPU compute, TurboQuant KV compression, NXF format, MoE routing, and Neural Engine speculative decoding. Faster than AirLLM, more capable than llama.cpp.

C++

Updated 3 days ago

apple-siliconcppinference-engine+5

turboquant-llama-lab

pp1840

🧡60

Experimental TurboQuant implementation and llama.cpp-style integration path for long-context inference

Apache-2.0

C++

Updated 1 week ago

cudeguffinference+7

llama-cpp-turboquant-windows

AylaTheTanuki

🧡50

Pre-compiled Windows binaries and CMake fixes for the experimental TurboQuant branch (with Gemma 4 support)

CMake

Updated 1 day ago

TurboQuant-QLauncher

WaveboSF

💛70

Model Switcher & Benchmark Tool for llama-server with TurboQuant KV-Cache

MIT

Python

Updated 7 hours ago

turboquant-llama

CarapaceUDE

💛70

llama.cpp fork: Qwen 3.5 hybrid GGUF + loader fixes; syncs with ggml-org/llama.cpp

MIT

C++

Updated 2 days ago

turboquant-llamacpp

selmand

💛70

TurboQuant Run larger AI models with longer context on your GPU — powered by Google's TurboQuant KV cache compression.

MIT

Python

Updated 5 days ago

llama-cpp-turboquant

atomicmilkshake

💛70

llama.cpp fork with TurboQuant quantization (turbo2/3/4) and TriAttention GPU-accelerated KV cache pruning. 75 tok/s on Qwen3-8B / RTX 3080.

MIT

C++

Updated 3 days ago

cudaggmlinference+7

Llama-TurboQuant

gotrendwise-com

🧡60

Run Large Language Models on CPU with up to 8× less RAM using advanced KV cache compression.

MIT

Python

Updated 1 week ago

llama.cpp-1bit-prism-turboquant

benardayim

💛70

a llama.cpp fork combining PrismML's 1-bit kernels with TurboQuant KV cache compression.

MIT

C++

Updated 4 days ago

jac-llm-server

Clifford-Swartz

🧡65

Pre-built llama-server with pmem-tier + TurboQuant KV cache compression for JAC

Updated 1 day ago

llama.cpp-gui

ahmaddarwesh

🧡65

A lightweight desktop application for managing and interacting with llama.cpp models through a clean, modern interface - Support TurboQuant technology

Rust

Updated 5 days ago

TurboQuant-Vulkan

tsuyu122

💛70

TurboQuant Vulkan: 3-bit KV cache quantization for llama.cpp using Lloyd-Max Gaussian codebooks. 4.57x compression, Vulkan GPU support (AMD/Intel/NVIDIA). Hobby project.

AGPL-3.0

C++

Updated 7 hours ago

amdglslgpu+7

GitHub Explorer

Search Results

rotorquant

TurboQuant

llama-turboquant

llama-turboquant

Turboquant-llama

prism-ml-biturbo

llama.cpp-turboquant-hip

llama-cpp-turboquant-gemma4

local-ai-coding-setup

Llama.cpp-turboquant

turbo-cli

llama-cpp-turboquant-guide

fused-turboquant

llama-cpp-turboquant-win-build-script

QuantumLeap---Llama.cpp-TurboQuant

turboquant-amd-vulkan

turboquant-rocm-llamacpp

llama-docker

Nexus-Inference-Engine-

turboquant-llama-lab

llama-cpp-turboquant-windows

TurboQuant-QLauncher

turboquant-llama

turboquant-llamacpp

llama-cpp-turboquant

Llama-TurboQuant

llama.cpp-1bit-prism-turboquant

jac-llm-server

llama.cpp-gui

TurboQuant-Vulkan

rotorquant

TurboQuant

llama-turboquant

llama-turboquant

Turboquant-llama

prism-ml-biturbo

llama.cpp-turboquant-hip

llama-cpp-turboquant-gemma4

local-ai-coding-setup

Llama.cpp-turboquant

turbo-cli

llama-cpp-turboquant-guide

fused-turboquant

llama-cpp-turboquant-win-build-script

QuantumLeap---Llama.cpp-TurboQuant

turboquant-amd-vulkan

turboquant-rocm-llamacpp

llama-docker

Nexus-Inference-Engine-

turboquant-llama-lab

llama-cpp-turboquant-windows

TurboQuant-QLauncher

turboquant-llama

turboquant-llamacpp

llama-cpp-turboquant

Llama-TurboQuant

llama.cpp-1bit-prism-turboquant

jac-llm-server

llama.cpp-gui

TurboQuant-Vulkan