Search Results

Found 41 repositories(showing 30)

SwiftLM

SharpAI

🧡65

⚡ Native MLX Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, + iOS iPhone app.

217

MIT

Swift

Updated 4 hours ago

apple-siliinferenceios+7

Qwen3.5-TurboQuant-MLX-LM

alicankiraz1

🧡65

TurboMLX v0.1 Research Preview public source tree for Qwen3.5-focused MLX TurboQuant experiments.

Apache-2.0

Python

Updated 19 hours ago

turboquant-mlx

arozanov

🧡65

TurboQuant KV cache compression for MLX with fused Metal kernels. 4.6x compression at 98% FP16 speed.

Python

Updated 2 days ago

apple-siliconkv-cachellm+4

turboquant_mlx

helgklaizar

🧡60

Extreme KV Cache Compression (1-3 bit) for LLMs natively on Apple Silicon (MLX). Features TurboQuant, asymmetric PolarQuant caching, and OpenAI server compatibility.

Python

Updated 1 day ago

apple-siliconkv-cachellm+2

turboquant-mlx

sharpner

🧡65

A proof of concept of googles TurboQuant Paper https://arxiv.org/abs/2504.19874

Python

Updated 1 hour ago

turboquant-mlx

DeadByDawn101

💛70

First MLX implementation of TurboQuant KV cache compression for Apple Silicon

MIT

Python

Updated 1 day ago

semafold

mindtro

💛70

Vector compression with TurboQuant codecs for embeddings, retrieval, and KV-cache. 10x compression, pure NumPy core — optional GPU acceleration via PyTorch (CUDA/MPS) or MLX (Metal).

Apache-2.0

Python

Updated 5 days ago

embedding-compressionkv-cachellm-inference+7

gemma4-benchmark

Incept5

🧡65

MLX benchmark: Gemma 4 + Qwen 3.5 on Apple Silicon with TurboQuant KV cache

HTML

Updated 13 hours ago

mlx-turboquant

rachittshah

🧡60

TurboQuant KV cache compression for MLX (Apple Silicon)

Python

Updated 16 hours ago

turboQuantPlayground

yzamari

🧡65

TurboQuant (ICLR 2026) ported to Apple Silicon — KV cache compression with MLX Metal kernels + PyTorch CPU

Python

Updated 2 days ago

apple-siliconattentiondeep-learning+12

qjl-mlx

lingengyuan

🧡55

First MLX / Apple Silicon native implementation of QJL and TurboQuant

Python

Updated 1 week ago

mlx-turboquant

yzamari

🧡65

TurboQuant KV cache compression for MLX-LM — run longer contexts on Apple Silicon with 5x less memory

Python

Updated 2 days ago

turboquant-mlx-

ananyasingh7

🧡55

Based off of https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/ and its implementation via MLX

Python

Updated 1 week ago

turboquant-mlx-full

matt-k-wong

🧡60

No description available

MIT

Python

Updated 5 days ago

turboquant-mlx-qwen35-kv

Flovflo

🧡55

Experimental TurboQuant-inspired KV-cache backend for MLX Qwen3.5 with baseline and MLX KV quantization benchmarks

Python

Updated 1 week ago

apple-siliconkv-cachemlx+2

bonsai-garden-1bit-turboquant-mlx-server

kyr0

💛70

Simple and fast inference server implementation for Prism ML 8B MLX 1bit

MIT

C++

Updated 1 day ago

bonsai-8b-1bit-turboquant

InnovativeCoder

🧡65

PrsimLabs Bonsai 8 bit model on MLX using turboquant, tested on M2 pro giving phenomenal tps

Python

Updated 3 days ago

eden

alex-rentel

💛70

[ARCHIVED] Local AI agent framework for Apple Silicon. Superseded by Claude Code and Ollama MLX. See eden-models and mlx-turboquant for active work.

Apache-2.0

Python

Updated 3 days ago

TurboQuant-Implementation

limitless235

💛70

A from-scratch MLX implementation of TurboQuant for near-optimal, 2.5-bit LLM KV cache compression on Apple Silicon.

MIT

Python

Updated 3 days ago

A lightweight, professional chat UI for running mlx-community vision-language models on Apple Silicon via mlx_vlm. Generates tokens significantly faster than Ollama or LM Studio on the same hardware, with a longer context window and support for modern compression techniques like TurboQuant.

JavaScript

Updated 14 hours ago

tqai

AlphaWaveSystems

🧡55

TurboQuant KV cache compression for local LLM inference — 80% memory savings, near-zero quality loss on 8B+ models. PyTorch + MLX (Apple Silicon). Based on arXiv:2504.19874 (Google Research, ICLR 2026).

MIT

Python

Updated 5 hours ago

apple-siliconcompressiondeep-learning+11

turboquantmlx

l0d0v1c

🧡55

Turboquant mlx implementation experiment

Python

Updated 1 week ago

mlx-turboquant

JoseLuna12

🧡55

MLX proof of concept for TurboQuant-style KV-cache compression, with dense vs quantized inference comparison, batch experiments, and visualization tools for low-bit long-context testing on Apple Silicon.

Jupyter Notebook

Updated 1 week ago

mlx-turboquant

alex-rentel

💛70

Apple Silicon MLX port of TurboQuant — near-optimal KV cache quantization for local LLMs. Original: vivekvar-dl/turboquant (MIT), paper: arXiv:2504.19874

NOASSERTION

Python

Updated 10 hours ago

turboquant-mlx

Shubham-Rasal

🧡65

MLX implementation of Google's TurboQuant KV cache compression

Python

Updated 4 days ago

turboquant-mlx

captainbotgit

🧡55

3-bit KV cache compression for MLX on Apple Silicon (TurboQuant + PolarQuant)

Python

Updated 1 week ago

mlx-turboquant

Ikaikaalika

🧡60

No description available

MIT

Python

Updated 6 days ago

turboquant-mlx

sladebot

🧡55

No description available

Python

Updated 4 days ago

turboquant-mlx

manjunathshiva

💛70

Extreme weight + KV cache compression for LLMs on Apple Silicon (MLX implementation of Google's TurboQuant)

MIT

Python

Updated 21 hours ago

apple-siliconkv-cachellm+3

turboquant-mlx

imleooooo

🧡50

No description available

Apache-2.0

Python

Updated 1 week ago

GitHub Explorer

Search Results

SwiftLM

Qwen3.5-TurboQuant-MLX-LM

turboquant-mlx

turboquant_mlx

turboquant-mlx

turboquant-mlx

semafold

gemma4-benchmark

mlx-turboquant

turboQuantPlayground

qjl-mlx

mlx-turboquant

turboquant-mlx-

turboquant-mlx-full

turboquant-mlx-qwen35-kv

bonsai-garden-1bit-turboquant-mlx-server

bonsai-8b-1bit-turboquant

eden

TurboQuant-Implementation

mlx-chat

tqai

turboquantmlx

mlx-turboquant

mlx-turboquant

turboquant-mlx

turboquant-mlx

mlx-turboquant

turboquant-mlx

turboquant-mlx

turboquant-mlx

SwiftLM

Qwen3.5-TurboQuant-MLX-LM

turboquant-mlx

turboquant_mlx

turboquant-mlx

turboquant-mlx

semafold

gemma4-benchmark

mlx-turboquant

turboQuantPlayground

qjl-mlx

mlx-turboquant

turboquant-mlx-

turboquant-mlx-full

turboquant-mlx-qwen35-kv

bonsai-garden-1bit-turboquant-mlx-server

bonsai-8b-1bit-turboquant

eden

TurboQuant-Implementation

mlx-chat

tqai

turboquantmlx

mlx-turboquant

mlx-turboquant

turboquant-mlx

turboquant-mlx

mlx-turboquant

turboquant-mlx

turboquant-mlx

turboquant-mlx