Found 41 repositories(showing 30)
SharpAI
โก Native MLX Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, + iOS iPhone app.
alicankiraz1
TurboMLX v0.1 Research Preview public source tree for Qwen3.5-focused MLX TurboQuant experiments.
arozanov
TurboQuant KV cache compression for MLX with fused Metal kernels. 4.6x compression at 98% FP16 speed.
helgklaizar
Extreme KV Cache Compression (1-3 bit) for LLMs natively on Apple Silicon (MLX). Features TurboQuant, asymmetric PolarQuant caching, and OpenAI server compatibility.
sharpner
A proof of concept of googles TurboQuant Paper https://arxiv.org/abs/2504.19874
DeadByDawn101
First MLX implementation of TurboQuant KV cache compression for Apple Silicon
mindtro
Vector compression with TurboQuant codecs for embeddings, retrieval, and KV-cache. 10x compression, pure NumPy core โ optional GPU acceleration via PyTorch (CUDA/MPS) or MLX (Metal).
Incept5
MLX benchmark: Gemma 4 + Qwen 3.5 on Apple Silicon with TurboQuant KV cache
rachittshah
TurboQuant KV cache compression for MLX (Apple Silicon)
yzamari
TurboQuant (ICLR 2026) ported to Apple Silicon โ KV cache compression with MLX Metal kernels + PyTorch CPU
lingengyuan
First MLX / Apple Silicon native implementation of QJL and TurboQuant
yzamari
TurboQuant KV cache compression for MLX-LM โ run longer contexts on Apple Silicon with 5x less memory
ananyasingh7
Based off of https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/ and its implementation via MLX
matt-k-wong
No description available
Flovflo
Experimental TurboQuant-inspired KV-cache backend for MLX Qwen3.5 with baseline and MLX KV quantization benchmarks
Simple and fast inference server implementation for Prism ML 8B MLX 1bit
InnovativeCoder
PrsimLabs Bonsai 8 bit model on MLX using turboquant, tested on M2 pro giving phenomenal tps
alex-rentel
[ARCHIVED] Local AI agent framework for Apple Silicon. Superseded by Claude Code and Ollama MLX. See eden-models and mlx-turboquant for active work.
limitless235
A from-scratch MLX implementation of TurboQuant for near-optimal, 2.5-bit LLM KV cache compression on Apple Silicon.
jzh001
A lightweight, professional chat UI for running mlx-community vision-language models on Apple Silicon via mlx_vlm. Generates tokens significantly faster than Ollama or LM Studio on the same hardware, with a longer context window and support for modern compression techniques like TurboQuant.
AlphaWaveSystems
TurboQuant KV cache compression for local LLM inference โ 80% memory savings, near-zero quality loss on 8B+ models. PyTorch + MLX (Apple Silicon). Based on arXiv:2504.19874 (Google Research, ICLR 2026).
l0d0v1c
Turboquant mlx implementation experiment
JoseLuna12
MLX proof of concept for TurboQuant-style KV-cache compression, with dense vs quantized inference comparison, batch experiments, and visualization tools for low-bit long-context testing on Apple Silicon.
alex-rentel
Apple Silicon MLX port of TurboQuant โ near-optimal KV cache quantization for local LLMs. Original: vivekvar-dl/turboquant (MIT), paper: arXiv:2504.19874
Shubham-Rasal
MLX implementation of Google's TurboQuant KV cache compression
captainbotgit
3-bit KV cache compression for MLX on Apple Silicon (TurboQuant + PolarQuant)
Ikaikaalika
No description available
sladebot
No description available
manjunathshiva
Extreme weight + KV cache compression for LLMs on Apple Silicon (MLX implementation of Google's TurboQuant)
imleooooo
No description available