Found 39 repositories(showing 30)
nicedreamzapp
Run Claude Code with local AI on Apple Silicon. 122B model at 41 tok/s with Google TurboQuant. No cloud, no API fees.
SharpAI
⚡ Native MLX Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, + iOS iPhone app.
cksac
No description available
PacifAIst
Based on the implementation of Google's TurboQuant (ICLR 2026) — Quansloth brings elite KV cache compression to local LLM inference. Quansloth is a fully private, air-gapped AI server that runs massive context models natively on consumer hardware with ease
Alberto-Codes
TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs
AI-Engineering-at
Practical guide: TurboQuant KV-cache quantization for llama.cpp. Run 122B models on consumer GPUs.
Sggin1
Docker containers for AI models on NVIDIA DGX Spark (GB10, SM121, aarch64). TurboQuant KV cache compression + mamba-ssm aarch64 build.
cpahgw-rgb
Local LLM setup for Apple M4 Pro 48GB — TurboQuant KV cache compression + MoE models
savka777
your ai, your rules. — local AI desktop app with hardware-aware model matching, threaded conversations, and TurboQuant integration. no cloud, no subscription, no data leaving your device.
robin-ph
Extreme cache compression for video diffusion model inference — TurboQuant × TeaCache fusion, 10× VRAM reduction with <2% quality loss
audiohacking
Run Claude Code with local AI on Apple Silicon. 122B model at 41 tok/s with Google TurboQuant. No cloud, no API fees.
sammyboi1801
A simple pytorch implementation of turboquant for model comparison
WaveboSF
Model Switcher & Benchmark Tool for llama-server with TurboQuant KV-Cache
HariharanSuthan-A
No description available
selmand
TurboQuant Run larger AI models with longer context on your GPU — powered by Google's TurboQuant KV cache compression.
youngstunners88
Our own AI coding agent with free models, TurboQuant compression, and full tool system
InnovativeCoder
PrsimLabs Bonsai 8 bit model on MLX using turboquant, tested on M2 pro giving phenomenal tps
ahmaddarwesh
A lightweight desktop application for managing and interacting with llama.cpp models through a clean, modern interface - Support TurboQuant technology
RemizovDenis
Portable memory format for LLMs. Store and transfer compressed KV-caches between model architectures without re-computation. Built on TurboQuant-MoE. (Портативный формат памяти для LLM. Храните и передавайте сжатые KV-кэши между архитектурами моделей без пересчета. Построено на TurboQuant-MoE.)
JulCCrum
Step-by-step guide to setting up Hermes Agent with a local AI model, Telegram bot, TurboQuant acceleration, and Claude Code delegation on Mac
alex-rentel
[ARCHIVED] Local AI agent framework for Apple Silicon. Superseded by Claude Code and Ollama MLX. See eden-models and mlx-turboquant for active work.
AlphaWaveSystems
TurboQuant KV cache compression for local LLM inference — 80% memory savings, near-zero quality loss on 8B+ models. PyTorch + MLX (Apple Silicon). Based on arXiv:2504.19874 (Google Research, ICLR 2026).
kavenmartinez1-collab
Local AI platform: WebGPU/WGSL browser inference engine + HuggingFace Transformers + Ollama. TurboQuant KV cache compression, GPTQ INT4 fused dequant, mixed-precision BF16/INT4 for hybrid SSM+attention models. 9B parameters in a browser, 8GB VRAM.
jzh001
A lightweight, professional chat UI for running mlx-community vision-language models on Apple Silicon via mlx_vlm. Generates tokens significantly faster than Ollama or LM Studio on the same hardware, with a longer context window and support for modern compression techniques like TurboQuant.
deharoalexandre-cyber
A generic, policy-driven, multi-model GGUF inference server. TurboQuant-native. CUDA + ROCm
kartikeyaagr
Small Language Model integrating Google's TurboQuant and Apple's Exclusive Self Attention.
YueHuLab
3-bit KV Cache quantization for ESM-2 protein language models via TurboQuant
gotrendwise-com
Run large language models on CPU with **4-6x smaller KV cache** using TurboQuant compression.
Peaky8linders
TurboQuant-powered local-first AI agent runtime — 35B+ models with 64K+ context on consumer hardware
claudlos
TurboQuant CUDA KV cache benchmarks on RTX 3050 Ti (4GB VRAM) — 4 models, tool tests, interactive charts