Search Results

Found 39 repositories(showing 30)

claude-code-local

nicedreamzapp

🧡65

Run Claude Code with local AI on Apple Silicon. 122B model at 41 tok/s with Google TurboQuant. No cloud, no API fees.

237

Python

Updated 18 hours ago

SwiftLM

SharpAI

🧡65

⚡ Native MLX Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, + iOS iPhone app.

218

MIT

Swift

Updated 12 minutes ago

apple-siliinferenceios+7

turboquant-model

cksac

🧡50

No description available

172

Python

Updated 14 hours ago

Based on the implementation of Google's TurboQuant (ICLR 2026) — Quansloth brings elite KV cache compression to local LLM inference. Quansloth is a fully private, air-gapped AI server that runs massive context models natively on consumer hardware with ease

Apache-2.0

Python

Updated 13 minutes ago

cudaquanslothturboquant+1

turboquant-vllm

Alberto-Codes

🧡55

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

Apache-2.0

Python

Updated 3 hours ago

compressionconsumer-gpuinference-optimization+7

llama-cpp-turboquant-guide

AI-Engineering-at

💛70

Practical guide: TurboQuant KV-cache quantization for llama.cpp. Run 122B models on consumer GPUs.

NOASSERTION

Shell

Updated 1 hour ago

spark-ai-containers

Sggin1

🧡65

Docker containers for AI models on NVIDIA DGX Spark (GB10, SM121, aarch64). TurboQuant KV cache compression + mamba-ssm aarch64 build.

Python

Updated 10 hours ago

aarch64blackwelldgx-spark+6

turboquant_plus-M4Pro48GB

cpahgw-rgb

🧡65

Local LLM setup for Apple M4 Pro 48GB — TurboQuant KV cache compression + MoE models

Shell

Updated 5 days ago

orbit

savka777

💛70

your ai, your rules. — local AI desktop app with hardware-aware model matching, threaded conversations, and TurboQuant integration. no cloud, no subscription, no data leaving your device.

NOASSERTION

TypeScript

Updated 1 day ago

aidesktop-appelectron+7

vedioquant

robin-ph

🧡55

Extreme cache compression for video diffusion model inference — TurboQuant × TeaCache fusion, 10× VRAM reduction with <2% quality loss

Apache-2.0

Python

Updated 6 days ago

claude-code-local

audiohacking

🧡55

Run Claude Code with local AI on Apple Silicon. 122B model at 41 tok/s with Google TurboQuant. No cloud, no API fees.

Python

Updated 1 week ago

turboquant-serve

sammyboi1801

🧡65

A simple pytorch implementation of turboquant for model comparison

Python

Updated 4 days ago

TurboQuant-QLauncher

WaveboSF

💛70

Model Switcher & Benchmark Tool for llama-server with TurboQuant KV-Cache

MIT

Python

Updated 11 hours ago

Turboquant-model-comparision

HariharanSuthan-A

❤️45

No description available

Jupyter Notebook

Updated 1 week ago

turboquant-llamacpp

selmand

💛70

TurboQuant Run larger AI models with longer context on your GPU — powered by Google's TurboQuant KV cache compression.

MIT

Python

Updated 12 hours ago

openclaude-code

youngstunners88

🧡55

Our own AI coding agent with free models, TurboQuant compression, and full tool system

NOASSERTION

JavaScript

Updated 5 days ago

bonsai-8b-1bit-turboquant

InnovativeCoder

🧡65

PrsimLabs Bonsai 8 bit model on MLX using turboquant, tested on M2 pro giving phenomenal tps

Python

Updated 3 days ago

llama.cpp-gui

ahmaddarwesh

🧡65

A lightweight desktop application for managing and interacting with llama.cpp models through a clean, modern interface - Support TurboQuant technology

Rust

Updated 21 hours ago

tqk-llm

RemizovDenis

💛70

Portable memory format for LLMs. Store and transfer compressed KV-caches between model architectures without re-computation. Built on TurboQuant-MoE. (Портативный формат памяти для LLM. Храните и передавайте сжатые KV-кэши между архитектурами моделей без пересчета. Построено на TurboQuant-MoE.)

NOASSERTION

Python

Updated 3 days ago

compressioninferencekv-cache+5

hermes-setup-guide

JulCCrum

🧡55

Step-by-step guide to setting up Hermes Agent with a local AI model, Telegram bot, TurboQuant acceleration, and Claude Code delegation on Mac

Updated 1 week ago

eden

alex-rentel

💛70

[ARCHIVED] Local AI agent framework for Apple Silicon. Superseded by Claude Code and Ollama MLX. See eden-models and mlx-turboquant for active work.

Apache-2.0

Python

Updated 3 days ago

tqai

AlphaWaveSystems

🧡55

TurboQuant KV cache compression for local LLM inference — 80% memory savings, near-zero quality loss on 8B+ models. PyTorch + MLX (Apple Silicon). Based on arXiv:2504.19874 (Google Research, ICLR 2026).

MIT

Python

Updated 6 hours ago

apple-siliconcompressiondeep-learning+11

Artifex-Assistantv5

kavenmartinez1-collab

🧡55

Local AI platform: WebGPU/WGSL browser inference engine + HuggingFace Transformers + Ollama. TurboQuant KV cache compression, GPTQ INT4 fused dequant, mixed-precision BF16/INT4 for hybrid SSM+attention models. 9B parameters in a browser, 8GB VRAM.

MIT

Python

Updated 2 days ago

ai-assistantgpullm+7

mlx-chat

jzh001

🧡65

A lightweight, professional chat UI for running mlx-community vision-language models on Apple Silicon via mlx_vlm. Generates tokens significantly faster than Ollama or LM Studio on the same hardware, with a longer context window and support for modern compression techniques like TurboQuant.

JavaScript

Updated 15 hours ago

EIE

deharoalexandre-cyber

💛70

A generic, policy-driven, multi-model GGUF inference server. TurboQuant-native. CUDA + ROCm

NOASSERTION

C++

Updated 1 day ago

apache-2cudagguf+7

Quark

kartikeyaagr

🧡65

Small Language Model integrating Google's TurboQuant and Apple's Exclusive Self Attention.

Python

Updated 6 days ago

TurboESM

YueHuLab

🧡60

3-bit KV Cache quantization for ESM-2 protein language models via TurboQuant

NOASSERTION

Python

Updated 1 week ago

VBC-Engine

gotrendwise-com

🧡55

Run large language models on CPU with **4-6x smaller KV cache** using TurboQuant compression.

Updated 1 week ago

nexusai

Peaky8linders

💛70

TurboQuant-powered local-first AI agent runtime — 35B+ models with 64K+ context on consumer hardware

MIT

Rust

Updated 2 days ago

turboquant-3050ti-benchmarks

claudlos

🧡55

TurboQuant CUDA KV cache benchmarks on RTX 3050 Ti (4GB VRAM) — 4 models, tool tests, interactive charts

Python

Updated 1 week ago

GitHub Explorer

Search Results

claude-code-local

SwiftLM

turboquant-model

Quansloth

turboquant-vllm

llama-cpp-turboquant-guide

spark-ai-containers

turboquant_plus-M4Pro48GB

orbit

vedioquant

claude-code-local

turboquant-serve

TurboQuant-QLauncher

Turboquant-model-comparision

turboquant-llamacpp

openclaude-code

bonsai-8b-1bit-turboquant

llama.cpp-gui

tqk-llm

hermes-setup-guide

eden

tqai

Artifex-Assistantv5

mlx-chat

EIE

Quark

TurboESM

VBC-Engine

nexusai

turboquant-3050ti-benchmarks

claude-code-local

SwiftLM

turboquant-model

Quansloth

turboquant-vllm

llama-cpp-turboquant-guide

spark-ai-containers

turboquant_plus-M4Pro48GB

orbit

vedioquant

claude-code-local

turboquant-serve

TurboQuant-QLauncher

Turboquant-model-comparision

turboquant-llamacpp

openclaude-code

bonsai-8b-1bit-turboquant

llama.cpp-gui

tqk-llm

hermes-setup-guide

eden

tqai

Artifex-Assistantv5

mlx-chat

EIE

Quark

TurboESM

VBC-Engine

nexusai

turboquant-3050ti-benchmarks