⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.
Stars
3.9k
Forks
327
Watchers
3.9k
Open Issues
36
Overall repository health assessment
No package.json found
This might not be a Node.js project
fix: use native GGUF chat template instead of name-based inference (#188)
b98391eView on GitHubfix: raise default n_ctx from 4096 to 8192 and fix UTF-8 token boundary (#187)
0db9b4aView on GitHubFix Gate 1: Remove GPU backends (not available on GitHub runners), keep vision
ebc163aView on GitHubAdd vision feature to Gate 1 (Linux Kitchen Sink binary)
f2eaec4View on GitHubfix: remove duplicate Linux x86_64 build to prevent artifact conflict
a307fbaView on GitHubfix: remove GPU backends from GitHub runner builds - CPU only
07e31e0View on GitHubfix: remove CUDA from GitHub runner builds (no CUDA toolkit available)
09587e1View on GitHubfix: disable git version check in llama.cpp CMake build
6e3c773View on GitHubfix: patch shimmy-llama-cpp-sys-2 to use git dependency for CI builds
20ab186View on GitHubfix: initialize git submodules for llama.cpp CUDA build
dbeb63cView on GitHubdocs: create shimmy vision sales pipeline consolidation guide
69bc2e8View on GitHubfeat: add critical testing for GPU backend robustness, vision performance, and license pipeline
aad5466View on GitHubchore: add build monitoring script for private repo
124fa68View on GitHub