Back to search
High-performance LLM inference engine — drop-in replacement for Ollama with faster multi-turn inference, lower TTFT, and higher throughput through prefix caching and continuous batching.
Stars
123
Forks
15
Watchers
123
Open Issues
1
Overall repository health assessment
No package.json found
This might not be a Node.js project
7
commits