🚀 Run any LLM on any hardware. 130% faster MoE inference with ExpertFlow + TurboQuant KV compression. Ollama-compatible API. Built on llama.cpp.
Stars
2
Forks
0
Watchers
2
Open Issues
2
Overall repository health assessment
No package.json found
This might not be a Node.js project
3
commits
fix: Add HIP/CUDA runtime headers to expert_compressor.cpp
877d72fView on GitHubfix: Remove unused parameter warnings in benchmark_polar.cpp
84bc039View on GitHubfix: AVX-512 assembly compilation error and CMake option handling
8a853b5View on GitHubfeat: Add complete llama.cpp fork with ExpertFlow Phase 3 integration
9a420ffView on GitHubfix: Force clang++ for HIP builds and fix FP16 conversion for AMD GPUs
49c012cView on GitHubfeat: Add workspace management system and fix Ollama streaming
6e3426dView on GitHubfix: add missing use_mlock variable and fix setup.sh clone logic
afeab0bView on GitHubfeat: QuantumLeap v0.4.0 - 801% faster LLM inference built on llama.cpp
c36b3abView on GitHub