MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.
Stars
14.8k
Forks
2.3k
Watchers
14.8k
Open Issues
38
Overall repository health assessment
No package.json found
This might not be a Node.js project
[LLM:Feature] Add text-level prompt cache for multi-turn chat (#4330)
66fd356View on GitHub[CPU:Feature] Add RISC-V Vector extension (RVV) support and fix tokenizer header (#4331)
ade3d6cView on GitHub[MNN:Feature] Add Moore Threads MUSA Backend Support (#4182)
c857fa2View on GitHub[LLM:Feature] Support Qwen3.5 smooth and omni export (#4336)
a351235View on GitHub[LinearAttention: Feature] support linear attention status load/store in disk
10d2ae9View on GitHub[MNNChat:Bugfix] reuse loaded runtime session for API start in 0.8.2.2 (#4319)
ccd2dbfView on GitHub[CPU:Feature] Add TurboQuant TQ3/TQ4 KV cache quantization
244f5d1View on GitHub[LLM:Bugfix] Fix prefix disk cache not loaded after first response (#4316)
622b3fbView on GitHub523
commits
471
commits
420
commits
225
commits
206
commits
34
commits
32
commits
32
commits
28
commits
23
commits