🚀 LLM inference Engine in Swift/Metal, Load GGUF and safe tensors modes, no conversion, no cpp, pure swift
Stars
25
Forks
0
Watchers
25
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
158
commits
chore: clean up repo structure for public-facing release
b889dccView on GitHubfeat: add simple EdgeRunner facade API with stream and generate methods
f59aec4View on GitHubfix: pass dense V buffer to decode kernel for 4k-16k contexts
a69219eView on GitHubfix: populate dense V cache during prefill for all context lengths
bbbc0b1View on GitHubfix: populate dense V cache during prefill for all context lengths
154fd75View on GitHubfix: dense-V decode path for turboquant quality regression
1d65194View on GitHubfix: disable unimplemented turboquant hybrid-V paths
edb1eacView on GitHubmerge: turboquant KV-cache integration from perf2-turboquant-isolation
c9d8280View on GitHubchore: commit local perf2 changes before turboquant merge
26fbb4cView on GitHubperf: use dense attention for long turboquant prefill — ttft 624029 → 504410 ms with 33.24 tok/s preserved
b9df8ecView on GitHubperf: increase 16k row thinning again — 27.08 → 33.21 tok/s (+22.6%)
40d5994View on GitHubperf: double 16k row thinning — 24.32 → 27.08 tok/s (+11.3%)
7b5fec6View on GitHubperf: keep only top tile value row at 16k — 22.87 → 24.32 tok/s (+6.3%)
f4317c0View on GitHubperf: fix 16k sparse score scaling — 22.33 → 22.87 tok/s (+2.4%)
eff8c12View on GitHub