llama.cpp + TurboQuant CUDA; syncs with ggml-org/llama.cpp
Stars
0
Forks
0
Watchers
0
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
1.7k
commits
409
commits
362
commits
356
commits
265
commits
251
commits
248
commits
103
commits
101
commits
100
commits
docs: OPERATOR-RUNTIME reflects llama-turboquant-runtime + Start-Llama-Server.cmd
9d9bf0fView on GitHubdocs: operator runtime vs fork binary; example models.ini preset
17f2018View on GitHubdocs: MAINTAINING-FORK playbook for upstream sync and canonical build/
066be40View on GitHubdocs: how llama-server relates to this fork (build, naming, packaging)
fccfd3aView on GitHubdocs: center fork on TurboQuant as project purpose; trim secondary Qwen section
5b7911fView on GitHubdocs: clarify TurboQuant as primary in-tree implementation; fix arXiv id
12c8174View on GitHubmerge: origin/feature/turboquant-kv-cache — restore TurboQuant CUDA KV + FA (resolve BF16 vs turbo conflicts)
6e047e1View on GitHubdocs: clarify TurboQuant naming vs implemented changes on master
fed3a75View on GitHubturboquant: snapshot before syncing upstream llama.cpp (2026-04-03)
9952377View on GitHubHIP: build eatch ci build test for a different architecture (#21337)
43a4ee4View on GitHubggml-zendnn : add MUL_MAT_ID op support for MoE models (#21315)
f1ac841View on GitHub