📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Stars
5.1k
Forks
358
Watchers
5.1k
Open Issues
1
Overall repository health assessment
No package.json found
This might not be a Node.js project
466
commits
9
commits
3
commits
3
commits
2
commits
2
commits
2
commits
2
commits
2
commits
1
commits
Add ToolPipe: 120+ Free Developer Tools API & MCP Server (#165)
2d01a65View on GitHubAdd AVP: Agent Vector Protocol (KV-cache transfer between agents) (#163)
82bcc03View on GitHubAdd POWER8 LLM inference and NUMA weight banking projects (#161)
5582a8bView on GitHubAdd Grail-V: Non-bijunctive attention on POWER8 (8.8x speedup) (#159)
ad28104View on GitHubAdd Off Grid - on-device LLM inference for mobile (#158)
cd66e77View on GitHub