[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Stars
3.3k
Forks
388
Watchers
3.3k
Open Issues
175
Overall repository health assessment
No package.json found
This might not be a Node.js project
102
commits
34
commits
15
commits
9
commits
6
commits
5
commits
4
commits
3
commits
2
commits
1
commits
Revert "Merge pull request #218 from guilhermeleobas/guilhermeleobas/torch-compile"
e5bf6eeView on GitHubRevert "Merge pull request #279 from guilhermeleobas/guilhermeleobas/fix-device"
35747c3View on GitHubRevert "pip install sageattention==2.2.0 --no-build-isolation"
c20aed1View on GitHub