FlashMLA: Efficient Multi-head Latent Attention Kernels
Stars
12.6k
Forks
1.0k
Watchers
12.6k
Open Issues
95
Overall repository health assessment
No package.json found
This might not be a Node.js project
Change the order of grid dim in bwd convert kernel to avoid overlimit when sequence length is very large(>1M) (#173)
71c7379View on GitHubAdd CUDAGuard and device id assignment in sm100 dense fmha (#160)
47c35a7View on GitHubAdd Deep-Dive Blog for the New Sparse Decoding Kernel on Hopper (#100)
472477eView on GitHubAdd Sparse Decoding Kernel and Sparse Prefill Kernel for Blackwell
fd249aaView on GitHubMerge remote-tracking branch 'github/main' into open-source-h
3969f20View on GitHub17
commits
13
commits
4
commits
3
commits
2
commits
2
commits
1
commits
1
commits
1
commits
1
commits