flash attention tutorial written in python, triton, cuda, cutlass
Stars
498
Forks
53
Watchers
Open Issues
11
Overall repository health assessment
No package.json found
This might not be a Node.js project
User
32
commits
1
fast cpu attn archive
dacda43
add torch only example
cfdaac7
Merge pull request #15 from KevinZeng08/fix-torch-2.6
e5e8ef5
cuda: fix compatibility with torch 2.6
22c5991
add mit license
9c954cf
upload cpu attention in pure cpp
b4e48c1
update README and pin cutlass version
8caf186
upload tutorial
3207b03
36ac7c7
update roadmap
0a2a175
cuda: niave flash2 standalone impl
86d4a80
cutlass flash attn python binding
305a947
cutlass impl build standalone
54e2968
smem limit bug. cuda error check bug fix.
ec2f833
cutlass flash attention bug fix
92e3eb3