Back to search
Flash Attention from scratch, tiled CUDA forward kernel, online softmax with running max and correction factor, recomputation trick in backward, O(N) memory, full forward and backward verified against PyTorch autograd to 1e-6.
Stars
16
Forks
1
Watchers
16
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
4
commits