Found 21 repositories(showing 21)
Relaxed-System-Lab
🚀🚀 Efficient implementations of Native Sparse Attention
HKUSTDial
Trainable fast and memory-efficient sparse attention
No description available
No description available
deep-spin
AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)
Danielohayon
No description available
alexdremov
Fast, flexible, and chill sparse flash attention kernel
mvideet
Developing CUDA Kernel for Adaptive Sparse Flash Attention (Goncalves et. al)
gabrielmaialva33
Pure Gleam tensor library with quantization (INT8, NF4, AWQ), Flash Attention, and 2:4 Sparsity - 7.5x memory multiplication
li-guohao
Adaptive Sparse Attention Module with Flash Attention - 5.45x speedup on consumer GPUs
raayandhar
Implementation of Sparse Flash (Splash) Attention in CUDA. FP32, nothing production grade.
BhoumikPatidar
MLSys 2026 NVIDIA Track: FlashInfer-Bench Contest - Deepseek Sparse Attention
pranay5255
No description available
raayandhar
Sparse Causal Flash Attention. QK-sparse and Hash-sparse attention kernels.
Anonymous44414
No description available
ykirpichev
No description available
benzhang0323
No description available
reachtarunhere
No description available
tranhohoangvu
Deep Learning coursework (2025): attention mechanisms (Self/Flash/Linear/Sparse) and OCR with ResNet + Transformer Decoder.
NguyenQuangTrung19
Deep Learning final project exploring advanced attention mechanisms in LLMs (self-attention, MQA, GQA, Flash/linear/sparse attention, RoPE) with PyTorch demos, plus a CNN + Transformer-Decoder OCR model for image-to-text with evaluation on test data.
MindIntels
⚡ Production-ready Flash Attention library unifying FlashAttention-2/3/4 + FFPA innovations. including polynomial exp2 emulation, conditional rescaling, ping-pong pipelining, GQA/MQA/MLA, paged KV-cache, block-sparse masking, and Triton auto-tuned GPU kernels
All 21 repositories loaded