Found 2 repositories(showing 2)
LH-Tech-AI
Optimized nanoGPT using Muon optimizer for 2x faster convergence. Features modern architecture (RoPE, RMSNorm, QK-Norm, ReLU²). bfloat16, torch.compile & Flash Attention ready. The fastest way to train small-to-medium GPTs.
MindIntels
⚡ Production-ready Flash Attention library unifying FlashAttention-2/3/4 + FFPA innovations. including polynomial exp2 emulation, conditional rescaling, ping-pong pipelining, GQA/MQA/MLA, paged KV-cache, block-sparse masking, and Triton auto-tuned GPU kernels
All 2 repositories loaded