Development repository for the Triton language and compiler
Stars
18.8k
Forks
2.7k
Watchers
18.8k
Open Issues
1.1k
Overall repository health assessment
No package.json found
This might not be a Node.js project
763
commits
539
commits
450
commits
289
commits
263
commits
222
commits
180
commits
153
commits
131
commits
123
commits
Revert "[language] Skip f16 to f32 promotion in max/min reductions" (#9921)
f97f66aView on GitHub[AMD] fix AsyncTDMCopyLocalToGlobalOp::verify bug about multi-cta (#9918)
8956d90View on GitHubrestrict prefetch pass to sm_80 target to fix performance regression (#9913)
01d2b7eView on GitHub[Standard] Skip f16 to f32 promotion in max/min reductions (#9903)
eab0c65View on GitHub[AMD][gfx1250] Pack f32 arith ops to use v_pk_* intrinsics (#9899)
d42e028View on GitHub[AMD] Generalize in-thread tree reduction to support ternary grouping for max/min (#9897)
bc79129View on GitHub[triton_kernels] Add other= to masked scale loads in _matmul.py to match _p_matmul.py (#9911)
7da7311View on GitHub[Gluon] Functional multi-CTA block scale MMA support (#9896)
798d24cView on GitHub[AMD] Make arith.select handling in canonicalize pointers more Robust (#8779)
4c69c79View on GitHub[PROTON][TEST] Add cupti graph replay heap growth repro (#9881)
e12b07aView on GitHub[AMD][Backend] Add OptimizeDescriptorEncoding pass for AMDGPU (#9792)
f50c8dfView on GitHub