Found 5,676 repositories(showing 30)
triton-lang
Development repository for the Triton language and compiler
triton-inference-server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Efficient Triton Kernels for LLM Training
JonathanSalwan
Triton is a dynamic binary analysis library. Build your own program analysis tools, automate your reverse engineering, perform software verification or just emulate code.
thu-ml
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
gpu-mode
Puzzles for learning Triton
TritonHo
it is a repository to store all slides used by Triton Ho's public presentation and course.
ELS-RD
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
ByteDance-Seed
Distributed Compiler based on Triton for Parallel Systems
TritonDataCenter
Triton DataCenter: a cloud management platform with first class support for containers.
TritonDataCenter
A service for autodiscovery and configuration of applications running in containers
fla-org
๐ณ Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
flagos-ai
FlagGems is an operator library for large language models implemented in the Triton Language.
triton-inference-server
The Triton TensorRT-LLM Backend
rpcpool
Triton's Dragon's Mouth Yellowstone gRPC service for high-performance Solana streaming
RightNow-AI
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
JonathanSalwan
Playing with the Tigress software protection. Break some of its protections and solve their reverse engineering challenges. Automatic deobfuscation using symbolic execution, taint analysis and LLVM.
triton-inference-server
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
triton-inference-server
This repository contains tutorials and examples for Triton Inference Server
coderonion
๐๐๐A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.
0xSero
TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
triton-inference-server
Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
triton-inference-server
Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.
SiriusNEO
Puzzles for learning Triton, play it with minimal environment configuration!
BobMcDear
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
HKUSTDial
Trainable fast and memory-efficient sparse attention
JafarAkhondali
Linux kernel module to support Turbo mode and RGB Keyboard for Acer Predator notebook series
triton-inference-server
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
66RING
flash attention tutorial written in python, triton, cuda, cutlass
rkinas
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.