Search Results

Found 25 repositories(showing 25)

Triton-distributed

ByteDance-Seed

🧡68

Distributed Compiler based on Triton for Parallel Systems

1.4k

136

MIT

Python

Updated 1 day ago

End-to-end MLOps architecture built for polyp segmentation — featuring distributed Ray training, MLflow experiment tracking, and automated CI/CD with Kubeflow Pipelines and KServe (Triton) deployment on Google Kubernetes Engine.

Python

Updated 1 month ago

dockerfastapigoogle-kubernetes-engine+11

distributed-ai-inference-server-meta

ahr-i

❤️40

This system operates in a distributed environment using Nvidia Triton.

MIT

Updated 1 year ago

aidistributed-systemsmeta-repository

kernel-learning

caroline430

🧡65

Everything-in-one-place GPU kernel programming for ML engineers. Covers CUDA, Triton, Flash Attention 2/3, paged attention, Mamba, speculative decoding, FP8/AWQ/GPTQ quantization, cuBLAS/cuDNN/CUTLASS, Nsight profiling, PyTorch custom ops, FSDP, tensor parallelism, and distributed training. SOTA 2026.

Updated 1 day ago

triton-inference-benchmark

WaffleBits

🧡55

Distributed Inference Benchmarking Tool for NVIDIA Triton Server

Python

Updated 3 weeks ago

distributed-pipeline-yolo

666keke

🧡55

Production-ready distributed YOLO inference pipeline powered by NVIDIA Triton Inference Server. Supports Kubernetes orchestration and Docker deployment.

Python

Updated 2 weeks ago

distributed-computingdockeredge-computing+6

triton_distributed

piotrm-nvidia

❤️30

No description available

Apache-2.0

Updated 1 year ago

triton-distributed-tutorial

Irving1113

❤️30

triton-distributed-tutorial

HTML

Updated 10 months ago

triton_distributed

triton-inference

❤️30

No description available

Apache-2.0

Rust

Updated 9 months ago

tdbench

xuzhao9

❤️35

Benchmark for Triton-Distributed

Python

Updated 5 months ago

distributed-triton-tube

tongxili

❤️35

No description available

Updated 1 month ago

triton-distributed-inference-sla-simulator

puniomp

❤️35

Triton distributed inference SLA simulator

Python

Updated 3 months ago

triton-cpp-distributed-inference

blockneural

🧡60

C++-based distributed AI inference system using NVIDIA Triton with gateway, scheduler, and blockchain-based payment integration on Ethereum.

GPL-3.0

Updated 2 weeks ago

Triton-HTTP-Distributed-Server

Echo13Bear

❤️25

No description available

Updated 3 years ago

Triton-Tube-Distributed-Video-Hosting-Using-Consistent-Hashing

shivani-chinta

❤️35

A simple YouTube-like video hosting platform made scalable, consistent-hash-based distributed storage Built using Go, gRPC, and SQLite.

HTML

Updated 9 months ago

Flash-Attention-2-Triton

parth-shettiwar

❤️35

Developing Flash Attention 2 in Triton and Distributed data parallel training

MIT

Python

Updated 3 months ago

blog-cuda-onnx-tensorrt-triton

sriramgkn

❤️40

Codes for my blog on distributed training (CUDA, ONNX, TensorRT, Triton)

MIT

Jupyter Notebook

Updated 1 year ago

triton-perf

theBeginner86

❤️20

A distributed performance benchmark engine for ASR workloads on Triton Inference Servers

Updated 1 year ago

load-testingperformance-analysistriton+1

triton-agent

blockneural

🧡60

Agent service for managing Triton inference containers, coordinating with gateway and scheduler for distributed AI workloads.

GPL-3.0

Updated 2 weeks ago

triton-mpi-optimization

kennethvuongcode

❤️35

Optimized Triton-based matrix multiplication kernel with ReLU and addition, plus MPI-based tensor and data parallel communication for distributed training.

Jupyter Notebook

Updated 1 year ago

Llm-training-serving-infra

sasi-chappidi

🧡55

Built an end-to-end LLM infrastructure project with PyTorch, distributed training, FastAPI serving, benchmarking, ONNX export, and Triton-compatible deployment structure.

Python

Updated 3 weeks ago

systems-transformer-optimizations

fadibenz

❤️35

This project implements systems-level optimizations for transformer training, including custom Triton kernels, PyTorch distributed training, optimizer state sharding, and memory/latency benchmarking tools.

Python

Updated 8 months ago

BPE-Transformer

milasd

🧡55

Implementation of the Byte-Pair Encoding Tokenizer, RoPE Embeddings, Transformer LLM distributed training & inference from scratch w/ PyTorch (and MLX), with a Flash Attention 2 Triton kernel.

Python

Updated 3 weeks ago

bpebpe-encoderbpe-tokenizer+8

Smart-traffic-system

nguyenhuyenkiohna

🧡65

A high-performance, distributed video analytics framework for Smart City traffic monitoring. Optimized with YOLOv11, TensorRT, and ByteTrack. Architecture powered by Apache Kafka and Triton Inference Server for scalable, real-time vehicle Re-ID and analytics.

Python

Updated 2 days ago

ai-infra-learning

dagc-ai

🧡55

Hands-on AI infrastructure from the ground up: GPU memory hierarchy, CUDA kernel optimization, Triton, distributed training, and inference serving. Real benchmarks across the full compute stack, from naive kernels to Groq LPUs, Tenstorrent, AMD MI300X, and Google TPU

Python

Updated 1 week ago

aigpugpu-acceleration+13

All 25 repositories loaded

GitHub Explorer

Search Results

Triton-distributed

polyp-segmentation-mlops

distributed-ai-inference-server-meta

kernel-learning

triton-inference-benchmark

distributed-pipeline-yolo

triton_distributed

triton-distributed-tutorial

triton_distributed

tdbench

distributed-triton-tube

triton-distributed-inference-sla-simulator

triton-cpp-distributed-inference

Triton-HTTP-Distributed-Server

Triton-Tube-Distributed-Video-Hosting-Using-Consistent-Hashing

Flash-Attention-2-Triton

blog-cuda-onnx-tensorrt-triton

triton-perf

triton-agent

triton-mpi-optimization

Llm-training-serving-infra

systems-transformer-optimizations

BPE-Transformer

Smart-traffic-system

ai-infra-learning

Triton-distributed

polyp-segmentation-mlops

distributed-ai-inference-server-meta

kernel-learning

triton-inference-benchmark

distributed-pipeline-yolo

triton_distributed

triton-distributed-tutorial

triton_distributed

tdbench

distributed-triton-tube

triton-distributed-inference-sla-simulator

triton-cpp-distributed-inference

Triton-HTTP-Distributed-Server

Triton-Tube-Distributed-Video-Hosting-Using-Consistent-Hashing

Flash-Attention-2-Triton

blog-cuda-onnx-tensorrt-triton

triton-perf

triton-agent

triton-mpi-optimization

Llm-training-serving-infra

systems-transformer-optimizations

BPE-Transformer

Smart-traffic-system

ai-infra-learning