Found 28,587 repositories(showing 30)
vllm-project
A high-throughput and memory-efficient inference and serving engine for LLMs
ray-project
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
lm-sys
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
sgl-project
SGLang is a high-performance serving framework for large language models and multimodal models.
sigoden
A file server that supports static serving, uploading, searching, accessing control, webdav...
vercel
Static file serving and directory listing
Tiiny-AI
High-speed Large Language Model Serving for Local Deployment
TheAssassin
Helper application for Linux distributions serving as a kind of "entry point" for running and integrating AppImages
InternLM
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
clearml
ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
ai-dynamo
A Datacenter Scale Distributed Inference Serving Framework
tensorflow
A flexible, high-performance serving system for machine learning models
knative
Kubernetes-based, scale-to-zero, request-driven compute
OpenBMB
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
volcano-sh
A Cloud Native Batch System (Project under CNCF)
flashinfer-ai
FlashInfer: Kernel Library for LLM Serving
brianfrankcooper
Yahoo! Cloud Serving Benchmark
kvcache-ai
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
microsoft
A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario.
SylphAI-Inc
A curated list of Large Language Model resources, covering model training, serving, fine-tuning, and building LLM applications.
lm-sys
A framework for serving and evaluating LLM routers - save LLM costs without compromising quality
SeldonIO
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
SPLWare
esProc SPL is a JVM-based programming language designed for structured data computation, serving as both a data analysis tool and an embedded computing engine.
ahkarami
In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
pytorch
Serve, optimize and scale PyTorch models in production
skyzh
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
FedML-AI
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
ModelTC
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
sgl-project
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Lightning-AI
A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.