Search Results

Found 28,587 repositories(showing 30)

vllm

vllm-project

💚90

A high-throughput and memory-efficient inference and serving engine for LLMs

76.0k

15.4k

Apache-2.0

Python

Updated 22 minutes ago

amdblackwellcuda+17

ray

ray-project

💚90

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

42.1k

7.4k

Apache-2.0

Python

Updated 1 hour ago

data-sciencedeep-learningdeployment+17

FastChat

lm-sys

💚95

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

39.4k

4.8k

Apache-2.0

Python

Updated 6 hours ago

sglang

sgl-project

💛85

SGLang is a high-performance serving framework for large language models and multimodal models.

25.6k

5.3k

Apache-2.0

Python

Updated 1 minute ago

attentionblackwellcuda+15

dufs

sigoden

💛80

A file server that supports static serving, uploading, searching, accessing control, webdav...

9.9k

515

NOASSERTION

Rust

Updated 9 hours ago

cloud-diskcommand-linefile-sharing+5

serve

vercel

💛82

Static file serving and directory listing

9.8k

698

MIT

TypeScript

Updated 7 hours ago

clicommandhttp+1

PowerInfer

Tiiny-AI

💛80

High-speed Large Language Model Serving for Local Deployment

9.3k

556

MIT

C++

Updated 22 minutes ago

large-language-modelsllamallm+2

AppImageLauncher

TheAssassin

💛81

Helper application for Linux distributions serving as a kind of "entry point" for running and integrating AppImages

7.9k

333

MIT

C++

Updated 6 hours ago

lmdeploy

InternLM

💛80

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

7.8k

681

Apache-2.0

Python

Updated 5 hours ago

codellamacuda-kernelsdeepspeed+8

clearml

💛79

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution

6.6k

767

Apache-2.0

Python

Updated 20 minutes ago

aiclearmlcontrol+13

dynamo

ai-dynamo

💛82

A Datacenter Scale Distributed Inference Serving Framework

6.5k

1.0k

NOASSERTION

Rust

Updated 1 hour ago

serving

tensorflow

💛86

A flexible, high-performance serving system for machine learning models

6.3k

2.2k

Apache-2.0

C++

Updated 8 hours ago

cppdeep-learningdeep-neural-networks+6

serving

knative

💛83

Kubernetes-based, scale-to-zero, request-driven compute

6.0k

1.2k

Apache-2.0

Updated 1 day ago

appautoscalercontainer+8

ToolBench

OpenBMB

💛80

[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.

5.6k

481

Apache-2.0

Python

Updated 10 hours ago

volcano

volcano-sh

💛79

A Cloud Native Batch System (Project under CNCF)

5.4k

1.3k

Apache-2.0

Updated 3 hours ago

aibatch-systemsbigdata+7

flashinfer

flashinfer-ai

💛74

FlashInfer: Kernel Library for LLM Serving

5.4k

882

Apache-2.0

Python

Updated 4 hours ago

attentioncudadistributed-inference+7

YCSB

brianfrankcooper

💛85

Yahoo! Cloud Serving Benchmark

5.2k

2.3k

Apache-2.0

Java

Updated 1 hour ago

Mooncake

kvcache-ai

💛77

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

5.1k

661

Apache-2.0

C++

Updated 2 hours ago

disaggregationinferencekvcache+4

A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario.

5.0k

611

MIT

C++

Updated 2 days ago

approximate-nearest-neighbor-searchdistributed-servingfresh-update+3

LLM-engineer-handbook

SylphAI-Inc

💛82

A curated list of Large Language Model resources, covering model training, serving, fine-tuning, and building LLM applications.

4.9k

687

MIT

Updated 4 hours ago

RouteLLM

lm-sys

💛73

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality

4.8k

368

Apache-2.0

Python

Updated 6 hours ago

seldon-core

SeldonIO

💛78

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

4.7k

859

NOASSERTION

Updated 3 minutes ago

aiopsdeploymentkubernetes+5

esProc

SPLWare

💛78

esProc SPL is a JVM-based programming language designed for structured data computation, serving as both a data analysis tool and an embedded computing engine.

4.7k

363

Apache-2.0

Java

Updated 11 hours ago

cluster-computingdatabasedataset+3

Deep-Learning-in-Production

ahkarami

🧡66

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

4.4k

692

Updated 1 week ago

angularjsc-plus-pluscaffe2+17

serve

pytorch

🧡63

Serve, optimize and scale PyTorch models in production

4.4k

886

Apache-2.0

Java

Updated 1 week ago

cpudeep-learningdocker+8

tiny-llm

skyzh

💛77

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

4.1k

304

Apache-2.0

Python

Updated 7 hours ago

courselarge-language-modelllm+5

FedML

FedML-AI

💛82

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

4.0k

768

Apache-2.0

Python

Updated 1 day ago

ai-agentdeep-learningdistributed-training+8

LightLLM

ModelTC

💛77

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

4.0k

318

Apache-2.0

Python

Updated 12 hours ago

deep-learninggptllama+4

mini-sglang

sgl-project

💛80

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

4.0k

560

MIT

Python

Updated 32 minutes ago

LitServe

Lightning-AI

💛72

A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.

3.9k

278

Apache-2.0

Python

Updated 6 hours ago

aiapiartificial-intelligence+6

GitHub Explorer

Search Results

vllm

ray

FastChat

sglang

dufs

serve

PowerInfer

AppImageLauncher

lmdeploy

clearml

dynamo

serving

serving

ToolBench

volcano

flashinfer

YCSB

Mooncake

SPTAG

LLM-engineer-handbook

RouteLLM

seldon-core

esProc

Deep-Learning-in-Production

serve

tiny-llm

FedML

LightLLM

mini-sglang

LitServe

vllm

ray

FastChat

sglang

dufs

serve

PowerInfer

AppImageLauncher

lmdeploy

clearml

dynamo

serving

serving

ToolBench

volcano

flashinfer

YCSB

Mooncake

SPTAG

LLM-engineer-handbook

RouteLLM

seldon-core

esProc

Deep-Learning-in-Production

serve

tiny-llm

FedML

LightLLM

mini-sglang

LitServe