Search Results

Found 6,077 repositories(showing 30)

vllm

vllm-project

💚90

A high-throughput and memory-efficient inference and serving engine for LLMs

75.2k

15.2k

Apache-2.0

Python

Updated 4 minutes ago

amdblackwellcuda+17

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, VLLM, NVIDIA NIM]

42.1k

7.0k

NOASSERTION

Python

Updated 2 minutes ago

ai-gatewayanthropicazure-openai+11

pi-mono

badlogic

💚100

AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods

31.3k

3.4k

MIT

TypeScript

Updated 3 minutes ago

llama-cookbook

meta-llama

💚95

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services

18.3k

2.7k

MIT

Jupyter Notebook

Updated 6 hours ago

aifinetuninglangchain+7

nano-vllm

GeeeekExplorer

💚93

Nano vLLM

12.7k

1.9k

MIT

Python

Updated 5 minutes ago

deep-learninginferencellm+3

OpenRLHF

💛88

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)

9.3k

912

Apache-2.0

Python

Updated 3 hours ago

large-language-modelsopenai-o1proximal-policy-optimization+5

inference

xorbitsai

💛87

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.

9.2k

813

Apache-2.0

Python

Updated 15 minutes ago

artificial-intelligencechatglmdeployment+17

ipex-llm

intel

💛83

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

8.7k

1.4k

Apache-2.0

Python

Updated 2 hours ago

gpullmpytorch+1

LMCache

💛89

Supercharge Your LLM with the Fastest KV Cache Layer

7.9k

1.1k

Apache-2.0

Python

Updated 1 hour ago

amdcudafast+7

AI-Research-SKILLs

Orchestra-Research

💛81

Comprehensive open-source library of AI research and engineering skills for any AI model. Package the skills and your claude code/codex/gemini agent will be an AI research agent with full horsepower. Maintained by Orchestra Research.

6.1k

477

MIT

TeX

Updated 6 minutes ago

aiai-researchclaude+11

UltraRAG

OpenBMB

💛75

A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines

5.5k

411

Apache-2.0

Python

Updated 6 hours ago

deepseekdemoeasy+14

kserve

💛80

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

5.3k

1.4k

Apache-2.0

Updated 6 hours ago

artificial-intelligencecncfgenai+17

sparrow

katanaml

💛80

Structured data extraction and instruction calling with ML, LLM and Vision LLM

5.1k

512

GPL-3.0

Python

Updated 15 hours ago

computer-visiongpthuggingface-transformers+5

Awesome-LLM-Inference

xlite-dev

💛79

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

5.1k

358

GPL-3.0

Python

Updated 2 hours ago

awesome-llmdeepseekdeepseek-r1+11

Mooncake

kvcache-ai

💛77

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

5.0k

651

Apache-2.0

C++

Updated 3 hours ago

disaggregationinferencekvcache+4

cube-studio

tencentmusic

💛84

cube studio开源云原生一站式机器学习/深度学习/大模型AI平台，mlops算法链路全流程，算力租赁平台，notebook在线开发，拖拉拽任务流pipeline编排，多机多卡分布式训练，超参搜索，推理服务VGPU虚拟化，边缘计算，标注平台自动化标注，deepseek等大模型sft微调/奖励模型/强化学习训练，vllm/ollama/mindie大模型多机推理，私有知识库，AI模型市场，支持国产cpu/gpu/npu 昇腾生态，支持RDMA，支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/ray/volcano等分布式

4.9k

870

NOASSERTION

Python

Updated 19 hours ago

aiaihubargo+14

gpustack

💛70

A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.

4.8k

493

Apache-2.0

Python

Updated 10 hours ago

ascendcudadeepseek+15

vllm-omni

vllm-project

🧡66

A framework for efficient model inference with omni-modality models

4.1k

682

Apache-2.0

Python

Updated 49 minutes ago

audio-generationdiffusionimage-generation+6

tiny-llm

skyzh

💛77

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

4.1k

297

Apache-2.0

Python

Updated 21 hours ago

courselarge-language-modelllm+5

FastDeploy

PaddlePaddle

💛71

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

3.7k

735

Apache-2.0

Python

Updated 21 hours ago

ernieernie-45ernie-45-vl+6

semantic-router

vllm-project

💛75

System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge

3.6k

595

Apache-2.0

Updated 48 minutes ago

ai-gatewaybert-classificationfine-tuning+15

llama-swap

mostlygeek

💛75

Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc

3.1k

225

MIT

Updated 4 minutes ago

golangllamallamacpp+5

llm-compressor

vllm-project

💛78

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

3.0k

467

Apache-2.0

Python

Updated 14 hours ago

compressionquantization

ramalama

containers

💛76

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

2.7k

321

MIT

Python

Updated 1 hour ago

aicontainerscuda+8

Model-Optimizer

NVIDIA

🧡66

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

2.4k

332

Apache-2.0

Python

Updated 1 hour ago

production-stack

vllm-project

💛71

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

2.3k

382

Apache-2.0

Python

Updated 1 day ago

cube-studio

data-infra

💛74

cube studio开源云原生一站式机器学习/深度学习/大模型AI平台/MaaS/mlops/人工智能平台/训推平台，算法全链路流程，算力租赁平台，拖拉拽任务流pipeline编排，多机多卡分布式训练，超参搜索，推理服务，VGPU虚拟化，云边端协同，边缘计算，自动化标注平台，deepseek等大模型sft微调/奖励模型/强化学习训练，vllm/ollama/mindie大模型多机推理，私有知识库llmops智能体，AI模型市场，支持国产异构算力调度,昇腾/寒武纪/海光/摩尔/沐曦等，支持ib/roce/RDMA，支持pytorch/deepspeed/colossalai/ray等分布式

2.0k

156

NOASSERTION

Python

Updated 5 hours ago

Aix-DB

apconw

💛71

Aix-DB 基于 LangChain/LangGraph 框架，结合 MCP Skills 多智能体协作架构，实现自然语言到数据洞察的端到端转换。

2.0k

386

JavaScript

Updated 1 day ago

bigdatadeepseek-r1dify+15

vllm-ascend

vllm-project

🧡62

Community maintained hardware plugin for vLLM on Ascend

1.9k

1.0k

Apache-2.0

Python

Updated 12 hours ago

ascendinferencellm+6

InfraTech

CalvinXKY

🧡68

分享AI Infra知识&代码练习：PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等

1.5k

121

Jupyter Notebook

Updated 2 hours ago

GitHub Explorer

Search Results

vllm

litellm

pi-mono

llama-cookbook

nano-vllm

OpenRLHF

inference

ipex-llm

LMCache

AI-Research-SKILLs

UltraRAG

kserve

sparrow

Awesome-LLM-Inference

Mooncake

cube-studio

gpustack

vllm-omni

tiny-llm

FastDeploy

semantic-router

llama-swap

llm-compressor

ramalama

Model-Optimizer

production-stack

cube-studio

Aix-DB

vllm-ascend

InfraTech

vllm

litellm

pi-mono

llama-cookbook

nano-vllm

OpenRLHF

inference

ipex-llm

LMCache

AI-Research-SKILLs

UltraRAG

kserve

sparrow

Awesome-LLM-Inference

Mooncake

cube-studio

gpustack

vllm-omni

tiny-llm

FastDeploy

semantic-router

llama-swap

llm-compressor

ramalama

Model-Optimizer

production-stack

cube-studio

Aix-DB

vllm-ascend

InfraTech