Found 6,077 repositories(showing 30)
vllm-project
A high-throughput and memory-efficient inference and serving engine for LLMs
BerriAI
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, VLLM, NVIDIA NIM]
badlogic
AI agent toolkit: coding agent CLI, unified LLM API, TUI & web UI libraries, Slack bot, vLLM pods
meta-llama
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services
GeeeekExplorer
Nano vLLM
OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)
xorbitsai
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.
intel
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.
LMCache
Supercharge Your LLM with the Fastest KV Cache Layer
Orchestra-Research
Comprehensive open-source library of AI research and engineering skills for any AI model. Package the skills and your claude code/codex/gemini agent will be an AI research agent with full horsepower. Maintained by Orchestra Research.
OpenBMB
A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines
kserve
Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes
katanaml
Structured data extraction and instruction calling with ML, LLM and Vision LLM
xlite-dev
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
kvcache-ai
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
tencentmusic
cube studio开源云原生一站式机器学习/深度学习/大模型AI平台,mlops算法链路全流程,算力租赁平台,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU虚拟化,边缘计算,标注平台自动化标注,deepseek等大模型sft微调/奖励模型/强化学习训练,vllm/ollama/mindie大模型多机推理,私有知识库,AI模型市场,支持国产cpu/gpu/npu 昇腾生态,支持RDMA,支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/ray/volcano等分布式
gpustack
A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.
vllm-project
A framework for efficient model inference with omni-modality models
skyzh
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
PaddlePaddle
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
vllm-project
System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge
mostlygeek
Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc
vllm-project
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
containers
RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
NVIDIA
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
vllm-project
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
data-infra
cube studio开源云原生一站式机器学习/深度学习/大模型AI平台/MaaS/mlops/人工智能平台/训推平台,算法全链路流程,算力租赁平台,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务,VGPU虚拟化,云边端协同,边缘计算,自动化标注平台,deepseek等大模型sft微调/奖励模型/强化学习训练,vllm/ollama/mindie大模型多机推理,私有知识库llmops智能体,AI模型市场,支持国产异构算力调度,昇腾/寒武纪/海光/摩尔/沐曦等,支持ib/roce/RDMA,支持pytorch/deepspeed/colossalai/ray等分布式
apconw
Aix-DB 基于 LangChain/LangGraph 框架,结合 MCP Skills 多智能体协作架构,实现自然语言到数据洞察的端到端转换。
vllm-project
Community maintained hardware plugin for vLLM on Ascend
CalvinXKY
分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等