Found 83 repositories(showing 30)
hellangleZ
Implemented a script that automatically adjusts Qwen3's inference and non-inference capabilities, based on an OpenAI-like API. The inference framework can be sglang, or it can be adapted/modified to use vLLM
igeniusai
Platform-agnostic toolkit to spin up vLLM endpoints and submit high-throughput jobs (DataFrame or scripts) across Slurm and DGX Cloud Lepton.
oteroantoniogom
Automated bash script to set up a high-performance environment on Ubuntu Linux with RTX5090, including installations of PyTorch, Unsloth, vLLM, Triton, Xformers. This script handles system dependencies, creates a Python virtual environment, compiles libraries from source, and verifies installations to ensure an optimal AI and deep learning setup.
sasha0552
CI scripts designed to build a Pascal-compatible version of vLLM.
JetBrains-Hardware
DGX Spark setup and vLLM deployment scripts for Qwen, GPT-OSS, and Nemotron 3.
A terraform based bootstrap script which allows the standup of vllm + ray for Apple Silicon based workloads
DFKI-NLP
Scripts to run large language models with text generation using vLLM.
diabloneo
My dev scripts and documents about vLLM development
brendanmckeag
Benchmarking scripts to run on RunPod to compare/contrast vLLM vs SGLang on the same prompts.
stt-anth
A script for transforming pixtral von HF transformers version to mistral/vLLM version
Hal9000AIML
Ubuntu Server edition: automated setup script for Intel Arc Pro B70 GPU LLM inference server with vLLM tensor parallelism. 140 tok/s on 2x B70, 540 tok/s on 4x B70. For Windows, see arc-pro-b70-inference-setup-windows.
ashleykleynhans
OpenAI Compatible API scripts for RunPod vLLM Worker
SURF-ML
vLLM inference scripts with SLURM Apptainer
vashkelis
vLLM configuration script. Easily find an optimal configuration for your vLLM and GPU, evaluate VRAM usage, and token throughput.
AI-DarwinLabs
🚀 Automated installation script for vLLM on HPC systems with ROCm support, optimized for AMD MI300X GPUs.
belalyahouni
Containerised NVIDIA Dynamo with vLLM Backend Ready-to-use Docker environment for running NVIDIA’s Dynamo inference framework with vLLM. Includes pre-installed dependencies, service setup (etcd, nats-server), and example scripts for running prompts via batch commands. Streamlines LLM inference without local setup overhead.
minkim26
A simple, configurable Bash script to benchmark and compare inference performance between Llama.cpp and vLLM using the OpenAI-compatible `/v1/chat/completions` API
openshift-psap
Scripts for vllm-model-bash efforts
bosung
Scripts for serving vllm on multi node
g-jaffe
Scripts for programatic launching of vllm backends at TACC
teja-rao
simple test scripts for RL between vllm and torchtitan
clawd-xsl
SERA (AI2 Open Coding Agent) setup scripts - vLLM deployment for GPU servers
kevinbazira
Standalone LLM inference benchmarking pipelines on AMD GPUs using ROCm, vLLM, MAD, and data visualization scripts.
kunjcr2
This repo contains a fine-tuned LLaMA 3.2B model served using vLLM and Docker. The project includes a custom OpenAI-style API endpoint, benchmarking scripts, performance metrics, and monitoring setup. Designed for low-latency inference and production-ready LLM deployment.
leideng
vllm sand vllm-ascend scripts
tolgaakar
Small repo for storing vllm server related scripts and files.
tomasruizt
No description available
XinyiQiao
No description available
galtay
No description available
manekiyong
No description available