Found 9 repositories(showing 9)
deepspeedai
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
hzg0601
dev for deepspeed-mii
heiko-hotz
Experiments with DeepSpeed MII library
heiko-hotz
Testing DeepSpeed MII
PenguinQwQ
DeepSpeed-MII for UniSpar Project
sfc-gh-vichan
test
tonyzhao-jt
Modificiation of MII
slinusc
Launch your own high-performance DeepSpeed-MII server for seamless local LLM deployment. This repository provides a Dockerized solution to serve Hugging Face models (e.g., Mistral-7B) with an OpenAI-compatible API, enabling GPU-accelerated, low-latency inference out of the box.
henryekeocha
Reproducible benchmarking suite for LLM inference stacks (vLLM, TGI, llama.cpp, Ollama, DeepSpeed-MII) on Kubernetes. Measures throughput, latency, GPU utilisation, and cost-per-token under production-grade K8s conditions including auto-scaling and pod scheduling overhead.
All 9 repositories loaded