Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.
Stars
9.2k
Forks
814
Watchers
9.2k
Open Issues
64
Overall repository health assessment
No package.json found
This might not be a Node.js project
267
commits
170
commits
167
commits
137
commits
105
commits
96
commits
92
commits
90
commits
61
commits
51
commits
Fix #4597: [Bug] v2.0.0 Docker image: ImportError (circular import) a... (#4757)
2418604View on GitHubENH: auto-detect PyTorch CUDA version for virtual environment setup (#4766)
b2f52eaView on GitHubENH: update 2 models JSON ("Qwen3-ASR-0.6B", "Qwen3-ASR-1.7B") (#4765)
d94c733View on GitHubfix: add variable to control template for Qwen3 Reranker Family (#4752)
69705c2View on GitHubbld: Fix the front-end UI access issue for aarch64 image (#4749)
0a336acView on GitHubbld: Fix the front-end UI access issue for aarch64 image (#4743)
0a7d4a4View on GitHubfix: use constant-time comparison for auth credentials (CWE-208) (#4734)
e6cfebaView on GitHub