Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.
Stars
8.7k
Forks
1.4k
Watchers
8.7k
Open Issues
1.5k
Overall repository health assessment
No package.json found
This might not be a Node.js project
366
commits
335
commits
251
commits
228
commits
162
commits
134
commits
126
commits
126
commits
119
commits
110
commits
docs: update recommended Ubuntu version from 24.10 to 25.04 with warning (#13313)
13d05bdView on GitHubFix PSIRT Vulnerability - Dependency Confusion in oneccl_bind_pt package (#13305)
6d89c82View on GitHubTo avoid errors caused by a Transformers version that is too new. (#13291)
25e1709View on GitHub[Doc] Add note about avoiding sourcing oneAPI for flashmoe and llama.cpp portable zip (#13274)
891e1f5View on GitHubupdate quickstart md related to llama.cpp/ollama (#13265)
951c237View on GitHub