LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
Stars
1.1k
Forks
174
Watchers
1.1k
Open Issues
51
Overall repository health assessment
No package.json found
This might not be a Node.js project
Handle mtp `prefix/files`in `out_of_model_tensors` (#2677)
2ae956cView on GitHubExtend OpenVINO's GPTQ patcher to understand GPTQModel new kernels. (#2675)
c8b0140View on GitHubRefactor input capture flow into BaseQModel and model-specific QModels (#2666)
defffb0View on GitHub1.0k
commits
609
commits
409
commits
264
commits
161
commits
126
commits
93
commits
54
commits
48
commits
40
commits