Search Results

Found 4 repositories(showing 4)

MInference

microsoft

🧡67

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

1.2k

MIT

Python

Updated 5 days ago

minference

amanb2000

❤️35

minimal llm inference servers for researchers

Python

Updated 1 year ago

Minference

TomtheCodeBot

❤️30

No description available

MIT

Python

Updated 1 year ago

sd-minference

WZRP

❤️35

Stable Diffusion Minimal Inference

Python

Updated 2 years ago

All 4 repositories loaded

GitHub Explorer

Search Results

MInference

minference

Minference

sd-minference

MInference

minference

Minference

sd-minference