MInference

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

MIT License

Created on May 22, 2024

Updated on Mar 31, 2026

Stars

1.2k

Forks

78

Watchers

1.2k

Open Issues

90

Repository Health Score

🧡

67/100

Fair

Overall repository health assessment

Score Breakdown

Activity

Active development - updated this week

30/30

100%

Issues Analytics

57

Total Issues

All time

42

Open

74% of total

15

Closed

Recent Commits

Merge pull request #205 from HalberdOfPineapple/wenxuan/mtrain

Zhiyuan He•3 weeks ago

a4eb395View on GitHub

update expr scripts

Wenxuan Li•4 weeks ago

8caa4ceView on GitHub

apply precommit to current PR

Wenxuan Li•4 weeks ago

8467009View on GitHub

update README and train script

Wenxuan Li•1 month ago

8386e32View on GitHub

update data prepare script

Wenxuan Li•1 month ago

f52ce67View on GitHub

add README

Wenxuan Li•1 month ago

762e278View on GitHub

fix and launch dense

Wenxuan Li•1 month ago

f4d9e76View on GitHub

update trainer and data prepare

Wenxuan Li•1 month ago

f62e174View on GitHub

update 3B model training setting with 32 GPUs

Wenxuan Li•1 month ago

f080509View on GitHub

update parallel, setup and 3B training script

Wenxuan Li•1 month ago

03e770cView on GitHub

refactored some structures

Wenxuan Li•1 month ago

52af3f6View on GitHub

Add Qwen-2.5-0.5B

Wenxuan Li•1 month ago

28b6783View on GitHub

Refactored the distributed operator testing

Wenxuan Li•1 month ago

ccee42eView on GitHub

fix dependencies and update training settings

Wenxuan Li•1 month ago

9b4f352View on GitHub

Fix(LeanK): fix import path (#189)

Yike Zhang•6 months ago

fc1c63fView on GitHub

View all commits

Community

1,202 stars, 78 forks

2/30

7%

Documentation

Has description, wiki, license

20/20

100%

Maintenance

7.5% issue ratio

15/20

75%

Health score is calculated based on activity, community engagement, documentation quality, and maintenance practices

Languages

Python

96.8%

Shell

2.8%

Cuda

0.3%

C++

0.0%

Makefile

0.0%

Dependencies

No package.json found

This might not be a Node.js project

Top Contributors

1

iofu728

User

119

commits

2

liyucheng09

User

37

commits

3

HalberdOfPineapple

User

25

commits

4

microsoftopensource

User

5

commits

5

hzy46

User

3

commits

6

zerolllin

User

2

commits

7

anminliu

User

1

commits

8

DefTruth

User

1

commits

9

GuoYiFantastic

User

1

commits

10

microsoft-github-operations[bot]

Bot

1

commits

Languages

Python

96.8%

Shell

2.8%

Cuda

0.3%

C++

0.0%

Makefile

0.0%

Issues Analytics

57

Total Issues

All time

42

Open

74% of total

15

Closed

26% of total

2w

Avg Close Time

Avg response

Issues Activity: Last 6 months

Top Labels

Hottest Issues

1

#186 [Feature Request]: Can MInference support chunked prefill?

feature request

5

1

open

2

#176 [Question]: Implement triangle attention in sglang?

question

6

open

3

#175 [Question]: Is MMinference included?

question

4

open

4

#120 [Question]: Why are the sparse patterns in the provided configuration files all “vertical_and_slash”?

question

4

open

5

#109 [Question]: How to find those attention sparse patterns?

question

4

open

Dependencies

No package.json found

This might not be a Node.js project

Top Contributors

1

iofu728

User

119

commits

2

liyucheng09

User

37

commits

3

HalberdOfPineapple

User

25

commits

4

microsoftopensource

User

5

commits

5

hzy46

User

3

commits

6

zerolllin

User

2

commits

7

anminliu

User

1

commits

8

DefTruth

User

1

commits

9

GuoYiFantastic

User

1

commits

10

microsoft-github-operations[bot]

Bot

1

commits

Recent Commits

Merge pull request #205 from HalberdOfPineapple/wenxuan/mtrain

Zhiyuan He•3 weeks ago

a4eb395View on GitHub

update expr scripts

Wenxuan Li•4 weeks ago

8caa4ceView on GitHub

apply precommit to current PR

Wenxuan Li•4 weeks ago

8467009View on GitHub

update README and train script

Wenxuan Li•1 month ago

8386e32View on GitHub

update data prepare script

Wenxuan Li•1 month ago

f52ce67View on GitHub

add README

Wenxuan Li•1 month ago

762e278View on GitHub

fix and launch dense

Wenxuan Li•1 month ago

f4d9e76View on GitHub

update trainer and data prepare

Wenxuan Li•1 month ago

f62e174View on GitHub

update 3B model training setting with 32 GPUs

Wenxuan Li•1 month ago

f080509View on GitHub

update parallel, setup and 3B training script

Wenxuan Li•1 month ago

03e770cView on GitHub

refactored some structures

Wenxuan Li•1 month ago

52af3f6View on GitHub

Add Qwen-2.5-0.5B

Wenxuan Li•1 month ago

28b6783View on GitHub

Refactored the distributed operator testing

Wenxuan Li•1 month ago

ccee42eView on GitHub

fix dependencies and update training settings

Wenxuan Li•1 month ago

9b4f352View on GitHub

Fix(LeanK): fix import path (#189)

Yike Zhang•6 months ago

fc1c63fView on GitHub

View all commits

GitHub Explorer

MInference

Score Breakdown

Issues Activity: Last 6 months

Top Labels

Hottest Issues