[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
Stars
6.0k
Forks
358
Watchers
6.0k
Open Issues
110
Overall repository health assessment
No package.json found
This might not be a Node.js project
57
commits
5
commits
5
commits
3
commits
2
commits
2
commits
1
commits
1
commits
1
commits
1
commits
Merge pull request #211 from microsoft/hjiang/support_bf16
e4e172aView on GitHubMerge pull request #209 from jue-zhang/lingua2/tinybert_mobilebert
9b357b5View on GitHubFeature(LLMLingua): add RetrievalAttention, SCBench (#205)
b3e5cecView on GitHubFeature(LLMLingua-2): update the meetingbank datasets (#160)
4411201View on GitHub