This project develops a high-performance KV-cache management framework for multi-document RAG tasks. It focuses on reducing time-per-output-token (TPOT) and improving throughput through adaptive cache scheduling, GPU–CPU offloading, and reuse of cross-document attention states.
Stars
4
Forks
3
Watchers
4
Open Issues
1
Overall repository health assessment
No package.json found
This might not be a Node.js project