Kosmos-2.5 is a cutting-edge Multimodal-LLM (MLLM) specializing in image OCR. However, its stringent software requirements & Python-script based invocation make it difficult to use for application development. Here, it has been containerized and made available via an API, greatly enhancing its ease-of-use.
Stars
68
Forks
6
Watchers
68
Open Issues
1
Overall repository health assessment
No package.json found
This might not be a Node.js project
20
commits
trimmed and cleaned-up dockerfiles. Added more to the option-2 dockerfile in the README.
bfb1461View on GitHubModified dockerfiles to clone the flash-attn repo and download the model checkpoint so the user need not manaully do so. Ammended & simplified README instructions accordingly.
9ac1b02View on GitHubModified dockerfiles to clone the flash-attn repo and download the model checkpoint so the user need not manaully do so. Ammended & simplified README instructions accordingly.
9774593View on GitHubModified dockerfiles to clone the flash-attn repo and download the model checkpoint so the user need not manaully do so. Ammended & simplified README instructions accordingly.
dd765beView on GitHubAdded instructions to restart WSL when making changes to .wslconfig for Topic 8 - Option 2
c04bf13View on GitHubAdding clarification to clone the model checkpoint to topic 5 - Building the Docker Image
ad0d3feView on GitHubAdded clickable Table of Contents and 'Back to ToC' links at section ends
4a3cd4fView on GitHub