Multimodal image + text captioning for 416k figures from arXiv. Uses CLIP + SciBERT + GPT-2 in an encoder-decoder architecture. CS224N final project.
Stars
1
Forks
0
Watchers
1
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
56
commits
added more to loader.ipynb, specifically the inference of the model
a82895cView on GitHub