Back to search
This PyTorch-based image captioning model uses ResNet-50 encoder and Transformer decoder to generate descriptive captions from Flickr8k images. Features include data augmentation (image transforms, synonym replacement), training with AdamW and early stopping, and inference via greedy, beam search (k=3,5,10), or nucleus sampling.
Stars
0
Forks
0
Watchers
0
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
14
commits