Found 11 repositories(showing 11)
rom1504
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Veldrovive
Reorders the embeddings generated by CLIP to be in line with the webdatasets generated by img2dataset.
nopperl
Training CLIP models on Data from Scientific Papers
boomb0om
img2dataset with multiprocessing + asyncio
reonokiy
No description available
lucasjinreal
Customized Image2Dataset tool called `mllmdata`, easy convert parquet urls to local dataset fast.
likelyzhao
No description available
marianna13
No description available
Doing image cropping at raw resolution then resize images to specified configs
man-8out
利用Doubao-Seed-2.0-pro的多模态能力,分辨图片是正常场景图还是白色纸质单图。如果是正常的场景图,对其进行图像识别,通过文字描述图片内容,整理成.txt文件。to_json能够将txt文件与对应图片整理成jsonl数据集。
A simple toolkit to transform datasource generate by img2dataset from parquet file to Huggingface dataset.
All 11 repositories loaded