Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Stars
1.9k
Forks
286
Watchers
1.9k
Open Issues
181
Overall repository health assessment
No package.json found
This might not be a Node.js project
65
commits
64
commits
30
commits
21
commits
15
commits
5
commits
5
commits
5
commits
4
commits
4
commits
Reshuffle the data after reading from cache if shuffle_rows is true (#817)
3cae688View on GitHubAdd support for pre-converting from pyarrow to numpy at the time of reading (#815)
ec31c47View on GitHubMultiple incremental improvements to local disk cache (#814)
64cc551View on GitHubUpdate unittest.yml to run tests only on the latest versions (#809)
f1328a2View on GitHubMake `make_spark_converter` supports creating converter from a saved dataframe path (#787)
d337feeView on GitHub