High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
Stars
5.4k
Forks
436
Watchers
5.4k
Open Issues
336
Overall repository health assessment
No package.json found
This might not be a Node.js project
797
commits
556
commits
450
commits
330
commits
230
commits
165
commits
147
commits
139
commits
139
commits
102
commits
feat: add image_hash() for image deduplication (#6485)
9a58f56View on GitHubfix(scan): skip getting bytes when range start equals end in daft async reader (#6602)
96a908cView on GitHubci(deps): fix CI failures from dependabot bump #6570 (#6596)
539ff3cView on GitHubfeat(dataframe): add var() method to DataFrame and GroupedDataFrame (#6584)
7086ea0View on GitHubfeat: checkpoint based on distributed key-existence filter (#5931)
64fe99fView on GitHubfeat(distributed): make flotilla worker actor startup timeout configurable (#6592)
fef13dfView on GitHubchore!: Remove unused `max_task_backlog` parameter (#6591)
1053db3View on GitHubfix(io): retry transient errors on initial GET request (#6544)
f404869View on GitHubfix(dashboard): use smart per-node stats aggregation for distributed execution (#6574)
d7c2fbbView on GitHubfix(io): handle schema-evolved Iceberg columns in Parquet predicate pushdown (#6551)
a1bff82View on GitHubrefactor(distributed): unify repartition exchange write flow across ray and flight (#6499)
c5fc110View on GitHubchore(dashboard): add ?debug query param for SSE event console logging (#6577)
99a8fbfView on GitHubfix(dashboard): prevent flotilla workers from sending spurious lifecycle events (#6573)
82068aaView on GitHub