Found 102,442 repositories(showing 30)
apache
Apache Superset is a Data Visualization and Data Exploration Platform
GokuMohandas
Learn how to develop, deploy and iterate on production-grade ML applications.
apache
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
DataExpert-io
This is a repo with links to everything you'd ever want to learn about data engineering
DataTalksClub
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
eugeneyan
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
mlflow
The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.
PrefectHQ
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
airbytehq
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Avaiga
Turns Data and AI algorithms into production-ready web applications in no time.
argoproj
Workflow Engine for Kubernetes
dagster-io
An orchestration platform for the development, production, and observation of data assets.
andkret
The Data Engineering Cookbook
datastacktv
Roadmap to becoming a data engineer in 2021
great-expectations
Always know what to expect from your data.
kedro-org
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
xonsh
🐚 Python-powered shell. Full-featured, cross-platform and AI-friendly.
risingwavelabs
Event streaming platform for agentic AI. Continuously ingest, transform, and serve event streams in real time, at scale.
mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
redpanda-data
Fancy stream processing made operationally mundane
igorbarinov
A curated list of data engineering tools for software developers
growthbook
Open Source Feature Flags, Experimentation, and Product Analytics
feast-dev
The Open Source Feature Store for AI/ML
cocoindex-io
Data transformation framework for AI. Ultra performant, with incremental processing. 🌟 Star if you like it!
cloudquery
Data pipelines for cloud config and security data. Build cloud asset inventory, CSPM, FinOps, and vulnerability management solutions. Extract from AWS, Azure, GCP, and 70+ cloud and SaaS sources.
evidence-dev
Business intelligence as code: build fast, interactive data visualizations in SQL and markdown
Eventual-Inc
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
treeverse
lakeFS - Data version control for your data lake | Git for data
dlt-hub
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
rudderlabs
Privacy and Security focused Segment-alternative, in Golang and React