Found 4 repositories(showing 4)
CityOfLosAngeles
A shared pipeline for building ETLs and batch jobs that we run at the City of LA for Data Science Projects. Built on Apache Airflow & Civis Platform
velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docker Compose.
multijump
Production Style Airflow DAG to orchestrate a daily batch ETL job using pyspark and COVID-19 data as an example
paulopottermarchi
It showcases end-to-end data workflows—web scraping, batch and streaming ETL, orchestration with Airflow, data modeling in SQL and NoSQL, Spark jobs, and loading into data warehouses—implemented in Python and related tools.
All 4 repositories loaded