Found 118 repositories(showing 30)
This project focuses on building a robust data pipeline using Apache Airflow to automate the ingestion of weather data from the OpenWeatherAPI and loading it into a data warehouse, specifically AWS Redshift.
CityOfLosAngeles
A shared pipeline for building ETLs and batch jobs that we run at the City of LA for Data Science Projects. Built on Apache Airflow & Civis Platform
No description available
Rafavermar
SnowflakeAirflowDbtCosmo project, a demonstration of integrating Airflow, DBT, and Snowflake with Snowpark for advanced data analysis. This project, generated with astro dev init using the Astronomer CLI, showcases how to run Apache Airflow locally, building both simple and advanced data pipelines involving Snowflake.
ManuelaMayorga
Workshop on building an ETL pipeline with Apache Airflow using data from Spotify (CSV) and Grammys (database). Airflow reads data from both sources, performs transformations, merges datasets, and loads them into Google Drive. Visualization is done in PowerBI.
tranhuy25
This project serves as a comprehensive guide to building an end-to-end data engineering pipeline. It covers each stage from data ingestion to processing and finally to storage, utilizing a robust tech stack that includes Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra.
tuanit03
Building Data Pipelines with Apache Airflow and MongoDB
shubhwip
Data engineering projects for beginners including PostgreSQL/Cassandra data modelling, Building data warehouses using AWS redshift, Building data lakes using Apache Spark and automating data pipelines with apache airflow.
Building Data Pipelines with Apache Airflow
mkmasudrana806
Building a simple data pipeline using PySpark and Apache Airflow with PostgreSQL database. ETL with Airflow Orchestration
A data engineering project demonstrating the use of Apache Airflow for building an ETL pipeline. The project fetches top movie data from an API, transforms it for insights, and loads the data into a PostgreSQL database, all running within a Dockerized environment.
sjliew
Building data pipelines using Youtube API, Amazon EC2, Apache Airflow, Amazon S3
Building an etl pipeline scraping financial data from a forex website and orchestrating with apache airflow
supakunz
A ready-to-use Docker-based template for data engineering projects, featuring a complete stack with Apache Airflow, Spark, and MinIO for building scalable data pipelines.
supakunz
A ready-to-use Docker-based template for data engineering projects, featuring a complete stack with Apache Airflow, Spark, and MinIO for building scalable data pipelines.
AbdullahMahmoud23
End-to-end Data Engineering Capstone Project building a complete data pipeline. Features operational databases (MySQL), analytical data warehousing (PostgreSQL), Python-based ETL scripts, Apache Airflow orchestration, and BI dashboarding (Looker Studio).
building a real-time data streaming pipeline, covering each phase from data ingestion to processing and finally storage. We'll utilize a powerful stack of tools and technologies, including Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra—all neatly containerized using Docker.
aakashsyadav1999
Building a real-time data streaming pipeline, covering each phase from data ingestion to processing and finally storage. We'll utilize a powerful stack of tools and technologies, including Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra—all neatly containerized using Docker.
satyam671
An End-to-End Goodreads ETL Pipeline: Leveraging Real-Time Data Collection, Transformation, and Integration with S3, Apache Spark, Airflow, and Redshift for Building a Robust Data Lake, Warehouse, and Analytics Platform.
cidraljunior
Indicium Code Challenge solution for building a data pipeline using Apache Airflow and Docker. The pipeline extracts data from Postgres and CSV files, processes and transforms it with Python, and loads the final results into a MongoDB database. The project demonstrates workflow orchestration and dependency management with Airflow DAGs.
caio-moliveira
This repository implements a fully automated data pipeline integrating AWS S3, Snowflake, DBT, Apache Airflow, and Streamlit. It handles data ingestion, transformation, and visualization, providing a streamlined solution for building and analyzing datasets.
YamanAlBochi
In this Project, I'll be building a real-time data streaming pipeline, covering each phase from data ingestion to processing and finally storage. We'll utilize a powerful stack of tools and technologies, including Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra—all neatly containerised using Docker.
This project involves building pipeline for a data set transforming data into scripts, storing and querying in postgres database (Pgcli and Pgadmin). Ingested data is then dockerized and Apache Airflow is used to monitor data workflow in AWS cloud storage
noamelli
Building an ETL pipeline and scheduling it using Apache Airflow in Docker. The SQL scripts are written in Postgresql and the tables are stored in DBeaver. The dashboard was developed in Tableau. The data is fake :)
Sahulblr
This project involves building an end-to-end ETL (Extract, Transform, Load) data pipeline on Google Cloud Platform (GCP) using Cloud Data Fusion and Apache Airflow. Learn how to craft a seamless process for extracting, transforming, and loading data into BigQuery, then visualize it effortlessly in Looker Studio.
No description available
Nagamohan1971
Building Data Pipelines with Apache Airflow
MaulikDave9
Building Data pipeline using Apache Airflow
philbier
Building data pipeline using Apache Airflow.
armandmutia
Building ETL and Data Pipelines with Apache Airflow