Found 118 repositories(showing 30)
AbhishekGit-hash
Batch & streaming data pipelines built using Databricks with Pyspark and modeled the data into star schema to analyze in PowerBI, Formula-1 racing data from multiple data sources, APIs.
PrathikVijaykumar
Real-Time Video Stream Analytics: Web application processes vehicular traffic video frames in real-time using PySpark on Databricks, counts vehicles and people while streaming. Uses Snowflake for data storage and Power BI for traffic insights, enhancing urban traffic management.
Databricks PySpark Certification Prep Lab: Build an e-commerce analytics pipeline covering Spark DataFrame API, Structured Streaming, data skew handling with salting, broadcast joins, and Pandas UDFs. Designed for the Databricks Certified Associate Developer for Apache Spark exam.
Azure End To End Data Engineering Project | Azure Data Factory | Azure Databricks | Azure SQL DB | PySpark | Big Data. It is a in depth Data Engineering project using powerful tools like Azure Data Factory, Azure SQL DB, Azure Databricks, Unity Catalog, Delta Live Tables, Spark Streaming, PySpark, Databricks Asset Bundles, GitHub, and more.
Bayzid03
⚡ A curated collection of PySpark Streaming notebooks built in Databricks — designed to showcase real-time data skills in action. 🚀 Ideal for demonstrating hands-on experience with scalable Spark applications.
This Azure Databricks project uses Spark Structured Streaming for data ingestion and PySpark for large-scale transformations. It automates Slowly Changing Dimensions (SCDs) with Delta Live Tables and finalizes with dynamic dimensional modeling for curated, analytics-ready datasets.
DivineSamOfficial
An advanced ETL pipeline using Databricks and ADLS Gen 2 to process traffic and roads data. Features dynamic schema creation, incremental data ingestion with Spark Streaming, comprehensive transformations using PySpark, data governance with Unity Catalog, and automated workflows with CI/CD integration via Azure DevOps.
VickyAugust10
First encounter with Pyspark structured streaming with databricks.
The purpose of this project is to demonstrate a simulation of streaming data through micro-batching in near real-time within a single-node cluster using Databricks, Apache Spark and Python
GoncaloCanteiro
Real-time Seattle Fire Department 911 dispatch streaming pipeline using Databricks, PySpark, Delta Lake and medallion architecture.
kelsey-s
Databricks provide programming challenges using pyspark to perform ETL, streaming pipelines and machine learning on distributed datasets supported by Hadoop.
bittumaurya
Built a production-like Medallion Data Pipeline using Databricks, PySpark, and Delta Lake with streaming ingestion, DLT transformations, CDC processing, and dynamic incremental Fact/Dimension framework.
NirmitKhurana10
Production-grade Azure Databricks ETL pipeline processing retail data into a Star Schema. Demonstrates idempotent streaming, advanced PySpark OOP transformations, and declarative data pipelines via Delta Live Tables.
Mamiololo01
A guide to designing and implementing ETL on Azure with Medallion architecture, using Azure Databricks, Azure Data Factory, PySpark, Spark Streaming, Delta Live tables, SCD, and dimensional data modelling.
Real-Time Order Processing Pipeline using Databricks Delta Lake This project demonstrates an end-to-end real-time streaming ETL pipeline built on Databricks using Structured Streaming, Delta Lake, and PySpark. It ingests simulated order data, processes it in stages (Bronze → Silver → Gold), and writes to Delta tables for downstream analytics.
This project demonstrates a real-time data processing pipeline using PySpark RDD-based Streaming. It reads streaming fruit sales data from a source (CSV/JSON), processes it in Databricks, calculates metrics, and sends the results to Kafka.
rahulkujur11
I built a complete Data Engineering project leveraging powerful tools like Azure Databricks, Delta Live Tables, Spark Streaming, and PySpark. The project dives into real-world use cases, including Dimensional Data Modeling and managing Slowly Changing Dimensions (SCDs) within Databricks.
End-to-end Data Engineering project using Databricks, Azure Data Factory, Event Hubs, PySpark, and Spark Structured Streaming. Covers batch and real-time ingestion, transformations, and pipelines with Spark Declarative Pipelines.
This repository builds a complete Data Engineering project from scratch using Azure Data Factory, Azure SQL DB, Databricks, Unity Catalog, Delta Live Tables, Spark Streaming, PySpark, and Databricks Asset Bundles. It also covers dimensional modeling, SCDs, CI/CD, and real-world pipeline design.
parthani07
Production-grade Medallion Lakehouse using Azure Databricks, DLT, PySpark, Unity Catalog. Implements bronze (ingest), silver (process), and gold (analytics) layers using streaming, SCD, data quality, and star schema for scalable, real-time analytics.
parcheesime
Production-grade Databricks notebooks showcasing hands-on expertise with PySpark, SQL, Delta Lake, AWS S3, and MailChimp API. Projects include real-time data streaming, API data ingestion, and automated backup/recovery systems using boto3 and the Databricks API. Each notebook is fully commented and demonstrates practical data engineering workflows.
AkshatDev2002
Built an end-to-end Azure Databricks Lakehouse solution using PySpark, Delta Tables, and Spark Streaming. Implemented dimensional data models and Slowly Changing Dimensions (Type 1 & 2) to deliver scalable, reliable, and analytics-ready datasets for batch and real-time workloads.
Ratnesh-181998
Master Azure Data Engineering with this Basic to Advance guide! Covers SQL, PySpark, Kafka, Databricks, Snowflake & Airflow. Build 15+ industrial projects using Azure (ADF, Synapse, Event Hubs), GCP & modern table formats (Delta Lake, Iceberg, Hudi). Learn real-time streaming, Medallion architecture, and cloud data warehousing with hands-on labs.
joekibz
May 2023 - Pyspark | MLFlow - Databricks | Spark streaming ...
KonuTech
streaming, delta table, external table, json, databricks, PySpark
blackishgray
Mini project tutorial in pyspark streaming
Real time streaming of e-commerce data using Kafka messages, pyspark transformations and databricks delta tables
akshayhc412
No description available
No description available
kashishlalwani13
Real-time fraud detection system using PySpark Structured Streaming on Databricks