Search Results

Found 513 repositories(showing 30)

databricks_bootcamp_2026

DataWithBaraa

🧡67

End-to-end Data Lakehouse project built on Databricks, following the Medallion Architecture (Bronze, Silver, Gold). Covers real-world data engineering and analytics workflows using Spark, PySpark, SQL, Delta Lake, and Unity Catalog. Designed for learning, portfolio building, and job interviews.

304

150

MIT

Jupyter Notebook

Updated 12 hours ago

aiapache-sparkdata-analytics+14

spotify-data-analysis

KamilKolanowski

🧡65

Data Engineering project using Databricks PySpark & Spark SQL for analysing data from Spotify API and present in form of PowerBI report

MIT

Jupyter Notebook

Updated 4 days ago

prophecy-build-tool

SimpleDataLabsInc

❤️45

Prophecy-built-tool (PBT) allows you to quickly build projects generated by Prophecy (your standard Spark Scala and PySpark pipelines) to integrate them with your own CI / CD (e.g. Github Actions), build system (e.g. Jenkins), and orchestration (e.g. Databricks Workflows).

Apache-2.0

Python

Updated 2 weeks ago

Databricks-Connect-PySpark

DataThirstLtd

❤️25

A guide of how to build good Data Pipelines with Databricks Connect using best practices

Python

Updated 5 months ago

databricks-connectpyspark

pyspark-logging-examples

renardeinside

❤️35

Writing PySpark logs in Apache Spark and Databricks

Python

Updated 10 months ago

apache-sparkdatabrickslog4j+2

Axa-Insurance-Telematics-Kaggle

AnilSener

❤️30

I developed this case study only in 7 days with Pyspark (Spark 1.6.0) SQL & MLlib. I used Databricks cluster and AWS. %90 AUC is achieved (without involving Trip Matching-Repeated Trips feature) with Random Forest. Many ensembles with RF, GBT and Logistic Regression and outlier elimination could be used to improve this result. There are two versions of my code (test and full execution). Since AWS costs have exceeded my budget I sopped to train my model(s) all dataset for full dataset execution. There is also a ppt that presents my outputs in test execution. Full Data Execution code is more production ready and slightly different version. I had to use Databricks Table Caching to TRAIN and TEST data tables to obtain acceptable performance in production ready version.

Jupyter Notebook

Updated 1 year ago

Learning-PySpark-with-Databricks

pathfinder-analytics-uk

❤️45

No description available

Jupyter Notebook

Updated 1 week ago

Recommendation-Engine-to-recommend-books-using-Collaborative-Filtering

garodisk

❤️35

Used Alternating Least Square method to build a recommender system in Spark [PySpark, Databricks, Python, Machine Learning]

Jupyter Notebook

Updated 1 year ago

PySparkDatabricksAZDataLake

schammass-zz

❤️25

No description available

Jupyter Notebook

Updated 4 years ago

PySpark-Databricks

ajinkyahk

❤️35

SparkByExamples tutorial with Databricks workspace.

Jupyter Notebook

Updated 4 months ago

Spark-Databricks-Intro

Hamidreza-Ramezani

❤️45

A basic Spark project in Databricks environment using PySpark

Python

Updated 1 month ago

databricks_pyspark_cert_zenith

jrlasak

🧡65

Databricks PySpark Certification Prep Lab: Build an e-commerce analytics pipeline covering Spark DataFrame API, Structured Streaming, data skew handling with salting, broadcast joins, and Pandas UDFs. Designed for the Databricks Certified Associate Developer for Apache Spark exam.

Python

Updated 1 day ago

certificationdata-engineeringdata-skew+6

Market-Basket-Analysis-Instacart-3-million-data-Spark-MLLib-

garodisk

❤️35

Association mining for products using Spark and FP-Growth algorithm in Spark [PySpark, Databricks, Association Miming, Frequent Pattern Mining tree algorithm]

Jupyter Notebook

Updated 1 year ago

Spark-PySpark-DataBricks

shreyashji

❤️35

Adding my python,spark, pyspark, scala notebooks logics which i solve/see on daily basis,it contains optimization techniques for big data processing and real time scenarios

Jupyter Notebook

Updated 9 months ago

databricksjupyter-notebookjupyter-notebooks+3

adventure-spark

iBalajiShanmugam

❤️35

The "Adventure Works - Spark" repository is a collection of code and resources for analyzing the Adventure Works dataset using Databricks, PySpark, Delta Lake, and Python. It provides examples and tools for ingesting, processing, and analyzing the data to gain insights

Python

Updated 1 year ago

data-engineeringdata-sciencepyspark+2

End-to-End-Azure-Data-Engineering-Pipeline-ADF-Databricks-PySpark-

Kamil-Bisbis

❤️45

No description available

Jupyter Notebook

Updated 3 weeks ago

Data-Engineer-Kafka-DataLake-

Naseer5196

❤️35

Indeed Home - For employers Dashboard Find resumes Analytics Need Help? Start of main content Jobs Candidates Messages Search candidates Search candidates Data Engineer -Immediate Joiner (Work From Office) Vedhas Technology Solutions Pvt Ltd – Hyderabad, Telangana Clicks Your job 17/09/21 18/09/21 19/09/21 20/09/21 21/09/21 22/09/21 23/09/21 0 15 30 Clicks this week 63 Candidates Awaiting Review 6 Total (excluding rejected) 6 0 Rejected Discover your top applicants faster by sending a free assessment Get a more complete picture of each candidate by being able to view and compare their assessment score results when you turn on the assessment of your choice. Job description Required Data Engineer - Work From Office Location: Himayatnagar, Hyderabad Experience: 2 - 4 yrs. Job Description: · Experience in Big Data components such as Spark, Kafka, Scala/PySpark, SQL, Data frames, Airflow etc. implemented using Data Bricks would be preferred. · Databricks integration with other cloud services like (Azure - Data Lake, Data Factory, Synapse, Azure DevOps, etc.) or (AWS S3, GLUE, Athena, Redshift, Lambda, CloudWatch etc.) · Reading, processing, and writing data in various file formats using Spark & Databricks. · Knowledge of best Databricks Job Optimization process and standards. Good to Have: · Databricks Delta Table & ML-Flow knowledge will be a plus. · AWS/Azure/Databricks Certifications will be a plus · Strong Data Warehousing experience · Good understanding of Database schema, design, optimization, scalability. · Ability to learn new technologies quickly. · Great communication skills, strong work ethic. Role Data Engineer Industry Type IT Services & Consulting Functional Area IT Software - Application Programming, Maintenance Employment Type Full Time, Permanent Education: UG- B. Tech/B.E. in Any Specialization Key Skills: Data Bricks, Data Lake, Kafka, Azure DevOps, SQL. Remuneration: No Bar for Right Candidate. Work Shift: Day Working Days: 5 per week Location: Vedhas Technology Solutions Pvt Ltd 1st Floor City Centre Himayatnagar, Hyderabad -500029. Email ID: HR@TECHVEDHAS.COM Contact HR: 040-23224181.

Updated 7 months ago

PySpark-for-Databricks-with-Python-and-SQL

PabitraKumarGhorai

❤️20

PySpark is the Python API for Apache Spark, an open source, distributed computing framework and set of libraries for real-time, large-scale data processing. If you're already familiar with Python and libraries such as Pandas, then PySpark is a good language to learn to create more scalable analyses and pipelines.

Python

Updated 3 months ago

DataBricks-PySpark-Notebooks

MWFK

❤️35

Data Engineering with Databricks Study Materials

Jupyter Notebook

Updated 1 year ago

Sales-Analysis-using-PySpark-on-Databricks

M0hamedIbrahim1

❤️35

sales analysis project using PySpark provides valuable insights into customer behavior, sales trends, and product performance. These insights can inform strategic decision-making to enhance customer satisfaction, optimize marketing efforts, and improve overall business performance

Jupyter Notebook

Updated 1 year ago

databrickspysparksale-analysis

Project-1-End-to-End-Spotify-Data-Engineering-with-DABs-DLT

SAMRAT47

❤️40

Azure End To End Data Engineering Project | Azure Data Factory | Azure Databricks | Azure SQL DB | PySpark | Big Data. It is a in depth Data Engineering project using powerful tools like Azure Data Factory, Azure SQL DB, Azure Databricks, Unity Catalog, Delta Live Tables, Spark Streaming, PySpark, Databricks Asset Bundles, GitHub, and more.

Python

Updated 2 months ago

azure-data-factorydata-engineeringdatabricks+3

Credit-Scoring-Data-Pipeline

windi-wulandari

❤️35

This project implements an end-to-end data pipeline designed to manage and analyze large-scale credit scoring data. Using AWS S3 as a scalable storage solution and Databricks for processing, the pipeline leverages the power of Apache Spark through PySpark and SQL Spark to handle data transformation and analysis efficiently.

Python

Updated 3 months ago

apache-sparkawsaws-s3+4

Leetcode-SQL-50-PySpark-Spark-SQL

ananyaSingh1305

❤️45

Solutions to LeetCode SQL 50 challenges using PySpark and Spark SQL on Databricks.

Jupyter Notebook

Updated 1 month ago

Machine-Learning-con-PySpark-usando-Databricks

narencastellon

❤️25

No description available

Jupyter Notebook

Updated 10 months ago

Load-DataBricks

cleberzumba

❤️35

Importando dados desestruturados no spark databricks com pyspark

Updated 3 years ago

PySpark-Streaming-Databricks

Bayzid03

❤️35

⚡ A curated collection of PySpark Streaming notebooks built in Databricks — designed to showcase real-time data skills in action. 🚀 Ideal for demonstrating hands-on experience with scalable Spark applications.

Updated 5 months ago

PySpark-Databricks-Journey

Bayzid03

❤️35

💡 Hands-on PySpark notebooks built in Databricks — covering real-world data transformations, cleaning, and analysis. 🚀 A practical showcase of Spark fundamentals applied to structured datasets.

Updated 5 months ago

IPL-DATA-ANALYSIS-Using-Databricks-PySpark

shakshamchauhan

❤️25

No description available

Python

Updated 1 year ago

play-around-with-Databricks-and-PySpark

riju18

❤️30

Play around with Databricks and PySpark

Updated 6 months ago

apache-sparkbig-datadata-engineering+4

Customer-Churn-w-Logistic-Regression

LeondraJames

❤️35

Utilizing tools such as Spark, Python (PySpark), SQL, and Databricks, performed logistic regression on customers to predict those at a higher risk of churning, then applied the model to an unseen "new customers" data set.

Jupyter Notebook

Updated 3 years ago

classificationcustomer-churndatabricks+8

GitHub Explorer

Search Results

databricks_bootcamp_2026

spotify-data-analysis

prophecy-build-tool

Databricks-Connect-PySpark

pyspark-logging-examples

Axa-Insurance-Telematics-Kaggle

Learning-PySpark-with-Databricks

Recommendation-Engine-to-recommend-books-using-Collaborative-Filtering

PySparkDatabricksAZDataLake

PySpark-Databricks

Spark-Databricks-Intro

databricks_pyspark_cert_zenith

Market-Basket-Analysis-Instacart-3-million-data-Spark-MLLib-

Spark-PySpark-DataBricks

adventure-spark

End-to-End-Azure-Data-Engineering-Pipeline-ADF-Databricks-PySpark-

Data-Engineer-Kafka-DataLake-

PySpark-for-Databricks-with-Python-and-SQL

DataBricks-PySpark-Notebooks

Sales-Analysis-using-PySpark-on-Databricks

Project-1-End-to-End-Spotify-Data-Engineering-with-DABs-DLT

Credit-Scoring-Data-Pipeline

Leetcode-SQL-50-PySpark-Spark-SQL

Machine-Learning-con-PySpark-usando-Databricks

Load-DataBricks

PySpark-Streaming-Databricks

PySpark-Databricks-Journey

IPL-DATA-ANALYSIS-Using-Databricks-PySpark

play-around-with-Databricks-and-PySpark

Customer-Churn-w-Logistic-Regression

databricks_bootcamp_2026

spotify-data-analysis

prophecy-build-tool

Databricks-Connect-PySpark

pyspark-logging-examples

Axa-Insurance-Telematics-Kaggle

Learning-PySpark-with-Databricks

Recommendation-Engine-to-recommend-books-using-Collaborative-Filtering

PySparkDatabricksAZDataLake

PySpark-Databricks

Spark-Databricks-Intro

databricks_pyspark_cert_zenith

Market-Basket-Analysis-Instacart-3-million-data-Spark-MLLib-

Spark-PySpark-DataBricks

adventure-spark

End-to-End-Azure-Data-Engineering-Pipeline-ADF-Databricks-PySpark-

Data-Engineer-Kafka-DataLake-

PySpark-for-Databricks-with-Python-and-SQL

DataBricks-PySpark-Notebooks

Sales-Analysis-using-PySpark-on-Databricks

Project-1-End-to-End-Spotify-Data-Engineering-with-DABs-DLT

Credit-Scoring-Data-Pipeline

Leetcode-SQL-50-PySpark-Spark-SQL

Machine-Learning-con-PySpark-usando-Databricks

Load-DataBricks

PySpark-Streaming-Databricks

PySpark-Databricks-Journey

IPL-DATA-ANALYSIS-Using-Databricks-PySpark

play-around-with-Databricks-and-PySpark

Customer-Churn-w-Logistic-Regression