Search Results

Found 2,452 repositories(showing 30)

data-accelerator

microsoft

❤️46

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

311

MIT

Updated 1 week ago

apache-sparkazurebig-data+17

databricks_bootcamp_2026

DataWithBaraa

🧡67

End-to-end Data Lakehouse project built on Databricks, following the Medallion Architecture (Bronze, Silver, Gold). Covers real-world data engineering and analytics workflows using Spark, PySpark, SQL, Delta Lake, and Unity Catalog. Designed for learning, portfolio building, and job interviews.

304

151

MIT

Jupyter Notebook

Updated 1 day ago

aiapache-sparkdata-analytics+14

Databricks-Apache-Spark-2X-Certified-Developer

vivek-bombatkar

❤️47

Databricks - Apache Spark™ - 2X Certified Developer

265

219

Updated 1 month ago

scalable-data-science

lamastex

🧡51

Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.

168

Unlicense

HTML

Updated 1 month ago

apache-sparkdata-sciencedatabricks+1

BerkeleyX-CS100.1x-Big-Data-with-Apache-Spark

dipanjanS

❤️41

This repository contains code files specifically IPython notebooks for the assignments in the course "Introduction to Big Data with Apache Spark" by UC Berkeley and Databricks on edX

116

125

MIT

Jupyter Notebook

Updated 9 months ago

apache-spark-with-data-bricks-for-data-engineering

darshilparmar

❤️46

apache-spark-with-databricks-for-data-engineering

100

112

Jupyter Notebook

Updated 1 month ago

databricks-notebooks

yokawasa

❤️41

Collection of Sample Databricks Spark Notebooks ( mostly for Azure Databricks )

MIT

Jupyter Notebook

Updated 4 months ago

azureazuredatabricksdatabricks+4

spark-bigquery

samelamin

❤️30

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.

Apache-2.0

Scala

Updated 1 year ago

bigquerydata-frameschema+1

Apache-Spark-and-Databricks-Stream-Processing-in-Lakehouse

LearningJournal

❤️46

No description available

111

Python

Updated 3 weeks ago

databricks-spark-certification

Realsid

🧡55

Guide for databricks spark certification

Updated 2 weeks ago

spark-saturday

dmatrix

❤️46

Workshop for Spark and Databricks

HTML

Updated 1 month ago

BatchSparkScoringPredictiveMaintenance

Azure

❤️40

Batch scoring Spark models on Azure Databricks: A predictive maintenance use case

MIT

Jupyter Notebook

Updated 1 year ago

spotify-data-analysis

KamilKolanowski

🧡65

Data Engineering project using Databricks PySpark & Spark SQL for analysing data from Spotify API and present in form of PowerBI report

MIT

Jupyter Notebook

Updated 8 hours ago

modern-data-eng-dbt-databricks-azure

airscholar

❤️45

In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our cloud provider.

Updated 1 month ago

apache-sparkazuredatabricks+2

databricks-spark-training

TomLous

❤️26

No description available

Scala

Updated 10 months ago

databricks-spark-knowledge-base-zh-cn

aiyanbo

❤️40

Databricks Spark 知识库简体中文版

NOASSERTION

Updated 4 years ago

azure_databricks_course

cloudboxacademy

❤️45

Resources for the Udemy Course - Azure Databricks & Spark Core For Data Engineers(Python/SQL) by Ramesh Retnasamy

Python

Updated 1 month ago

metabase-sparksql-databricks-driver

relferreira

❤️25

No description available

AGPL-3.0

Clojure

Updated 1 month ago

databricks-certification

ericbellet

🧡55

Databricks Certified Associate Developer for Apache Spark 3.0

HTML

Updated 2 weeks ago

databricks-certification-spark

itversity

❤️36

Databricks Certified Associate Spark Developer preparation toolkit to setup single node Standalone Spark Cluster along with material in the form of Jupyter Notebooks.

Jupyter Notebook

Updated 9 months ago

spark-databricks-observability

rayalex

🧡50

Monitoring Databricks using Prometheus, Grafana and Pyroscope

MIT

HCL

Updated 1 month ago

databricksdatabricks-automationgrafana+4

prophecy-build-tool

SimpleDataLabsInc

🧡55

Prophecy-built-tool (PBT) allows you to quickly build projects generated by Prophecy (your standard Spark Scala and PySpark pipelines) to integrate them with your own CI / CD (e.g. Github Actions), build system (e.g. Jenkins), and orchestration (e.g. Databricks Workflows).

Apache-2.0

Python

Updated 13 hours ago

spetlr

spetlr-org

❤️25

A python SPark ETL libRary (SPETLR) for Databricks. https://discord.gg/p9bzqGybVW

MIT

Python

Updated 4 months ago

Databricks-Connect-PySpark

DataThirstLtd

❤️25

A guide of how to build good Data Pipelines with Databricks Connect using best practices

Python

Updated 5 months ago

databricks-connectpyspark

awesome-databricks

reisdebora

❤️40

A curated list of awesome Databricks resources, including Spark

CC0-1.0

Updated 6 months ago

awesomeawsazure+4

Azure-Databricks-Log4J-To-AppInsights

AdamPaternostro

❤️25

Connect your Spark Databricks clusters Log4J output to the Application Insights Appender

Shell

Updated 6 months ago

application-insightsazure-monitordatabricks

pyspark-logging-examples

renardeinside

❤️35

Writing PySpark logs in Apache Spark and Databricks

Python

Updated 10 months ago

apache-sparkdatabrickslog4j+2

pysparklyr

mlverse

❤️45

Extension to {sparklyr} that allows you to interact with Spark & Databricks Connect

NOASSERTION

Updated 4 weeks ago

databrickspysparkr+2

Spark-Databricks-SQL-API-Streaming-Deep-Learning-

PrineetKaur

❤️45

Project in Pyspark

Python

Updated 2 months ago

Axa-Insurance-Telematics-Kaggle

AnilSener

❤️30

I developed this case study only in 7 days with Pyspark (Spark 1.6.0) SQL & MLlib. I used Databricks cluster and AWS. %90 AUC is achieved (without involving Trip Matching-Repeated Trips feature) with Random Forest. Many ensembles with RF, GBT and Logistic Regression and outlier elimination could be used to improve this result. There are two versions of my code (test and full execution). Since AWS costs have exceeded my budget I sopped to train my model(s) all dataset for full dataset execution. There is also a ppt that presents my outputs in test execution. Full Data Execution code is more production ready and slightly different version. I had to use Databricks Table Caching to TRAIN and TEST data tables to obtain acceptable performance in production ready version.

Jupyter Notebook

Updated 1 year ago

GitHub Explorer

Search Results

data-accelerator

databricks_bootcamp_2026

Databricks-Apache-Spark-2X-Certified-Developer

scalable-data-science

BerkeleyX-CS100.1x-Big-Data-with-Apache-Spark

apache-spark-with-data-bricks-for-data-engineering

databricks-notebooks

spark-bigquery

Apache-Spark-and-Databricks-Stream-Processing-in-Lakehouse

databricks-spark-certification

spark-saturday

BatchSparkScoringPredictiveMaintenance

spotify-data-analysis

modern-data-eng-dbt-databricks-azure

databricks-spark-training

databricks-spark-knowledge-base-zh-cn

azure_databricks_course

metabase-sparksql-databricks-driver

databricks-certification

databricks-certification-spark

spark-databricks-observability

prophecy-build-tool

spetlr

Databricks-Connect-PySpark

awesome-databricks

Azure-Databricks-Log4J-To-AppInsights

pyspark-logging-examples

pysparklyr

Spark-Databricks-SQL-API-Streaming-Deep-Learning-

Axa-Insurance-Telematics-Kaggle

data-accelerator

databricks_bootcamp_2026

Databricks-Apache-Spark-2X-Certified-Developer

scalable-data-science

BerkeleyX-CS100.1x-Big-Data-with-Apache-Spark

apache-spark-with-data-bricks-for-data-engineering

databricks-notebooks

spark-bigquery

Apache-Spark-and-Databricks-Stream-Processing-in-Lakehouse

databricks-spark-certification

spark-saturday

BatchSparkScoringPredictiveMaintenance

spotify-data-analysis

modern-data-eng-dbt-databricks-azure

databricks-spark-training

databricks-spark-knowledge-base-zh-cn

azure_databricks_course

metabase-sparksql-databricks-driver

databricks-certification

databricks-certification-spark

spark-databricks-observability

prophecy-build-tool

spetlr

Databricks-Connect-PySpark

awesome-databricks

Azure-Databricks-Log4J-To-AppInsights

pyspark-logging-examples

pysparklyr

Spark-Databricks-SQL-API-Streaming-Deep-Learning-

Axa-Insurance-Telematics-Kaggle