Found 2,452 repositories(showing 30)
microsoft
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
DataWithBaraa
End-to-end Data Lakehouse project built on Databricks, following the Medallion Architecture (Bronze, Silver, Gold). Covers real-world data engineering and analytics workflows using Spark, PySpark, SQL, Delta Lake, and Unity Catalog. Designed for learning, portfolio building, and job interviews.
vivek-bombatkar
Databricks - Apache Spark™ - 2X Certified Developer
lamastex
Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
This repository contains code files specifically IPython notebooks for the assignments in the course "Introduction to Big Data with Apache Spark" by UC Berkeley and Databricks on edX
darshilparmar
apache-spark-with-databricks-for-data-engineering
yokawasa
Collection of Sample Databricks Spark Notebooks ( mostly for Azure Databricks )
samelamin
Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
LearningJournal
No description available
Guide for databricks spark certification
dmatrix
Workshop for Spark and Databricks
Batch scoring Spark models on Azure Databricks: A predictive maintenance use case
KamilKolanowski
Data Engineering project using Databricks PySpark & Spark SQL for analysing data from Spotify API and present in form of PowerBI report
airscholar
In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our cloud provider.
TomLous
No description available
Databricks Spark 知识库简体中文版
cloudboxacademy
Resources for the Udemy Course - Azure Databricks & Spark Core For Data Engineers(Python/SQL) by Ramesh Retnasamy
relferreira
No description available
ericbellet
Databricks Certified Associate Developer for Apache Spark 3.0
itversity
Databricks Certified Associate Spark Developer preparation toolkit to setup single node Standalone Spark Cluster along with material in the form of Jupyter Notebooks.
Monitoring Databricks using Prometheus, Grafana and Pyroscope
SimpleDataLabsInc
Prophecy-built-tool (PBT) allows you to quickly build projects generated by Prophecy (your standard Spark Scala and PySpark pipelines) to integrate them with your own CI / CD (e.g. Github Actions), build system (e.g. Jenkins), and orchestration (e.g. Databricks Workflows).
spetlr-org
A python SPark ETL libRary (SPETLR) for Databricks. https://discord.gg/p9bzqGybVW
DataThirstLtd
A guide of how to build good Data Pipelines with Databricks Connect using best practices
reisdebora
A curated list of awesome Databricks resources, including Spark
AdamPaternostro
Connect your Spark Databricks clusters Log4J output to the Application Insights Appender
renardeinside
Writing PySpark logs in Apache Spark and Databricks
mlverse
Extension to {sparklyr} that allows you to interact with Spark & Databricks Connect
Project in Pyspark
AnilSener
I developed this case study only in 7 days with Pyspark (Spark 1.6.0) SQL & MLlib. I used Databricks cluster and AWS. %90 AUC is achieved (without involving Trip Matching-Repeated Trips feature) with Random Forest. Many ensembles with RF, GBT and Logistic Regression and outlier elimination could be used to improve this result. There are two versions of my code (test and full execution). Since AWS costs have exceeded my budget I sopped to train my model(s) all dataset for full dataset execution. There is also a ppt that presents my outputs in test execution. Full Data Execution code is more production ready and slightly different version. I had to use Databricks Table Caching to TRAIN and TEST data tables to obtain acceptable performance in production ready version.