Found 513 repositories(showing 30)
DataWithBaraa
End-to-end Data Lakehouse project built on Databricks, following the Medallion Architecture (Bronze, Silver, Gold). Covers real-world data engineering and analytics workflows using Spark, PySpark, SQL, Delta Lake, and Unity Catalog. Designed for learning, portfolio building, and job interviews.
KamilKolanowski
Data Engineering project using Databricks PySpark & Spark SQL for analysing data from Spotify API and present in form of PowerBI report
SimpleDataLabsInc
Prophecy-built-tool (PBT) allows you to quickly build projects generated by Prophecy (your standard Spark Scala and PySpark pipelines) to integrate them with your own CI / CD (e.g. Github Actions), build system (e.g. Jenkins), and orchestration (e.g. Databricks Workflows).
DataThirstLtd
A guide of how to build good Data Pipelines with Databricks Connect using best practices
renardeinside
Writing PySpark logs in Apache Spark and Databricks
AnilSener
I developed this case study only in 7 days with Pyspark (Spark 1.6.0) SQL & MLlib. I used Databricks cluster and AWS. %90 AUC is achieved (without involving Trip Matching-Repeated Trips feature) with Random Forest. Many ensembles with RF, GBT and Logistic Regression and outlier elimination could be used to improve this result. There are two versions of my code (test and full execution). Since AWS costs have exceeded my budget I sopped to train my model(s) all dataset for full dataset execution. There is also a ppt that presents my outputs in test execution. Full Data Execution code is more production ready and slightly different version. I had to use Databricks Table Caching to TRAIN and TEST data tables to obtain acceptable performance in production ready version.
pathfinder-analytics-uk
No description available
Used Alternating Least Square method to build a recommender system in Spark [PySpark, Databricks, Python, Machine Learning]
schammass-zz
No description available
ajinkyahk
SparkByExamples tutorial with Databricks workspace.
Hamidreza-Ramezani
A basic Spark project in Databricks environment using PySpark
Databricks PySpark Certification Prep Lab: Build an e-commerce analytics pipeline covering Spark DataFrame API, Structured Streaming, data skew handling with salting, broadcast joins, and Pandas UDFs. Designed for the Databricks Certified Associate Developer for Apache Spark exam.
Association mining for products using Spark and FP-Growth algorithm in Spark [PySpark, Databricks, Association Miming, Frequent Pattern Mining tree algorithm]
shreyashji
Adding my python,spark, pyspark, scala notebooks logics which i solve/see on daily basis,it contains optimization techniques for big data processing and real time scenarios
iBalajiShanmugam
The "Adventure Works - Spark" repository is a collection of code and resources for analyzing the Adventure Works dataset using Databricks, PySpark, Delta Lake, and Python. It provides examples and tools for ingesting, processing, and analyzing the data to gain insights
No description available
Naseer5196
Indeed Home - For employers Dashboard Find resumes Analytics Need Help? Start of main content Jobs Candidates Messages Search candidates Search candidates Data Engineer -Immediate Joiner (Work From Office) Vedhas Technology Solutions Pvt Ltd – Hyderabad, Telangana Clicks Your job 17/09/21 18/09/21 19/09/21 20/09/21 21/09/21 22/09/21 23/09/21 0 15 30 Clicks this week 63 Candidates Awaiting Review 6 Total (excluding rejected) 6 0 Rejected Discover your top applicants faster by sending a free assessment Get a more complete picture of each candidate by being able to view and compare their assessment score results when you turn on the assessment of your choice. Job description Required Data Engineer - Work From Office Location: Himayatnagar, Hyderabad Experience: 2 - 4 yrs. Job Description: · Experience in Big Data components such as Spark, Kafka, Scala/PySpark, SQL, Data frames, Airflow etc. implemented using Data Bricks would be preferred. · Databricks integration with other cloud services like (Azure - Data Lake, Data Factory, Synapse, Azure DevOps, etc.) or (AWS S3, GLUE, Athena, Redshift, Lambda, CloudWatch etc.) · Reading, processing, and writing data in various file formats using Spark & Databricks. · Knowledge of best Databricks Job Optimization process and standards. Good to Have: · Databricks Delta Table & ML-Flow knowledge will be a plus. · AWS/Azure/Databricks Certifications will be a plus · Strong Data Warehousing experience · Good understanding of Database schema, design, optimization, scalability. · Ability to learn new technologies quickly. · Great communication skills, strong work ethic. Role Data Engineer Industry Type IT Services & Consulting Functional Area IT Software - Application Programming, Maintenance Employment Type Full Time, Permanent Education: UG- B. Tech/B.E. in Any Specialization Key Skills: Data Bricks, Data Lake, Kafka, Azure DevOps, SQL. Remuneration: No Bar for Right Candidate. Work Shift: Day Working Days: 5 per week Location: Vedhas Technology Solutions Pvt Ltd 1st Floor City Centre Himayatnagar, Hyderabad -500029. Email ID: HR@TECHVEDHAS.COM Contact HR: 040-23224181.
PabitraKumarGhorai
PySpark is the Python API for Apache Spark, an open source, distributed computing framework and set of libraries for real-time, large-scale data processing. If you're already familiar with Python and libraries such as Pandas, then PySpark is a good language to learn to create more scalable analyses and pipelines.
Data Engineering with Databricks Study Materials
M0hamedIbrahim1
sales analysis project using PySpark provides valuable insights into customer behavior, sales trends, and product performance. These insights can inform strategic decision-making to enhance customer satisfaction, optimize marketing efforts, and improve overall business performance
Azure End To End Data Engineering Project | Azure Data Factory | Azure Databricks | Azure SQL DB | PySpark | Big Data. It is a in depth Data Engineering project using powerful tools like Azure Data Factory, Azure SQL DB, Azure Databricks, Unity Catalog, Delta Live Tables, Spark Streaming, PySpark, Databricks Asset Bundles, GitHub, and more.
windi-wulandari
This project implements an end-to-end data pipeline designed to manage and analyze large-scale credit scoring data. Using AWS S3 as a scalable storage solution and Databricks for processing, the pipeline leverages the power of Apache Spark through PySpark and SQL Spark to handle data transformation and analysis efficiently.
ananyaSingh1305
Solutions to LeetCode SQL 50 challenges using PySpark and Spark SQL on Databricks.
narencastellon
No description available
cleberzumba
Importando dados desestruturados no spark databricks com pyspark
Bayzid03
⚡ A curated collection of PySpark Streaming notebooks built in Databricks — designed to showcase real-time data skills in action. 🚀 Ideal for demonstrating hands-on experience with scalable Spark applications.
Bayzid03
💡 Hands-on PySpark notebooks built in Databricks — covering real-world data transformations, cleaning, and analysis. 🚀 A practical showcase of Spark fundamentals applied to structured datasets.
shakshamchauhan
No description available
Play around with Databricks and PySpark
LeondraJames
Utilizing tools such as Spark, Python (PySpark), SQL, and Databricks, performed logistic regression on customers to predict those at a higher risk of churning, then applied the model to an unseen "new customers" data set.