Found 39 repositories(showing 30)
onehouseinc
Monitoring and insights on your data lakehouse tables
This repository showcases a Medallion Architecture Data Lakehouse designed for both batch and real-time processing of e-commerce and marketing data. It supports comprehensive data analysis, reporting, and monitoring, providing a scalable solution for deriving insights from integrated datasets.
SivaPrasath26
Hands-on AWS data engineering pipelines: batch, streaming, lakehouse, monitoring, CI/CD.
CheeYuTan
End-to-end Data Quality Monitoring framework on Databricks: custom rules, profiling, thresholds, dashboards, and drilldowns. Covers validity, completeness, consistency, accuracy with reusable templates and visualizations.
DavidGitter
A helm chart containing an on-premise coud native lakehouse expanded with the use case for monitoring earthquake data.
EmmaSwan0729
End-to-end Lakehouse data pipeline built on Microsoft Fabric, featuring incremental processing, data quality framework with severity-based gating, and pipeline observability through structured run logging and monitoring views.
Babitha23
UK Energy Intelligence Lakehouse: A Modern Data Platform for Monitoring Consumption, Efficiency & Price Trends.
Przemyslaw11
A scalable data lakehouse for collecting, processing, and analyzing ALICE experiment's infrastructure monitoring data at CERN, featuring version-controlled datasets, automated ETL pipelines, and a research-oriented query interface designed for both technical and non-technical scientists.
seanjw13
Code Example that uses Databricks MLOps Stacks and Lakehouse Monitoring to set up an Automated Retraining Pipeline
BygraveRyan
Orchestrating AI agents to build production-grade data engineering on GCP — lakehouse (Bronze→Silver→Gold), PySpark, BigQuery, dbt & Gemini AI monitoring. Where modern DE meets autonomous development.
thibaut7
End-to-end Data Lakehouse architecture (NiFi, Iceberg, Spark, Postgres) with a built-in Observability Hub. A production-ready local environment designed for managing and monitoring large-scale data streams.
thejasono
End-to-end Databricks Community Edition lakehouse demo for NYC Taxi data, showing batch medallion (Bronze/Silver/Gold) pipelines, Unity Catalog governance, SQL-based transformations, and lightweight MLflow monitoring without paid features.
Built a real-time IoT data pipeline using Apache Kafka, Spark Streaming, and Delta Lake to ingest, process, and analyze streaming sensor data. Enabled low-latency insights and scalable analytics within a unified lakehouse architecture for improved operational monitoring.
mik3lol
Demo data ingestion and Lakehouse Monitoring using data from DataSF
pavansri8886
PySpark-based monitoring and optimization framework for Delta Lakehouse platforms. Detects small-file fragmentation, parses Delta transaction logs, flags OPTIMIZE/VACUUM gaps, and estimates Azure storage costs — built on Databricks Serverless.
ThulasiramanBalaraman
No description available
data-engineer-yogesh
This project is a hands-on Delta Lake learning lab designed to master Delta Lake internals, performance optimization, and governance using a public-sector soil health & agriculture analytics use case.
This project builds a lakehouse to monitor and analyze renewable energy data using Azure Synapse, Delta Lake, and Azure Data Lake Storage for insights.
Saarthak8
A data lakehouse solution for a heartbeat monitoring device.
clickzetta
A streamlit app for monitoring SQLs in Clickzetta Lakehouse
Databricks Lakehouse monitoring is a popular way of monitoring data drift, here the proposal is to use open source technologies instead of lakehouse monitoring to acheive the same results.
rv-online
Python monitoring project for checking lakehouse datasets, surfacing quality regressions, and summarizing operational health.
suryasaitura-db
Pharmaceutical Manufacturing Digital Twin Platform - GSK-Inspired Bioreactor Monitoring & Quality Control powered by Databricks Lakehouse
PENE18
End-to-end e-commerce Data Lakehouse in Docker: Iceberg, Airflow, Spark, Dremio & real-time monitoring.
hansraj1108
This project implements a production-grade IoT Sensor Monitoring & Anomaly Detection Pipeline using the Databricks Lakehouse Platform.
Kritansh-Tank
A governed, scalable Lakehouse platform built on Databricks for real-time IoT monitoring, predictive maintenance, and production quality tracking.
suryasai87
Pharmaceutical Manufacturing Digital Twin Platform - Real-time monitoring, predictive analytics, 21 CFR Part 11 compliant | Databricks Lakehouse + MLflow + Dash
jcinterrante
A Databricks lakehouse application for explainable complaint-driven risk monitoring, triage, and lightweight issue management using public CFPB complaint data.
tejaswinikannan
End-to-end data engineering pipeline using Kafka, Parquet lakehouse (Bronze/Silver/Gold), and Airflow orchestration for glucose monitoring analytics.
🚕 Open-source Big Data Lakehouse built with PySpark, Delta Lake, Airflow & MinIO – automated monthly ingestion, data quality monitoring, and BI-ready analytics.