Found 5,723 repositories(showing 30)
ClickHouse
ClickHouse® is a real-time analytics database management system
airbytehq
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
prestodb
The official home of the Presto distributed SQL query engine for big data
apache
Apache Doris is an easy-to-use, high performance and unified analytics database.
StarRocks
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
databendlabs
Data Agent Ready Warehouse : One for Analytics, Search, AI, Python Sandbox. — rebuilt from scratch. Unified architecture on your S3.
delta-io
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
lance-format
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
apache
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
lakesoul-io
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
apache
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
apache
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
ByConity
ByConity is an open source cloud data warehouse
ytsaurus
YTsaurus is a scalable and fault-tolerant open-source big data platform.
Mooncake-Labs
Real-time analytics on Postgres tables
apache
Apache Polaris, the interoperable, open source catalog for Apache Iceberg
apache
Apache Fluss is a streaming storage built for real-time analytics.
datazip-inc
OLake - Fastest Databases, Kafka & S3 Replication to Apache Iceberg or Plain Parquet. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supported sources : Postgres, MongoDB, MySQL, Oracle, MSSql, DB2, Kafka, S3.
lakekeeper
Lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.
apache
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
apache
Apache Amoro(incubating) is a Lakehouse management system built on open data lake formats.
ClickHouse
ClickBench: a Benchmark For Analytical Databases
paradedb
DuckDB-powered data lake analytics from Postgres
A curated list of open source tools used in analytics platforms and data engineering ecosystem
nimtable
The observability platform for Iceberg lakehouses.
databricks-demos
Demos to implement your Databricks Lakehouse
Open Control Plane for Tables in Data Lakehouse
gigapi
GigAPI is a Timeseries lakehouse for real-time data and sub-second queries, powered by DuckDB OLAP + Parquet Query Engine, Compactor w/ Cloud-Native Storage. Drop-in FDAP alternative ⭐
kamu-data
Next-generation decentralized data lakehouse and a multi-party stream processing network
databricks
Examples of using Terraform to deploy Databricks resources