Found 60,645 repositories(showing 30)
apache
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
airbytehq
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Avaiga
Turns Data and AI algorithms into production-ready web applications in no time.
dagster-io
An orchestration platform for the development, production, and observation of data assets.
apache
SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.
ricklamers
Grid studio is a web-based application for data science with full integration of open source data science frameworks and languages.
mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
mark3labs
A Go implementation of the Model Context Protocol (MCP), enabling seamless integration between LLM applications and external data sources and tools.
pentaho
Pentaho Data Integration ( ETL ) a.k.a Kettle
apache
Flink CDC is a streaming data integration tool
cloudquery
Data pipelines for cloud config and security data. Build cloud asset inventory, CSPM, FinOps, and vulnerability management solutions. Extract from AWS, Azure, GCP, and 70+ cloud and SaaS sources.
lance-format
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
apache
Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.
apache
Upserts, Deletes And Incremental Processing on Big Data.
fluvio-community
🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.
jitsucom
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
rudderlabs
Privacy and Security focused Segment-alternative, in Golang and React
DTStack
A data integration framework
seandavi
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
bruin-data
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
django-import-export
Django application and library for importing and exporting data with admin integration.
SolaceLabs
An event-driven framework designed to build and orchestrate multi-agent AI systems. It enables seamless integration of AI agents with real-world data sources and systems, facilitating complex, multi-step workflows.
apache
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
microsoft
Catalog of official Microsoft MCP (Model Context Protocol) server implementations for AI-powered data access and tool integration
deepnote
Deepnote is a drop-in replacement for Jupyter with an AI-first design, sleek UI, new blocks, and native data integrations. Use Python, R, and SQL locally in your favorite IDE, then scale to Deepnote cloud for real-time collaboration, Deepnote agent, and deployable data apps. https://deepnote.com/
PhoebusSi
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts to initiate any meaningful PR on this repo and integrate as many LLM related technologies as possible. 我们打造了方便研究人员上手和使用大模型等微调平台,我们欢迎开源爱好者发起任何有意义的pr!
corbindavenport
Remove AI features, telemetry data reporting, sponsored content, product integrations, and other annoyances from web browsers.
doctrine
Symfony integration for the doctrine/data-fixtures library
meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
apache
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.