Found 4,294 repositories(showing 30)
supabase
Stream your Postgres data anywhere in real-time. Simple Rust building blocks for change data capture (CDC) pipelines.
yougov
MongoDB data stream pipeline tools by YouGov (adopted from MongoDB)
GoogleCloudPlatform
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
digitalocean
Golang framework for streaming ETL, observability data pipeline, and event processing apps
areed1192
Unofficial Python API client library for TD Ameritrade. This library allows for easy access of the Standard API and allows users to build data pipelines for the Streaming API.
victor369basu
In this repository, I have developed the entire server-side principal architecture for real-time stock market prediction with Machine Learning. I have used Tensorflow.js for constructing ml model architecture, and Kafka for real-time data streaming and pipelining.
Stratio
Real Time Analytics and Data Pipelines based on Spark Streaming
shafiab
My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggregates Twitter and US stock market data for user sentiment analysis using open source tools - Apache Kafka for data ingestions, Apache Spark & Spark Streaming for batch & real-time processing, Apache Cassandra f or storage, Flask, Bootstrap and HighCharts f or frontend.
RSKriegs
Stream processing pipeline from Finnhub websocket using Spark, Kafka, Kubernetes and more
airyhq
💬 Open Source App Framework to build streaming apps with real-time data - 💎 Build real-time data pipelines and make real-time data universally accessible - 🤖 Join historical and real-time data in the stream to create smarter ML and AI applications. - ⚡ Standardize complex data ingestion and stream data to apps with pre-built connectors
nessos
A lightweight F#/C# library for efficient functional-style pipelines on streams of data.
A real-time interactive web app based on data pipelines using streaming Twitter data, automated sentiment analysis, and MySQL&PostgreSQL database (Deployed on Heroku)
VedantC2307
Mobile Sensor Bridge for ROS2 transforms your android smartphone into a plug‑and‑play sensor suite—streaming camera, spatial pose data, and bidirectional audio into ROS2 topics via rclnodejs. Whether you’re prototyping perception pipelines or building voice‑driven robots, the package lets you leverage your phone’s sensors without extra hardware.
apssouza22
A hybrid Big Data pipeline architecture that combines a real-time streaming layer with a batch layer to process large datasets(Lambda Architecture)
GoogleCloudPlatform
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines. This repository hosts a few example pipelines to get you started with Dataflow.
GoogleCloudPlatform
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Cactusinhand
🦀 A high-performance Rust implementation of git-filter-repo for efficiently rewriting Git repository history. Remove sensitive data, shrink repos, and restructure projects with streaming pipeline architecture.
plecto
Cloud ready pure-python streaming data pipeline library
nama1arpit
A real-time reddit data streaming pipeline for sentiment analysis of various subreddits
Nonanti
High-performance ETL pipeline library for .NET. Process CSV, JSON, Excel, and SQL data with minimal memory usage through streaming operations.
GoogleCloudPlatform
Example Kubernetes app that shows how to build a 'pipeline' to stream data into BigQuery. Uses Redis or Google Cloud PubSub
colossus-lab
AI-powered analysis engine for Argentine government open data. Multi-agent pipeline (LangGraph) with 10 data connectors, NL2SQL, semantic caching, and real-time streaming. Built with FastAPI, PostgreSQL + pgvector, Celery, and Gemini/Claude LLMs.
hoangsonww
📈 A scalable, production-ready data pipeline for real-time streaming & batch processing, integrating Kafka, Spark, Airflow, AWS, Kubernetes, and MLflow. Supports end-to-end data ingestion, transformation, storage, monitoring, and AI/ML serving with CI/CD automation using Terraform & GitHub Actions.
koralium
High-performance streaming SQL query engine designed for real-time data processing. Use cases include event-driven architectures, ETL pipelines, and modern data-intensive applications.
typestreamio
Open Source streaming platform. Write and run typed data pipelines with a minimal, familiar syntax.
tikal-fuseday
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
abdkumar
Generate synthetic Spotify music stream dataset to create dashboards. Spotify API generates fake event data emitted to Kafka. Spark consumes and processes Kafka data, saving it to the Datalake. Airflow orchestrates the pipeline. dbt moves data to Snowflake, transforms it, and creates dashboards.
max-mapper
CLI tool for automating the use of docker containers in streaming data processing pipelines. Works on Windows, Mac and Linux.
TP-Lab
EOSIO Kafka Plugin is used for building real-time data pipelines and streaming apps. This plugin allows you to utilize all of Kafka’s rich real-time features utilizing the EOS blockchain.
oslabs-beta
An Apache Kafka monitoring tool to prototype and scale real-time streaming data pipelines, and test parallelization of multi-stage ML models before production, with metrics for potential automation, in preconfigured Docker containers.