Found 2,893 repositories(showing 30)
zhaoyachao
大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批流,私域营销等模块
duoan
🏭 Mega Scale Multimodal DataPipeline for SOTA Foundation Models
coursera
DataPipeline for humans.
ErdemOzgen
Roadmap for Data Engineering
jamesdensmore
No description available
LiberCoders
Official Implementation of "CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion"
josephmachado
Simple stream processing pipeline
ContextData
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
Smars-Bin-Hu
A cloud-native data pipeline and visualization project analyzing Formula 1 racing data using Azure, Databricks, Delta Lake, Tableau, and Python for insightful EDA and interactive dashboards.
Stability-AI
Iterable datapipelines for pytorch training.
kartik4949
High Performance Tensorflow Data Pipeline with State of Art Augmentations and low level optimizations.
LuQQiu
Real time stock data pipeline --play with Kafka, Cassandra, Spark, Redis, Node.js, Zookeeper
josephmachado
Step by step instructions to create a production-ready data pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Alireza-Akhavan
Tensorflow 2 Tutorials (use tensorflow and keras in a better way!)
cloudposse
Terraform module designed to easily backup EFS filesystems to S3 using DataPipeline
KennethanCeyer
Awesome list for datapipeline
guimou
Various demos of data pipelines
covalenthq
Ethereum client written in Go, modified for full-hierarchy data exports and block specimen production
indix
Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
WaylonWalker
kedro cli plugin for generating a static kedro viz site (html, css, js) that can be deployed on many serverless tools.
behnamyazdan
This course is designed to provide learners with the fundamental skills needed for data engineering using Python. The objective is to introduce anyone interested in the topic to Python's data engineering-related features.
shazam
Domain-specific language to help build and maintain AWS Data Pipelines
kromozome2003
Building Json data pipeline within Snowflake using Streams and Tasks
josephmachado
Example repo to create end to end tests for data pipeline.
hieuimba
Spark-based pipeline to extract and parse monthly games from the Lichess database.
WaylonWalker
A GitHub Action to lint, test, build-docs, package, and run your kedro pipelines. Supports any Python version you'll give it (that is also supported by pyenv).
NorthConcepts
DataPipeline Examples
NVIDIA
Go library that provides easy-to-use interfaces and tools for TensorFlow users, in particular allowing to train existing TF models on .tar and .tgz datasets
mehroosali
A data pipeline project build on databricks and azure to demostrate lifecycle of a cloud data project.