Found 49 repositories(showing 30)
khushal2405
We Build an ETL pipeline using Airflow that accomplishes the following: Downloads data from an AWS S3 bucket, Runs a Spark/Spark SQL job on the downloaded data producing a cleaned-up dataset of delivery deadline missing orders and then Upload the cleaned-up dataset back to the same S3 bucket in a folder primed for higher level analytics
We Build an ETL pipeline using Airflow that accomplishes the following: Downloads data from an AWS S3 bucket, Runs a Spark/Spark SQL job on the downloaded data producing a cleaned-up dataset of delivery deadline missing orders and then Upload the cleaned-up dataset back to the same S3 bucket in a folder primed for higher level analytics
crussedev9
End-to-end ETL pipeline for job market analytics showcasing Python, SQL, dimensional modeling, and Power BI integration
Automated ETL pipeline using AWS Glue and Step Functions to process, enrich, and load airline flight delay data into Redshift for analytics and reporting. Includes crawler orchestration, schema transformation, and SNS-based job monitoring.
shakiroye
A GCP-based ETL pipeline that fetches football data from Football-Data.org, stores raw JSON in Cloud Storage, transforms it, loads analytics tables into BigQuery, and powers a Looker Studio dashboard via scheduled Cloud Run jobs.
Arvind1997
AWS-powered pipeline loads stock data (AAPL, AMZN, BRKA, FB, GOOG, JNJ, MA, MSFT, V, WMT) to S3, Glue for crawling, Athena for querying. Python scripts handle ETL, storing cleaned data in S3. Glue ETL job transfers to Redshift for advanced analytics, ensuring seamless storage, processing, and visualization.
A PySpark supply chain pipeline for product demand analytics: data cleaning, monthly/summary aggregation, JDBC read-write, and visualization. Includes src ETL jobs/transforms/utils, configs YAML setup, notebooks exploration, output datasets, and tests coverage. Ideal for reliable demand insights and revenue tracking.
karkakasadara-tharavu
💼 Complete career transformation path: BE Graduate → Data Engineer ($65K-$130K). Master SQL Server administration, T-SQL programming, SSIS ETL pipelines, Power BI analytics. 593KB content, 75+ files, AdventureWorks databases. Learn database design, normalization, backup strategies, security, CDC, dimensional modeling. Job-ready in 5 months.
DESCRIÇÃO DA VAGA Buscamos engenheiros de dados que se motivem com tecnologia de ponta e um ambiente com com muita autonomia para testar coisas novas. Somos um time que está sempre se reinventando para arquitetar soluções para para processar, armazenar e prover dados cada vez mais relevantes para todos os nossos produtos e também para nossos clientes. Você participará de um time que estará arquitetando sistemas distribuídos, criando pipelines escaláveis e confiáveis, combinando múltiplas fontes de dados e pensando em arquiteturas de dados escaláveis e otimizando recursos pensando na eficiência da nossa infra. Nossa tecnologia suporta as maiores marcas e varejos do mercado a tomarem decisões estratégicas sobre suas vendas no canal digital - e-commerce - e ajudam a aproximá-los das dos shoppers em campanhas em redes sociais. Responsabilidades: Conhecer e interagir com as diferentes áreas da Lett com o objetivo de ter um conhecimento amplo do negócio e das bases de dados; Desenvolver e implantar arquiteturas e processos que suportem as soluções das demais equipes de forma escalável; Governar, documentar e prover acesso a metadados a todas as equipes; Modelar Data Lakes e Data Warehouse; Pesquisar e trazer abordagens e tecnologias modernas para as soluções de Big Data da empresa; Criar e gerenciar data flows, clusters de processamento e armazenamento de dados em nossa cloud; Propor melhorias, otimizações de baixo nível e novas arquiteturas para os outros times; Democratizar o acesso a dados utilizando ferramentas e desenvolvimento de interfaces (como APIs, ETLs, SQL); e trabalhar diretamente com equipes de produto. REQUISITOS DA VAGA Requisitos: Experiência com Python; Experiência com Docker e docker-compose; Estar muito confortável com ambiente Spark (Pyspark no serviço EMR da AWS ou em Kubernetes); AWS (Elastic Beanstalk, SQS, RDS, Lambda Functions, EC2, EMR, S3, SNS); Vasta experiência com Data Lakes em storage de objetos (AWS S3); Vasta experiência com Google Big Query (modelagem de dados, ELTs, manutenção e governança); Experiência com o conceito de ELT; Experiência com governança e catalogação de dados; Apache Airflow (Implementação de DAGs e deploys do Airflow em formato de workers distribuídos); Data warehouses, data lakes, suas interfaces (engines SQL, processos de ETL, acesso direto a objetos) e sua organização ( particionamento, formas de orientar dados, custo e performance); Estar confortável com SQL, bancos de dados relacionais, bancos orientados a documento e armazenamento de arquivo; Arquiteturas diversas de processamento (filas, jobs, workers, functions... etc); Ferramentas de deploy, versionamento de código e infraestrutura na nuvem; Ser capaz de criar fluxos de execução paralela ou concorrente de processamento, assim como execução distribuída quando aplicável; Saber interagir de forma técnica e não-técnica com outros membros da equipe e com pessoas de outras áreas. Diferenciais: Dremio/Athena/AWS Glue Catalog; Ferramenta DBT (Data Build Tool da Fishtown Analytics); Amundsen; Processos de dados envolvendo CDC (Change Data Capture); Terraform; Prometheus/Grafana; Kubernetes/AWS ECS/AWS EKS; PostgreSQL, MongoDB, ElasticSearch, DynamoDB; Java/Scala; Jenkins; Inglês. BENEFÍCIOS DA VAGA Plano de Saúde; Plano Odontológico; VR; VA; Home Office.
deekshithgadi1203
This ETL pipeline performs job analytics by extracting LinkedIn job postings using the Apify API.
ghrjeon
ETL pipeline serving Crypto Jobs Analytics
No description available
tejokiran48-afk
ETL pipeline + job market analytics dashboard using Python and Streamlit .
hoangbui93
ETL pipeline for HR Analytics – Job Change of Data Scientists
ahmedtarek-mel
Real-time Job Market Analytics Dashboard using Python, Streamlit, and Automated ETL pipelines.
hazemtarek-mel
Real-time Job Market Analytics Dashboard using Python, Streamlit, and Automated ETL pipelines.
abreu-joao
Automated ETL pipeline and RESTful API for up-to-date tech job market analytics.
ProTos027
A ETL pipeline built to transform messy job data into data ready for advanced analytics
leonmwandiringa
well architected ETL Jobs pipeline, data lake, etl and analytics. s3, glue crawler, glue catalog, spark, pyspark, python, docker, kubernetes, eks, cloudwatch
kirtishrestha
Airflow ETL pipeline to ingest, transform, stage (Postgres) and warehouse (Snowflake) job-listings data for analytics and reporting.
nguyentunhu
An end-to-end ETL pipeline that aggregates job descriptions from 3 job platforms, extracts key skills, and produces analytics-ready visualizations.
abhinav2105
End-to-end job market analytics pipeline — scraping, ETL, data warehouse (Snowflake), dbt, Prefect orchestration, and Streamlit dashboard
SohaliChandra
End-to-end data pipeline analyzing UK data engineering job trends using Adzuna API, Python ETL pipelines, SQL analytics, and Streamlit dashboard.
Madhusudhangupta
End-to-end IMDb data engineering platform with ETL pipelines, AWS Glue jobs, SQL analytics, and data quality checks
tarnowsky
End-to-end ETL pipeline that scrapes job offers from multiple job boards, processes and models the data, and prepares analytics-ready datasets for data engineering practice
Lautarocuello98
Python ETL pipeline that scrapes remote jobs from the RemoteOK API, cleans the dataset, and exports analytics-ready files (CSV, Excel, JSON).
Mohit0135
A lightweight ETL pipeline built with Python, AWS (S3, Redshift), and cron jobs to automate data ingestion, transformation, and loading for analytics.
JamieChristian22
Job-ready data engineering portfolio showcasing real-world pipelines, ETL workflows, data modeling, cloud data architecture, SQL, Python, snowflake and analytics engineering projects.
yago-novaes
An end-to-end ETL pipeline to extract, transform, and analyze Analytics and Data Engineering job postings using Python, dbt, DuckDB, and Kubernetes.
raunaqkoppikar
Cloud-based ETL pipeline and analytics dashboard for tracking global remote job trends using the Remote OK API, Neon Postgres, and Google Data Studio.