Found 238 repositories(showing 30)
nchammas
A command-line tool for launching Apache Spark clusters.
amplab
Scripts used to setup a Spark cluster on EC2
DataSenseiAryan
Automated Real-Time Indian Railway Twitter Complaint Administration System. It uses Apache Kafka, Spark, MySQL, PHP. The full project was deployed on AWS EC2 and RDS.
huntingzhu
Recommendation System, Collaborative Filtering, Spark, Hive, Flask, Web Crawler, AWS EC2, AWS RDS
BD2KGenomics
Image and VM management for Jenkins, Spark and Mesos clusters in EC2
shivaram
Scripts used to setup a Spark cluster on EC2
entropyltd
Spark-cloud is a set of scripts for starting spark clusters on ec2
CloudComputingCourse
No description available
geotrellis
Scripts to deploy a GeoTrellis Spark cluster on EC2
felixgborrego
Sbt plugin to submit Spark jobs to AWS EMR Spark Clusters
pishen
SBT plugin for spark-ec2
anish749
Data Pipeline examples using Oozie, Spark and Hive on Cloudera VM and AWS EC2 (branch aws-ec2)
codeaucafe
FULL stack data science project (tech currently utilized: AWS/boto3/EMR/EC2/S3, Python, PySpark (Spark SQL and MLlib), and Flask/Flask RESTPlus)
phamthuonghai
No description available
rochitasundar
Scrapped tweets using twitter API (for keyword ‘Netflix’) on an AWS EC2 instance, ingested data into S3 via kinesis firehose. Used Spark ML on databricks to build a pipeline for sentiment classification model and Athena & QuickSight to build a dashboard
Maggie1001
Spark, EMR, EC2, Redshift, Glue
bgrosjea
Setup procedure to work with jupyter notebook and pyspark on a EC2 AWS instance
MBtech
Ansible playbook to setup apache spark and hdfs on AWS EC2
richjdowney
This project demonstrates skills in data engineering, specifically it contains an efficient ETL process utilizing AWS EC2, EMR and S3, Python and Spark and orchestrating the data pipeline with Airflow
rajshah1
Grad Course Work for ITCS-6190 Cloud Computing for Data Analysis. Stack Used : AWS , EC2 Clusters ,Spark,Spring Boot Applications
ChahiriAbderrahmane
This project simulates a real-world enterprise data migration and modernization strategy. It extracts transactional data from a simulated "On-Premise" environment (hosted on AWS EC2), performs heavy distributed processing using a Hadoop/Spark cluster, and ultimately serves the data via a Cloud-Native, serverless architecture to optimize costs .
anuragdogra2192
Spark and Python for Big Data with PySpark (SparkML, DataFrames) Udemy course projects
jinliangwei
No description available
Ting-DS
Spark ML, Spark SQL, Spark DataFrame, AWS EMR, AWS S3, AWS EC2, ML Classification
ShauryaManiTripathi
MY work around AWS (includes NGINX,EC2,S3,Autoscaling,RDS,BeanStalk,Hadoop,Spark)
Overcome mislabeling errors in genomics training sets by utilizing machine learning on AWS EC2 and Apache Spark.
Build a recommendation system for Twitter hashtags using Neo4j graph database running on Spark GraphX on an EC2 cluster on AWS.
Implemented spatial hotspot analysis on the NYC Yellow Cab taxi trip records using spark cluster setup on the AWS EC2 Instances. The aim was to analyse huge dataset using distributed cluster-computing framework like Apache Spark and Apache Sedona.
longNguyen010203
👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflows🥊
Architectshwet
A project about building a stacked model with tuning the hyperparameters with grid search and hyperopt and used PySpark to test the performance of model in Spark Clusters in AWS EC2 and ROSE to balance the target variable