Search Results

Found 28 repositories(showing 28)

spark-r-notebooks

jadianes

🧡51

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

122

NOASSERTION

Jupyter Notebook

Updated 1 month ago

big-databigdatadata-analysis+7

amazon-spark

minimaxir

❤️40

R Code + R Notebook for analyzing millions of Amazon reviews using Apache Spark

MIT

HTML

Updated 6 months ago

ml-interpretability-european-football

marcgarnica13

❤️40

Understanding gender differences in professional European football through Machine Learning interpretability and match actions data. This repository contains the full data pipeline implemented for the study *Understanding gender differences in professional European football through Machine Learning interpretability and match actions data*. We evaluated European male, and female football players' main differential features in-match actions data under the assumption of finding significant differences and established patterns between genders. A methodology for unbiased feature extraction and objective analysis is presented based on data integration and machine learning explainability algorithms. Female (1511) and male (2700) data points were collected from event data categorized by game period and player position. Each data point included the main tactical variables supported by research and industry to evaluate and classify football styles and performance. We set up a supervised classification pipeline to predict the gender of each player by looking at their actions in the game. The comparison methodology did not include any qualitative enrichment or subjective analysis to prevent biased data enhancement or gender-related processing. The pipeline had three representative binary classification models; A logic-based Decision Trees, a probabilistic Logistic Regression and a multilevel perceptron Neural Network. Each model tried to draw the differences between male and female data points, and we extracted the results using machine learning explainability methods to understand the underlying mechanics of the models implemented. A good model predicting accuracy was consistent across the different models deployed. ## Installation Install the required python packages ``` pip install -r requirements.txt ``` To handle heterogeneity and performance efficiently, we use PySpark from [Apache Spark](https://spark.apache.org/). PySpark enables an end-user API for Spark jobs. You might want to check how to set up a local or remote Spark cluster in [their documentation](https://spark.apache.org/docs/latest/api/python/index.html). ## Repository structure This repository is organized as follows: - Preprocessed data from the two different data streams is collecting in [the data folder](data/). For the Opta files, it contains the event-based metrics computed from each match of the 2017 Women's Championship and a single file calculating the event-based metrics from the 2016 Men's Championship published [here](https://figshare.com/collections/Soccer_match_event_dataset/4415000/5). Even though we cannot publish the original data source, the two python scripts implemented to homogenize and integrate both data streams into event-based metrics are included in [the data gathering folder](data_gathering/) folder contains the graphical images and media used for the report. - The [data cleaning folder](data_cleaning/) contains descriptor scripts for both data streams and [the final integration](data_cleaning/merger.py) - [Classification](classification/) contains all the Jupyter notebooks for each model present in the experiment as well as some persistent models for testing.

MIT

Jupyter Notebook

Updated 1 year ago

spark-r-notebooks

vezir

❤️35

Apache Spark üzerinde R kullanımı (SparkR) ile büyük veri analizi ve makine öğrenmesi IPython / Jupyter notebook kullanılarak nasıl yapılır?

Jupyter Notebook

Updated 3 years ago

pragmatic-polyglot-data-analysis

lgautier

❤️35

Docker container for off-the-shelf jupyter notebook + Python + R + Spark/pyspark + LLVM

Jupyter Notebook

Updated 5 years ago

docker-containerjupyter-notebookpython+2

docker-jupyter

hilljb

❤️40

Dockerized Jupyter notebooks (Python, R, Spark)

BSD-3-Clause

Shell

Updated 7 years ago

jupyter-notebook

AliyunContainerService

❤️35

Jupyter Notebook Python, Scala, R, Spark, Mesos Stack

Updated 6 years ago

all-in-one-notebook

scigility

❤️35

Jupyter Notebook with Scala, Python, R, Spark, including SBT, based on jupyter/all-spark-notebook

Dockerfile

Updated 4 years ago

Amazon-Spark

krantirk

❤️40

R Code + R Notebook for analyzing millions of Amazon reviews using Apache Spark

MIT

HTML

Updated 6 years ago

https-github.com-jadianes-spark-r-notebooks

alokkumar70

❤️25

No description available

Updated 5 years ago

IMTorgDemo-Notebooks

IMTorgDemo

❤️20

Demo notebooks using a variety of data science and programming tools, such as: spark, python, r, node, scala, java

Jupyter Notebook

Updated 3 years ago

jupyter-docker-stack

fuguixing

❤️40

An all-in-one Docker image for data scientist in Jupyter Notebook. Contains Spark, Python 2, Python 3, R, Tensorflow etc.

MIT

Dockerfile

Updated 4 years ago

dockerdocker-composedocker-container+4

flights_spark

johnlak

❤️35

Test Spark R jupyter notebooks

HTML

Updated 4 years ago

PySpark-SparkR-Notebooks

natg76

❤️35

Contains DS Notebooks for PySpark & SparkR notebooks

Updated 7 years ago

jupyter_all_spark_notebook_docker

mrNicky

❤️35

Run Jupyter Notebook Python, Scala, R, Spark

Updated 6 years ago

operezr

oscarperez11

❤️35

My Repository and Notebooks in R or Python or Spark

Updated 4 years ago

all-spark-notebook

idekernel

❤️35

Jupyter Notebook Python, Scala, R, Spark, Mesos Stack

Shell

Updated 8 years ago

docker-pyspark-cluster

rosszhang

❤️35

Jupyter Notebook Python, Scala, R, Spark, Mesos Stack

Shell

Updated 10 years ago

SparkOnDocker

notiv

❤️35

Jupyter Notebook Python, Scala, R, Spark, Mesos Stack

Updated 10 years ago

jupyter-all-spark-notebook

vishnuratheesh

❤️40

Experiments with the Jupyter All Spark Notebook. https://hub.docker.com/r/jupyter/all-spark-notebook/

MIT

Jupyter Notebook

Updated 5 years ago

docker-swarmjupyterjupyter-notebook+1

movie-ratings-Spark-practice-notebooks-

ananyabadkar

❤️40

Practice notebooks in Google Colab using PySpark with sample datasets for learning Data Processing, queries and analysis.

MIT

Jupyter Notebook

Updated 6 months ago

eoi-spark-notebooks-LauraRodriguez

rguezmoralaura

❤️30

No description available

Jupyter Notebook

Updated 1 month ago

learning_spark_with_r

AnttiRask

❤️40

An R version of a Microsoft Learn notebook Analyze Data with Apache Spark

MIT

Jupyter Notebook

Updated 2 years ago

Road-to-Top1percent-PySpark-contributor-on-StackOverflow-65-PySpark-Notebooks-

murtihash

❤️25

No description available

Jupyter Notebook

Updated 5 years ago

all-spark

hsci-r

❤️30

OpenShift compatible Spark image capable of being run as master, worker or notebook driver, including Python 3.11, R and Scala 2.12 notebooks

Dockerfile

Updated 1 year ago

Data-Pipeline-Azure-Spark-Databricks

charleside2001

❤️35

Learning objectives - Business scenarios for Apache Spark Setting up a cluster Using Python, R, and Scala notebooks Scaling Azure Databricks workflows Data pipelines with Azure Databricks Machine learning architectures Using Azure Databricks for data warehousing

Updated 4 years ago

Towards-Moving-Scientific-Applications-in-the-Cloud

saimmehmood

❤️35

A framework designed by the resources from Swedish National Infrastructure for computing (SNIC) using Apache Spark, SparkR, R language & Jupyter Notebook to enable computations of highly parallel scientific applications.

Shell

Updated 5 years ago

Using_Databricks_integrated_with_GitHub

EddyGiusepe

❤️40

Databricks é uma plataforma em nuvem criada pelos desenvolvedores do Apache Spark para processar grandes volumes de dados. Integra engenharia de dados, ciência de dados e IA em notebooks colaborativos com suporte a Python, SQL, R e Scala. Compatível com AWS, Azure e Google Cloud.

Apache-2.0

Python

Updated 6 months ago

All 28 repositories loaded

GitHub Explorer

Search Results

spark-r-notebooks

amazon-spark

ml-interpretability-european-football

spark-r-notebooks

pragmatic-polyglot-data-analysis

docker-jupyter

jupyter-notebook

all-in-one-notebook

Amazon-Spark

https-github.com-jadianes-spark-r-notebooks

IMTorgDemo-Notebooks

jupyter-docker-stack

flights_spark

PySpark-SparkR-Notebooks

jupyter_all_spark_notebook_docker

operezr

all-spark-notebook

docker-pyspark-cluster

SparkOnDocker

jupyter-all-spark-notebook

movie-ratings-Spark-practice-notebooks-

eoi-spark-notebooks-LauraRodriguez

learning_spark_with_r

Road-to-Top1percent-PySpark-contributor-on-StackOverflow-65-PySpark-Notebooks-

all-spark

Data-Pipeline-Azure-Spark-Databricks

Towards-Moving-Scientific-Applications-in-the-Cloud

Using_Databricks_integrated_with_GitHub

spark-r-notebooks

amazon-spark

ml-interpretability-european-football

spark-r-notebooks

pragmatic-polyglot-data-analysis

docker-jupyter

jupyter-notebook

all-in-one-notebook

Amazon-Spark

https-github.com-jadianes-spark-r-notebooks

IMTorgDemo-Notebooks

jupyter-docker-stack

flights_spark

PySpark-SparkR-Notebooks

jupyter_all_spark_notebook_docker

operezr

all-spark-notebook

docker-pyspark-cluster

SparkOnDocker

jupyter-all-spark-notebook

movie-ratings-Spark-practice-notebooks-

eoi-spark-notebooks-LauraRodriguez

learning_spark_with_r

Road-to-Top1percent-PySpark-contributor-on-StackOverflow-65-PySpark-Notebooks-

all-spark

Data-Pipeline-Azure-Spark-Databricks

Towards-Moving-Scientific-Applications-in-the-Cloud

Using_Databricks_integrated_with_GitHub