Found 28 repositories(showing 28)
jadianes
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
minimaxir
R Code + R Notebook for analyzing millions of Amazon reviews using Apache Spark
marcgarnica13
Understanding gender differences in professional European football through Machine Learning interpretability and match actions data. This repository contains the full data pipeline implemented for the study *Understanding gender differences in professional European football through Machine Learning interpretability and match actions data*. We evaluated European male, and female football players' main differential features in-match actions data under the assumption of finding significant differences and established patterns between genders. A methodology for unbiased feature extraction and objective analysis is presented based on data integration and machine learning explainability algorithms. Female (1511) and male (2700) data points were collected from event data categorized by game period and player position. Each data point included the main tactical variables supported by research and industry to evaluate and classify football styles and performance. We set up a supervised classification pipeline to predict the gender of each player by looking at their actions in the game. The comparison methodology did not include any qualitative enrichment or subjective analysis to prevent biased data enhancement or gender-related processing. The pipeline had three representative binary classification models; A logic-based Decision Trees, a probabilistic Logistic Regression and a multilevel perceptron Neural Network. Each model tried to draw the differences between male and female data points, and we extracted the results using machine learning explainability methods to understand the underlying mechanics of the models implemented. A good model predicting accuracy was consistent across the different models deployed. ## Installation Install the required python packages ``` pip install -r requirements.txt ``` To handle heterogeneity and performance efficiently, we use PySpark from [Apache Spark](https://spark.apache.org/). PySpark enables an end-user API for Spark jobs. You might want to check how to set up a local or remote Spark cluster in [their documentation](https://spark.apache.org/docs/latest/api/python/index.html). ## Repository structure This repository is organized as follows: - Preprocessed data from the two different data streams is collecting in [the data folder](data/). For the Opta files, it contains the event-based metrics computed from each match of the 2017 Women's Championship and a single file calculating the event-based metrics from the 2016 Men's Championship published [here](https://figshare.com/collections/Soccer_match_event_dataset/4415000/5). Even though we cannot publish the original data source, the two python scripts implemented to homogenize and integrate both data streams into event-based metrics are included in [the data gathering folder](data_gathering/) folder contains the graphical images and media used for the report. - The [data cleaning folder](data_cleaning/) contains descriptor scripts for both data streams and [the final integration](data_cleaning/merger.py) - [Classification](classification/) contains all the Jupyter notebooks for each model present in the experiment as well as some persistent models for testing.
vezir
Apache Spark üzerinde R kullanımı (SparkR) ile büyük veri analizi ve makine öğrenmesi IPython / Jupyter notebook kullanılarak nasıl yapılır?
lgautier
Docker container for off-the-shelf jupyter notebook + Python + R + Spark/pyspark + LLVM
hilljb
Dockerized Jupyter notebooks (Python, R, Spark)
AliyunContainerService
Jupyter Notebook Python, Scala, R, Spark, Mesos Stack
scigility
Jupyter Notebook with Scala, Python, R, Spark, including SBT, based on jupyter/all-spark-notebook
krantirk
R Code + R Notebook for analyzing millions of Amazon reviews using Apache Spark
alokkumar70
No description available
IMTorgDemo
Demo notebooks using a variety of data science and programming tools, such as: spark, python, r, node, scala, java
fuguixing
An all-in-one Docker image for data scientist in Jupyter Notebook. Contains Spark, Python 2, Python 3, R, Tensorflow etc.
johnlak
Test Spark R jupyter notebooks
natg76
Contains DS Notebooks for PySpark & SparkR notebooks
Run Jupyter Notebook Python, Scala, R, Spark
oscarperez11
My Repository and Notebooks in R or Python or Spark
idekernel
Jupyter Notebook Python, Scala, R, Spark, Mesos Stack
rosszhang
Jupyter Notebook Python, Scala, R, Spark, Mesos Stack
notiv
Jupyter Notebook Python, Scala, R, Spark, Mesos Stack
vishnuratheesh
Experiments with the Jupyter All Spark Notebook. https://hub.docker.com/r/jupyter/all-spark-notebook/
ananyabadkar
Practice notebooks in Google Colab using PySpark with sample datasets for learning Data Processing, queries and analysis.
rguezmoralaura
No description available
AnttiRask
An R version of a Microsoft Learn notebook Analyze Data with Apache Spark
No description available
hsci-r
OpenShift compatible Spark image capable of being run as master, worker or notebook driver, including Python 3.11, R and Scala 2.12 notebooks
charleside2001
Learning objectives - Business scenarios for Apache Spark Setting up a cluster Using Python, R, and Scala notebooks Scaling Azure Databricks workflows Data pipelines with Azure Databricks Machine learning architectures Using Azure Databricks for data warehousing
A framework designed by the resources from Swedish National Infrastructure for computing (SNIC) using Apache Spark, SparkR, R language & Jupyter Notebook to enable computations of highly parallel scientific applications.
EddyGiusepe
Databricks é uma plataforma em nuvem criada pelos desenvolvedores do Apache Spark para processar grandes volumes de dados. Integra engenharia de dados, ciência de dados e IA em notebooks colaborativos com suporte a Python, SQL, R e Scala. Compatível com AWS, Azure e Google Cloud.
All 28 repositories loaded