Search Results

Found 43 repositories(showing 30)

spark-py-notebooks

jadianes

💛81

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

1.7k

908

NOASSERTION

Jupyter Notebook

Updated 2 days ago

big-databigdatadata-analysis+9

ml-interpretability-european-football

marcgarnica13

❤️40

Understanding gender differences in professional European football through Machine Learning interpretability and match actions data. This repository contains the full data pipeline implemented for the study *Understanding gender differences in professional European football through Machine Learning interpretability and match actions data*. We evaluated European male, and female football players' main differential features in-match actions data under the assumption of finding significant differences and established patterns between genders. A methodology for unbiased feature extraction and objective analysis is presented based on data integration and machine learning explainability algorithms. Female (1511) and male (2700) data points were collected from event data categorized by game period and player position. Each data point included the main tactical variables supported by research and industry to evaluate and classify football styles and performance. We set up a supervised classification pipeline to predict the gender of each player by looking at their actions in the game. The comparison methodology did not include any qualitative enrichment or subjective analysis to prevent biased data enhancement or gender-related processing. The pipeline had three representative binary classification models; A logic-based Decision Trees, a probabilistic Logistic Regression and a multilevel perceptron Neural Network. Each model tried to draw the differences between male and female data points, and we extracted the results using machine learning explainability methods to understand the underlying mechanics of the models implemented. A good model predicting accuracy was consistent across the different models deployed. ## Installation Install the required python packages ``` pip install -r requirements.txt ``` To handle heterogeneity and performance efficiently, we use PySpark from [Apache Spark](https://spark.apache.org/). PySpark enables an end-user API for Spark jobs. You might want to check how to set up a local or remote Spark cluster in [their documentation](https://spark.apache.org/docs/latest/api/python/index.html). ## Repository structure This repository is organized as follows: - Preprocessed data from the two different data streams is collecting in [the data folder](data/). For the Opta files, it contains the event-based metrics computed from each match of the 2017 Women's Championship and a single file calculating the event-based metrics from the 2016 Men's Championship published [here](https://figshare.com/collections/Soccer_match_event_dataset/4415000/5). Even though we cannot publish the original data source, the two python scripts implemented to homogenize and integrate both data streams into event-based metrics are included in [the data gathering folder](data_gathering/) folder contains the graphical images and media used for the report. - The [data cleaning folder](data_cleaning/) contains descriptor scripts for both data streams and [the final integration](data_cleaning/merger.py) - [Classification](classification/) contains all the Jupyter notebooks for each model present in the experiment as well as some persistent models for testing.

MIT

Jupyter Notebook

Updated 1 year ago

DataBricks-PySpark-Notebooks

MWFK

❤️35

Data Engineering with Databricks Study Materials

Jupyter Notebook

Updated 1 year ago

PySpark-Notebooks

yashkathe

❤️35

ipynb notebooks for pyspark

Jupyter Notebook

Updated 1 year ago

PySpark-Notebooks

kishanpython

❤️25

No description available

Jupyter Notebook

Updated 2 years ago

Apache_Spark_Pynq_Xilinx_FPGA_Notebooks

utri092

❤️35

Compendium and backup of working examples for final year project

Jupyter Notebook

Updated 4 years ago

https-github.com-jadianes-spark-py-notebooks

ramironeto

❤️30

No description available

NOASSERTION

Jupyter Notebook

Updated 8 years ago

PySpark-SparkR-Notebooks

natg76

❤️35

Contains DS Notebooks for PySpark & SparkR notebooks

Updated 7 years ago

sparkPyNotebooks

josh26z

❤️25

No description available

Jupyter Notebook

Updated 9 months ago

Apache-Spark-PySpark-Notebooks

psifio

❤️25

No description available

Python

Updated 10 years ago

PySpark-Notebooks

Abhilash0708

❤️40

Here I upload all my Spark and Pyspark related works

MIT

Jupyter Notebook

Updated 3 years ago

PySpark-Notebooks

aneezx

❤️25

No description available

Jupyter Notebook

Updated 10 months ago

PySparkNotebooks

lausandt

❤️25

No description available

Jupyter Notebook

Updated 1 year ago

PySpark-Notebooks

anushasuresh348

❤️35

Worked on two datasets- Titanic survival & News Headline Classification. Dataset of titanic is attached, dataset for news classification-https://www.kaggle.com/datasets/uciml/news-aggregator-dataset

Jupyter Notebook

Updated 3 years ago

PySpark-Notebooks

Matheendev

❤️25

No description available

HTML

Updated 1 year ago

PySpark-Notebooks

SonakshiA

❤️35

A couple of notebooks to cut my teeth with PySpark :)

Jupyter Notebook

Updated 7 months ago

PySpark-Notebooks

eldferns86

❤️25

No description available

Updated 4 years ago

pySparkNotebooks

ashiva99

❤️35

No description available

Jupyter Notebook

Updated 2 months ago

PySpark-Notebooks

ahmad-hamed

❤️25

No description available

Jupyter Notebook

Updated 3 years ago

PySpark-Notebooks

jehuhta

❤️25

No description available

Updated 10 months ago

PySpark_notebooks

AnshuData

❤️25

No description available

Jupyter Notebook

Updated 3 years ago

PySparkNotebooks

ankesh86

❤️35

Contains Google Collab Notebooks using PySpark with Data Engineering and Data Science projects

Jupyter Notebook

Updated 1 year ago

PySparkNotebooks

HariniMlc

❤️25

No description available

Jupyter Notebook

Updated 1 year ago

PySpark-Notebooks

marckx0

❤️35

Notebooks Desarollados en un Curso de BigData

Jupyter Notebook

Updated 4 years ago

pySparkNotebooks

ibrahim99977

❤️25

No description available

HTML

Updated 1 year ago

PySpark-Notebooks

harshitlikhar

❤️35

PySpark tutorial notebooks

Jupyter Notebook

Updated 3 years ago

PySparkNotebooks

sarthak221995

❤️25

No description available

Updated 5 years ago

PySpark-Notebooks

vidush5

❤️35

This repository contains the list of pyspark technical scenario based questions and answers

Jupyter Notebook

Updated 2 years ago

PySpark_notebooks

eshraaqsaeed

❤️35

Developed both regression and classification models to benchmark different algorithms. Fake post detection utilizing NLP.

Jupyter Notebook

Updated 2 years ago

PySpark-ML-Notebooks

ruidbras

❤️30

No description available

MIT

Jupyter Notebook

Updated 6 years ago

GitHub Explorer

Search Results

spark-py-notebooks

ml-interpretability-european-football

DataBricks-PySpark-Notebooks

PySpark-Notebooks

PySpark-Notebooks

Apache_Spark_Pynq_Xilinx_FPGA_Notebooks

https-github.com-jadianes-spark-py-notebooks

PySpark-SparkR-Notebooks

sparkPyNotebooks

Apache-Spark-PySpark-Notebooks

PySpark-Notebooks

PySpark-Notebooks

PySparkNotebooks

PySpark-Notebooks

PySpark-Notebooks

PySpark-Notebooks

PySpark-Notebooks

pySparkNotebooks

PySpark-Notebooks

PySpark-Notebooks

PySpark_notebooks

PySparkNotebooks

PySparkNotebooks

PySpark-Notebooks

pySparkNotebooks

PySpark-Notebooks

PySparkNotebooks

PySpark-Notebooks

PySpark_notebooks

PySpark-ML-Notebooks

spark-py-notebooks

ml-interpretability-european-football

DataBricks-PySpark-Notebooks

PySpark-Notebooks

PySpark-Notebooks

Apache_Spark_Pynq_Xilinx_FPGA_Notebooks

https-github.com-jadianes-spark-py-notebooks

PySpark-SparkR-Notebooks

sparkPyNotebooks

Apache-Spark-PySpark-Notebooks

PySpark-Notebooks

PySpark-Notebooks

PySparkNotebooks

PySpark-Notebooks

PySpark-Notebooks

PySpark-Notebooks

PySpark-Notebooks

pySparkNotebooks

PySpark-Notebooks

PySpark-Notebooks

PySpark_notebooks

PySparkNotebooks

PySparkNotebooks

PySpark-Notebooks

pySparkNotebooks

PySpark-Notebooks

PySparkNotebooks

PySpark-Notebooks

PySpark_notebooks

PySpark-ML-Notebooks