Search Results

Found 66 repositories(showing 30)

all-spark-notebook

whole-tale

❤️30

Jupyter Notebook with Spark support extracted from jupyter/docker-stack

Python

Updated 8 months ago

ml-interpretability-european-football

marcgarnica13

❤️40

Understanding gender differences in professional European football through Machine Learning interpretability and match actions data. This repository contains the full data pipeline implemented for the study *Understanding gender differences in professional European football through Machine Learning interpretability and match actions data*. We evaluated European male, and female football players' main differential features in-match actions data under the assumption of finding significant differences and established patterns between genders. A methodology for unbiased feature extraction and objective analysis is presented based on data integration and machine learning explainability algorithms. Female (1511) and male (2700) data points were collected from event data categorized by game period and player position. Each data point included the main tactical variables supported by research and industry to evaluate and classify football styles and performance. We set up a supervised classification pipeline to predict the gender of each player by looking at their actions in the game. The comparison methodology did not include any qualitative enrichment or subjective analysis to prevent biased data enhancement or gender-related processing. The pipeline had three representative binary classification models; A logic-based Decision Trees, a probabilistic Logistic Regression and a multilevel perceptron Neural Network. Each model tried to draw the differences between male and female data points, and we extracted the results using machine learning explainability methods to understand the underlying mechanics of the models implemented. A good model predicting accuracy was consistent across the different models deployed. ## Installation Install the required python packages ``` pip install -r requirements.txt ``` To handle heterogeneity and performance efficiently, we use PySpark from [Apache Spark](https://spark.apache.org/). PySpark enables an end-user API for Spark jobs. You might want to check how to set up a local or remote Spark cluster in [their documentation](https://spark.apache.org/docs/latest/api/python/index.html). ## Repository structure This repository is organized as follows: - Preprocessed data from the two different data streams is collecting in [the data folder](data/). For the Opta files, it contains the event-based metrics computed from each match of the 2017 Women's Championship and a single file calculating the event-based metrics from the 2016 Men's Championship published [here](https://figshare.com/collections/Soccer_match_event_dataset/4415000/5). Even though we cannot publish the original data source, the two python scripts implemented to homogenize and integrate both data streams into event-based metrics are included in [the data gathering folder](data_gathering/) folder contains the graphical images and media used for the report. - The [data cleaning folder](data_cleaning/) contains descriptor scripts for both data streams and [the final integration](data_cleaning/merger.py) - [Classification](classification/) contains all the Jupyter notebooks for each model present in the experiment as well as some persistent models for testing.

MIT

Jupyter Notebook

Updated 1 year ago

public-notebooks

kensuio-oss

❤️30

A repo for all kind of notebooks to demo the spark notebook, spark, scala, ...

Updated 5 years ago

Apache-Spark-Tutorials

amanjeetsahu

🧡50

This repo contains my learnings and practice notebooks on Spark using PySpark (Python Language API on Spark). All the notebooks in the repo can be used as template code for most of the ML algorithms and can be built upon it for more complex problems.

CC0-1.0

Jupyter Notebook

Updated 1 month ago

big-databigdatamachine-learning+5

all-spark-go-notebook

adamyordan

❤️35

Docker image jupyter/all-spark-notebook with added Go kernel from gophernotes

Dockerfile

Updated 6 years ago

Jupyterhub-spark-python-k8s

azfaraziz

❤️35

The installation of Jupyterhub + all-spark-notebook + Kubernetes locally

Jupyter Notebook

Updated 3 years ago

jupyter-notebookjupyterhubkubernetes+1

ApacheSpark_with_Scala

saryamane

❤️35

This is the repository that will host all the notebooks hosted for Apache Spark using Scala

Scala

Updated 5 years ago

docker-jupyter-spark2

Archethought

❤️35

Merge of Docker Stacks pyspark-notebook and all-spark-notebook updated to Spark 2.0.2

Jupyter Notebook

Updated 9 years ago

docker-jupyter-spark2

cownby

❤️35

Merge of Docker Stacks pyspark-notebook and all-spark-notebook updated to Spark 2.0.2

Updated 9 years ago

Modern-data-stack-in-an-hour

ekote

❤️35

Learn how to build an end-to-end lakehouse architecture all the way from ingestion to reporting. In this lab, learn how to ingest data into the lakehouse leveraging our Data Integration capabilities. Then, see how you can use notebooks and Spark to transform your data at scale.

Jupyter Notebook

Updated 3 months ago

all-spark-education-notebook

datainpoint

❤️35

all-spark-education-notebook is a community maintained Jupyter Docker Stack image.

Python

Updated 5 years ago

all-in-one-notebook

scigility

❤️35

Jupyter Notebook with Scala, Python, R, Spark, including SBT, based on jupyter/all-spark-notebook

Dockerfile

Updated 4 years ago

sparkAllInOne

CharlesDLandau

❤️40

Provision, run and delete a Spark cluster all from your Jupyter notebook.

MIT

Jupyter Notebook

Updated 6 years ago

Azure-Data-Engineer-Stuffs

tanujit

❤️35

You can find all kind of azure data engineering stuffs here such as ADF pipelines, Databricks Spark notebooks etc.

Updated 2 years ago

jupyter-docker-stack

fuguixing

❤️40

An all-in-one Docker image for data scientist in Jupyter Notebook. Contains Spark, Python 2, Python 3, R, Tensorflow etc.

MIT

Dockerfile

Updated 4 years ago

dockerdocker-composedocker-container+4

python_data_science_bootcamp

Sweeteally

❤️35

Are you ready to start your path to becoming a Data Scientist! This comprehensive course will be your guide to learning how to use the power of Python to analyze data, create beautiful visualizations, and use powerful machine learning algorithms! Data Scientist has been ranked the number one job on Glassdoor and the average salary of a data scientist is over $120,000 in the United States according to Indeed! Data Science is a rewarding career that allows you to solve some of the world's most interesting problems! This course is designed for both beginners with some programming experience or experienced developers looking to make the jump to Data Science! This comprehensive course is comparable to other Data Science bootcamps that usually cost thousands of dollars, but now you can learn all that information at a fraction of the cost! With over 100 HD video lectures and detailed code notebooks for every lecture this is one of the most comprehensive course for data science and machine learning on Udemy! We'll teach you how to program with Python, how to create amazing data visualizations, and how to use Machine Learning with Python! Here a just a few of the topics we will be learning: Programming with Python NumPy with Python Using pandas Data Frames to solve complex tasks Use pandas to handle Excel Files Web scraping with python Connect Python to SQL Use matplotlib and seaborn for data visualizations Use plotly for interactive visualizations Machine Learning with SciKit Learn, including: Linear Regression K Nearest Neighbors K Means Clustering Decision Trees Random Forests Natural Language Processing Neural Nets and Deep Learning Support Vector Machines and much, much more! Enroll in the course and become a data scientist today! Wat zijn de vereisten? Some programming experience Admin permissions to download files Wat leer ik in deze cursus? Use Python for Data Science and Machine Learning Use Spark for Big Data Analysis Implement Machine Learning Algorithms Learn to use NumPy for Numerical Data Learn to use Pandas for Data Analysis Learn to use Matplotlib for Python Plotting Learn to use Seaborn for statistical plots Use Plotly for interactive dynamic visualizations Use SciKit-Learn for Machine Learning Tasks K-Means Clustering Logistic Regression Linear Regression Random Forest and Decision Trees Natural Language Processing and Spam Filters Neural Networks Support Vector Machines Wie is het doelpubliek? This course is meant for people with at least some programming experience

Jupyter Notebook

Updated 1 year ago

docker_ds_tutorial

joshua-staples

❤️35

A tutorial for setting up and using a Docker all-spark-notebook image/container for data science work.

Jupyter Notebook

Updated 3 years ago

all-spark-notebook

psyoblade

❤️35

Apache Spark 실습을 위한 Docker 이미지 (AWS S3 저장)

Jupyter Notebook

Updated 1 year ago

all-spark-notebook

idekernel

❤️35

Jupyter Notebook Python, Scala, R, Spark, Mesos Stack

Shell

Updated 8 years ago

all-spark-yarn-notebook

ultimoguerrero

❤️40

Contains files required to create a Jupyter Notebook container with Spark and Yarn support

Apache-2.0

Shell

Updated 8 years ago

all-spark-scrapy-notebook

amavzyutov

❤️25

No description available

Updated 8 years ago

customized-all-spark-notebook

stankiewicz

❤️25

No description available

Jupyter Notebook

Updated 8 years ago

custom-all-spark-notebook

ernane

❤️40

Jupyter Docker Stacks - https://jupyter-docker-stacks.readthedocs.io/en/latest/index.html

MIT

Dockerfile

Updated 3 years ago

jupyter-all-spark-notebook

vishnuratheesh

❤️40

Experiments with the Jupyter All Spark Notebook. https://hub.docker.com/r/jupyter/all-spark-notebook/

MIT

Jupyter Notebook

Updated 5 years ago

docker-swarmjupyterjupyter-notebook+1

all-spark-notebooks-custom

jiekebo

❤️35

Minor customizations to the jupyter all-spark-notebooks docker image

Python

Updated 8 years ago

jupyter-all-spark-notebook-plus

scr512

❤️35

Docker image that builds/extends jupyter/all-spark-notebook to include things I find useful.

Dockerfile

Updated 4 years ago

environment-jovyans-all-spark-notebook

jovyans

❤️20

Environment: Jovyans All Spark Notebook.

Dockerfile

Updated 3 years ago

docker-all-spark-notebook-custom

shadowandy

❤️25

No description available

Updated 8 years ago

jupyter_all_spark_notebook_docker

mrNicky

❤️35

Run Jupyter Notebook Python, Scala, R, Spark

Updated 6 years ago

bchwtz-all-spark-hadoop-notebook

bchwtz

❤️35

Docker Image for Data Engineering Examples

Dockerfile

Updated 5 years ago

GitHub Explorer

Search Results

all-spark-notebook

ml-interpretability-european-football

public-notebooks

Apache-Spark-Tutorials

all-spark-go-notebook

Jupyterhub-spark-python-k8s

ApacheSpark_with_Scala

docker-jupyter-spark2

docker-jupyter-spark2

Modern-data-stack-in-an-hour

all-spark-education-notebook

all-in-one-notebook

sparkAllInOne

Azure-Data-Engineer-Stuffs

jupyter-docker-stack

python_data_science_bootcamp

docker_ds_tutorial

all-spark-notebook

all-spark-notebook

all-spark-yarn-notebook

all-spark-scrapy-notebook

customized-all-spark-notebook

custom-all-spark-notebook

jupyter-all-spark-notebook

all-spark-notebooks-custom

jupyter-all-spark-notebook-plus

environment-jovyans-all-spark-notebook

docker-all-spark-notebook-custom

jupyter_all_spark_notebook_docker

bchwtz-all-spark-hadoop-notebook

all-spark-notebook

ml-interpretability-european-football

public-notebooks

Apache-Spark-Tutorials

all-spark-go-notebook

Jupyterhub-spark-python-k8s

ApacheSpark_with_Scala

docker-jupyter-spark2

docker-jupyter-spark2

Modern-data-stack-in-an-hour

all-spark-education-notebook

all-in-one-notebook

sparkAllInOne

Azure-Data-Engineer-Stuffs

jupyter-docker-stack

python_data_science_bootcamp

docker_ds_tutorial

all-spark-notebook

all-spark-notebook

all-spark-yarn-notebook

all-spark-scrapy-notebook

customized-all-spark-notebook

custom-all-spark-notebook

jupyter-all-spark-notebook

all-spark-notebooks-custom

jupyter-all-spark-notebook-plus

environment-jovyans-all-spark-notebook

docker-all-spark-notebook-custom

jupyter_all_spark_notebook_docker

bchwtz-all-spark-hadoop-notebook