Found 66 repositories(showing 30)
whole-tale
Jupyter Notebook with Spark support extracted from jupyter/docker-stack
marcgarnica13
Understanding gender differences in professional European football through Machine Learning interpretability and match actions data. This repository contains the full data pipeline implemented for the study *Understanding gender differences in professional European football through Machine Learning interpretability and match actions data*. We evaluated European male, and female football players' main differential features in-match actions data under the assumption of finding significant differences and established patterns between genders. A methodology for unbiased feature extraction and objective analysis is presented based on data integration and machine learning explainability algorithms. Female (1511) and male (2700) data points were collected from event data categorized by game period and player position. Each data point included the main tactical variables supported by research and industry to evaluate and classify football styles and performance. We set up a supervised classification pipeline to predict the gender of each player by looking at their actions in the game. The comparison methodology did not include any qualitative enrichment or subjective analysis to prevent biased data enhancement or gender-related processing. The pipeline had three representative binary classification models; A logic-based Decision Trees, a probabilistic Logistic Regression and a multilevel perceptron Neural Network. Each model tried to draw the differences between male and female data points, and we extracted the results using machine learning explainability methods to understand the underlying mechanics of the models implemented. A good model predicting accuracy was consistent across the different models deployed. ## Installation Install the required python packages ``` pip install -r requirements.txt ``` To handle heterogeneity and performance efficiently, we use PySpark from [Apache Spark](https://spark.apache.org/). PySpark enables an end-user API for Spark jobs. You might want to check how to set up a local or remote Spark cluster in [their documentation](https://spark.apache.org/docs/latest/api/python/index.html). ## Repository structure This repository is organized as follows: - Preprocessed data from the two different data streams is collecting in [the data folder](data/). For the Opta files, it contains the event-based metrics computed from each match of the 2017 Women's Championship and a single file calculating the event-based metrics from the 2016 Men's Championship published [here](https://figshare.com/collections/Soccer_match_event_dataset/4415000/5). Even though we cannot publish the original data source, the two python scripts implemented to homogenize and integrate both data streams into event-based metrics are included in [the data gathering folder](data_gathering/) folder contains the graphical images and media used for the report. - The [data cleaning folder](data_cleaning/) contains descriptor scripts for both data streams and [the final integration](data_cleaning/merger.py) - [Classification](classification/) contains all the Jupyter notebooks for each model present in the experiment as well as some persistent models for testing.
kensuio-oss
A repo for all kind of notebooks to demo the spark notebook, spark, scala, ...
amanjeetsahu
This repo contains my learnings and practice notebooks on Spark using PySpark (Python Language API on Spark). All the notebooks in the repo can be used as template code for most of the ML algorithms and can be built upon it for more complex problems.
adamyordan
Docker image jupyter/all-spark-notebook with added Go kernel from gophernotes
azfaraziz
The installation of Jupyterhub + all-spark-notebook + Kubernetes locally
saryamane
This is the repository that will host all the notebooks hosted for Apache Spark using Scala
Archethought
Merge of Docker Stacks pyspark-notebook and all-spark-notebook updated to Spark 2.0.2
cownby
Merge of Docker Stacks pyspark-notebook and all-spark-notebook updated to Spark 2.0.2
Learn how to build an end-to-end lakehouse architecture all the way from ingestion to reporting. In this lab, learn how to ingest data into the lakehouse leveraging our Data Integration capabilities. Then, see how you can use notebooks and Spark to transform your data at scale.
datainpoint
all-spark-education-notebook is a community maintained Jupyter Docker Stack image.
scigility
Jupyter Notebook with Scala, Python, R, Spark, including SBT, based on jupyter/all-spark-notebook
CharlesDLandau
Provision, run and delete a Spark cluster all from your Jupyter notebook.
tanujit
You can find all kind of azure data engineering stuffs here such as ADF pipelines, Databricks Spark notebooks etc.
fuguixing
An all-in-one Docker image for data scientist in Jupyter Notebook. Contains Spark, Python 2, Python 3, R, Tensorflow etc.
Sweeteally
Are you ready to start your path to becoming a Data Scientist! This comprehensive course will be your guide to learning how to use the power of Python to analyze data, create beautiful visualizations, and use powerful machine learning algorithms! Data Scientist has been ranked the number one job on Glassdoor and the average salary of a data scientist is over $120,000 in the United States according to Indeed! Data Science is a rewarding career that allows you to solve some of the world's most interesting problems! This course is designed for both beginners with some programming experience or experienced developers looking to make the jump to Data Science! This comprehensive course is comparable to other Data Science bootcamps that usually cost thousands of dollars, but now you can learn all that information at a fraction of the cost! With over 100 HD video lectures and detailed code notebooks for every lecture this is one of the most comprehensive course for data science and machine learning on Udemy! We'll teach you how to program with Python, how to create amazing data visualizations, and how to use Machine Learning with Python! Here a just a few of the topics we will be learning: Programming with Python NumPy with Python Using pandas Data Frames to solve complex tasks Use pandas to handle Excel Files Web scraping with python Connect Python to SQL Use matplotlib and seaborn for data visualizations Use plotly for interactive visualizations Machine Learning with SciKit Learn, including: Linear Regression K Nearest Neighbors K Means Clustering Decision Trees Random Forests Natural Language Processing Neural Nets and Deep Learning Support Vector Machines and much, much more! Enroll in the course and become a data scientist today! Wat zijn de vereisten? Some programming experience Admin permissions to download files Wat leer ik in deze cursus? Use Python for Data Science and Machine Learning Use Spark for Big Data Analysis Implement Machine Learning Algorithms Learn to use NumPy for Numerical Data Learn to use Pandas for Data Analysis Learn to use Matplotlib for Python Plotting Learn to use Seaborn for statistical plots Use Plotly for interactive dynamic visualizations Use SciKit-Learn for Machine Learning Tasks K-Means Clustering Logistic Regression Linear Regression Random Forest and Decision Trees Natural Language Processing and Spam Filters Neural Networks Support Vector Machines Wie is het doelpubliek? This course is meant for people with at least some programming experience
joshua-staples
A tutorial for setting up and using a Docker all-spark-notebook image/container for data science work.
psyoblade
Apache Spark 실습을 위한 Docker 이미지 (AWS S3 저장)
idekernel
Jupyter Notebook Python, Scala, R, Spark, Mesos Stack
ultimoguerrero
Contains files required to create a Jupyter Notebook container with Spark and Yarn support
amavzyutov
No description available
stankiewicz
No description available
Jupyter Docker Stacks - https://jupyter-docker-stacks.readthedocs.io/en/latest/index.html
vishnuratheesh
Experiments with the Jupyter All Spark Notebook. https://hub.docker.com/r/jupyter/all-spark-notebook/
jiekebo
Minor customizations to the jupyter all-spark-notebooks docker image
Docker image that builds/extends jupyter/all-spark-notebook to include things I find useful.
Environment: Jovyans All Spark Notebook.
shadowandy
No description available
Run Jupyter Notebook Python, Scala, R, Spark
Docker Image for Data Engineering Examples