Find a data set from awesome public datasets which interests you. The data set must have at least 30 rows, and 8 or more attributes (columns). At least 5 of the attributes must be numerical as opposed to categorical (labels). Read through "https://www.itl.nist.gov/div898/handbook/eda/eda.htm". Produce a jupyter notebooks (which you will submit here) which does a throughout EDA. You should have 1 dimensional histograms of each variable of the data. If the data is categorical they will be bar charts. In the accompanied text, report on max, min, mean, median, and outliers. Put all the variables together with a single box plot so we can see them side by side (in addion to the individual histograms). Next show the pair-wise relationships with a matrix of scatter plots for the numerical data only. How would you show the catagorical vs categorical data? Aggregate one varaible vs another. How would you show the relationship between categorical and numerical data (color).
Stars
2
Forks
0
Watchers
2
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
1
commits