Search Results

Found 1,002 repositories(showing 30)

GlobalCoreTech-DataScience-Internship

FarhaKousar1601

🧡50

This repository contains resources, code, and projects related to the Global Core Tech Internship on Data Science with Python. Explore the world of data science with Python, learn NumPy, Pandas, Matplotlib, and EDA, and work on exciting data science projects. Elevate your skills and knowledge in the field of data analysis and visualization.

MIT

Jupyter Notebook

Updated 2 months ago

dataedaintern+2

UK-Train-Rides-

AshnaJ4991

❤️35

Railway ticket records with Python and Pandas, covering purchase type, payment, journey times, delays, and refunds. Explores pricing patterns, customer behavior, and delay reasons. Includes visualizations and stats for EDA, time series insights, and improving rail service efficiency.

Python

Updated 6 months ago

Indian-Rainfall-Data-Analysis

ajaygangwar945

🧡55

Comprehensive Rainfall EDA & Machine Learning project. Built with Python (Pandas/Scikit-learn) and a premium web dashboard using Three.js and Chart.js.

Jupyter Notebook

Updated 3 weeks ago

data-scienceedajupyter-notebook+4

chat-mining-whatsapp

avinashreddy1235

❤️45

EDA of my personal WhatsApp chat data to uncover insights like who sent the most messages, chat frequency, and usage trends. Built with Python, pandas, and matplotlib. A fun and educational project exploring communication patterns using real-world text data.

Jupyter Notebook

Updated 2 months ago

EDA-with-Pandas-Numpy-and-Python

vin725k

❤️35

In this, I have performed exploratory data analysis on datasets.

Jupyter Notebook

Updated 5 years ago

TikTok-Claims-Classification-End-to-End-Analysis-and-Modeling

Cyberoctane29

❤️45

This project involves analyzing TikTok videos to classify claims vs. opinions using Python. It includes EDA, statistical tests, logistic regression, and ML models (Random Forest, XGBoost) to support content moderation. Built with pandas, scikit-learn, and Tableau, the solution helps TikTok automate content review and enhance moderation efficiency.

Jupyter Notebook

Updated 1 month ago

content-moderationdata-analyticsdata-visualization+9

profiling

sjapanjots

❤️35

This web application is build with python streamlit and this repository helps perfrom EDA(Exploratory Data Analysis ) using pandas-profiling library in python . This web application also helps to analys the target variable using it modelling functon

Python

Updated 5 months ago

data-sciencemachine-learningnumpy+6

EDA-on-Car-Features-and-Price

Prafulbhoyar45

❤️35

Exploratory data analysis is the analysis of the data and brings out insights. It’s storytelling, a story that data is trying to tell. EDA is an approach to analyze the data with the help of various tools and graphical techniques like barplot, histogram, etc. There are many libraries available in python like pandas, NumPy, matplotlib, seaborn, etc. with the help of those we can do the analysis of the data and bring out helpful insights. I will be using Jupyter Notebook along with these libraries.

Jupyter Notebook

Updated 3 years ago

edafeature-engineeringpython

EDA-with-python-and-pandas

Mahima9861

❤️35

To perform Exploratory Data Analysis (EDA) on a supermarket sales dataset. It will be accomplised by completing each task in the project: Task 1: Initial Data Exploration Task 2: Univariate Analysis Task 3: Bivariate Analysis Task 4: Dealing With Duplicate Rows and Missing Values Task 5: Correlation Analysis

Jupyter Notebook

Updated 2 years ago

Data-Analysis

prathgithub

❤️35

Performed exploratory data analysis (EDA) and visualization using Python libraries in Google Colab. Includes data cleaning, transformation, statistical insights, and interactive visualizations with Pandas, Matplotlib, and Seaborn.

Jupyter Notebook

Updated 7 months ago

Energy-Supply-and-Renewable-Electricity-Production-Project

ZiadAhmed10

❤️35

A project with Python programming language which aims to prepare and preprocess the data by utilizing pandas and numpy packages, doing EDA by some pandas functions, figuring out what problems may i be facing through the analysis process, cleaning and preparing the data and getting some initial insights.

Jupyter Notebook

Updated 3 years ago

data-analysisdata-cleaningnumpy+2

FUNDAMENTAL-STOCKS-ANALYSIS-EDA

manishhemnani06

❤️35

Done! Scraping & Fundatmental Analysis on few stocks of stock market with the help of diffrent libraries in PYTHON like for scraping used Selenium, for EDA used Pandas and for Charts and Presentation used Matplotlib and Seaborn.

HTML

Updated 1 year ago

htmlmatplotlibpandas+3

Co2-Emission-EDA-by-Aayush

aayushdhote

❤️35

This project focused on analyzing CO₂ emissions data for the top 10 emitting countries from 2018 to 2020 with the help of some EDA techniques by using python and some libraries like Pandas ,Seaborn and Matplotlib

Jupyter Notebook

Updated 3 months ago

IPL-Data-Analysis-Case-Study

Pankaj-Str

❤️40

Explore IPL data using Python libraries like Pandas, NumPy, and Matplotlib. Analyze team and player performance, match outcomes, toss impact, and trends through visualizations and insights. A great project for data analysis and EDA practice with real-world sports data.

GPL-3.0

Jupyter Notebook

Updated 6 months ago

Machine-Learning-Portfolio-Project-Loan-Approval-Prediction-Credit-Risk-Analysis

24pwai0032-gif

🧡50

A Machine learning project for automated loan approval prediction in banking. Built with Python, scikit-learn, and pandas. Features comprehensive EDA, 4 ML models (Random Forest, Decision Tree, KNN, Logistic Regression), achieving 92%+ ROC-AUC. Includes model interpretability, business insights, and production-ready code. Portfolio project.

MIT

Jupyter Notebook

Updated 1 month ago

Uber-case-study

vnsgamer

❤️35

Introduction : This data set is a masked data set which is similar to what data analysts at Uber handle. Solving this assignment will give you an idea about how problems are systematically solved using EDA and data visualisation. Business Understanding : You may have some experience of travelling to and from the airport. Have you ever used Uber or any other cab service for this travel? Did you at any time face the problem of cancellation by the driver or non-availability of cars? Well, if these are the problems faced by customers, these very issues also impact the business of Uber. If drivers cancel the request of riders or if cars are unavailable, Uber loses out on its revenue. As an analyst, you decide to address the problem Uber is facing - driver cancellation and non-availability of cars leading to loss of potential revenue. Business Objectives : The aim of analysis is to identify the root cause of the problem (i.e. cancellation and non-availability of cars) and recommend ways to improve the situation. As a result of your analysis, you should be able to present to the client the root cause(s) and possible hypotheses of the problem(s) and recommend ways to improve them. There are six attributes associated with each request made by a customer: 1. Request id: A unique identifier of the request 2. Time of request: The date and time at which the customer made the trip request 3. Drop-off time: The drop-off date and time, in case the trip was completed 4. Pick-up point: The point from which the request was made 5. Driver id: The unique identification number of the driver 6. Status of the request: The final status of the trip, that can be either completed, cancelled by the driver or no cars available Note: For this assignment, only the trips to and from the airport are being considered. Results Expected : 1. Visually identify the most pressing problems for Uber. Hint: Create plots to visualise the frequency of requests that get cancelled or show 'no cars available'; identify the most problematic types of requests (city to airport / airport to city etc.) and the time slots (early mornings, late evenings etc.) using plots. 2. Find out the gap between supply and demand and show the same using plots. a. Find the time slots when the highest gap exists b. Find the types of requests (city-airport or airport-city) for which the gap is the most severe in the identified time slots 3. What do you think is the reason for this issue for the supply-demand gap? Write the answer in less than 100 words. You may accompany the write-up with plot(s). 4. Recommend some ways to resolve the supply-demand gap. IDE : jupyter notebook Language : Python Libraries : Numpy, Pandas, Matplotlib, Seaborn Please do explore the dataset further to your own and see what kind of other insights you can get across various other columns.

Jupyter Notebook

Updated 5 months ago

Predicting-car-accidents-report.lpynb

amimba09

❤️35

ANALYZING ROAD SAFETY & TRAFFIC DEMOGRAPHICS IN THE UK (Multi-class Classification) SUMMARY Here, I am aim to analyze the Road Safety and Traffic Demographics dataset (UK), containing accidents reported by the police between the years of 2004 - 2017. PROJECT GOALS: Identify factors responsible for most of the reported accidents. Build a machine learning model that is capable of accurately predicting the severity of an accident. Provide recommendations to the Department of Transport (UK Government), to improve road safety policies and prevent recurrences of severe accidents where possible. PACKAGES USED: Scikit-learn, numpy, pandas, imblearn (imbalanced-learn), seaborn, Matplotlib MOTIVATION World Health Organization (WHO) reported that more than 1.25 million people die each year while 50 million are injured as a result of road accidents worldwide. Road accidents are the 10th leading cause of death globally. On current trends, road traffic accidents are to become the 7th leading cause of death by 2030 making it a major public health concern. Between the years 2005 and 2016, there were roughly 2 million road accidents reported in the United Kingdom (UK) alone of which 16,000 were fatal. As a big data project, I wanted to explore the traffic demographics data in greater detail using machine learning! CONTEXT The UK government amassed traffic data from 2004 to 2017, recording over 2 million accidents in the process and making this one of the most comprehensive traffic data sets out there. It's a huge picture of a country undergoing change. Note that all the contained accident data comes from police reports, so this data does not include minor incidents. For steps undertaken to pre-process and clean the data, please view the "Data Cleansing & Descriptive Analysis_UK Traffic Demographics.ipynb" file DESCRIPTIVE ANALYTICS (EDA) Tools used include Python, Tableau, MS PowerBI Percent (%) distribution of target classes Percent dist of Accident Severity As seen above, the data is highly imbalanced. For detailed steps undertaken to deal with the imbalanced data, please view the "Modelling_Predictive Analytics_UK Traffic Demographics.ipynb" file. This article provides some great tips on utilizing the correct performance metrics when analyzing a models performance trained on an imbalanced dataset. This article describes several strategies that can help combat the case of a severly imbalanced dataset. Methods include: Resampling strategies (under - Tomek Links, Cluster Centroids, over sampling - SMOTE) Using Decision Tree based models Using Cost-Sensitive training (Penalize algorithms) Number of accidents by Year and Accident Severity Total accidents by year and severity It can be seen above that the trend seems to be increasing as the years go. In addition, the spike between 2008 - 2009 was because of a enhancement in the reporting system introduced in the UK in 2009, where all accident including minor accidents needed to be reported by the police so as to match the counts represented by hospitals, insurance claims etc. Accidents density by Location geomap Most accidents took place in major cities - Birmingham, London, leeds, Newcastle Accidents by Gender and Age Accidents by gender and age Accidents by Day of the week and Year Accidents by year and weekday Most accidents take place on a Friday Vehicle Manoever at time of accident Vehicle Manoever at time of accident Most accidents take place as a result of overtaking For more findings, please go to the "Images" folder. For steps undertaken to carry out some predictive modeling and hyper-parameter tuning, please view the "Modelling_Predictive Analytics_UK Traffic Demographics.ipynb" file. RECOMMENDATIONS TO THE DEPARTMENT OF TRANSPORT (UK) Decrease emergency response times during afternoon rush-hours (15-19) especially on Fridays. Allocate resources to investigate high density traffic points and identify new infrastructure needs to divert traffic from dual-carriage ways. Explore conditions of vehicles and casualties such as vehicle type, age of vehicles registered, pedestrian movements, etc. for policy makers. Adopt comprehensive distracted driving laws that increase penalties for drivers who commit traffic violations like aggressive overtaking. ACKNOWLEDGEMENTS The license for this dataset is the Open Givernment Licence used by all data on data.gov.uk. The raw datasets are available from the UK Department of Transport website. I had a lot of fun working on this dataset and learned a lot in the process. I plan to further my research in the area of predictive modeling using imabalanced data and how to effectively build a highly robust model for future projects. About Here, I analyze the Road Safety and Traffic Demographics dataset (UK), containing accidents reported by the police between the years of 2004 - 2017. Topics accident-rate accident-severity imbalanced-data imbalanced-learning road-accident reported-accidents road-safety uk-government transport traffic-demographics severe-accidents pca classification Resources Readme Releases No releases published Packages No packages published Languages Jupyter Notebook 100.0% © 2020 GitHub, Inc.

Updated 1 year ago

EDA-With-Python-and-Pandas

Muhammad-Rebaal

❤️35

Explore, Analyze, and Visualize Data with Python and Pandas: Dive into the world of data analysis and visualization using Python and Pandas with this repository. Discover valuable insights, clean and preprocess your data, and create stunning visualizations to make data-driven decisions.

Jupyter Notebook

Updated 2 years ago

calmapmatplotlibnumpy+4

Data-Science-Learning

ggauravky

🧡60

A structured learning repository for Data Science using Python. Covers Data Cleaning, EDA, and visualization with Pandas, NumPy, Matplotlib, and Seaborn and more

MIT

Jupyter Notebook

Updated 3 weeks ago

data-sciencejupyter-notebookmatplotlib+4

Exploratory-Data-Analysis-on-Employee-Attrition

Ahmed-M-Fayad

❤️35

EDA on Employee Attrition Dataset: This repository includes data cleaning, feature engineering, visualizations, and analysis of key factors influencing employee turnover, with raw and cleaned datasets, a Jupyter notebook, and Python scripts. Tools used: Python, Pandas, Matplotlib, Seaborn.

Jupyter Notebook

Updated 1 year ago

US-accidents-eda

Mahii0107

❤️45

An interactive Streamlit dashboard analyzing US traffic accident patterns across time, weather, and severity levels. Features comprehensive EDA with visualizations exploring temporal trends, light conditions, and weather impact on accident severity. Built with Python, Pandas, and Seaborn.

Jupyter Notebook

Updated 2 months ago

StockForecastHub

HamsaVardhiniS

❤️45

This project forecasts stock prices using Yahoo Finance data and time-series models (ARIMA, SARIMA, XGBoost, LSTM). It features EDA, visualization, and an interactive Streamlit app. Built with Python, Pandas, Scikit-learn, and Plotly.

Python

Updated 2 months ago

IPL-Data-Analysis

868Rahul

💛70

End-to-end IPL analytics and machine learning project with data cleaning, EDA, feature engineering, Gradient Boosting model, and Streamlit app to predict 50+ scores. Built using Python, Pandas, Scikit-learn, Matplotlib, and Seaborn.

MIT

Jupyter Notebook

Updated 11 hours ago

Pristinizer-pyProject

harmanbajwa2954

🧡60

Pristinizer is a lightweight Python package for automatic data cleaning, exploratory data analysis (EDA), and missing data visualization for pandas DataFrames. It helps data scientists and ML engineers quickly clean and understand datasets with minimal effort.

MIT

Python

Updated 3 weeks ago

datadata-sciencedata-visualization+6

Project_EDA

IagoG7

❤️35

Exploratory Data Analysis (EDA) of fencing competitions. Using Python, pandas, and matplotlib to uncover patterns in performance, ranking, and event dynamics. This project merges data science skills with my personal experience as a national-level fencer.

Updated 7 months ago

footwear-analysis

AnakhaBiju7

❤️40

Analyze footwear e-commerce data with EDA, predictive modeling (XGBoost), clustering (K-Means), anomaly detection, fairness analysis, and association rule mining using Python. Utilizes Pandas, Scikit-learn, SHAP, and MLxtend to uncover pricing trends, product insights, and market patterns for retailers and analysts.

Apache-2.0

Python

Updated 4 months ago

Covid-19_System_and_Analysis

HarshKothari21

❤️35

Analysis of COVID-19 data using python library pandas, matplotlib, beautiful soap, request, and seaborn. Build-ed Real-time notification system. You can view/edit my notebook from Kaggle: https://www.kaggle.com/harshkothari21/eda-on-covid-19-with-python?scriptVersionId=38870107

Jupyter Notebook

Updated 5 years ago

analysisbeautifulsoupeda+4

Student_Performance_Analysis

MariamMohamed20

❤️45

Statistical data analysis of 10,000 student records examining academic performance factors. Includes data cleaning, EDA, feature engineering, hypothesis testing, correlation analysis, and PCA. Demonstrates complete data science workflow with Python, Pandas, and statistical methods to provide actionable educational insights.

Jupyter Notebook

Updated 2 months ago

Zara-Sales-EDA

Atharvraj893

❤️35

Exploratory Data Analysis (EDA) of Zara's sales dataset to uncover trends, seasonality, and customer behavior. Includes data cleaning, visualization, and insights on product performance, store dynamics, and time-based patterns. Built using Python (pandas, matplotlib, seaborn) with a focus on reproducibility and clarity.

Jupyter Notebook

Updated 5 months ago

Fuel-Efficiency-Prediction-to-Reduce-Carbon-Emmision

Aaditya-Mishra1

❤️35

🚗 Fuel Efficiency Prediction 🚀 Predict vehicle fuel efficiency using machine learning! 📊 This project includes data preprocessing, EDA, model training, and evaluation with Python, Pandas, NumPy, Matplotlib, Seaborn, TensorFlow, and Keras. Clone, install dependencies, and run the Jupyter Notebook for insights.

Jupyter Notebook

Updated 7 months ago

GitHub Explorer

Search Results

GlobalCoreTech-DataScience-Internship

UK-Train-Rides-

Indian-Rainfall-Data-Analysis

chat-mining-whatsapp

EDA-with-Pandas-Numpy-and-Python

TikTok-Claims-Classification-End-to-End-Analysis-and-Modeling

profiling

EDA-on-Car-Features-and-Price

EDA-with-python-and-pandas

Data-Analysis

Energy-Supply-and-Renewable-Electricity-Production-Project

FUNDAMENTAL-STOCKS-ANALYSIS-EDA

Co2-Emission-EDA-by-Aayush

IPL-Data-Analysis-Case-Study

Machine-Learning-Portfolio-Project-Loan-Approval-Prediction-Credit-Risk-Analysis

Uber-case-study

Predicting-car-accidents-report.lpynb

EDA-With-Python-and-Pandas

Data-Science-Learning

Exploratory-Data-Analysis-on-Employee-Attrition

US-accidents-eda

StockForecastHub

IPL-Data-Analysis

Pristinizer-pyProject

Project_EDA

footwear-analysis

Covid-19_System_and_Analysis

Student_Performance_Analysis

Zara-Sales-EDA

Fuel-Efficiency-Prediction-to-Reduce-Carbon-Emmision

GlobalCoreTech-DataScience-Internship

UK-Train-Rides-

Indian-Rainfall-Data-Analysis

chat-mining-whatsapp

EDA-with-Pandas-Numpy-and-Python

TikTok-Claims-Classification-End-to-End-Analysis-and-Modeling

profiling

EDA-on-Car-Features-and-Price

EDA-with-python-and-pandas

Data-Analysis

Energy-Supply-and-Renewable-Electricity-Production-Project

FUNDAMENTAL-STOCKS-ANALYSIS-EDA

Co2-Emission-EDA-by-Aayush

IPL-Data-Analysis-Case-Study

Machine-Learning-Portfolio-Project-Loan-Approval-Prediction-Credit-Risk-Analysis

Uber-case-study

Predicting-car-accidents-report.lpynb

EDA-With-Python-and-Pandas

Data-Science-Learning

Exploratory-Data-Analysis-on-Employee-Attrition

US-accidents-eda

StockForecastHub

IPL-Data-Analysis

Pristinizer-pyProject

Project_EDA

footwear-analysis

Covid-19_System_and_Analysis

Student_Performance_Analysis

Zara-Sales-EDA

Fuel-Efficiency-Prediction-to-Reduce-Carbon-Emmision