Found 1,002 repositories(showing 30)
FarhaKousar1601
This repository contains resources, code, and projects related to the Global Core Tech Internship on Data Science with Python. Explore the world of data science with Python, learn NumPy, Pandas, Matplotlib, and EDA, and work on exciting data science projects. Elevate your skills and knowledge in the field of data analysis and visualization.
AshnaJ4991
Railway ticket records with Python and Pandas, covering purchase type, payment, journey times, delays, and refunds. Explores pricing patterns, customer behavior, and delay reasons. Includes visualizations and stats for EDA, time series insights, and improving rail service efficiency.
ajaygangwar945
Comprehensive Rainfall EDA & Machine Learning project. Built with Python (Pandas/Scikit-learn) and a premium web dashboard using Three.js and Chart.js.
avinashreddy1235
EDA of my personal WhatsApp chat data to uncover insights like who sent the most messages, chat frequency, and usage trends. Built with Python, pandas, and matplotlib. A fun and educational project exploring communication patterns using real-world text data.
In this, I have performed exploratory data analysis on datasets.
This project involves analyzing TikTok videos to classify claims vs. opinions using Python. It includes EDA, statistical tests, logistic regression, and ML models (Random Forest, XGBoost) to support content moderation. Built with pandas, scikit-learn, and Tableau, the solution helps TikTok automate content review and enhance moderation efficiency.
sjapanjots
This web application is build with python streamlit and this repository helps perfrom EDA(Exploratory Data Analysis ) using pandas-profiling library in python . This web application also helps to analys the target variable using it modelling functon
Prafulbhoyar45
Exploratory data analysis is the analysis of the data and brings out insights. It’s storytelling, a story that data is trying to tell. EDA is an approach to analyze the data with the help of various tools and graphical techniques like barplot, histogram, etc. There are many libraries available in python like pandas, NumPy, matplotlib, seaborn, etc. with the help of those we can do the analysis of the data and bring out helpful insights. I will be using Jupyter Notebook along with these libraries.
Mahima9861
To perform Exploratory Data Analysis (EDA) on a supermarket sales dataset. It will be accomplised by completing each task in the project: Task 1: Initial Data Exploration Task 2: Univariate Analysis Task 3: Bivariate Analysis Task 4: Dealing With Duplicate Rows and Missing Values Task 5: Correlation Analysis
prathgithub
Performed exploratory data analysis (EDA) and visualization using Python libraries in Google Colab. Includes data cleaning, transformation, statistical insights, and interactive visualizations with Pandas, Matplotlib, and Seaborn.
A project with Python programming language which aims to prepare and preprocess the data by utilizing pandas and numpy packages, doing EDA by some pandas functions, figuring out what problems may i be facing through the analysis process, cleaning and preparing the data and getting some initial insights.
manishhemnani06
Done! Scraping & Fundatmental Analysis on few stocks of stock market with the help of diffrent libraries in PYTHON like for scraping used Selenium, for EDA used Pandas and for Charts and Presentation used Matplotlib and Seaborn.
aayushdhote
This project focused on analyzing CO₂ emissions data for the top 10 emitting countries from 2018 to 2020 with the help of some EDA techniques by using python and some libraries like Pandas ,Seaborn and Matplotlib
Pankaj-Str
Explore IPL data using Python libraries like Pandas, NumPy, and Matplotlib. Analyze team and player performance, match outcomes, toss impact, and trends through visualizations and insights. A great project for data analysis and EDA practice with real-world sports data.
A Machine learning project for automated loan approval prediction in banking. Built with Python, scikit-learn, and pandas. Features comprehensive EDA, 4 ML models (Random Forest, Decision Tree, KNN, Logistic Regression), achieving 92%+ ROC-AUC. Includes model interpretability, business insights, and production-ready code. Portfolio project.
vnsgamer
Introduction : This data set is a masked data set which is similar to what data analysts at Uber handle. Solving this assignment will give you an idea about how problems are systematically solved using EDA and data visualisation. Business Understanding : You may have some experience of travelling to and from the airport. Have you ever used Uber or any other cab service for this travel? Did you at any time face the problem of cancellation by the driver or non-availability of cars? Well, if these are the problems faced by customers, these very issues also impact the business of Uber. If drivers cancel the request of riders or if cars are unavailable, Uber loses out on its revenue. As an analyst, you decide to address the problem Uber is facing - driver cancellation and non-availability of cars leading to loss of potential revenue. Business Objectives : The aim of analysis is to identify the root cause of the problem (i.e. cancellation and non-availability of cars) and recommend ways to improve the situation. As a result of your analysis, you should be able to present to the client the root cause(s) and possible hypotheses of the problem(s) and recommend ways to improve them. There are six attributes associated with each request made by a customer: 1. Request id: A unique identifier of the request 2. Time of request: The date and time at which the customer made the trip request 3. Drop-off time: The drop-off date and time, in case the trip was completed 4. Pick-up point: The point from which the request was made 5. Driver id: The unique identification number of the driver 6. Status of the request: The final status of the trip, that can be either completed, cancelled by the driver or no cars available Note: For this assignment, only the trips to and from the airport are being considered. Results Expected : 1. Visually identify the most pressing problems for Uber. Hint: Create plots to visualise the frequency of requests that get cancelled or show 'no cars available'; identify the most problematic types of requests (city to airport / airport to city etc.) and the time slots (early mornings, late evenings etc.) using plots. 2. Find out the gap between supply and demand and show the same using plots. a. Find the time slots when the highest gap exists b. Find the types of requests (city-airport or airport-city) for which the gap is the most severe in the identified time slots 3. What do you think is the reason for this issue for the supply-demand gap? Write the answer in less than 100 words. You may accompany the write-up with plot(s). 4. Recommend some ways to resolve the supply-demand gap. IDE : jupyter notebook Language : Python Libraries : Numpy, Pandas, Matplotlib, Seaborn Please do explore the dataset further to your own and see what kind of other insights you can get across various other columns.
ANALYZING ROAD SAFETY & TRAFFIC DEMOGRAPHICS IN THE UK (Multi-class Classification) SUMMARY Here, I am aim to analyze the Road Safety and Traffic Demographics dataset (UK), containing accidents reported by the police between the years of 2004 - 2017. PROJECT GOALS: Identify factors responsible for most of the reported accidents. Build a machine learning model that is capable of accurately predicting the severity of an accident. Provide recommendations to the Department of Transport (UK Government), to improve road safety policies and prevent recurrences of severe accidents where possible. PACKAGES USED: Scikit-learn, numpy, pandas, imblearn (imbalanced-learn), seaborn, Matplotlib MOTIVATION World Health Organization (WHO) reported that more than 1.25 million people die each year while 50 million are injured as a result of road accidents worldwide. Road accidents are the 10th leading cause of death globally. On current trends, road traffic accidents are to become the 7th leading cause of death by 2030 making it a major public health concern. Between the years 2005 and 2016, there were roughly 2 million road accidents reported in the United Kingdom (UK) alone of which 16,000 were fatal. As a big data project, I wanted to explore the traffic demographics data in greater detail using machine learning! CONTEXT The UK government amassed traffic data from 2004 to 2017, recording over 2 million accidents in the process and making this one of the most comprehensive traffic data sets out there. It's a huge picture of a country undergoing change. Note that all the contained accident data comes from police reports, so this data does not include minor incidents. For steps undertaken to pre-process and clean the data, please view the "Data Cleansing & Descriptive Analysis_UK Traffic Demographics.ipynb" file DESCRIPTIVE ANALYTICS (EDA) Tools used include Python, Tableau, MS PowerBI Percent (%) distribution of target classes Percent dist of Accident Severity As seen above, the data is highly imbalanced. For detailed steps undertaken to deal with the imbalanced data, please view the "Modelling_Predictive Analytics_UK Traffic Demographics.ipynb" file. This article provides some great tips on utilizing the correct performance metrics when analyzing a models performance trained on an imbalanced dataset. This article describes several strategies that can help combat the case of a severly imbalanced dataset. Methods include: Resampling strategies (under - Tomek Links, Cluster Centroids, over sampling - SMOTE) Using Decision Tree based models Using Cost-Sensitive training (Penalize algorithms) Number of accidents by Year and Accident Severity Total accidents by year and severity It can be seen above that the trend seems to be increasing as the years go. In addition, the spike between 2008 - 2009 was because of a enhancement in the reporting system introduced in the UK in 2009, where all accident including minor accidents needed to be reported by the police so as to match the counts represented by hospitals, insurance claims etc. Accidents density by Location geomap Most accidents took place in major cities - Birmingham, London, leeds, Newcastle Accidents by Gender and Age Accidents by gender and age Accidents by Day of the week and Year Accidents by year and weekday Most accidents take place on a Friday Vehicle Manoever at time of accident Vehicle Manoever at time of accident Most accidents take place as a result of overtaking For more findings, please go to the "Images" folder. For steps undertaken to carry out some predictive modeling and hyper-parameter tuning, please view the "Modelling_Predictive Analytics_UK Traffic Demographics.ipynb" file. RECOMMENDATIONS TO THE DEPARTMENT OF TRANSPORT (UK) Decrease emergency response times during afternoon rush-hours (15-19) especially on Fridays. Allocate resources to investigate high density traffic points and identify new infrastructure needs to divert traffic from dual-carriage ways. Explore conditions of vehicles and casualties such as vehicle type, age of vehicles registered, pedestrian movements, etc. for policy makers. Adopt comprehensive distracted driving laws that increase penalties for drivers who commit traffic violations like aggressive overtaking. ACKNOWLEDGEMENTS The license for this dataset is the Open Givernment Licence used by all data on data.gov.uk. The raw datasets are available from the UK Department of Transport website. I had a lot of fun working on this dataset and learned a lot in the process. I plan to further my research in the area of predictive modeling using imabalanced data and how to effectively build a highly robust model for future projects. About Here, I analyze the Road Safety and Traffic Demographics dataset (UK), containing accidents reported by the police between the years of 2004 - 2017. Topics accident-rate accident-severity imbalanced-data imbalanced-learning road-accident reported-accidents road-safety uk-government transport traffic-demographics severe-accidents pca classification Resources Readme Releases No releases published Packages No packages published Languages Jupyter Notebook 100.0% © 2020 GitHub, Inc.
Muhammad-Rebaal
Explore, Analyze, and Visualize Data with Python and Pandas: Dive into the world of data analysis and visualization using Python and Pandas with this repository. Discover valuable insights, clean and preprocess your data, and create stunning visualizations to make data-driven decisions.
ggauravky
A structured learning repository for Data Science using Python. Covers Data Cleaning, EDA, and visualization with Pandas, NumPy, Matplotlib, and Seaborn and more
Ahmed-M-Fayad
EDA on Employee Attrition Dataset: This repository includes data cleaning, feature engineering, visualizations, and analysis of key factors influencing employee turnover, with raw and cleaned datasets, a Jupyter notebook, and Python scripts. Tools used: Python, Pandas, Matplotlib, Seaborn.
Mahii0107
An interactive Streamlit dashboard analyzing US traffic accident patterns across time, weather, and severity levels. Features comprehensive EDA with visualizations exploring temporal trends, light conditions, and weather impact on accident severity. Built with Python, Pandas, and Seaborn.
HamsaVardhiniS
This project forecasts stock prices using Yahoo Finance data and time-series models (ARIMA, SARIMA, XGBoost, LSTM). It features EDA, visualization, and an interactive Streamlit app. Built with Python, Pandas, Scikit-learn, and Plotly.
868Rahul
End-to-end IPL analytics and machine learning project with data cleaning, EDA, feature engineering, Gradient Boosting model, and Streamlit app to predict 50+ scores. Built using Python, Pandas, Scikit-learn, Matplotlib, and Seaborn.
harmanbajwa2954
Pristinizer is a lightweight Python package for automatic data cleaning, exploratory data analysis (EDA), and missing data visualization for pandas DataFrames. It helps data scientists and ML engineers quickly clean and understand datasets with minimal effort.
IagoG7
Exploratory Data Analysis (EDA) of fencing competitions. Using Python, pandas, and matplotlib to uncover patterns in performance, ranking, and event dynamics. This project merges data science skills with my personal experience as a national-level fencer.
AnakhaBiju7
Analyze footwear e-commerce data with EDA, predictive modeling (XGBoost), clustering (K-Means), anomaly detection, fairness analysis, and association rule mining using Python. Utilizes Pandas, Scikit-learn, SHAP, and MLxtend to uncover pricing trends, product insights, and market patterns for retailers and analysts.
HarshKothari21
Analysis of COVID-19 data using python library pandas, matplotlib, beautiful soap, request, and seaborn. Build-ed Real-time notification system. You can view/edit my notebook from Kaggle: https://www.kaggle.com/harshkothari21/eda-on-covid-19-with-python?scriptVersionId=38870107
MariamMohamed20
Statistical data analysis of 10,000 student records examining academic performance factors. Includes data cleaning, EDA, feature engineering, hypothesis testing, correlation analysis, and PCA. Demonstrates complete data science workflow with Python, Pandas, and statistical methods to provide actionable educational insights.
Atharvraj893
Exploratory Data Analysis (EDA) of Zara's sales dataset to uncover trends, seasonality, and customer behavior. Includes data cleaning, visualization, and insights on product performance, store dynamics, and time-based patterns. Built using Python (pandas, matplotlib, seaborn) with a focus on reproducibility and clarity.
Aaditya-Mishra1
🚗 Fuel Efficiency Prediction 🚀 Predict vehicle fuel efficiency using machine learning! 📊 This project includes data preprocessing, EDA, model training, and evaluation with Python, Pandas, NumPy, Matplotlib, Seaborn, TensorFlow, and Keras. Clone, install dependencies, and run the Jupyter Notebook for insights.