Found 185 repositories(showing 30)
saizhang1
There are several exploratory data analysis (EDA) analyzes in this file. More data analytics and business approached than machine learning.
ANALYZING ROAD SAFETY & TRAFFIC DEMOGRAPHICS IN THE UK (Multi-class Classification) SUMMARY Here, I am aim to analyze the Road Safety and Traffic Demographics dataset (UK), containing accidents reported by the police between the years of 2004 - 2017. PROJECT GOALS: Identify factors responsible for most of the reported accidents. Build a machine learning model that is capable of accurately predicting the severity of an accident. Provide recommendations to the Department of Transport (UK Government), to improve road safety policies and prevent recurrences of severe accidents where possible. PACKAGES USED: Scikit-learn, numpy, pandas, imblearn (imbalanced-learn), seaborn, Matplotlib MOTIVATION World Health Organization (WHO) reported that more than 1.25 million people die each year while 50 million are injured as a result of road accidents worldwide. Road accidents are the 10th leading cause of death globally. On current trends, road traffic accidents are to become the 7th leading cause of death by 2030 making it a major public health concern. Between the years 2005 and 2016, there were roughly 2 million road accidents reported in the United Kingdom (UK) alone of which 16,000 were fatal. As a big data project, I wanted to explore the traffic demographics data in greater detail using machine learning! CONTEXT The UK government amassed traffic data from 2004 to 2017, recording over 2 million accidents in the process and making this one of the most comprehensive traffic data sets out there. It's a huge picture of a country undergoing change. Note that all the contained accident data comes from police reports, so this data does not include minor incidents. For steps undertaken to pre-process and clean the data, please view the "Data Cleansing & Descriptive Analysis_UK Traffic Demographics.ipynb" file DESCRIPTIVE ANALYTICS (EDA) Tools used include Python, Tableau, MS PowerBI Percent (%) distribution of target classes Percent dist of Accident Severity As seen above, the data is highly imbalanced. For detailed steps undertaken to deal with the imbalanced data, please view the "Modelling_Predictive Analytics_UK Traffic Demographics.ipynb" file. This article provides some great tips on utilizing the correct performance metrics when analyzing a models performance trained on an imbalanced dataset. This article describes several strategies that can help combat the case of a severly imbalanced dataset. Methods include: Resampling strategies (under - Tomek Links, Cluster Centroids, over sampling - SMOTE) Using Decision Tree based models Using Cost-Sensitive training (Penalize algorithms) Number of accidents by Year and Accident Severity Total accidents by year and severity It can be seen above that the trend seems to be increasing as the years go. In addition, the spike between 2008 - 2009 was because of a enhancement in the reporting system introduced in the UK in 2009, where all accident including minor accidents needed to be reported by the police so as to match the counts represented by hospitals, insurance claims etc. Accidents density by Location geomap Most accidents took place in major cities - Birmingham, London, leeds, Newcastle Accidents by Gender and Age Accidents by gender and age Accidents by Day of the week and Year Accidents by year and weekday Most accidents take place on a Friday Vehicle Manoever at time of accident Vehicle Manoever at time of accident Most accidents take place as a result of overtaking For more findings, please go to the "Images" folder. For steps undertaken to carry out some predictive modeling and hyper-parameter tuning, please view the "Modelling_Predictive Analytics_UK Traffic Demographics.ipynb" file. RECOMMENDATIONS TO THE DEPARTMENT OF TRANSPORT (UK) Decrease emergency response times during afternoon rush-hours (15-19) especially on Fridays. Allocate resources to investigate high density traffic points and identify new infrastructure needs to divert traffic from dual-carriage ways. Explore conditions of vehicles and casualties such as vehicle type, age of vehicles registered, pedestrian movements, etc. for policy makers. Adopt comprehensive distracted driving laws that increase penalties for drivers who commit traffic violations like aggressive overtaking. ACKNOWLEDGEMENTS The license for this dataset is the Open Givernment Licence used by all data on data.gov.uk. The raw datasets are available from the UK Department of Transport website. I had a lot of fun working on this dataset and learned a lot in the process. I plan to further my research in the area of predictive modeling using imabalanced data and how to effectively build a highly robust model for future projects. About Here, I analyze the Road Safety and Traffic Demographics dataset (UK), containing accidents reported by the police between the years of 2004 - 2017. Topics accident-rate accident-severity imbalanced-data imbalanced-learning road-accident reported-accidents road-safety uk-government transport traffic-demographics severe-accidents pca classification Resources Readme Releases No releases published Packages No packages published Languages Jupyter Notebook 100.0% © 2020 GitHub, Inc.
mascarenhasneil
This is Final Capstone Project for ALY6040 Data Mining Fall 2021 CPS. Primarily to learn Data Analytics, Data Mining and Python. Residential and commercial properties were assessed in Boston. The Boston Globe reported in May 2021 that the competitive Boston housing market drives up costs. As the pandemic continues, people demand larger homes. Finding a home became more difficult as most property managers and realtors could not display their properties to several people. This post was written to help individuals, realtors, and real estate brokers find a property at a reasonable price. We selected to use a few basic machine learning concepts to help us determine the best selling price for the house based on the amount of rooms, location, design, and other characteristics about the bath and kitchen. We only focused on residential property because it was in demand. This study's goal was to improve on initial EDA work by constructing predictive models that solved our business concerns. Finally, optimizing the model's performance.
I-AM-PRASHANT-VERMA
Data analytics and business Intelligence. / Module 2 Numerical Programming in Python / EDA - Capstone Project - AirBnb Bookings Analysis
mamathasri-22
Data Analytics Portfolio - Aspiring Data Analyst specializing in Python, SQL, and Data Visualization. Explore my projects in predictive modeling, EDA, and automation.
Marco-barthem
Customer segmentation using K-Means (Python) + Business Dashboard in Power BI. Full project with EDA, clustering, insights and real-world analytics.
Kaiyang-Liu
A business analytics project about "business insight of Airbnb in Fenway, Boston" and some Basic python exercise projects related to data ingestion, EDA, and data visualization.
srquieng
Projects from my Post Graduate Program in Data Science & Business Analytics at UT Austin (McCombs). Covers classification, regression, A/B testing, clustering, and EDA using Python.
Sak12345641
Ongoing Summer Training 2025 in Data Analytics. Learning data cleaning (Excel, Python), SQL, data visualization, and EDA. Projects and assignments will be updated as the training progresses. Gaining hands-on experience with real-world datasets and analytics tools.
priyankadatacodes
End-to-end data analytics project for Superstore retail sales. Includes data cleaning in Excel, SQL data warehousing, Python EDA, visualization, and actionable business insights for sales, profit, and discount optimization.
gudlaakanksha011
🚀 Data Analyst | Building Real-World Analytics Solutions I design end-to-end data analytics projects focused on solving business problems — from inventory management and expiry alerts to sales insights and recommendation systems. 📊 Skilled in Python, Pandas, ETL, EDA, and dashboarding (Streamlit & Tableau, Power bi ,Excel)
POOJAKHAIRE-04
Python-based data analysis project on Superstore USA sales dataset. Includes data cleaning, EDA, and visualization to uncover trends in sales, profit, and customer segments. Demonstrates data analytics skills using Python libraries like Pandas, Matplotlib, and Seaborn for actionable business insights.
A comprehensive portfolio of diverse Machine Learning and Data Science projects. Demonstrates end-to-end proficiency in data acquisition, preprocessing, EDA, model building, and evaluation. Covers NLP, time-series, predictive analytics, and database querying using Python libraries and SQL.
SachinRamesh024
📊 Completed Data Science & Analytics Internship Project at DevelopersHub Corporation – Includes 5 hands-on tasks in data cleaning, EDA 📈, classification 🤖, regression 📉, and model evaluation ✅ using Python, pandas, seaborn, matplotlib & scikit-learn.
Maheshkolakar
Data analytics project using Pandas and NumPy to detect intrusions in CAN bus systems. Includes data preprocessing, EDA, and ML-based classification. Built as a fresher to showcase skills in Python, data analysis, and real-world automotive cybersecurity use cases.
sachanlabs
Built an E-Commerce Sales Analytics project in Python analyzing 1,000 orders across 7 categories and 15 Indian states. Performed EDA using Pandas (groupby, agg, corr), computed KPIs, and created an 8-chart Matplotlib dashboard.
DimpyBangoriya
Welcome to my Data Analytics repository! This repository showcases my projects in data analysis, visualization, leveraging tools like SQL, Python, Excel, Tableau, and Power BI. Each project focuses on extracting meaningful insights from raw data using exploratory data analysis (EDA), dashboard creation, and data modeling.
An end-to-end T20 World Cup cricket data analytics project using Python, Pandas, and Power BI. Web scrape ESPN Cricinfo, analyze with Python & pandas, visualize insights with Power BI. Gain hands-on experience in web scraping, data cleaning, EDA, stats analysis, and visualization.
AamirAhamed07
An end-to-end data analytics project that transforms raw customer data into meaningful business insights. It includes loading data in Python, performing EDA and cleaning, storing data in MySQL, running SQL queries, and building an interactive Power BI dashboard for data-driven decision-making.
This project, completed for the Rochester Institute of Technology (RIT) and the Excelerate Early Internship, focuses on student engagement analytics using data preprocessing, exploratory data analysis (EDA), predictive modeling, and a recommendation system. Implemented in Python, with IDF vectorization and cosine similarity.
impratikpati-web
End-to-end data analytics project analyzing regional sales data (2021–2025) to identify revenue drivers, profit trends, seasonal patterns, and regional performance. Performed advanced EDA in Python and built an interactive Power BI dashboard to support pricing, channel mix, and growth strategy decisions.
Muhammad-Saad-Ali5491
This project provides standalone Python scripts for key data analytics tasks: Titanic EDA, RFM segmentation, survey cleaning, job scraping, retail time series, and e-commerce insights. Includes fallback datasets, requirements.txt, and is ready to run in VS Code for quick analysis and visualization.
samueleniola
A comprehensive time-series analysis of historical stock data for The Coca-Cola Company (1980–2026) using Python. The project explores long-term price trends, calculates returns, analyzes volatility, and visualizes financial performance to demonstrate strong data cleaning, EDA, and data visualization skills in real-world financial analytics.
YashCh05
End-to-end product analytics project using a real-world user events dataset. Performed data cleaning and EDA in Python, solved business questions using SQL, and built an interactive Power BI dashboard to analyze revenue trends, user behavior, traffic sources, and geographic insights for data-driven decision making.
This project demonstrates a complete Data Extraction and Processing pipeline using a Kaggle dataset on Payment Card Fraud Detection (2025). It includes data cleaning, preprocessing, EDA, and machine learning model building in Python, along with a Power BI dashboard for visual analytics and a detailed report summarizing insights and results.
Ayush3622
A complete HR analytics project using SQL for deep-dive analysis and Python (Machine Learning) to predict employee attrition, visualized in a 5-page interactive Power BI dashboard. End-to-end workforce analysis: from SQL querying and EDA to building and tuning a predictive classification model and performing K-Means clustering to segment employees.
AdhamAymanElsayed
There are several exploratory data analysis (EDA) analyzes in this file
Divaaazhr
Portfolio of Data Analytics projects including EDA, Data Visualization, and Data Manipulation projects implemented in Python by me.
AbiWaqas08
Data Science & Analytics internship projects at DevelopersHub — covering EDA, classification, regression, and machine learning workflows in Python.
Data Science & Analytics internship projects at DevelopersHub — covering EDA, classification, regression, and machine learning workflows in Python.