Found 88 repositories(showing 30)
sondosaabed
I aquired a full scholarship from Google Launchpad. Advanced data wrangling skills to work with messy, complex real-world datasets. Highly customized visualizations using the Matplotlib Python library
JamilaHajAhmad
Second project in my Data Analyst Nanodegree from Udacity
seni1
Course Outline Data wrangling is a core skill that everyone who works with data should be familiar with since so much of the world's data isn't clean. Though this course is geared towards those who use Python to analyze data, the high-level concepts can be applied in all programming languages and software applications for data analysis. Lesson 1: The Walkthrough In the first lesson of this course, we'll walk through an example of data wrangling so you get a feel for the full process. We'll introduce gathering data, then download a file from the web and import it into a Jupyter Notebook. We'll then introduce assessing data and assess the dataset we just downloaded both visually and programmatically. We'll be looking for quality and structural issues. Finally, we'll introduce cleaning data and use code to clean a few of the issues we identified while assessing. The goal of this walkthrough is awareness rather than mastery, so you'll be able to start wrangling your own data even after just this first lesson. Lessons 2-4: Gathering, Assessing, and Cleaning Data (in Detail) In the following lessons, you'll master gathering, assessing, and cleaning data. We'll cover the full data wrangling process with real datasets too, so think of this course as a series of wrangling journeys. You'll learn by doing and leave each lesson with tangible skills. Your In
Real-world data rarely comes clean. Using Python and its libraries, I gathered data from a variety of sources and in a variety of formats, assessed its quality and tidiness, then cleaned it. This is called the data wrangling process. The dataset used gathered from Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. In this project, I conducted a data wrangling process through gathering data from a variety of sources and in a variety of formats: - First is downloaded manually a .csv file named ‘twitter_archive_enhanced.csv’ and stored it in ‘archive’ table - Then, I used the Requests python library to download programmatically a ‘.tsv’ file named ‘tweet-image-predictions.tsv’ and I stored it in the ‘images’ table. This file contains the results of a neural network's analysis which predicts a dog's breed based on images. - After this, I created an API object that I used to programmatically download a JSON file stored as ‘twitter_counts’ table, which contains additional Twitter data. For the second section of the project, which is devoted to data assessing, I first, looked for quality issues that pertain to the content of data I identified ten quality issues, then I examined tidiness issues, which pertain to the structure of data. In the last section of the wrangling process, I structured and cleaned dirty data into the desired format for better analysis and visualizations using Python and its libraries. For each identified issue, I defined the actions to undertake before translating those actions to lines of code. I also tested every code to check the result of the cleaning.
Applying Data Wrangling process with real world data.
No description available
In this project, leverage Python and its libraries to collect, assess, and clean real-world data from various sources and formats. Document the entire process in a Jupyter Notebook and present analyses and visualizations using Python for a transparent showcase of the refined dataset.
Data wrangling and analysis of Santa Barbara bird populations and rainfall trends using Python
josiahuma
Several python-based ETL projects, data wrangling, and analytics scripts with real-world examples. Visualizations using panda and power bi
DenisMarcher
This repository serves as a collection of educational data analysis and projects demonstrating: Data wrangling skills, Visualization techniques, Real-world EDA, Data Sets onlyClear, structured, using Python . All notebooks use real-world, non-synthetic datasets, in accordance with the course requirements.
suneelshivanioffical
In the Data Analysis with Python course by freeCodeCamp, gain hands-on experience with Python's core data analysis libraries, including Pandas, Matplotlib, and NumPy. Through real-world projects, you’ll learn to clean, manipulate, and visualize data effectively, developing skills in data wrangling, analysis, and visualization.
Collection of practical data science projects demonstrating end-to-end analytics workflow: from data wrangling and EDA to ML modeling and interactive visualization. Built with Python, SQL, Tableau, and QuickSight to solve real-world business problems.
prince-std
Empower your data-driven decision-making with this comprehensive repository of data analysis projects. Explore a variety of datasets, analyze trends, and visualize insights using Python and power bi and other tools. Enhance your data wrangling, analysis, and storytelling skills while gaining hands-on experience with real-world data challenges
gowthamkumar9
I am a data science enthusiast skilled in turning raw data into meaningful insights and predictive solutions. With strong foundations in Python, Machine Learning, Statistics, and Data Wrangling, I enjoy solving real-world problems using data-driven approaches. building projects that strengthen my analytical thinking and technical expertise.
prast567
Real-world data rarely comes clean. Using Python and its libraries, I will gather data from a variety of sources and in a variety of formats, assess its quality and tidiness, then clean it. This is called data wrangling. I will document my wrangling efforts in this Jupyter Notebook, plus showcase them through analyses and visualizations using Python (and its libraries) and/or SQL. The dataset that I will be wrangling (and analyzing and visualizing) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog.
himanshusharmacu
Real-world data rarely comes clean. Using Python and its libraries, I will gather data from a variety of sources and in a variety of formats, assess its quality and tidiness, then clean it. This is called data wrangling. I had documented my wrangling efforts in a Jupyter Notebook, plus showcased them through analyses and visualizations using Python (and its libraries) and/or SQL. The dataset that i wrangled (and analyzed and visualized) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs Brent." WeRateDogs has over 4 million followers and has received international media coverage. WeRateDogs downloaded their Twitter archive and sent it to Udacity via email exclusively to use in this project. This archive contains basic tweet data (tweet ID, timestamp, text, etc.) for all 5000+ of their tweets as they stood on August 1, 2017. More on this soon.
joj19968
# Wrangle-and-Analyze-data #### Introduction Real-world data rarely comes clean. Using Python and its libraries, you will gather data from a variety of sources and in a variety of formats, assess its quality and tidiness, then clean it. This is called data wrangling. You will document your wrangling efforts in a Jupyter Notebook, plus showcase them through analyses and visualizations using Python (and its libraries) and/or SQL. The dataset that you will be wrangling (and analyzing and visualizing) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs Brent." WeRateDogs has over 4 million followers and has received international media coverage. WeRateDogs downloaded their Twitter archive and sent it to Udacity via email exclusively for you to use in this project. This archive contains basic tweet data (tweet ID, timestamp, text, etc.) for all 5000+ of their tweets as they stood on August 1, 2017. More on this soon. ### Project Details Your tasks in this project are as follows: -Data wrangling, which consists of: Gathering data, Assessing data and Cleaning data. -Storing, analyzing, and visualizing your wrangled data -Reporting on 1) your data wrangling efforts and 2) your data analyses and visualizations #### Gathering Data for this Project Gather each of the three pieces of data as described below in a Jupyter Notebook titled wrangle_act.ipynb: The WeRateDogs Twitter archive. I am giving this file to you, so imagine it as a file on hand. Download this file manually by clicking the following link: twitter_archive_enhanced.csv The tweet image predictions, i.e., what breed of dog (or other object, animal, etc.) is present in each tweet according to a neural network. This file (image_predictions.tsv) is hosted on Udacity's servers and should be downloaded programmatically using the Requests library and the following URL: https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv Each tweet's retweet count and favorite ("like") count at minimum, and any additional data you find interesting. Using the tweet IDs in the WeRateDogs Twitter archive, query the Twitter API for each tweet's JSON data using Python's Tweepy library and store each tweet's entire set of JSON data in a file called tweet_json.txt file. Each tweet's JSON data should be written to its own line. Then read this .txt file line by line into a pandas DataFrame with (at minimum) tweet ID, retweet count, and favorite count. Note: do not include your Twitter API keys, secrets, and tokens in your project submission. #### Assessing Data for this Project After gathering each of the above pieces of data, assess them visually and programmatically for quality and tidiness issues. Detect and document at least eight (8) quality issues and two (2) tidiness issues in your wrangle_act.ipynb Jupyter Notebook. To meet specifications, the issues that satisfy the Project Motivation (see the Key Points header on the previous page) must be assessed. #### Cleaning Data for this Project Clean each of the issues you documented while assessing. Perform this cleaning in wrangle_act.ipynb as well. The result should be a high quality and tidy master pandas DataFrame (or DataFrames, if appropriate). Again, the issues that satisfy the Project Motivation must be cleaned. #### Storing, Analyzing, and Visualizing Data for this Project Store the clean DataFrame(s) in a CSV file with the main one named twitter_archive_master.csv. If additional files exist because multiple tables are required for tidiness, name these files appropriately. Additionally, you may store the cleaned data in a SQLite database (which is to be submitted as well if you do). Analyze and visualize your wrangled data in your wrangle_act.ipynb Jupyter Notebook. At least three (3) insights and one (1) visualization must be produced. #### Reporting for this Project Create a 300-600 word written report called wrangle_report.pdf or wrangle_report.html that briefly describes your wrangling efforts. This is to be framed as an internal document. Create a 250-word-minimum written report called act_report.pdf or act_report.html that communicates the insights and displays the visualization(s) produced from your wrangled data. This is to be framed as an external document, like a blog post or magazine article, for example.
mmaayyss20
No description available
lmatos-803
Real World Data Wrangling with Python
MohammadHamo912
Real World Data Wrangling With Python
DataAnalytics-ISSS
Real World Data Wrangling with Python
thiago-grabe
Real World Data Wrangling with Python
Farha-Dahman
Real World Data Wrangling with Python
No description available
Raghad-Odwan
Udacity_Secned_Project_data_anayst
No description available
kelseyz1229
No description available
AbdalrhmanJuber
No description available
tareq-saymeh
No description available
sarashrouf
Data wrangling project exploring movie characteristics on Netflix vs. general movies dataset using Python and Pandas.