Found 327 repositories(showing 30)
Ebuka456
No description available
ndabdulsalaam
The primary focus of this project revolves around proficient data wrangling techniques. Leveraging the Request library and Tweepy, I conducted comprehensive web scraping operations. Additionally, I engaged in succinct exploratory and explanatory analyses, extracting valuable insights and proposing strategies to enhance tweet retweeting metrics.
jmlcode
Wrangling and analysis of Tweets from WeRateDogs (@dogrates) with Python in Jupyter Notebook. Project focuses on gathering, assessing and cleaning data. Various methods, including Python's Requests and Tweepy packages for performing a GET Request and querying Twitter API, were used to collect Tweets and relevant data available online.
Emmaxadel
No description available
YuehHanChen
Use Twitter Api and Pandas to gather and conduct data cleaning
Python Libraries was used
Real-world data rarely comes clean. Using Python and its libraries, I gathered data from a variety of sources and in a variety of formats, assessed its quality and tidiness, then cleaned it. This is called the data wrangling process. The dataset used gathered from Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. In this project, I conducted a data wrangling process through gathering data from a variety of sources and in a variety of formats: - First is downloaded manually a .csv file named ‘twitter_archive_enhanced.csv’ and stored it in ‘archive’ table - Then, I used the Requests python library to download programmatically a ‘.tsv’ file named ‘tweet-image-predictions.tsv’ and I stored it in the ‘images’ table. This file contains the results of a neural network's analysis which predicts a dog's breed based on images. - After this, I created an API object that I used to programmatically download a JSON file stored as ‘twitter_counts’ table, which contains additional Twitter data. For the second section of the project, which is devoted to data assessing, I first, looked for quality issues that pertain to the content of data I identified ten quality issues, then I examined tidiness issues, which pertain to the structure of data. In the last section of the wrangling process, I structured and cleaned dirty data into the desired format for better analysis and visualizations using Python and its libraries. For each identified issue, I defined the actions to undertake before translating those actions to lines of code. I also tested every code to check the result of the cleaning.
abdullahmoustaf
WeRateDogs Project Report Introduction: The below report which made for the Udacity Data Analyst Nanodegree Program of project “WeRateDogs”, I’ll try top explain the process in which my report has gone through. The goal of this project is to practice the process of wrangling and cleaning data, which was made through this twitter account tweet data. Tweets went through a process in which I performed the following activites: - Gathering Data - Assessing Data - Cleaning Data - Gathering Data I this process data is being obtained from csv files and loaded to tables in which it will go through the wrangling process. - Twitter archive data was loaded to `twitter_archive` table whcih contains WeRateDogs Twitter archive, which was provided by the Course and data was imported into the dataframe. - Image prediction data was imported from Image prediction file provided by the course and hosted in Udacity’s servers and added data to `predictions` table. The tweet image predictions, basically predicts whether the object in a said image is a dog or other object. - API data was provided through a file in the course material as my twitter developer account wasn’t created when I’ve started to work on the project, in this file I was able to query API data in JOSN file to read twitter data to api_df_now table. - Assessing Data In this step data is being assessed visually and programmatically to detect quality and tidiness issues in the gathered data. - ‘twitter_archive’ has missing data in multiple tables example, "in_reply_to_status_id", "in_reply_to_user_id", "retweeted_status_id", "in_reply_to_user_id", "retweeted_status_id", "retweeted_status_user_id". Lower case dog names was an issue too. - Another issue is the dog names that can make confusion like doggo, pupper, floofer and puppo. - Timestamp is another issue that needs an attention. Source of content needs to be organized. - Rating values needed some changes. - image predictions columns was making a confsuion. - ‘api_df_now’ file is separate from Twitter archive data. Cleaning Data In this step data is being cleaned and added to new tables twitter_archive_clean, prediction_clean and api_df_now_clean according to the issues observed in after assessing the data. 1- Fixing Quality issues 1- Dropped unnecessary columns containing missing data “in_reply_to_status_id", "in_reply_to_user_id", "retweeted_status_id", "in_reply_to_user_id", "retweeted_status_id", "retweeted_status_user_id". 2- Replaced missing “None” values with “NaN”. 3- Joined “api_df_now” table with “twitter_archive” table and renaming ‘tweet_id’ column. 4- Combined all dog names; doggo, pupper, floofer and puppo under one column name ‘dog’ 5- Changed timestamp to datetime. 6- Optimized source of content: Twitter for iphone, Vine - Make a Scene, Twitter Web Client and TweetDeck. 7- Made a default value for numerator and denominator values. 8- Capitalized first letters of dogs names. 2- Tidiness 1- Changed Image predictions p1, p2 and p3 names to potential_dog1, potential_dog2 and potential_dog3. 2- Merged the cleaned data into the clean tables. Conclusion Through this project I’ve learned to express the data analysis process through code and different tools offered through the Jupyter lab application. Data wrangling is crucial in the data analysis process as it’s the only way top obtain a reliable data to take the proper decisions in any organization. And using python in the process made it much easier and more efficient, also, the different libraries used in the process allowed the data to be read and manipulated in a relatively easier way, which will facilitates the process if dealt with much larger data amounts like Big Data. This proves that using code is a way to manipulate data and alter it efficiently. I believe that through my learning process I’ll be able to dig deeper into more processes and tools which will make the process more fruitful and efficient.
divyanitin
No description available
iKhushPatel
WeRateDogs Twitter Analysis
douglasnavarro
🐶Analysis of the WeRateDogs twitter page
p-aguila
Data Wrangling Analysis from Twitter Account @WeRateDogs
saiprasadlaxmeshwar
Data Wrangling and Analysis of WeRateDogs Twitter archive.
LinChen1992
Using Twitter API to analysis data of WeRateDogs
uminomuneaki
Analysis of a twitter account called WeRateDogs with twitter API (tweepy)
cynthia-obojememe
WeRateDog Twitter dog rating analysis
AkwasiTp
A Udacity twitter data analysis on tweets from WeRateDogs
nsalvine-prog
Data wrangling and analysis project using the WeRateDogs Twitter dataset
Abdulraqib20
This project involved wrangling and analyzing multiple datasets, which were consolidated into one cohesive dataset. The project also included collecting data from the Twitter API. The resulting dataset was used for further analysis and insights. Check out the code and documentation for more details!
Moamen-Abdelkawy
Explore WeRateDogs Twitter Archive Data
nidhim12
In this project we are analyzing a popular twitter page @dog_rates which is provided as a .csv file from Twitter archive. Tweepy is the API which is used as interface to the twitter API to download JSON data about retweet counts and favorite counts. We also need to download a file from Udacity servers using http request which predicts if the images are of dogs are not. We will be gathering, assessing and cleaning the data
shanepatterson09
Data Wrangling Project to clean the WeRateDogs tweet data into a functional and effective data set
Boy-Davis
The data wrangling process carried out on @dog_rates (WeRateDogs) twitter account
evanchen13
Udacity Data Analyst Nanodegree Project 4 - WeRateDogs Twitter Data Wrangling & Analysis
JblIdeal
This project analyzes the WeRateDogs twitter page. The dataset is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. The WeRateDogs Twitter archive contains basic tweet data for all 2356 of their tweets, but not everything. One column the archive does contain though: each tweet's text, which I used to extract rating, dog name, and dog "stage" (i.e. doggo, floofer, pupper, and puppo). Additional data was gathered from Twitter's API for the tweet Ids in the twitter archive dataset. These additional data includes the like counts, retweet counts etc. Furthermore, a table of image predictions which contains the results of the neural network classifier that classifies breeds of dogs from alongside each tweet ID, image URL, and the image number that corresponded to the most confident prediction was gotten from the Udacity website.
youshuo2008
Data wrangling and exploratory data analysis on the WeRateDogs twitter data
Suhaila-Ehab
WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs Brent." WeRateDogs has over 4 million followers and has received international media coverage. In this report I report on my efforts to gather, assess, and clean the twitter data to gain some useful insights. Which do you think is the most loveable dog?
sourojyotipaul
No description available
nietzische
To complete my Udacity Data Analyst Nanodegree Program, I was required to perform an analysis of @WeRateDogs twitter archive with a special emphasis on data wrangling.
No description available