Data-Wrangling-WeRateDogs

In the project, the dataset that was wrangled (and analyzing and visualizing) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. WeRateDogs has over 4 million followers and has received international media coverage. The project consists of three major sections: 1. Data gathering 2. Data assessing 3. Data cleaning The project involves three datasets: 1. Enhanced Twitter Archive 2. Additional Data obtained via the Twitter API 3. An Image Predictions File. The enhanced twitter archive file was downloaded from the web. From the enhanced twitter archive, the retweet count and favorite count are two of the notable column omissions. Fortunately, this additional data can be gathered from Twitter's API. Anyone who has access to data for the 3000 most recent tweets, at least. Hence, the Twitter's API was quered to gather this valuable data. Using the tweet IDs in the WeRateDogs Twitter archive, the Twitter API was quered for each tweet's JSON data using Python's Tweepy library and each tweet's entire set of JSON data was stored in a file called tweet_json.txt file. For the Image Predictions File, a tsv file (image_predictions.tsv) is present in each tweet according to a neural network. It was hosted on a servers and was downloaded programmatically using the Requests library and the following URL: https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv.

Created on Apr 23, 2022

Updated on Apr 23, 2022

Stars

0

Forks

0

Watchers

0

Open Issues

0

Repository Health Score

❤️

35/100

Poor

Overall repository health assessment

Score Breakdown

Activity

Inactive - no updates in 3+ months

0/30

0%

Recent Commits

Feats: The twitter archive enhanced file

Timileyin Samuel Akintilo•3 years ago

1531bf3View on GitHub

View all commits

Community

0 stars, 0 forks

0/30

0%

Documentation

Has description, wiki

15/20

75%

Maintenance

0.0% issue ratio

20/20

100%

Health score is calculated based on activity, community engagement, documentation quality, and maintenance practices

Languages

No language data available

Dependencies

No package.json found

This might not be a Node.js project

Top Contributors

1

Timmtet

User

1

commits