In the project, the dataset that was wrangled (and analyzing and visualizing) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. WeRateDogs has over 4 million followers and has received international media coverage. The project consists of three major sections: 1. Data gathering 2. Data assessing 3. Data cleaning The project involves three datasets: 1. Enhanced Twitter Archive 2. Additional Data obtained via the Twitter API 3. An Image Predictions File. The enhanced twitter archive file was downloaded from the web. From the enhanced twitter archive, the retweet count and favorite count are two of the notable column omissions. Fortunately, this additional data can be gathered from Twitter's API. Anyone who has access to data for the 3000 most recent tweets, at least. Hence, the Twitter's API was quered to gather this valuable data. Using the tweet IDs in the WeRateDogs Twitter archive, the Twitter API was quered for each tweet's JSON data using Python's Tweepy library and each tweet's entire set of JSON data was stored in a file called tweet_json.txt file. For the Image Predictions File, a tsv file (image_predictions.tsv) is present in each tweet according to a neural network. It was hosted on a servers and was downloaded programmatically using the Requests library and the following URL: https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv.
Stars
0
Forks
0
Watchers
0
Open Issues
0
Overall repository health assessment
No language data available
No package.json found
This might not be a Node.js project
1
commits