Wrangle and Analyze Data

WeRateDogs Project 

                                                                                                               Image via Boston Magazine


This project  known as wrangle and analyze data involves the wrangling of WeRateDogs twitter archive data from the period of 2015 to 2017. The twitter archive data which was made available to Udacity by the twitter user was used as the dataset for the purpose of this project. See archive here.

The goal of this project is to wrangle WeRateDogs twitter data to create interesting and trustworthy analyses and visualizations. The twitter data however contains very basic information. Therefore, gathering of additional data was carried out using twitter API and image prediction data was also collected here prior to cleaning and assessing of the data.

Following the cleaning, assessment, and analysis of all the data collected from the different sources, some insights were garnered, and appropriate visualizations produced to showcase these insights. The following questions formed the basis of our analysis:

In answering the first question, we gathered that out of a total of 1994 rows of dog image prediction data analyzed, only 1477 were images of dog breeds while 517 were not even dog breeds. Meaning that of all images posted within the period of November 2015 to August 2017, only 74% of all images posted for ratings on WeRateDogs twitter page were actual pictures of dogs while 26% were not even pictures of dogs.

We also gathered that within that same period as earlier highlighted, only images of 374 dog breeds have been rated on WeRateDogs twitter timeline out of a total of 1477 dog images data assessed. 12/10 was observed to be the highest proportion of rating given on WeRateDogs twitter page with a mean dog rating of 10/10. The lowest dog rating however was 0 as clearly shown in the chart below. Only 35 dog images have received a rating of 14/10 which was the highest rating ever given within the period. 

P.S: Note that this data does not include images that were not dog breeds. 

Lastly, we were also able to gather from our analysis that within the period in question - November 2015 - August 2017, the tweet with the highest retweet count and favorite count(likes) was tweeted on the 18th of June 2016 at 7.26pm from an iPhone with a tweet -  Here's a doggo realizing you can stand in a pool. 13/10 enlightened af (vid by Tina Conrad) and tweet URL with 70,331 retweets and 144,247 likes on that particular tweet. 


Note that the result of this analysis is valid as at the 06th of September 2022 when this analysis was made.


Project documentation is available and can be accessed on  GitHub 

See also Jupyter notebook with Python Codes  here