This week, I compared the number of Chipotle Locations and Obesity Rates in each U.S State, because I was curious to see if there was a correlation between them. I took data from two datasets—Chipotle Locations and Obesity Data—and merged them based on State. Since the State names were stored in different ways (full state names in Chipotle Locations dataset and abbreviations like "CA" for California, in Obesity Data), I defined a function to map each abbreviation to full state name, and then applied that to the Obesity Dataset. Taking numerical data of count of Chipotle locations and Obesity Rate for each state, I plotted them across an x, y scale. Each of the blue dots on the scatterplot therefore represent 1 US state.
To visualize the correlation better, I also added a line of best fit. It shows that there is a very slight negative correlation between number of Chipotle Locations and Obesity Rate, but it is hard to tell that this is an accurate representation since it seems that the one outlier state with 400+ Chipotle locations influenced this line significantly. It is interesting to see whethere there would be a difference in this line if we remove this outlier, but I left it in for the purpose of this assignment.
Overall, this is an interesting data visualization that shows the great variation of obesity rates and Chipotle locations across US states. However, it doesn't really make any conclusions or prove a causal relationship. In fact, it would probably be a bad idea to try and convince that the high number of Chipotle locations solely predicts the obesity rate of the people living in that location. Instead, there would most likely be a huge variety of different factors that impact obesity rates, such as overall diet, stress levels, levels of physical activity etc. That could be an interesting visualization to create in the future.
You can find the original datasets here (Chipotle, Obesity), and download the full Jupyter Notebook here.