BEES

Investigating What Affects Honey Bee Populations and What Honey Bee Populations Affect in the USA

THE GENERAL STATE OF THE BEES

How have bee populations changed overtime?

The #savethebees campaign has been trending for almost a decade and a half now, and we often emphatically hear 'the bees are dying!' but how true is that actually?

From about 1992 onwards, there's been a drastic decrease in honey bee populations, and this trend has continued until around 2005, but since then, honey bee populations have been relatively stable.

Have bees been dying everywhere?

Interestingly enough, no! Honey bees have actually seen increases in some states.

The graph on the right shows 5 states with the largest honey bee populations in the US. While California's honey bee population has been decreasing for the last few decades, North Dakota has actually seen an increase in colonies, while Florida, South Dakota, and Minnesota has been relatively unchanged.

In short, bee survivability & thrive-ability really depends on a variety of factors including geography and varies state to state.

THE BIGGEST KILLER: THE WINTER

How do colony losses differ by state?

Just looking at the maps on the right, we can tell that 2011 and 2013 winters weren't too bad, but 2012 and 2014-2016 were pretty bad.

Geographically, states in the Midwest and Mid-Atlantic experience the worse bee die-offs.

Below, I've created bar charts for states with the highest loss rates and lowest loss rates.

Which states are the best and worst for bees?

States with Highest Hive Loss Percentage

States with the highest hive loss % over the winter tend to be located in the Mid-Atlantic and Midwest

States with Lowest Hive Loss Percentage

States with lowest hive loss % over the winter tend to be located in the Western US

How do colonies die?

The two main reasons for colony loss are deadouts and colony collapse disorder.

Deadouts are when all the bees in a colonies die. Bees' immune systems weaken in the winter and they are more susceptible to infections and parasites (like varroa mites).

Colony Collapse Disorder, or CCD, is when the majority of bees in a colony permanently fly away from the hive leaving just the queen and a few worker bees. CCD is still being studied to determine what factors contribute to it.

Unfortunately the USDA has only mandated data collection of CCD and Deadout losses starting in 2015, so there isn't much data. Based on the graph, we can see deadouts are far more common; in addition, there are seasonal trends among both types of colony loss.

Is there correlation between cold temperature and colony loss?

In order to answer the question, I used a dataset of temperature anomalies by year (dating back to the late 1800s). I then took the minimum temperature anomaly by year and wanted to see if this value had any correlation by the percentage change in bee hives year-to-year.

First I wanted to look at the data to see if there was a general trend. On the left is the minimum temperature anomaly for the year by year and on the left is the percentage change in colonies from year-to-year. There doesn't appear to be any shared pattern in the data.

Now I wanted to see the direct influence of temperature on the number of colonies, so I plotted both on a graph (left) and looked at the correlation.

I found the correlation between the % change in colonies and temperature anomaly. I was expecting to find a high positive correlation (more negative temperatures = harsh winter = more hives lost = more negative percentage change in colonies).

I found that the correlation was 0.23

Although it was much less than I expected, my intuition about it being a positive number was correct.

WHAT DO BEE POPULATIONS AFFECT?

How are colonies correlated with prices of consumer goods?

Correlation Table with Food CPIs [V1]

Using R, I scraped CSV files of different Consumer Price Indices from FRED (Federal Reserve Bank of St. Louis) for an assortment of food products.

I hypothesized that 'melons', 'fruits', and 'almonds' would have strong negative correlation with 'colonies'. Because melons, fruits, and almonds are mainly pollinated by honey bees, I assumed that as the number of colonies decreased, the prices of these goods would increase.

Although there was some negative correlation between these categories, there was also negative correlation between colonies and 'meats/poultry/fish/eggs', 'alcohol', 'dairy', 'cereal', and 'sugar' which I would not have expected.

I thought one flaw in my data was I was looking at total number of colonies and compared to the index when it might make more sense to look at correlation between % change in colonies and % change in these price indices.

Correlation Table between Colonies and Price Indices of Various Consumer Goods

(Last row 'Colonies' is the most relevant)

Correlation Table with Food CPIs [V2]

For this correlation chart I looked at how % change of colonies is correlated with % change of price indices for a variety of consumer goods (as opposed to absolute numbers).

Again, I expected to see 'melons', 'fruits', and 'almonds' would have strong negative correlation with 'colonies' while seeing no correlation with the remaining price indices. The data, however, looked relatively similar.

These correlation charts show that while prices of goods may be dependent on honey bee populations, there are likely a myriad of other factors that influence it.

*I also tried lagging the colony % change data by 1 year to see if prices would be affected after-the-fact and actually saw correlation significantly closer to 0.

Correlation Table between % Change in Colonies and % Change in Price Indices of Various Consumer Goods

(Last row 'Colonies' is the most relevant)

Can we predict honey prices?

Average honey price is strongly correlated with production of honey [-0.8] and is moderately correlated with pounds of honey per colony [-0.59] and number of colonies [-0.45].

I built a linear model to try predicting the price of honey based on the other three variables. Although the overall model is statistically significant (p-value = 3.972e-07) and the r-squared is reasonably high (0.6818), none of the variables are statistically significant predictors, so I don't have too much faith in the quality of the model, however I thought it'd be interesting to look at nonetheless.