Data

Data Management:

The dataset used in this study is part of a greater dataset being actively developed in the Rueppell lab at the University of Alberta.  As such, it had to be cleaned and summarized for analysis. Much of the data was stored in a central excel file, though some auxiliary files contained data on variables such as honey production and queen longevity. Queen longevity was not exact (to the nearest month) and had to be inferred from various datasets related to different aspects of the project.  Because of the uncertainty surrounding time of death for queens that perished overwinter and the resulting odd distribution of longevity scatter plots (Figure 3), we opted to use overwintering success as response variable corresponding to overall colony success - winter is also the period that colonies are most at-risk for death, and overwintering failure is a relevant concern for Canadian beekeepers. However, we did include longevity in our analysis of differences among honey bee stocks. Our other metric of colony success is the reason that beekeepers manage colonies in the first place: honey production. We used 2023 honey output, measured in kilograms. 

Figure 3. Example scatter plot of approximate queen longevity following field placement plotted against a predictor variable (here geotaxis score). Note the large gap in point distribution over winter months.

Figure 4.  An example line graph depicting the type of error prevalent in the temperature data.

Data for temperature and humidity were contained on individual files for each queen. We encountered considerable sensor error in this data, with the temperature data for most queens stagnating to a single value for long periods of time, as can be seen in Figure 4. This tended to occur later in time, rather than earlier, which led us to focus on August  2022 for our summary statistics. Our analysis for this aspect of the project focused on temperature variance,  as a metric of a colony's ability to to thermoregulate.  After the data was summarized, it was put into a master datasheet,  along with the data from the central file, honey production, and longevity discussed above.  A portion of this sheet can be seen below in Table 1.


Table 1. Sample of the data table used in analysis, containing predictor, response, and identifying variables. 

Figure 5.  Box plots demonstrating the distribution of A) Head Width, B) Body Weight, C) Temperature variance, D) Geotaxis Score, E) Number of Turns, F) Sperm Count, G) Sperm Viability, H) Ovary weight, and I) Ovariole count by genetic stock

There are many variables present in the dataset for this project, and we had to select a handful from each variable category for our analysis. Our predictor variables (and reasons for their selection) in this study include:

In addition to the above variables being evaluated as predictors of colony success metrics (overwintering success and honey output), they were were also used as dependent variables in analysis of differences in queen and colony attributes by stock. The below destructively sampled internal queen traits were also evaluated to observe differences by stock:

Figure 5 provides a preliminary glance at the distribution of some of those variables  within each genetic stock tested.  It allows us to see outliers that may need to be accounted for in further analysis, such as queen #134,  who is more than twice as heavy than any other queen (Figure 5B). She was removed when calculating statistics for body weight, as her weight was likely a result of a misinput. Control colonies (sensors places in empty brood boxes) also needed to be extracted from the temperature data, which was demonstrated by Figure 5C.