When null hypotheses are not so obvious

I have received a number of questions regarding the null hypothesis or null model used in my recent publication on collective decision-making in mixed-species bird flocks (see link). So I thought I would take a moment to explain why the null expectation in some systems is sometimes not immediately obvious.

Lets take a simple experimental setup as I used in that study. Here, there are four possible areas in a patch that birds can feed on (designated as feeders 1 to 4). Each feeder has a given density (given by the rho values). Lets say that, having measured the what feeder each bird chose to feed at, I got what seems to be perfect random behaviour, where each bird appears to have chosen a feeder at random:

Figure 1: The observed probability of a bird choosing a feeder as a function of the density of birds on that feeder

Having gone to all this effort to collect these data, we could easily despair at the sight of these data, which obviously is obviously random. If we had been collecting data on the feeder choices of birds arriving into the patch (as we did in a companion study, see link), then we might have reason to despair. For example, three birds (i,j,k) flying into the patch and picking a site at random would be equally likely to choose any of the four:

Figure 2: Schematic of individuals arriving into the patch and choosing a feeder

As a result, if we plotted the null expectation against our data, we would not have much to write about:

Figure 3: The data perfectly fit the null expectation of random feeder choice on arrival into the patch

However, let us go back to the original study in question. In this case, we were interested in studying the movements of individual birds within the patch. That is, where they decided to feed when we detected them moving from one feeder onto another. When thinking about null models, it is easy to fall into the trap of considering all feeders as being equal, as in the first example. However, because birds are moving within the patch, rather than into the patch, this is not the case! Thus, it is important to frame the null hypothesis in terms of the random movement by birds in the patch rather than the random choice of feeder directly. Lets start with the decision about which feeder to leave from:

Figure 4: Individuals are unequally distributed across two feeders. What is the probability of leaving any given feeder?

If the data we'd collected (in Figure 1) represented the leaving decisions, we may easily have concluded that the decision was random. That is, each feeder had an equal probability of being departed, regardless of the distribution of birds across the patch. However, by working through a more appropriate null model, one where each bird is equally likely to leave a feeder at any given time, I'll show that the expectation of 0.25 is incorrect, but that the proper null hypothesis is a density-dependent one. Let us start by randomly picking a bird in Figure 4. We now have a 2/3 chance of picking a bird from feeder 1, and a 1/3 chance of picking a bird from feeder 3. So, if birds are departing at random (i.e. not paying any attention to others), this means that at any given time, the probability that the next bird will leave feeder 1 is actually 2/3, the probability that the next bird will leave feeder 2 is 1/3, and the probabilities that the next bird leaves feeders 2 and 4 are 0. Hopefully it is clear that the null hypothesis for departures is therefore not an probability of 0.25 for each feeder, but instead it is actually equal to the density of individuals at that feeder. We can then plot how this probability looks compared to our data:

Figure 5: Data plotted against the null expectation of random movements away from a site as a function of its relative density.

Thus, if individuals were leaving sites and behaving in a manner similar to the ideal free distribution, then we should expect the points to lie on the y=x line. In the case of our observed data, we therefore find individuals are more likely to leave relatively empty sites (the points at low densities are above the null), and less likely to leave busy sites (points are below the line). Note that in the above figure, I have removed the point at x=0, since we could not have observed an individual leaving an empty site (and this is reflected in the null hypothesis). We can visualise this a bit better by plotting the log of the ratio between the points and the line, i.e. log (points/line): 

Figure 6: Log of the ratio between the probability of leaving and the null expectation.

This now clearly shows the non-random behaviour of individuals in this toy experimental data. In the above plot, the null is plotted along y=0, points above the line are higher-than-expected probabilities of leaving, and points below the line are lower-than-expected probabilities. So we can conclude that, although our data initially looked like random behaviour, birds in fact moved more towards each other than expected by pure chance.

The next case, identifying whether the arrival decisions of birds after having moved differs from random is slightly trickier. Here is a schematic of what I mean:

Figure 7: Individual k leaves feeder 1 and moves to feeder 3.

Hopefully it is obvious straight away that the probability of bird k to land on any given feeder is not 0.25, because if it landed on feeder 1 it would not have been considered a movement. In this case, it has a 1/3 probability of picking any of the three remaining feeders (2,3,4). However, before we can simply conclude that the random expectation of picking a feeder is now 1/3, we also need to account that not all feeders are equal likely of having a probability of 0 (i.e. not all feeders are equally likely to have a departure). First, we have to quantify the probability of individuals leaving from any given feeder (see above), which is 2/3 for feeder 1, 0 for feeder 2, 1/3 for feeder 3, and 0 for feeder 4. So now there is a 2/3 chance that feeder 1 will have an arrival probability of 0, given that a bird will be departing from it, and a 1/3 chance that it will have a probability of 0.33. Adding these together, we get (1/3 * 0) + (2/3 * 1/3) = 0.222. Similarly, for feeder 3, we get (2/3 * 0) + (1/3 * 1/3) = 0.111. We can plot these values against the observed data:

Figure 8: Data plotted against the null expectation of random movements to a feeder from another.

This clearly shows a negative density-dependent relationship between the number of individuals at a feeder, and the probability that a bird will choose that feeder at random. This makes a lot of sense - if all the birds are on one feeder (density = 1), then the probability that a bird will move away from a feeder and land on that one is 0 because it must have left that feeder to start with (note that I've removed the data point at x=1 because we could not have observed it). Similarly, all empty feeders (density = 0) have an equal probability of 1/3. In contrast, feeders with intermediate densities have a probability that is given by the function P(arrive) = (1-density)/3 (show as the solid line). Again, we can plot this as a log ratio to more clearly show how the original data deviates strongly from the true null hypothesis:

Figure 9: Log of the ratio between the probability of arriving and the null expectation. 

This now clearly shows again that observed data along the 0.25 line is not necessarily random in such a system.

Hopefully, this highlights the importance and care when assigning null expectations on data. This issue is very common in particular with social data, as individual's decisions can impact the state of the system. The key in this case was that the null hypothesis should be based on randomly picking individual birds to make decisions in each possible scenario, rather than assigning an equal probability to each feeder at the start.

Finally, thanks to Richard Mann for pointing this out to me in the first place.