There are two key questions to consider when thinking about how FamilySearch users might produce a Family Tree that is not representative of a population. First, what are the characteristics of the users themselves? Here, specific concerns include the facts that FamilySearch users have access to computers/smart phones, internet, and time, or that members of the Church of Jesus Christ of Latter-day Saints may be more prevalent among FamilySearch users than they are in the general population. Second, how might the behavior of users affect who ends up on the Tree and who does not? For example, are users more likely to look for or find information about successful relatives, which would lead to their over-representation on the Tree?
Unfortunately, we do not have demographic information that allows us to provide summary statistics for the 12 million+ FamilySearch users. However, we can compare the characteristics from records on the Family Tree to other population records, to help us assess the representativeness of the Tree. We summarize the key findings here; see Price et al. (2021) for the full results.
When comparing the census profiles that are on the FamilyTree to the full population, we see that those on the tree are similar in terms of gender, age, household size, and the probability of being the household head. However, those on the Tree are more likely to be white, married, literate, and are more likely to be living in their birth state. Interestingly, we find that those on the Tree have a lower occupation score, which suggests that users are not more likely to look for or find information on more successful relatives.
While these results suggest that there is selection into the Tree along some characteristics, we note that when using the FamilyTree data (or any samples produced using it as training data), it is possible to re-weight the sample to be representative of the desired population by following the procedure outlined in Bailey et al. (2019). The fact that the Family Tree includes over 1.2 billion profiles means that even for under-represented groups, there will likely be sufficient support in the data for this approach.