Modeling General Patterns of Digit Preference

Digit preference is a commonly found effect that certain preferred end-digits are stated substantially more often than the general pattern of the distribution would suggest. These digits are typically multiples of 5 and 10, possibly combined with tendencies to avoid certain unpleasant numbers, such as 13. This type of misreporting leads to unusual heapings at the preferred digits and it is frequently found as age-heaping in demography and epidemiology, particularly in historic data. The digit preference problem can be viewed as an inverse problem where the actually observed values are linear compositions of a latent sequence representing the true distribution. This sequence can be estimated and the composition pattern would reveal the amount of misreporting.

In this project, in collaboration with Jutta Gampe from the MPIDR and Paul H.C. Eilers from the Erasmus Medical Center in Rotterdam, I proposed an approach based on the composite link model in which both the proportions of counts that were transferred to neighboring digits and the latent distribution can be estimated. An article about this approach has been publish in 2008 on Statistical Modelling: Modelling general patterns of digit preference (link).

Afterward we also successfully extended this model to a two-dimensional setting for latent distributions with digit preference, where the strength of the misreporting pattern can vary over time. We published this generalization in the Proceedings of the 24th International Workshop on Statistical Modelling: Modelling trends in digit preference patterns (paper).

We also intend to extend this approach to better understand spatial variation in digit preference patterns.