with Jia Chen, Andrew M. Jones and Bin Peng
[Summary] The empirical and methodological efforts in using the generalised linear model to model healthcare costs have been mostly concentrated on selecting the correct link and variance functions. Another type of misspecification - misspecification of functional form of the key covariates - has been largely neglected. In many cases, continuous variables enter the model in linear form. This means that the relationship between the covariates and the response variable is entirely determined by the link function chosen which can lead to biased results when the true relationship is more complicated. To address this problem, we propose a hybrid model incorporating the extended estimating equations (EEE) model and partially linear additive functions. More specifically, we partition the index function in the EEE model into a number of additive components including a linear combination of some covariates and unknown functions of the remaining covariates which are believed to enter the index non-linearly. The estimator for the new model is developed within the EEE framework and based on the method of sieves. Essentially, the unknown functions are approximated using basis functions which enter the model just like the other predictors. This minimises the need for programming as the estimation itself can be completed using existing EEE software programs. The new model and its estimation procedure are illustrated through an empirical example focused on how children's Body Mass Index (BMI) z-score measured at 4-5 year old relates to their accumulated healthcare costs over a 5-year period. Results suggest our new model can reveal complex relationships between covariates and the response variable.
Top panel: m1(BMI z-score) estimated using the method of sieves with confidence intervals; Bottom panel: the marginal e ects of BMI z-score at age 4-5 on a 5-year accumulated MBS costs (mother's age at birth is fixed at its mean 30 and all the dummy variables are set at their reference level) with confidence intervals; The dashed lines indicate the approximate cut-points for the BMI categories using age and gender specific cut-offs from Cole et al. (2000): the one on the left represents the cut-o between underweight and normal weight while the one on the right represents the cut-o between normal and over weight.