This page illustrates the use of Heck Probit model, which is used in cases where the selection bias may impact the results of a model. Consider for instance a database of mortgage loans. The data set contains loans that were approved and the ones that were denied. Furthermore, the data set also contains information on the mortgage loans that were sold in the secondary market. Not all approved loans were sold in the secondary market. Let's say we are interested in determining the characteristics of a loan that would increase its propensity to be sold in the secondary market. At the same time, let us acknowledge the fact that the sale in the secondary market is contingent upon the loan being approved in the first place. Now consider that racial discrimination may have played a role in the loan approval process where loan applications from AfricanAmerican clients were disproportionately rejected. We now have a selection bias that needs to be accounted for in the modelling process. Mathematically, we have: Selection (loan approval) Equation: z*k + u2 > 0 where u1~N(0,sigma) and u2~N(0,1) & cov(u1,u2) = 0 In cases where selection is not random or when we cannot predict selection perfectly, we may have a case where cov(u1,u2) != 0. In the Heckprob case, the big difference lies in the assumptions about the error terms. We would therefore have to test the possibility that the response and selection equations may not be independent and the disturbances in the two equations may be 'correlated.' In such cases we use the Heck Probit model. Drone warefare and Heck ProbitReview the paper [Fair drone support paper.pdf], which implements the Heck Probit model. See if you could figure out that the model has not been specified properly and that the use of Heck Probit was not warranted. The real issue is that there are several sample selection biases in the data set and not just one. The authors have merely tried to address one of the sample selection/response bias. Also erroneous is to use Pearson correlation coefficients when the more appropriate tool is the tetrachoric correlations. I have provided the data set in Stata and SPSS formats for you to see how best can we specify the model in the aforementioned paper. The Stata code is made available below.
