Predicting Hospital Length of Stay 


This time, I want to explore the Negative Binomial Distribution. Remember when some one is  admitted in hospital, every day she/he is hoping to get discharged. He tries n number of days till he succeeds. This data perfectly suits to be modeled as NBD.

There is a data set from Microsoft . I got this from Kaggle .This dataset has 100k data points on patients admitted into hospital, indicators of their health condition and how long they were admitted in the hospital. 

The Google sheet is here.  This data of 100k patients gives the length of stay with a mean of 4 and variance of 5.57 making it an ideal candidate for NBD because of over dispersion.

I  used few lines of R code to fit it in NBD to find that it fits with these parameter:




Interpretation


The  Google Sheet has the model generated data and plotted over the actual observations.


The comparison of actuals and the model is below . The bar is from data and the red line is the model generated. The R code used is also given in the Google sheet. This  Google Doc has more analysis.


The alternate model of Poisson distribution was tried and compared with NBD using Chi-squared test which rejected the Null hypothesis. Full details are in Google sheet and Doc.