A Double GLM with Shrinkage and Selection

Title of Paper:

Loss amount prediction from textual data using a double GLM with shrinkage and selection

(European Actuarial Journal)


Abstract:

The Gamma model has been widely utilized in a variety of fields, including actuarial science, where it has important applications in insurance loss predictions. Meanwhile, high dimensional models and their applications have become more common in the statistics literature in recent years. The availability of such high dimensional models have allowed the analysis of non-traditional data, including those containing textual descriptions of the response. In the models used in such applications, the dispersion may be designed to be related to a set of covariates, as opposed to being a single fixed value for the entire population. Following this approach, we incorporate a group Lasso type penalty in both the dispersion and the mean parameterization for a Gamma model, and illustrate its use in a predictive analytics application in actuarial science. In particular, we apply the method to an insurance claim prediction problem involving textual data analysis methods. Simulations are conducted to illustrate the variable selection and model fitting performance of our method.


Authors:

Scott Manski, Kaixu Yang, Gee Y. Lee (Corresponding author), Tapabrata Maiti


Copyright:

The source code published on this website is made publicly available under the GNU GPL 3.0 License.


Code and Data:

Note:

You will also need a word embedding matrix (https://nlp.stanford.edu/projects/glove/) to run the code for this project.

You can download one from here: glove.6B.zip.