S4 Real World Examples

The Gamma distribution is used to model many real world events. An important application is to the size of Insurance claims. Suppose that an insurance company has historical data on the claims which occured last year. Claims are C(1), C(2), ... C(N) -- these are the amounts of money that car drivers asked for as compensation for damages in an accident. In order to understand and predict what might happen next year, the insurance company want to create a MODEL for these claims. The idea is to think of these claims as the outcomes of a random variable with a given distribution. A commonly used distribution for this purpose is the Gamma Distribution. Given data on claims, we would like to find out WHICH gamma distribution best fits this data. One way to do this is to draw the HISTOGRAM, or the fitted DENSITY for the data, and then find the best matching Gamma Density.

An artificial, over-simplified EXAMPLE, to clarify the concepts:

Suppose that in January 2016, 20 patients receive health benefits from Pak Qatar Takaful PQT. The amount of these benefits is listed in the sheet named Sim(ulated) Claims Data and Density within EXCEL file GammDemo attached below. For example, Patient 1 received 14.7 thousand rupees as benefits, while Patient 2 received 19.3 thousand rupees. The amount of the claim is listed for all twenty patients. In real data sets, there would be thousands of claims, but we keep the data set down to 20 points for ease of understanding.

FIRST STEP: is to create an estimated DATA DENSITY function. This is pictured in the graph on same sheet. This graph is estimated using Kernel Density Estimation, with window size listed in CELL $D$1. You can vary the window size to see how the graph changes. We have chosen window size 3 as suitable for this data set. For details on how the Kernel Density Estimation is done, see the link to Lecture 8 on Density Estimation in the earlier course Intro Stats for Muslims below.

SECOND STEP: It is not necessary to understand the FIRST STEP, for those who do not know about Kernel Density Estimates. All we need to know is that we have an ESTIMATED density function based on the data. Our goal is to find the BEST FITTING Gamma Density function VISUALLY -- NOT computationally. For this purpose, we COPY the density data into the second column of the Sheet labelled Fitting Gamma. The first column is the X-coordinates. The Third Column just computed a GAMMA density evaluated at the X-value in the first column, using EXCEL built in function. The Shape and Scale Parameters are given in cells E1 and E2.

GOAL: BY VARYING cells E1 and E2, different gamma densities are generated. Try to find values which create some sort of a reasonable match between the data density and the theoretical gamma density. Remember that there will never be a PERFECT match, because data is RANDOMLY GENERATED, so there is much random variation, especially in a small sample size like 20. ESTIMATION is the name of the process by which we try to find the best gamma density which fits the data. More discussion of this point in the next slide.

Lecture 8: Estimation of Data Density — Shows how to create Kernel Density estimates for data sets, using EXCEL methods, via Data Tables