S5 Understanding Models

We now discuss the insurance claims model of the previous slide in greater detail. The process of creating and using models is VASTLY MISUNDERSTOOD in conventional textbooks of statistics.

Misconception 1: Inference is based on DATA (alone). In fact, statistical inference depends BOTH on the data and on the model in a very important way. NEITHER part can be ignored. Data is objective but models are subjective, and inference MUST ALWAYS combine both objective and subjective elements. IT can NEVER be done in a PURELY OBJECTIVE way.

The first misconception arises from the idea that Statistical inference IS, or SHOULD BE, purely objective, and should not be based on subjective elements. That is why emphasis in conventional treatments is placed on DATA alone as the basis for inference.

Misconception 2: There exists a TRUE Model for the data, and ESTIMATION is about trying to FIND the TRUE model. Again this misconception arises from the same problem of trying to make everything objective. The idea that there is a TRUE model means that the model is out there, part of objective reality, and we are just trying to find out what the objective reality is. This HIDES the subjective nature of the process of modelling.

In fact, ALL MODELS are false -- they are just convenient over-simplifications of a complex reality. There is no TRUE model. Estimation is the process of trying to find a suitable model which works for applications. The method of estimation taught in the previous slide, where we just try to VISUALLY match a gamma density to an estimated data density function, IS JUST AS GOOD as many more sophisticated methods we will study. All methods are just guesses at a suitable model. The real test of suitability lies in APPLICATIONS. If the method works within the APPLICATION, then it is good. If the method does NOT work well, then we should look for better models.

APPLICATION: So in order to understand whether or not a Gamma model is useful (not true), we need to understand HOW the model is used. Having the data by itself does not tell us anything about what might happen in the next month. However, the company PQT needs to make plans for the future, to find out how much to charge customers, how much to pay its employees and so on. To do this, it needs to have some projections about what might happen in the future. Under reasonable assumptions, the number of claims N in a given month is a Poisson random variable. Given N, the amount of the claim is a Gamma Random Variable. After making these modelling assumptions, it is possible for the insurance company to make predictions about the size of the total claims and how much these will vary. These predictions are then cross-checked with reality, as real claims come in every month. If the model is adequate, then it is accepted. If the model makes bad predictions, then the company will discard the model and try to get a better model.