Goodness-of-fit Test

The goodness-of-fit test compares the observed values in the training data set and the expected values obtained from the model to be tested.

A goodness-of-fit statistic tests the following hypothesis:

  H0 : the proposed model M0 fits

  HA : the model M0 does not fit (or, some other model MA fits, A for alternative)

Most often, the observed data represent the fit of the saturated model, which is the most complex model matches all the observed values exactly with one parameter for each of the observed value (the over fitted model). The proposed model M0 is a simpler model with less parameters. The rationale behind model fitting is the assumption that the complex saturated model may be represented by a simpler model and the goodness-of-fit test is applied to corroborate the assumption. 

There are different statistics developed to measure the goodness-of-fit. Pearson, deviance, Wald, Score, etc. 

The Pearson goodness-of-fit statistic is: 

         X2 = Sum[(Oj - Ej)2/Ej]  

Oj is the jth observed value and Ej is the jth expected value.

For example, assuming we are expecting 500 females and 500 males in a group of 1000 people, but the observed count of people in each gender category is 400 and 600. So the Pearson goodness-of-fit statistic is 

(400-500)2/500 + (600-500)2/500

This is also called Pearson Chi-Square test statistic.

The Deviance statistic is:

        G2=Sum[Ojlog(Oj/Ej)2] = 2 Sum[Ojlog(Oj/Ej)]

It's basically summing the log of the square of Oj/Ej

In some texts, G2 is also called the likelihood-ratio test statistic, for comparing the likelihood of two models (the fitted model and the saturated model).

The likelihood-ratio test statistic, compares the loglikelihoods under H0 (the fitted model) and loglikelihood under HA (the saturated model). The H0 model is what we propose / estimate (with less parameters). By comparing the likelihood-ratio we can decide to reject H0 or not.

        G2 = -2 Log (l0/l1) = -2(L0 - L1)

l0 and l1 are the likelihood of the two models. H0 is the null hypothesis and HA is the alternative hypothesis.

According to Wilk's Theorem, the likelikhood-ratio statistic has something to do (equals to or sth else) with the Pearson goodness-of-fit statistic. So they are both used to assess a model.

If every O equals E, the both X2 and G2 are zero, which means the model fits perfectly. A large value of X2 or G2 means that the data do not agree well with the assumed model M0 (for H0).

When the sample size is large enough X2 and G2 tend to be approximately equal, so it's good to compute both to see if they are similar.

The distributions of X2 and G2 approach Chi-Squared distribution with DF (degree of freedom) = k-1 where k is the number of the predicting categories. E.g.  Predicting the number of females and males, so there are two predicting categories and the k = 2-1 =1. 

The idea of using Chi-Squared distribution (probability density) here is to obtain the probability of the model having a deviation (larger/smaller) than a confidence level. If the deviation is to big then the model is rejected.

Chi-Square distribution

is a probability distribution of the sum of the squares of k independent variables.

Chi-Square is the distribution of Sum(X2j) , j=[1,k]

The k here is the degree of freedom. The following are the probability density functions with different k.

To define the confidence level by which we can reject or accept a model, we use the Alpha Level which is also called the Significance level. The Alpha Level is normally 0.05 or 0.01 or sth else depending on empirical study. 

Consider the example in below, if the Alpha level is set to 0.05, the Alpha area is the furthest area below the probability density curve with the area = 0.05. I.e. the area to the right of the blue line and below the red curve is 0.05.

The blue line corresponds to a deviation (X2 or G2) of 12, so there is only 0.05 probability that the deviation can be larger than 12.

The (X2 or G2) here above is 10, the P Value is then defined as the area to the right of the green line.

As the deviation 10 is smaller than 12, the model is accepted. 

Normally compare the P value with the Alpha Level, if the P value is bigger then accept, otherwise reject.

Sometime if the (X2 or G2) is too small, we may also reject the model as it fits too well, ie. the data may have been fabricated.

Read this. Using the Hypothesis Test differently.

Hypothesis Test is a very basic idea in statistics so one can have a very different H0 and HA setttings. 

The above Pearson and Deviance statistics examples are based on comparing a proposed model and the observed data.

When doing model fit (e.g logistic regression) in SAS, we may want to test if a variable (predictor variable) improves the goodness-of-fit (significantly).

To do that, we compare the likelihood score of the model without the variable and the likelihood score of the model with the variable.

The log(likelihood 1/likelihod 2) measures the significance the variable affects the model.

A model can be expressed as Y = a + bx1 + cx2, ...

To exclude a variable xi from the model, simply set the coefficient (e.g b, c) to zero. To include the variable the coefficient is nonzero.

The comparison of the two models is referred to as Hypothesis Test in statistics.

For example if we want to test the variable x1.

H0 (Null hypothesis): b = 0, so x1 wouldn't be included in the model

HA (Alternative):     b !=0

In SAS the Wald Chi-square test compares the likelihood of the H0 and HA models for each of the variables. (Actually the variance between the two models).

Similar to the Pearson and Deviance Chi-squiare statistics, the Wald test score follows the Chi-Square distribution.

If the Wald test score is larger than a particular value (e.g. the score determined by Alpha=0.05), then the variance between H0 and HA is

too big, so excuding the variable x1 significantly reduce the Goodness Of Fit (likelihood). Wald test score > a value is identical to the p-value < 0.05.

SO, if the Wald test score is too high (ie. p-value < 0.05, ie. confidence level > 95%) we can reject H0 and prove that x1 is a good variable to the model.