Spurious Regression with Cross Sectional Data

Spurious regression occurs if

a. The regression between two variables X and Y indicates that there is significant relation relation between two variables whereas in fact there is no theoretical justification of the relation between two;

b. The regression between two variables X and Y indicates that there very strong relationship between two variables whereas in fact the relationship between variables is weak.

Spurious regression can be observed between any two data sets; however the chances of encounter with spurious regression are more in time series data. In fact the term spurious regression was used first for the cross sectional data, but with the passage of time, use of this term increased for the time series data. Now a days, the standard textbooks present spuriuous regression as a pure time series phenomena.

Common Reasons for Spurious Regression in Cross Sectional Data

a. If various non-homogeneous groups are taken together and a regression is estimated between two variables, there is possibility of spurious regression.

Look at the figure given above, this shows relation between variables X and Y. Three separate groups of data can be observed in the above scatter diagram, and within each group, there is very week correlation between X and Y (Correlation less than 15%). But if all three groups are taken together, the correlation rises to about 85%. The appropriate way to estimate regression between two variable is to take the three groups separately. But if these non-homogeneous groups are taken together, this gives misleading information about the strength of relation between X&Y.

2. Spurious regression may occur if one or more relevant variables are missing from a regression equation. For example if we take