statrefs home‎ > ‎Main‎ > ‎Methods‎ > ‎Regression‎ > ‎Regression Supporting Concepts‎ > ‎

Regression Fallacy



In standard regression models, Y is a random variable while X is not a random variable.  The values of X (Xi, for i = 1 to n) are considered to be constants.

If both Y and X are random variables, then the relationship is better described as a bivariate distribution.  There are some similarities between this and the standard regression model.


 
From The Six Sigma Practitioner's Guide to Data Analysis (Wheeler) chapter 8.


When X and Y are [both] Random Variables

"The regression fallacy occurs when someone tries to use the major axis [of an ellipse superimposed on a scatterplot] to relate the values of X to the values of Y.  While this may be visually appealing, it is wrong.  [...] the conditional distributions of Y given X are not centered on the major axis.  They are instead centered on the line connecting the two points on the ellipse having vertical tangents, which is the regression line of Y as a function of X.  Thus, the regression of Y upon X provides an unbiased estimate of mean values for Y that occur in conjunction with a particular value of X. 

Likewise, the regression of X as a function of Y is the line that connects the two points on the ellipse having horizontal tangents.  This line is an unbiased estimate of the mean values for X that occur in conjunction with a particular value of Y.  The conditional distributions of X given Y are not centered on the major axis of the ellipse.  The major axis does not describe the behavior of either variable as a function of the other variable."

 



Comments