Exploratory Factor Analysis

Overview

    • Not a single technique but a family of methods for analyzing a set of observed variables (the data matrix X)

    • Two basic branches in family tree: defined factors (aka principal components) and inferred factors (aka common factor analysis or classical factor analysis)

    • In principal components, we define new variables (factors), which are linear combinations of our observed variables, that summarize our input data, much like a stock market index summarizes the whole market.

        • the focus is on expressing the new variables (the principal components) as weighted averages of the observed variables

        • the factors (properly called factor scores) have the same order (number of values) as the original variables.

    • In common factor analysis, we infer the existence of latent variables that explain the pattern of correlations among our observed variables

        • the focus is on expressing the observed variables as a linear combination of underlying factors.

    • In both approaches, the factors are defined as linear combinations of the variables, and the variables are decomposed as linear combinations of the factors. Weird but true.

    • Two basic outputs from factor analysis: a set of column (variable) scores called factor loadings (each factor loading has as many values as there are variables in the data matrix), and a set of row (case) scores called factor scores (each factor score has as many values as there are cases in the data matrix).

Principal Components

    • Given as input a rectangular, 2-mode matrix X whose columns are seen as variables, the objective of principal components is to create a new variable (called a factor or principal component) that is a linear combination of the input variables, such that the sum of squared correlations between the principal component (factor) and each of the original variables is maximized.

    • Actually, it is to create an ordered set of principal components such that the first principal component explains as much variance in the original variables as possible, and then the second component explains as much of the residual variance not explained by the first as possible, and so on until all variance is accounted for.

        • Since the observed variables may be highly intercorrelated (i.e., share variance), what we are doing is just "collecting together" the shared bits into "components". It doesn't really change anything, it just reallocates things. Think about 10 pieces of luggage of different sizes that are filled with stuff in such a way that each bag is approximately the same weight. Note that if you were to decide to take only 8 bags with you, you would leave behind 20% of your stuff. Now reallocate the contents so that the biggest bags are stuffed to the maximum. This will mean that some of the other bags will be much emptier. Now if you took only 8 bags (the 8 fullest ones) you would leave behind much less than 20% of your stuff.

    • We solve this reallocation problem by "factoring" the correlation matrix between the variables. That is, we compute the correlation matrix, and then use SVD to extract the eigenvectors and eigenvalues.

        • the eigenvectors (multiplied by their eigenvalues) are called factor loadings, and these are the correlations of each variable with each factor (principal component)

        • The sum of the squared loadings of each variable with a given factor (the column sum of the squared loadings matrix) will equal the factor's eigenvalue. Hence the eigenvalue summarizes how well the factor correlates with (i.e., summarizes or can stand in for) each of the variables. It is literally the amount of variance accounted for (since correlation squared is variance accounted for).

        • The sum of the squared loadings for each variable across the factors (the row sums of the squared loadings matrix) is defined as the variable's communality. It tells you how much of the variable's variance is captured by the factors. In principal components, the communality of each variable should be 1.0 unless you have chosen to throw away some of the factors (as when we keep only the bigger factors).

        • The loadings can also be seen as a formula for rewriting the variables in terms of the factors: For example if we write Z1 = b1F1 + b2F2 + b3F3 ..., i.e., expressing variable Z1 as a linear combination of factors, then the weights b1, b2, b3, etc will turn out to be the factor loadings. There's a different equation for each variable Z.. In matrix form, we can express the collection of equations easily as X = FB, where X is the original data (whose columns are the various Z variables), F is the matrix of factor scores (which we haven't discussed yet) and B is the factor loadings matrix (which is just D2V in the singular value decomposition of the correlation matrix).

        • Since X=FB=FD2V, we can do some simple matrix algebra to obtain a formula for constructing the factor scores. Multiplying both sides by V', we get XV' = FD2VV' = FD2 and post-multiplying both sides by D-2, we get XV'D-2 = F. This tells us that we can also use the factor loadings to compute the factor scores

Common Factor Analysis

    • Given as input a rectangular, 2-mode matrix X whose columns are seen as variables, the objective of common factor analysis is to decompose ("factor") the variables in terms of a set of underlying "latent" variables called factors that are inferred from the pattern of correlations among the variables.

    • The underlying factors are the "reason for" the observed correlations among the variables. That is, we assume that correlations among the variables are due to the fact that each variable is correlated with the underlying factors. So, if we simplify and assume there is just one underlying factor for a given set of variables, the idea is to see whether we can explain the observed correlation r(Y,Z) between two variables Y and Z as a function of the extent to which each is correlated with an unseen third variable, the factor F. That is, r(Y,Z) = r(Y,F)*r(Z,F). This is known as Spearman's fundamental theorem of factor analysis. If there are two underlying factors, then the correlation between two variables is due to the sum of their correlations with each of the latent factors, like this: r(Y,Z) = r(Y,F1)*r(Z,F1) + r(Y,F2)*r(Z,F2).