Overview
Not a single technique but a family of methods for analyzing a set of observed variables (the data matrix X)
This handout focuses on exploratory factor analysis. But you may also want to look up confirmatory factor analysis.
Two basic branches in the exploratory family tree: defined factors (aka principal components) and inferred factors (aka common factor analysis or classical factor analysis)
In principal components, we define new variables (factors), which are linear combinations of our observed variables, that summarize our input data, much like a stock market index summarizes the whole market.
the focus is on expressing the new variables (the principal components) as weighted averages of the observed variables
the factors (properly called factor scores) have the same order (number of values) as the original variables.
In common factor analysis, we infer the existence of latent variables that explain the pattern of correlations among our observed variables
the focus is on expressing the observed variables as a linear combination of underlying factors.
In both approaches, the factors are defined as linear combinations of the variables, and the variables are decomposed as linear combinations of the factors. Weird but true.
Two basic outputs from factor analysis: a set of column (variable) scores called factor loadings (each factor loading has as many values as there are variables in the data matrix), and a set of row (case) scores called factor scores (each factor score has as many values as there are cases in the data matrix).
Principal Components
Given as input a rectangular, 2-mode matrix X whose columns are seen as variables, the objective of principal components is to create a new variable (called a factor score or principal component) that is a linear combination of the input variables, such that the sum of squared correlations between the principal component (factor) and each of the original variables is maximized.
Actually, it is to create an ordered set of principal components such that the first principal component explains as much variance in the original variables as possible, and then the second component explains as much of the residual variance not explained by the first as possible, and so on until all variance is accounted for.
Since the observed variables may be highly intercorrelated (i.e., share variance), what we are doing is just "collecting together" the shared bits into "components". It doesn't really change anything, it just reallocates things. Think about 10 pieces of luggage of different sizes that are filled with stuff in such a way that each bag is approximately the same weight. Note that if you were to decide to take only 8 bags with you, you would leave behind 20% of your stuff. Now reallocate the contents so that the biggest bags are stuffed to the maximum. This will mean that some of the other bags will be much emptier. Now if you took only 8 bags (the 8 fullest ones) you might well leave behind much less than 20% of your stuff. In fact, it might be that the 8 biggest bags were enough to hold all of your stuff.
We solve this reallocation problem by "factoring" the correlation matrix between the variables. That is, we compute the correlation matrix, and then use SVD or other means to extract the eigenvectors and eigenvalues.
the eigenvectors (multiplied by the square roots of their eigenvalues) are called factor loadings, and these are the correlations of each variable with each factor (aka principal component)
The sum of the squared loadings of each variable with a given factor (the column sum of the squared loadings matrix) will equal the factor's eigenvalue. Hence the eigenvalue summarizes how well the factor correlates with (i.e., summarizes or can stand in for) each of the variables. It is literally the amount of variance accounted for (since correlation squared is variance accounted for).
The sum of the squared loadings for each variable across the factors (the row sums of the squared loadings matrix) is defined as the variable's communality. It tells you how much of the variable's variance is captured by the factors. In principal components, the communality of each variable should be 1.0 unless you have chosen to throw away some of the factors (as when we keep only the bigger factors).
In principal components (but not in common factor analysis), we can compute the actual factor scores as linear combinations of the original variables, where the coefficients are the factor loadings. So F = XL, where F is the set of factor scores, X is the matrix of original variables (standardized), and L is the factor loadings matrix.
Common Factor Analysis
Given as input a rectangular, 2-mode matrix X whose columns are seen as variables, the objective of common factor analysis is to decompose ("factor") the variables in terms of a set of underlying "latent" variables called factors that are inferred from the pattern of correlations among the variables.
The underlying factors are the "reason for" the observed correlations among the variables. That is, we assume that correlations among the variables are due to the fact that each variable is correlated with the underlying factors. So, if we simplify and assume there is just one underlying factor for a given set of variables, the idea is to see whether we can explain the observed correlation r(Y,Z) between two variables Y and Z as a function of the extent to which each is correlated with an unseen third variable, the factor F. That is, r(Y,Z) = r(Y,F)*r(Z,F). This is known as Spearman's fundamental theorem of factor analysis. If there are two underlying factors, then the correlation between two variables is due to the sum of their correlations with each of the latent factors, like this: r(Y,Z) = r(Y,F1)*r(Z,F1) + r(Y,F2)*r(Z,F2) plus error.