Lee - 1999 - Learning the parts of objects by non-negative matrix factorization

Citation

Lee H, Seung D. Learning the parts of objects by non-negative factorization. Nature. 1999 Oct;401:788-91. DOI

10 Word Summary

Decomposing neural signals with NMF gives a physiologically explained representation.

Abstract

Is perception of the whole based on perception of its parts? There is psychological and physiological evidence for parts-based representations in the brain, and certain computational theories of object recognition rely on such representations. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not parts-based, representations. Non-negative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. When non-negative matrix factorization is implemented as a neural network, parts-based representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.

Notes

    • Also look at Lee and Seung - Algorithms for Non-negative Matrix Factorization
    • Vector quantization (VQ)
    • Principle components analysis (PCA)
    • Non-negative matrix factorization (NMF)
V_{i\mu} \approx \Sigma W_{ia} H_{a\mu}
    • All recreate structure using the following decomposition
    • Constraints on W and H
      • VQ - H must be a unary vector. This means each new object must be represented entirely in W.
      • PCA - Columns of W must be orthonormal and rows of H must be orthogonal. This allows for an "eigenvector" representation. Positive and negative combinations are allowed for complex reconstruction of data.
      • NMF - Only constraint is that the elements of W and H must be positive. This results in a "sparse" coding of base features.
    • VQ is building things up as a prototype as opposed to NMF which is building things up from components.
    • PCA - looks kindof like a frequency decomposition. The first component looks like a mean, and the other components look like higher and higher frequency.
    • NMF solves for the coefficients of W and H using the following update rules:
W_{ia} \leftarrow W_{ia} \overset{}{\underset{\mu}{\sum}} \frac{V_{i\mu}}{\left(WH\right)_{i\mu}}H_{a\mu}
W_{ia} \leftarrow \frac{W_{ia}}{\overset{}{\underset{j}{\sum}}W_{ja}}
H_{a\mu} \leftarrow H_{a\mu} \overset{}{\underset{i}{\sum}} W_{ia} \frac{V_{i\mu}}{\left(WH\right)_{i\mu}}
      • Need to check and make sure that seed values for W and H do not have zeros, otherwise they will remain zero.
    • These are optimized by maximizing the function:
    • NMF in general models a set of visible variables V by using a set of "hidden" variables H factored into a set of W combinations.
      • VQ - has one-and-only-one hidden variable H for each V
      • PCA - allows combination of multiple hidden variables for each V, but allows for "unnatural" combinations due to negative valued bases vectors.
      • NMF - similar to PCA, but constrains combinations to summation.
        • Doesn't provide information about syntactic relationships
        • Complex data-sets might require hierarchical decomposition that is not explained by NMF
    • NMF performs "learning" and "inference" simultaneously.
      • Learning is the determination of bases from data.
      • Inference is the setting of weights of the hidden variables
    • NMF suits neurological data "physiologically" because:
      • Synapses are either inhibatory or excitatory but do not change sign.
      • Firing rates cannot be negative
F = \overset{n}{\underset{i=1}{\sum}} \overset{m}{\underset{\mu=1}{\sum}} V_{i\mu} \log \left(WH\right)_{i\mu} - \left(WH\right)_{i\mu}