Foundations of Manifold Learning

本課程網頁已於 107-2 學期結束後停止維護，並移除大部分課程講義連結。

流形學習

流形學習(manifold learning, ML) 廣義來說包含了所有和幾何相關的機器學習理論[1]，其中主要的一個分支是低維嵌入(low dimensional embedding)，目的是做資料降維(dimensionality reduction)、視覺化(data visualization)或分群(clustering)，這部分和統計的多變量分析比較相關[2]。統計中最基本的降維方法叫做主成分分析(Principal component analysis, PCA)，大多數多變量分析的課本都有提到，或者可以參考專書[3]。另一個常用於“相對性”資料的方法則是 multidimensional scaling(MDS)[4]。這兩種方式都是在指定的低維歐氏空間中盡可能重建原始資料分佈。

2000 年時，同一期《Science》上出現兩個新的 ML 演算法：Isomap[5] 與 Locally Linear Embedding(LLE)[6]。之後陸續有 Laplacian eigenmap(LE)[7]、Diffusion map(DM)[8]、t-SNE[9] 等方法。這些新方法和以往最大的差異在於假設資料具有流形的結構，因此採用非線性的嵌入法來研究資料的型態，亦即"學習流形(learn the manifold)"。可以參考一些介紹性的文章 [10,11,12] 及 [12] 豐富的參考資料。

本課程的目的是介紹這些方法的數學原理，以及理論上的困難與問題，為進一步的探究做準備。

我們假設學生已經修過微積分與線性代數。

[1] Manifold Learning Theory and Applications, by Yunqian Ma (Editor), Yun Fu (Editor)

[2] Modern Multivariate Statistical Techniques - Regression, Classification, and Manifold Learning. Alan J. Izenman. especially Chapter 7,13,16

[3] Principal component analysis. I. T. Jolliffe.

[4] Modern Multidimensional Scaling - Theory and Applications. I. Borg and P. J. F. Groenen, 2005. Or Chapter 13 in [2].

[5] http://web.mit.edu/cocosci/isomap/isomap.html

[6] https://cs.nyu.edu/~roweis/lle/

[7] Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, by M. Belkin and P. Niyogi.

[8] Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators, by Nadler, Lafon and Coifman.

[9] Visualizing Data using t-SNE, by van der Maaten and G. Hinton.

[10] An introduction to diffusion maps, by J. de la Porte, B. M. Herbst, W. Hereman, S. J. van der Walt.

[11] Spectral methods for dimensionality reduction, by L. Saul, K. Weinberger, F. Sha, J. Ham, D. Lee.

[12] Dimensionality Reduction: A Comparative Review, by L. van der Maaten, E. Postma, J. van den Herik.

課程參考資料

除了第一手論文之外，學生主要可以參考上述的

[2] Modern Multivariate Statistical Techniques - Regression, Classification, and Manifold Learning. Alan J. Izenman.

以及

Geometric Structure of High-Dimensional Data and Dimensionality Reduction. Jianzhong Wang. 2012.

關於矩陣計算以及 Matlab 的部分可以參考

Matrix Algebra Useful for Statistics. Shayle R. Searle and André I. Khuri. 2017

課程內容

本課程涵蓋三個主題：流形學習演算法的數學推導、幾何、綜合探索。

流形學習演算法的數學推導：介紹各種流形學習法，包括線性與非線性方法。
幾何：介紹維度、嵌入、黎曼流形、測地線、曲率、Laplace-Beltrami 算子、一階變分、調和座標。
綜合探索：編寫簡單的 matlab 程式、討論演算法的優劣、利用幾何知識來設計不同的演算法。

如果先花三四週講預備知識，再介紹流形學習法，這樣似乎會很無趣。所以預計交錯進行，每提出一個流形學習演算法，就來看看需要用到什麼數學。實際進度紀錄如下：

Week 1: Introduction to manifold learning
- What is manifold learning? Methods and problems
- Matrix computation, Gram matrix and centering matrix

流形學習基礎課程介紹 (1).pptx

0218講義.pdf

Week 2: PCA
- Statistics: variance, covariance, and their matrix expressions
- PCA - looking for maximum variance
- maximum is achieved by the largest eigenvalue
- [理2004] Matlab: matrix operations

PCA1.pdf

0225講義.pdf

Week 3: PCA
- PCA: an example [ppt]
- PCA - looking for minimum energy
- Matrix decomposition and matrix calculus
- [理2004] Matlab: PCA

Week 4: Surfaces in R^3
- Gauss map and the second fundamental form
- Gaussian curvature
- Riemannian metric
- [理2004] Matlab: better PCA; function (by Yen)

Week 5: MDS
- Classical MDS
  - Euclidean MDS is PCA
  - non-Euclidean MDS problem
- The concept of manifold
- [理2004] Matlab: import data(swiss roll); call function(HLLE)

Data Visualization with Multidimensional Scaling （後半段較進階）

Week 6: Metric and isometry
- metric MDS
- Isomap, REE, MVU(SDE)
- [理2004] Matlab: Picture compression by PCA (by Yen)

Week 7: Computation of curvatures
- curves and turning angle
- surfaces and Euler characteristic
- second fundamental form and deviation

Week 8 : Locally Linear Embedding
- LLE algorithm
- matrix calculus
- Connection and Christoffel symbols
- [理2004] Matlab: use spectrum to check the dimension of two coplanar circles in R^3 is 2. scree plot (by Yen)
- 0408流形學習code

Week 9: Midterm Exam
- 紙筆測驗 1.5 hr
- 挑選期末報告主題

Week 10(4/22): Geometric Laplacian
- Levi-Civita connection
- Laplace-Beltrami operator

Week 11(4/29): Prof. Lin's "Graph Laplacian"
- graph Laplacian
- graph connectivity
- graph partition
- spectral clustering

參考資料

A Tutorial on Spectral Clustering, Statistics and Computing, U. von Luxburg
Fiedler’s Theory of Spectral Graph Partitioning, B. Slininger

JLin_Note.pdf

Week 12(5/6): Discussions on Final Presentations (individual presentation)
- Consistency problem & metric design
- 介紹 Workshop: Geometry of Big Data 的內容
- [理2004] Matlab: MDS

Week 13(5/13): Discretization of Laplacian
- the first variation of energy
- difference quotient - weighted Laplacian
- heat kernel - point cloud Laplacian
- [理2004] 用 Python 展示 graph Laplacian 的課程內容 (by J. Lin)

Week 14(5/20): Hessian LLE
- examples of chart
- Local tangent space & PCA/SVD
- Taylor & Hessian
- Hessian LLE
- theoretical proof: normal coordinates vs tangent coordinates

參考資料

Robust Hessian Locally Linear Embedding Techniques for High-Dimensional Data, Xianglei Xing, Sidan Du, and Kejun Wang
An improved local tangent space alignment method for manifold learning, Peng Zhang, Hong Qiao, and Bo Zhang（改變 LTS 的中心點取法）

Week 15: Replaced by Dr. Yi-Sheng Wang's "Topological Data Analysis" on 5/24(Friday)

Week 16, 17(6/3,10): 學生報告

[1] 魏丞偉 MLLE: Modified Locally Linear Embedding Using Multiple Weights, Zhenyue Zhang and Jing Wang.

[2] 林鋐儒 Global versus local methods in nonlinear dimensionality reduction 或參考 Sparse multidimensional scaling using landmark points, both by Vin de Silva and Joshua B. Tenenbaum.

[3] 陳俊宇 MVU/SDE Algorithms for manifold learning

[4] 葉永傑 Generalized Non-metric Multidimensional Scaling, S. Agarwal, J. Wills, L. Cayton, G. Lanckriet, D. Kriegman, S. Belongie. (Semidefinite programming) 或可看 Robust Euclidean Embedding, L. Cayton and S. Dasgupta. (Subgradient algorithm)

[5] 陳湘元 Visualizing Data using t-SNE, by van der Maaten and G. Hinton.

[6] 陳彥伍 Locally Linear Embedding (LLE) for MRI based Alzheimer’s Disease Classification, X. Liu, D. Tosun, M. W. Weiner, N. Schuff, and ADNI.

[7] 蔡承翰 Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators, by Nadler, Lafon and Coifman.