Foundations of Manifold Learning

本課程網頁已於 107-2 學期結束後停止維護,並移除大部分課程講義連結。

流形學習

流形學習(manifold learning, ML) 廣義來說包含了所有和幾何相關的機器學習理論[1],其中主要的一個分支是低維嵌入(low dimensional embedding),目的是做資料降維(dimensionality reduction)、視覺化(data visualization)或分群(clustering),這部分和統計的多變量分析比較相關[2]。統計中最基本的降維方法叫做主成分分析(Principal component analysis, PCA),大多數多變量分析的課本都有提到,或者可以參考專書[3]。另一個常用於“相對性”資料的方法則是 multidimensional scaling(MDS)[4]。這兩種方式都是在指定的低維歐氏空間中盡可能重建原始資料分佈。

2000 年時,同一期《Science》上出現兩個新的 ML 演算法:Isomap[5] 與 Locally Linear Embedding(LLE)[6]。之後陸續有 Laplacian eigenmap(LE)[7]、Diffusion map(DM)[8]、t-SNE[9] 等方法。這些新方法和以往最大的差異在於假設資料具有流形的結構,因此採用非線性的嵌入法來研究資料的型態,亦即"學習流形(learn the manifold)"。可以參考一些介紹性的文章 [10,11,12] 及 [12] 豐富的參考資料。

本課程的目的是介紹這些方法的數學原理,以及理論上的困難與問題,為進一步的探究做準備。

我們假設學生已經修過微積分與線性代數。


[1] Manifold Learning Theory and Applications, by Yunqian Ma (Editor), Yun Fu (Editor)

[2] Modern Multivariate Statistical Techniques - Regression, Classification, and Manifold Learning. Alan J. Izenman. especially Chapter 7,13,16

[3] Principal component analysis. I. T. Jolliffe.

[4] Modern Multidimensional Scaling - Theory and Applications. I. Borg and P. J. F. Groenen, 2005. Or Chapter 13 in  [2].

[5] http://web.mit.edu/cocosci/isomap/isomap.html

[6] https://cs.nyu.edu/~roweis/lle/

[7] Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, by  M. Belkin and P. Niyogi. 

[8] Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators, by Nadler, Lafon and Coifman.

[9] Visualizing Data using t-SNE, by van der Maaten and G. Hinton. 

[10] An introduction to diffusion maps, by J. de la Porte, B. M. Herbst, W. Hereman, S. J. van der Walt.

[11] Spectral methods for dimensionality reduction, by L. Saul, K. Weinberger, F. Sha, J. Ham, D. Lee.

[12] Dimensionality Reduction: A Comparative Review, by L. van der Maaten, E. Postma, J. van den Herik.

課程參考資料

除了第一手論文之外,學生主要可以參考上述的

[2] Modern Multivariate Statistical Techniques - Regression, Classification, and Manifold Learning. Alan J. Izenman.

以及 

Geometric Structure of High-Dimensional Data and Dimensionality Reduction. Jianzhong Wang. 2012.  

關於矩陣計算以及 Matlab 的部分可以參考

Matrix Algebra Useful for Statistics. Shayle R. Searle and André I. Khuri. 2017

課程內容

本課程涵蓋三個主題:流形學習演算法的數學推導、幾何、綜合探索。

如果先花三四週講預備知識,再介紹流形學習法,這樣似乎會很無趣。所以預計交錯進行,每提出一個流形學習演算法,就來看看需要用到什麼數學。實際進度紀錄如下:


流形學習基礎課程介紹 (1).pptx
0218講義.pdf
PCA1.pdf
0225講義.pdf
JLin_Note.pdf

參考資料

[1] 魏丞偉 MLLE: Modified Locally Linear Embedding Using Multiple Weights, Zhenyue Zhang and Jing Wang. 

[2] 林鋐儒 Global versus local methods in nonlinear dimensionality reduction 或參考 Sparse multidimensional scaling using landmark points, both by Vin de Silva and Joshua B. Tenenbaum. 

[3] 陳俊宇 MVU/SDE Algorithms for manifold learning  

[4] 葉永傑 Generalized Non-metric Multidimensional Scaling, S. Agarwal, J. Wills, L. Cayton, G. Lanckriet, D. Kriegman, S. Belongie. (Semidefinite programming) 或可看  Robust Euclidean Embedding, L. Cayton and S. Dasgupta. (Subgradient algorithm) 

[5] 陳湘元 Visualizing Data using t-SNE, by van der Maaten and G. Hinton. 

[6] 陳彥伍 Locally Linear Embedding (LLE) for MRI based Alzheimer’s Disease Classification, X. Liu, D. Tosun, M. W. Weiner, N. Schuff, and ADNI.

[7] 蔡承翰 Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators, by Nadler, Lafon and Coifman.  

[8] 閻紹華 Perron-Frobenius Theorem in Matrix Analysis

[9] 翁偉倫 Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment, by Zhenyue Zhang and Hongyuan Zha.

[10] 蔡秉承 Taking all positive eigenvectors is suboptimal in classical MDS, J. Tsang and R. Pereira.

其他可參考主題:

[11] HLLE: Explain Donoho-Grimes's output / compare with Maaten's one.

[12] Shamap: Shape-based Manifold Learning, Fenglei Fan, Hongming Shan and Ge Wang.

[13] dimension detecting

[14] LLE with distance matrix

[15] Non-linear Dimensionality Reduction: Riemannian Metric Estimation and the Problem of Geometric Recovery, D. Perrault-Joncas and M. Meila

[16] Riemannian Manifold Learning for Nonlinear Dimensionality Reduction, Tony Lin, Hongbin Zha, and Sang Uk Lee

[17] Geometric Network Comparisons, Dena Asta and Cosma R. Shalizi

[18] A remark on global positioning from local distances, A. Singer

[19] Polynomial PCA and kernel PCA

[20] MDS, MVU in view of kernel

[21] Matlab: HLLE-比較先前用過的 Donoho 與 Grimes 的程式碼Maaten 的 差異