Machine Learning

Post date: Nov 10, 2015 3:22:22 PM

MBA Course Notes to

MACHINE LEARNING

Dr. Dao NamAnh

LEARNING OUTCOMES

From robotics, speech recognition, and analytics to finance and social network analysis, machine learning has become one of the most useful set of scientific tools of our age. With this course we want to bring interested students and researchers from a wide array of disciplines up to speed on the power and wide applicability of machine learning. The ultimate aim of the course is to equip you with all the modelling and optimization tools you’ll need in order to formulate and solve problems of interest in a machine learning framework. We hope to help build these skills through lectures and reading materials which introduce machine learning in the context of its many applications, as well as by describing in a detailed but user-friendly manner the modern techniques from nonlinear optimization used to solve them.

ABOUT THE COURSE

Bài giảng Khóa học Thạc sỹ

HỌC MÁY

Ts. Đào NamAnh

MỤC ĐÍCH

Từ robot, nhận dạng giọng nói, phân tích tài chính và phân tích mạng xã hội, học máy đã trở thành một trong những công cụ khoa học hữu ích nhất hiện nay. Với khóa học này, chúng tôi muốn mang lại cho sinh viên và các nhà nghiên cứu quan tâm trong các ngành khác nhau khả năng khai thác sức mạnh và ứng dụng rộng rãi của máy học. Mục đích cuối cùng của khóa học là trang bị cho bạn với tất cả các công cụ mô hình hóa và tối ưu hóa cần thiết để xây dựng và giải quyết các vấn đề quan tâm trong máy học. Chúng tôihy vọng sẽ giúp xây dựng các kỹ năng thông qua các bài giảng và tài liệu đọc giới thiệu máy học cùng vớinhiều ứng dụng của nó, cũng như bằng cách mô tả một cách chi tiết và dễ hiểu, các kỹ thuật hiện đại từ tối ưu hóa phi tuyến để giải quyết bài toán.

GIỚI THIỆU MÔN HỌC

Data acquisition devices, computers and networks are generating data to an unprecedented level that postulates the development and utilization of powerful, robust and adaptive learning solutions in order to accomplish various challenging tasks such as pattern recognition, time series modeling, optimization, decision support, diagnosis, text mining, and multimedia searching etc. In this course, advanced methods in the context of dimension reduction, feature extraction and selection, clustering, and classification will be explored. Along with conventional machine learning algorithms, artificial neural networks and computational intelligence methods will also be discussed. The interrelationship between these methods will be addressed, and the mixture-of expert approach will be examined. We will also look at a few case studies on building powerful, intelligent data mining systems. The course is taught with formal and informal lectures and in-class discussion is encouraged. Students will give presentations based on selected research papers of interest. A major assessment component is a project that aims at developing an intelligent data analysis system for real-world problem solving.

PREREQUISITES

ĐIỀU KIỆN

A thorough understanding of Linear Algebra and Vector Calculus (e.g., students should be able to easily compute gradients/Hessians of a multivariate function), as well as basic understanding of the Python or MATLAB/OCTAVE programming environments.

TEACHING METHODS

PHƯƠNG PHÁP GIẢNG DẠY

20h lecturer, 30h exercise, 15h homework

COURSE OUTLINE

NỘI DUNG MÔN HỌC

    1. Introduction to Machine Learning
    2. Decision-Tree Learning
    3. Supervised Learning
    4. Linear Classifier
    5. Perceptron Learning
    6. Neuron Model
    7. Feature
    8. Multiclass
    9. Support Vector Machines
    10. Naïve Bayes
    11. Bayesian Learning
    12. Ensemble Methods
    13. Clustering
    14. Principal Component Analysis

GRADING

ĐÁNH GIÁ

The final grade will be determined based on regular homeworks, one midterm exam, and a Semester Project:

Homeworks: 20%

Midterm Exam: 30%

Semester Project: 50%

Hands-on design projects are the key component of the course. Team work is required for the projects.

RECOMMENDED TEXTS

TÀI LIỆU THAM KHẢO

The lectures will follow, in part, Tom Mitchell, Machine Learning, McGraw Hill, 1997. The more advanced material will be based on material the instructor will make available. Some interesting books for the advanced material include:

    • Kevin P. Murphy, Machine Learning: A Probabilistic Perspective, 2014
    • Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006
    • Hal Daumé III, A Course in Machine Learning.
    • Dave Kauchak, CS 451 - Machine Learning and Big Data, 2013
    • Pedro Domingos, CSE 446, Machine Learning, 2009
    • Raymond J. Mooney, CS 391: Machine Learning, 2007

INTERESTING LINKS

CÁC TRANG WEB HỮU ÍCH

Major links from Clayton Scott:

Books

    • Hastie, Friedman, and Tibshirani, The Elements of Statistical Learning, 2001
    • Bishop, Pattern Recognition and Machine Learning, 2006
    • Ripley, Pattern Recognition and Neural Networks, 1996
    • Duda, Hart, and Stork, Pattern Classification, 2nd Ed., 2002
    • Tan, Steinbach, and Kumar, Introduction to Data Mining, Addison-Wesley, 2005.
    • Scholkopf and Smola, Learning with Kernels, 2002
    • Mardia, Kent, and Bibby, Multivariate Analysis, 1979
    • Computational Statistics (online book)
    • Sutton and Barto, Reinforcement Learning: An Introduction, MIT Press, 1998.
    • Bertsekas and Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.

Other machine learning courses

Data repositories

Background

Matlab Software

    • CVX convex program solver by Stephen Boyd
    • YALMIP, a high-level Matlab interface to a variety of convex program solvers, such as SeDuMi
    • SeDuMi, for solving second order cone programs. Most if not all tractable convex programs can be cast as such.
    • LIBSVM, for support vector classification (including multiclass), regression, and one-class classification (novelty detection).

Conferences/Publications

Nearest Neighbors

    • The primary research area relating to nearest neighbor methods is the problem of storage, data reduction, and rapid calculation of nearest neighbors. A search on "nearest neighbor search" or "condensed nearest neighbors" or "editted nearest neighbors" will return a number of references.
    • Theory: Devroye, Gyorfi and Lugosi, A Probabilistic Theory of Pattern Recognition, 1996

Density Estimation

    • David Scott, Multivariate Density Estimation, 1992
    • Theory: Devroye and Lugosi, Combinatorial Methods in Density Estimation, 2001

Linear methods for classification

    • Hastie et al, Bishop, and Duda et al. all have chapters on LDA, logistic regression, and other linear classifiers.

Decision Trees

    • The first comprehensive treatment and still a standard reference: Brieman, Friedman, Olshen and Stone, Classification and Regression Trees, 1984
    • The other standard reference is Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.
    • A somewhat recent survey of research on decision trees: Sreerama K. Murthy: Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey. Data Min. Knowl. Discov. 2(4): 345-389 (1998)
    • Ripley has a nice chapter on decision trees -- probably the best place to start.

Error estimation

Boosting

Support Vector Machines

Clustering

    • K-means, EM for Gaussian mixture models, and hierarchical clustering: see the recommended texts, especially Hastie et al., Duda et al., and Bishop (although Bishop doesn't discuss hierarchical clustering). K-means is also known as the Lloyd-Max algorithm in the context of vector quantization.
    • EM was originally introduced by Dempster, A. P., Laird, N. M. Rubin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39, 1-38.
    • Spectral clustering: an excellent introduction to spectral clustering is the following: U. Luxburg, "A Tutorial on Spectral Clustering," Statistics and Computing 17(4), 395-416 (2007).

Dimensionality reduction

    • Principal components analysis: The book by Mardia, Kent and Bibby derives PCA for the population" case (the sample case being analagous) for both the maximum orthogonal variance perspective and the least squares linear approximation perspective. Note that PCA is also known as the Karhunen-Loeve transform (KLT).
    • Multidimensional scaling: The book by Mardia, Kent and Bibby has a clean and rigorous derivation of classical MDS, associated optimality properties, and connections to PCA. It also discusses nonmetric MDS methods.
    • The "majorization" approach to metric MDS via stress minimization is reviewed and analyzed by Jan de Leeuw, "Convergence of the Majorization Method for Multidimensional Scaling," Joumal of Classification 5:163-180 (1988)
    • Isomap
    • Local linear embedding (LLE)
    • Laplacian eigenmaps
    • Kernel PCA is covered in the book by Scholkopf and Smola, or see the original paper referenced therein.
    • Manifold learning resource page
    • Self-organizing maps, principal curves, and independent component analysis (ICA) may be reviewed in Hastie et al.
    • Factor analysis is treated in Mardia et al.
    • An Introduction to Variable and Feature Selection, an excellent survey and introduction to methods of variable section that appeared in Journal of Machine Learning Research 3 (2003) 1157-1182.
    • The following article describes extensive simulations for various learning algorithms combined with different feature selection methods, and offers some good intuition: Hua, J., Xiong, Z., Lowey, J., Suh, E., and E. R. Dougherty, Optimal Number of Features as a Function of Sample Size for Various Classification Rules, Bioinformatics, 21, No. 8, 1509-1515, 2005.

Nonlinear regression and Gaussian Processes