OMDS - Optimization Methods for Data Science
academic year 2022 - 2023
Timetable:
monday 8:30-12 Room A2, DIAG via Ariosto 25
thursday 13-15 Room A7, DIAG via Ariosto 25
Midterm exam: 3/04/2023 at 9:00 room A2
Classroom code of the course : 2blsla3
Link to the classroom course: https://classroom.google.com/c/NTkzNTk5Njc1ODEy?cjc=2blsla3
All the slides and notes can be found on the classroom webpage.
Lessons diary:
20/02/2023 (4hours) Introduction to the course. Machine learning and Optimization. Data Analytics. Applications of supervised learning. Empirical risk vs Expected risk. Overfitting and Underfitting. Tradeoff between bias and variance. Linear Regression and its extensions. Subset selection and Regularization techniques.
23/02/2023 (2hours) Vapnick theory (Vapnik Chervonenkis bound). VC confidence and VC dimension. Structural Risk minimization.
27/02/2023 (4 hours) Introduction to Neural Networks. Neurons and biological motivation. Linear threshold units. The Perceptron algorithm with convergence proof. Classification of linearly separable patterns. Limits of the perceptron. Hyperparameters and parameter optimization. Feedforward Neural Network. The training optimization problem: unconstrained non convex optimization.
2/03/2023 (2 hours) Convex function and convex set with properties. Theorem on the properties of convex functions: any local minimum is a global minimum (unique if f is strictly convex) and the set of global minima is a convex set. Gradient and hessian of a function. Taylor expansion of 1st and 2nd order and its use in optimization.
06/03/2023 (4 hours) Existence conditions: Weierstrass theorem. Level sets of a function. Coercive functions. Examples. Descent directions, negative curvature directions. Optimality conditions for Unconstrained Optimization: necessary first order and second order optimality conditions. The convex case. Examples. The quadratic case.
13/03/2023 (4 hours) Algorithms for unconstrained optimization: general scheme, convergence properties. Example of converging sequences. Gradient method. Exact linesearch. Fixed stepsize gradient method: convergence theorem (the proof is not included in the program). Exercises.
16/03/2023 (2 hours) Multi Layer perceptron networks: choice of the architecture and training. Optimization viewpoint.
20/03/2023 (4 hours) Batch gradient method for MLP training. Backpropagation formulas. Online gradient method: incremental gradient and stochastic gradient. Confusion region and practical behavior
23/03/2023 (2 hours) Practical lesson: python libraries and implementation of gradient method for minimizing a non linear function
27/03/2023 (4 hours) Convergence of online methods: assumptions needed for proving convergence of IG and SGD. Practical choices: mini batch size, early stopping different implementations. Acceleration strategies: Nesterov momentum, ADAGRAD, RMSProp, ADAM. An application of MLP networks: representation of gas separation through membranes using ML.
30/03/2023 (2 hours) RBF networks: theoretical properties and training algorithms.
03/04/2023 (3 hours) Mid Term Exam
13/04/2023 (2 hours) Practical lesson on unconstrained Optimization
17/04/2023 (4 hours) Definition of the training problem for SVMs. Constrained Optimization: optimality conditions. Examples.
20/04/2023 (2 hours) Hints on the conditional gradient method. Reformulation of the SVM problem.
27/04/2023 (2 hours) SVM problem: existence and uniqueness and optimality conditions. Introduction to Wolfe duality.
05/05/2023 (2 hours) Wolfe duality in the case of quadratic optimization
08/05/2023 (4 hours) Application of the Wolfe duality to SVMs: hard SVMs and Soft Margin SVMs. Exercises.
11/05/2023 (2 hours) Nonlinear SVM and kernel trick. Examples.
15/05/2023 (4 hours) Optimality conditions for the dual. Decomposition methods: algorithm SMO most violating pair
18/05/2023 (2 hours) Practical lesson on SVMs
22/05/2022 (4 hours) Summary on SVMs, nu classification, bias effect, SVMs for regression, exercises.
25/05/2023 (2 hours) Final term exam
29/05/2023 (4 hours) Facultative lesson on tensor flow
Syllabus of the course:
Introduction:
Definition of learning systems.
Goals and applications of machine learning (classification and regression).
Basics on statistical learning theory (Vapnik Chervonenkis bound). Definition of the learning optimization problem.
Trade-off between complexity and errors: Underfitting and Overfitting. Definition of the learning problem.
Hyperparameters and parameter optimization
Training set, test set, validation set.
The k-fold cross-validation procedure.
Neurons and biological motivation. Linear threshold units. The Perceptron algorithm. Classification of linearly separable patterns. Limits of the perceptron
Part 1: Neural Networks.
Multi-Layer Feedforward Neural Networks. Shallow and Deep networks. The training optimization problem: unconstrained non convex optimization.
Necessary optimality conditions: stationary points. Iterative descent methods, stopping conditions.
Gradient method: basics. Backpropagation (BP) algorithm for gradient evaluation.
Batch and online (or mini-batch) gradient methods. BP batch version (gradient with constant stepsize): theorem of convergence and choice of the learning rate. BP on-line (mini-batch) version: stochastic/incremental gradient with diminishing stepsize. Basics of convergence.
Momentum term.
Decomposition methods for shallow networks: extreme learning and two blocks decomposition.
Radial-Basis function (RBF) networks: regularized and generalized RBF networks. Their use in interpolation and approximation. Unsupervised selection of center. Supervised selection of weights and centers: decomposition methods into two blocks and decomposition methods into more blocks. Basic convergence theory of decomposition methods
Early stopping technique
Part 2: Support Vector Machines (Kernel methods)
Hard and Soft Maximum Margin Classifiers using Linear functions.
The convex constrained optimization problem the soft/hard linear SVM.
Primal Quadratic Programming: the KKT optimality conditions, feasible and descent iterative methods
• Dual formulation of the primal QP problem. Wolfe duality theory for QP.
KKT conditions.
• Nonlinear SVM or use of Kernel. The dual QP formulation of nonlinear SVM.
• Frank Wolfe method: basics. Decomposition methods: SMO-type algorithms, MVP algorithm, SVMlight, cyclic methods. Convergence theory.
• Multiclass SVM problems: one-against-one and one-against-all.
Practical use of learning algorithms.
Comparing learning algorithms from the optimization point of view.
Comparing learning algorithms from the learning point of view.
Exams:
The exam will consist in one practical projects and a written exam on the theory (possibly split into two parts, with a mid term exam)