OMDS - Optimization Methods for Data Science

academic year 2024 - 2025

Timetable:

Monday 8:00-12 Room A2, DIAG via Ariosto 25

Wednesday 11-13 Room A4, DIAG via Ariosto 25

Classroom code of the course : b2m76o2

Link to the classroom course: https://classroom.google.com/c/MjEzMTcyMzk5NDFa?cjc=b2m76o2

All the slides and notes can be found on the classroom webpage.

Lessons diary:

26/02/2025 (2h) Introduction to the course. Some interesting applications. Survey to assess the background of the students
03/03/2025 (4h) Applications of supervised learning. Empirical risk vs Expected risk. Overfitting and Underfitting. Tradeoff between bias and variance. Linear Regression and its extensions. Subset selection and Regularization techniques.
05/03/2025 (2h) Vapnick theory (Vapnik Chervonenkis bound). VC confidence and VC dimension. Structural Risk minimization.
10/03/2025 (4h) Introduction to neural networks. Introduction to Neural Networks. Neurons and biological motivation. Linear threshold units. The Perceptron algorithm with convergence proof. Classification of linearly separable patterns. Limits of the perceptron. Hyperparameters and parameter optimization. Feedforward Neural Network. The training optimization problem: unconstrained non convex optimization. Definition of global minimum. Definition of Local Minimum. Convex optimization and its properties
12/03/2025 (2h) Convex optimization and its properties. Existence conditions: Weierstrass theorem. Level sets of a function. Coercive functions.
17/03/2025 (4h) Gradient and hessian of a function. Taylor expansion of 1st and 2nd order and its use in optimization. Examples. Exercises on coercive functions. Gradient and hessian of a function. Taylor expansion of 1st and 2nd order and its use in optimization.
19/03/2025 (2 hours) Descent directions and their use. Optimality conditions for unconstrained optimization. Special cases: convex case, quadratic case. Examples.
24/03/2025 (4 hours) Algorithms for unconstrained optimization: general scheme, convergence properties. Example of converging sequences. Gradient method. Exact linesearch. Fixed stepsize gradient method: convergence theorem (the proof is not included in the program). Exercises.
26/03/2025 (2 hours) Multi Layer perceptron networks: choice of the architecture and training. Optimization viewpoint. Batch gradient method for MLP training.
31/03/2025 (4 hours) Backpropragation formulas. Introduction to online methods. Online gradient method: incremental gradient and stochastic gradient. Confusion region and practical behavior. Practical lesson: python libraries and implementation of gradient method for minimizing a non linear function.
07/04/2025 (3hrs) Exercise to prepare the mid term exam
09/04/2025(2hrs) Introduction to SVM Training problem definition. Introduction to constrained optimization: feasible directions and gemoetric optimality conditions when the feasible set is convex
14/04/2025 (4hrs) MidTerm exam
16/04/2025 (2hrs) Constrained Optimization: more on optimality conditions. Special cases: linear constraints, examples. KKT conditions.
28/04/2025 (4hrs) Practical lesson
30/04/2025 (2hrs) Project description
05/05/2025 (4hrs) Examples of applications of KKT conditions. Conditional Gradient Method. Reformulation of the SVM training problem. Application of KKT conditions to the SVM problem problem. Example. Introduction to duality: weak duality and Strong duality. Wolfe dual of a convex problem. Example.
07/05/2025 (2hrs) Wolfe duality theory: quadratic programming case. Example.
12/05/2025 (4hrs) Derivation of the Wolfe dual for hard SVMs. L1 soft SVMs, derivation of the dual. Examples. Nonlinear SVMs: kernel trick, properties of kernel functions, polynomial kernel, Gaussian kernel.. Examples
14/05/2025 (2hrs) Optimality conditions for the dual SVM problem. Introduction to decomposition methods
19/05/2025 (4hrs) SMO with the most violating pairs. Examples and convergence properties. Practical lesson on constrained optimization.
21/05/2025 (2hrs) Exercises for the final term
26/05/2025 (4hrs) Final Term

Syllabus of the course:

Introduction:

Definition of learning systems.
Goals and applications of machine learning (classification and regression).
Basics on statistical learning theory (Vapnik Chervonenkis bound). Definition of the learning optimization problem.
Trade-off between complexity and errors: Underfitting and Overfitting. Definition of the learning problem.
Hyperparameters and parameter optimization
Training set, test set, validation set.
The k-fold cross-validation procedure.

Neurons and biological motivation. Linear threshold units. The Perceptron algorithm. Classification of linearly separable patterns. Limits of the perceptron
Part 1: Neural Networks.
1. Multi-Layer Feedforward Neural Networks. Shallow and Deep networks. The training optimization problem: unconstrained non convex optimization.
  - Necessary optimality conditions: stationary points. Iterative descent methods, stopping conditions.
  - Gradient method: basics. Backpropagation (BP) algorithm for gradient evaluation.
  - Batch and online (or mini-batch) gradient methods. BP batch version (gradient with constant stepsize): theorem of convergence and choice of the learning rate. BP on-line (mini-batch) version: stochastic/incremental gradient with diminishing stepsize. Basics of convergence.
  - Momentum term.
  - Decomposition methods for shallow networks: extreme learning and two blocks decomposition.
2. Early stopping technique
Part 2: Support Vector Machines (Kernel methods)
1. Hard and Soft Maximum Margin Classifiers using Linear functions.
2. The convex constrained optimization problem the soft/hard linear SVM.
3. Primal Quadratic Programming: the KKT optimality conditions, feasible and descent iterative methods
4. • Dual formulation of the primal QP problem. Wolfe duality theory for QP.
5. KKT conditions.
6. • Nonlinear SVM or use of Kernel. The dual QP formulation of nonlinear SVM.
7. • Frank Wolfe method: basics. Decomposition methods: SMO-type algorithms, MVP algorithm, SVMlight, cyclic methods. Convergence theory.
8. • Multiclass SVM problems: one-against-one and one-against-all.
Unsupervised and semisupervised learning: Minumum Sum of Squares Clustering and Semi supervised SVMs (S3SVM)
Practical use of learning algorithms.
Comparing learning algorithms from the optimization point of view.
Comparing learning algorithms from the learning point of view.

Exams:

The exam will consist in one practical projects and a written exam on the theory (possibly split into two parts, with a mid term exam)

Page updated

Report abuse