OMDS - Optimization Methods for Data Science

academic year 2023 - 2024

Timetable:

Monday 8:00-12 Room A2, DIAG via Ariosto 25

Wednesaday 10-12 Room A7, DIAG via Ariosto 25

Final term exam: 27th May 2024

Classroom code of the course : t4gg7io

Link to the classroom course: https://classroom.google.com/c/NjY0MTY4MjUyNTAy?cjc=t4gg7io

All the slides and notes can be found on the classroom webpage.

Lessons diary:

26/02/2024 (4hours) Introduction to the course. Machine learning and Optimization. Data Analytics. Applications of supervised learning. Empirical risk vs Expected risk. Overfitting and Underfitting. Tradeoff between bias and variance. Linear Regression and its extensions. Subset selection and Regularization techniques.
28/02/2023 (2hours) Vapnick theory (Vapnik Chervonenkis bound). VC confidence and VC dimension. Structural Risk minimization.
04/03/2024 (4 hours) Introduction to neural networks. Introduction to Neural Networks. Neurons and biological motivation. Linear threshold units. The Perceptron algorithm with convergence proof. Classification of linearly separable patterns. Limits of the perceptron. Hyperparameters and parameter optimization. Feedforward Neural Network. The training optimization problem: unconstrained non convex optimization. Definition of global minimum
06/03/2024 (2 hours) Definition of Local Minimum. Convex optimization and its properties. Existence conditions: Weierstrass theorem. Level sets of a function. Coercive functions.
11/03/2023 (4 hours) Gradient and hessian of a function. Taylor expansion of 1st and 2nd order and its use in optimization. Examples. Exercises on coercive functions. Gradient and hessian of a function. Taylor expansion of 1st and 2nd order and its use in optimization. Descent directions and their use.
13/03/2024 (2 hours) Optimality conditions for unconstrained optimization. Special cases: convex case, quadratic case. Examples.
18/03/2024 (4 hours) Algorithms for unconstrained optimization: general scheme, convergence properties. Example of converging sequences. Gradient method. Exact linesearch. Fixed stepsize gradient method: convergence theorem (the proof is not included in the program). Exercises.
20/03/2024 (2 hours) Multi Layer perceptron networks: choice of the architecture and training. Optimization viewpoint. Batch gradient method for MLP training.
25/03/2024 (4 hours) Backpropragation formulas. Introduction to online methods. Online gradient method: incremental gradient and stochastic gradient. Confusion region and practical behavior. Practical lesson: python libraries and implementation of gradient method for minimizing a non linear function.
03/04/2024. (2 hours) Convergence of online methods: assumptions needed for proving convergence of IG and SGD. Practical choices: mini batch size, early stopping different implementations. Acceleration strategies: Nesterov momentum, ADAGRAD, RMSProp, ADAM. An application of MLP networks: representation of gas separation through membranes using ML.
08/04/2024 MidTerm Exam (4 hours)
10/04/2024 (2 hours) SVM training problem. Necessary optimality conditions for constrained optimization: the geometric point of view
15/04/2024 (4 hours) Constrained Optimization: more on optimality conditions. Special cases: linear constraints, examples. KKT and Fritz John conditions. Hints on the conditional gradient method.
17/04/2024 (2 hours) SVM problem: reformulation and existence and uniqueness.
22/04/2024 (4 hours) Wolfe duality theory. General case and quadratic optimization case. Examples
24/04/2024 (2 hours) Dual of the SVM problem. Soft margin SVMs.
06/05/2024 (4 hours) Dual of Soft Margin L1 SVM, exercises. Kernel trick and non linear SVMs. Examples of kernels.
08/05/2024 (2 hours) Optimality conditions for the dual. Introduction to decomposition methods
13/05/2024 (4 hours) SMO Maximum Violating Pair algorithm. Practical lesson on SVM. Description of the final project
15/05/2024 (2 hours) Final details on SVMs. Hints on nu-classification. Exerciese
20/05/2024 (4 hours) Exercises. Hints on semi-supervised SVMs and on SOS clustering.
27/05/2024 (4 hours) Final term

Syllabus of the course:

Introduction:

Definition of learning systems.
Goals and applications of machine learning (classification and regression).
Basics on statistical learning theory (Vapnik Chervonenkis bound). Definition of the learning optimization problem.
Trade-off between complexity and errors: Underfitting and Overfitting. Definition of the learning problem.
Hyperparameters and parameter optimization
Training set, test set, validation set.
The k-fold cross-validation procedure.

Neurons and biological motivation. Linear threshold units. The Perceptron algorithm. Classification of linearly separable patterns. Limits of the perceptron
Part 1: Neural Networks.
1. Multi-Layer Feedforward Neural Networks. Shallow and Deep networks. The training optimization problem: unconstrained non convex optimization.
  - Necessary optimality conditions: stationary points. Iterative descent methods, stopping conditions.
  - Gradient method: basics. Backpropagation (BP) algorithm for gradient evaluation.
  - Batch and online (or mini-batch) gradient methods. BP batch version (gradient with constant stepsize): theorem of convergence and choice of the learning rate. BP on-line (mini-batch) version: stochastic/incremental gradient with diminishing stepsize. Basics of convergence.
  - Momentum term.
  - Decomposition methods for shallow networks: extreme learning and two blocks decomposition.
2. Early stopping technique
Part 2: Support Vector Machines (Kernel methods)
1. Hard and Soft Maximum Margin Classifiers using Linear functions.
2. The convex constrained optimization problem the soft/hard linear SVM.
3. Primal Quadratic Programming: the KKT optimality conditions, feasible and descent iterative methods
4. • Dual formulation of the primal QP problem. Wolfe duality theory for QP.
5. KKT conditions.
6. • Nonlinear SVM or use of Kernel. The dual QP formulation of nonlinear SVM.
7. • Frank Wolfe method: basics. Decomposition methods: SMO-type algorithms, MVP algorithm, SVMlight, cyclic methods. Convergence theory.
8. • Multiclass SVM problems: one-against-one and one-against-all.
Unsupervised and semisupervised learning: Minumum Sum of Squares Clustering and Semi supervised SVMs (S3SVM)
Practical use of learning algorithms.
Comparing learning algorithms from the optimization point of view.
Comparing learning algorithms from the learning point of view.

Exams:

The exam will consist in one practical projects and a written exam on the theory (possibly split into two parts, with a mid term exam)

Page updated

Report abuse