Advanced Machine Learning for Physics (PhD 2024)

Course Information and Syllabus

Contacts: Stefano Giagu (stefano.giagu [at] uniroma1.it) and Andrea Ciardiello (andrea.ciardiello [at] gmail.com)

Program:

The general objective of the course is to become familiar with advanced deep learning techniques based on differentiable neural network models with different learning paradigms; acquire skills in modeling complex problems, through deep learning techniques, and be able to apply them in different contexts in the fields of physics, basic and applied scientific research.

Topics covered include: general overview of differentiable artificial neural networks and use of the pytorch library for ANN design, training and testing. Basic architectures: MLP, Convolutional neural network, neural network for sequence analysis (RNN, LSTM/GRU). Bayesian-NN. Attention, Self-Attention, Transformers and Visual Transformers, Models for object detection and semantic segmentation and applications. Graph Neural Networks and Geometrical Deep Learning. Generative models based on VAE, GAN, autoregressive models, invertible networks, diffusion models, normalising flow, and generative GNNs. Advanced learning techniques: transfer learning, domain adaptation, adversarial learning, self-supervised and contrastive learning, model distillation. Explainable and interpretable AI. Quantum Machine Learning on near-term quantum devices.

Approximately 50% of the lectures are frontal lessons supplemented by slide, aimed at providing advanced knowledge of Deep Learning techniques. The remaining 50% is based on hands-on computational practical experiences that provide some of the application skills necessary to autonomously develop and implement advanced Deep Learning models for solving various problems in physics and scientific research in general.

Indispensable prerequisites: basic concepts in machine learning, python language programming, standard python libraries (numpy, pandas, matplotlib, torch/pytorch )

a basic python course on YT (many others available on web: https://youtu.be/_uQrJ0TkZlc
tutorial on numpy, matplotlib, pandas: https://jakevdp.github.io/PythonDataScienceHandbook/)
basic concepts of ML: Introduction + Part I (sec. 5: ML basics) of the book I. Goodfellow et al.: https://www.deeplearningbook.org/
tutorials on pytorch web site: https://pytorch.org/
an introductory course on pytorch on YT (many others available on web): https://youtu.be/c36lUUr864M

Depending on the requirements of your specific PhD course each students can decided how may lectures/hands-on to attend tu fulfil the required hours: 20h, 40h, 60h (60h corresponds to the whole course).

Discussion group (telegram group):

https://t.me/+GPu9GM9QRtpiMWNk

Calendar: (in preparation)

aula Marcello Conversi: dipartimento di fisica G.Marconi building (1st floor)
aula Giustina Baroni: dipartimento di fisica E.Fermi building (2nd floor)
labSS: dipartimento di fisica G.Marconi building (1st floor)

Lectures2024

Student's E-mail and data for the CINECA HPC system accounts

enter the requested data in the google spreadsheet document available here (by the end of March 2024)

Bibliography/References and detailed topics treated during lectures, slides, notebooks, etc.

Given the highly dynamic nature of the topics covered in the course, there is no single reference text. During the course the sources will be indicated and provided from time to time in the form of scientific and technical articles and book chapters.

Some classic readings on Deep Learning based on differentiable neural networks:

DL: I. Goodfellow, Y. Bengio, A. Courville: Deep Learning, MIT Press (https://www.deeplearningbook.org/)
PB: P. Baldi, Deep Learning in Science, Cambridge University Press
DL2: C. Bishop, Deep Learning, Springer
GRL: W. L. Hamilton, Graph Representation Learning Book, MCGill Uni press (https://www.cs.mcgill.ca/~wlh/grl_book/files/GRL_Book.pdf)

lecture 1 - 27.2.2024 (slides, recording) h17:00-19:00
- course information
- ANN 101: (DL ch 6 (6.1,6.2,6.3, 6.4, 6.5), BIS ch 5 (5.1, 5.2,5.3, 5.5), DUDA ch 6 (6.1, 6.2, 6.3, 6.24, 6.5, 6.8))
  - artificial neuron model and MLPs
  - activations functions for hidden and output layers)
  - training of an ANN
  - loss functions
  - SGD, momentum, learning rate, variable lr and optimizers
  - learning curves, bias-variance tradeoff and double descent in DNN
  - regularisation
    - dropout
    - early stopping
    - noise injection
    - data augmentation
    - weight regularisation L1, L2, L1+L2
  - a simple example of a shallow MLP implemented in pytorch
lecture 2 - 28.2.2024 (slides, recording) h16:00-18:00
- Convolutional-NN (SL 9 (9.1, 9.2, 9.3, 9.4, 9.7, 9.8, 9.9, 9.10, 9.11)
  - image representation and input properties of a CNN (symmetry, translation invariance, self-similarity, compositionality, locality) and learned convolutional filters (DL ch 9 (9.1,9.2, 9.4))
  - local receptive field
  - convolution e shared weights
  - pooling layers
  - CNN architectures: LeNet, AlexNet, VGG, Inception, ResNet, DenseNet, ...
- Analysis of sequences: task definition and problems (DL ch 10 (intro, 10.2))
  - Elementary RNN cell: structure and operating principle
  - StackedRNN, Bidirectional RNN, Encoder-Decoder RNN (seq2seq)
  - Back-propagation through time
  - Long-term correlation and gradient vanishing and exploding problems: Gated Cell and Long Short Term Memory RNN
  - LSTM: description of operations (note by C.Olah and for details DL ch 10 (10.10))
Hands-on 1 - 5.3.2024 (notebook, recording) h 14:00-16:00
- hands-on: pytorch library usage, an example of auto-grad based optimisation, implementation of a simple CNN to solve a classification task
lecture 3 - 6.3.2024 (slides, recording) h 16:00-18:00
- learning methods:
  - learning paradigms recall (supervised, unsupervised, reinforcment learning)
  - semi-supervised learning
  - self-supervised learning
    - contrastive learning, simCLR (DL2 6.3.5)
    - non-contrastive learning, Barlow twins (https://arxiv.org/pdf/2103.03230.pdf)
  - deep residual learning, denoisers based on residual learning
  - transfer learning and domain adaptation (DL2 6.3.4)
  - knowledge transfer based on knowledge distillation (https://arxiv.org/abs/1503.02531)
hands-on 2 - 12.3.2024 (notebook (recording not available)) h: 14:00-16:00
- hands-on: use of skip connection and denoising CNN (DnCNN) exercise description & assignment
lecture 4 - 13.3.2024 (slides, recording) h: 16:00-18:00
- neural architectures for object detection and segmentation (DL2: 10.4. 10.5)
  - semantic segmentation, downsampling-upsampling (arXiv:1411.4038., arXiv:1505.04366)
  - object detection, IoU, anchor boxes, non-max supression
    - region proposals, R-CNN, Fast and Faster R-CNN (arXiv:1311.2524, arXiv:1506.01497 )
    - Yolo and SSD models (arXiv:1506.02640, arXiv:1512.02325)
  - Instance segmentation, Mask R-CNN (arXiv:1703.06870)
  - Pose
hands-on 3 - 19.3.2024 (notebook, recording) h: 14:00-16:00
- hands-on: DnCNN solution + SSL Barlow Twins exercise assignment
hands-on 4 - 20.3.2024 (notebook, recording) h: 16:00-18:00
- hands-on: use of MONAI library for semantic segmentaion of medical images
hands-on 5 - 27.3.2024 (notebook, slides, recording) h:16:00-18:00
- Barlow Twins problem solution + introduction to implementation of Bayesian NN
lecture 5 - 3.4.2024 (slides, recording) h:16:00-18:00
- uncertainty quantification in ANN
  - IID and OOD data
  - type of uncertanties: approximation unc., epistemic unc. aleatoric unc.
  - calibration error and ECE (https://arxiv.org/pdf/1706.04599.pdf)
  - ensamble methods:
    - deep ensambles (https://arxiv.org/pdf/1612.01474.pdf)
    - MC dropout (https://arxiv.org/pdf/1506.02142.pdf)
  - Bayesian-NN (https://arxiv.org/pdf/2007.06823.pdf)
    - Pyro lib, example implementation of a MCMC-based (see DL ch 17 for a simple introduction to MCMC) BNN for a simple regression task (https://github.com/pyro-ppl/pyro)
  - conformal predictions (https://arxiv.org/pdf/2107.07511.pdf)
lecture 6 - 9.4.2024 (slides, recording) h:14:00-16:00
- Graph Neural Networks (Dl2 ch 13, GRL section 5 and 6, PyTorch geometric web site)
  - introduction
  - graphs and representation
  - permutation equivariance
  - graph convolutions and message passing
  - basic GCN layerw
  - self-loop GCN
  - graph attention networks
  - solutions to the over-smoothing problem in GNNs
  - normalization
  - GNN python libraries
hands-on 6 - 10.4.2024 (notebook, recording) h: 16:00-18:00
- Graph Neural Networks and Pytorch Geometric
hands-on 7 - 16.4.2024 (notebook, recording) h: 14:00-16:00
- PointCloud classification with GNNs
lecture 7 - 17.4.2024 (slides, recording) h: 16:00-18:00
- attention mechanism
  - the RNNSearch encoder-decoder model (arXiv:1409.0473)
  - attention and the Nadaraya-Watson kernel estimator
  - attention layers vs fully connected layers
- transformer architecture (arXiv:1706.03762)
  - word embedding (cenni)
  - (masked) multi head (self) attention based on scaled dot product
  - layer normalization
  - positional embedding
hands-on 8 - 23.4.2024 (notebook, recording) h: 14:00-16:00
- implementation of attention layers and of the transformer architecture
hands-on 9 - 24.4.2024 (slides, notebook, recording) h: 16:00-18:00
- modern evolutions of transformers architectures: BERT/GPT/GPT2/GPT3/...
- vision transformer (arXiv:2010.11929)
- multimodal transformers
- assignment of project #4 (ViT)
lecture 8 - 30.4.2024 (slides, recording) h: 14:00-16:00
- AI explainability and interpretability
hands-on 10 - 7.4.2024 (notebook, recording) h: 14:00-16:00
- xAI examples + PointNet++ solution
hands-on 11 - 8.4.2024 (notebook, recording) h: 16:00-18:00
- ViT solution
lecture 9 - 14.5.2024 h:14:00-16:00
- Q&A with the instructors
lecture 10 - 15.5.2024 (slides, recording) h: 16:00-18:00
- AutoEncoders (DL ch 14, DL2 19.1)
  - under-complete auto-encoders
  - linear AE and PCA
  - over-complete AEs
  - denoising AE
  - sparse AE
  - contractive AE
  - AE for self-supervised anomaly detection
  - examples of implementation in pytorch
lecture 11 - 21.5.2024 (slides, recording) h: 14:00-16:00
- seminar from Sergio Orlandini on HPC resources and AI parallel programming at CINECA - part 1
lecture 12 - 22.5.2024 (slides, recording) h: 16:00-18:00
- generative DL (DL ch. 20, DL2 ch 11.3, 19.2)
  - autoregressive models
  - latent variable models
  - VAE, ELBO theorem
lecture 12 - 29.5.2024 (slides, recording) h: 16:00-18:00