Scientific Programme

Course 1: Randomized Numerical Linear Algebra
(B Bah, AIMS South Africa)

This course will cover recent developments in randomized matrix algorithms of interest in large-scale machine learning and statistical data analysis applications. The course will introduce students to basic algorithms that use randomization in a non-trivial way for fundamental matrix problems such as matrix multiplication, least-squares regression, low-rank matrix approximation, etc. It will start with a brief review of linear algebra and probability distributions. The implementation and application of algorithms will be in Python, Julia, Scilab and/or Octave.

Course 2: Topics in the Analysis of Large Databases
(M Bogdan University of Wroclaw)

In the first part of this lecture we will use the classical setup of the multivariate normal distribution to discuss the generic problems in the analysis of large data sets: needle in haystack problem, limits of signal detectability, multiple testing and regularization techniques. In the second part of the lecture we will apply these notions in the context of the analysis of high dimensional generalized linear models. We will discuss the properties of the information model selection criteria and the regularization techniques like ridge regression, LASSO and SLOPE. If time permits, we will also discuss some unsupervised techniques for dimensionality reduction like PCA, sparse PCA and sparse subspace clustering.

Course 3: Spatial Methods for Spatial Data Science
(S Dabo-Niang, Université de Lille)

Spatial statistics includes any (statistical) techniques which study phenomenons observed on spatial sets. Such phenomenons appear in a variety of fields: epidemiology, environmental science, econometrics, image processing and many others. The modelization of spatial data is among the most interesting research subjects in dependent data analysis. This is motivated by the increasing number of situations coming from different fields of applied sciences for which the data are of spatial nature. Complex issues arise in spatial analysis, many of which are neither clearly defined nor completely resolved, and form the basis for current researches.

In this course, we are interested in introducing the methodology and application of spatial statistical models to young researchers and PhD students. More specifically, the objective of this course is to provide an introduction to spatial statistics, to learn how to model and integrate spatial dependencies into spatial data analysis. The course will cover topics such as: Exploratory analysis of spatial data; Spatial clustering; Spatial regression and prediction models; Estimation methods; Model selection and specification. The skills of the models and methods are illustrated on real data analysis. This practical part will be done with the R software.

Reference: Insee-Eurostat. (2018), Handbook of Spatial Analysis Theory and Application with R

Course 4: Modern Graphical Models
(Piotr Graczyk, Université d’Angers)

Graphical models provide one of the most powerful methods of unsupervised learning of modern Data Science. The lectures will start with basics on conditional independence and on Gaussian Graphical Models. Next lectures will present axioms, Markov properties and factorization property of Graphical Models. Maximum Likelihood Estimators of the Covariance Matrix will be discussed. The exact computation of the MLE requires the graph to be decomposable, so we will learn basics on decomposable graphs. We will finish by the graphical Model Selection, both LASSO type and Bayesian. The lectures will contain numerous examples, starting with the Simpson paradox of apparent sex discrimination at a public university and ending with Big Data examples. Students will be encouraged to carry out some programming tasks in R.

Course 5: Complex Networks, Embedding Methods and Applications
(F Kalala Mutombo, University of Lubumbashi)

Complex networks are networks that feature patterns of connection between their elements that are neither purely regular nor purely random. Most real-world networks, such as transportation, social or gene-regulatory networks, are complex. The course will start with a broad range of approaches and techniques drawn from social network analysis, graph theory, and network science for analyzing real-world network data. Next, we introduce embedding methods for networks to deal with high dimensionality and heterogeneity. During the course, theoretical material will be presented together with data and code using NetworkX in Python in the Jupyter Notebook environment (or spyder). Specific topics include the following: The basic conceptual and mathematical formulation of networks. Basic metrics of networks (e.g. paths, components, degree distributions, etc.) Centrality measures, General properties of real-world networks, Models of networks, Community detection.

Course 6: Deep Generative Models
(S Kroon, Stellenbosch University)

This course considers the problem of large-scale learning and Bayesian inference in deep generative models. Exact inference in such hierarchical models is not tractable, so approaches based on variational inference are widely used. Modern approaches focus on learning a mapping from observed data to the required variational parameters governing the approximate posterior over the latent variables in an unsupervised manner, a technique known as amortized inference. This training procedure is performed by simultaneously training the generative model parameters and those of the variational parameter mapping by formulating so-called encoder and decoder networks, and training them using stochastic optimization methods available in modern neural network libraries. The lectures will cover the above material in the context of two major types of deep generative models, namely variational autoencoders and normalizing flow models. Students will do some theoretical derivations in the context of variational inference, and spend time developing and applying deep generative models in Python using PyTorch.

Course 7: High-Dimensional Statistics Based on Random Matrix Theory
(J Yao, University of Hong Kong )

The course is designed for a short introduction of recent progresses in high-dimensional statistics in connection to random matrix theory. Firstly, fundamental results on eigenvalues of large sample covariance matrices will be reviewed. This includes Marcenko-Pastur law, Bai-Silverstein central limit theorem. Next, applications to hypothesis testing on large covariance matrices, factor modelling or PCA will be discussed. The students will learn a few important theoretical results and representative applications in high- dimensional statistics. They will have opportunity to consolidate this learning with a number of exercises and simulation studies.

Reference: Jianfeng Yao, Shurong Zheng and Z. D. Bai. Large Sample Covariance Matrices and High-Dimensional Data Analysis. Cambridge University Press, 2015

How to participate

For all wishing to take part please click here to apply.

For those based outside South Africa please click here to apply for CIMPA financial support.

Closing date for applications: 20 April 2022

Registration and logistics: research-admin@aims.ac.za