Schedule

PLENARY 1:  Cluster detection for high dimensional spatial data

 Sophie Dabo | University of Lille and INRIA, France |  11.00 - 11.40 | Slides

Abstract

In environmental surveillance, cluster detection of environmental black spots is of major interest due to the adverse health effects of pollutants, as well as their known synergistic effect. We introduce new spatial scan statistics for multivariate functional data, applicable for detecting clusters of abnormal spatial measurements of a high dimensional variable, in a given region, taking into account their correlations. Mathematically, the methodology is derived from a functional multivariate analysis of variance (MANOVA), an adaptation of the Hotelling test statistic, and a multivariate extension of the Wilcoxon test statistic.

Leveraging the Banach Contraction Principle in Reinforcement Learning

 Yae U. Gaba  | Quantum Leap Africa, Rwanda |  11.50 - 12.05 | Slides

In this presentation, I'll explore the mathematical foundations of classical reinforcement learning, highlighting how the Bellman operator acts as a contraction mapping in a complete metric space and explaining how value-based methods like Q-learning utilize these principles to discover optimal policies. Through this exploration, I aim to illustrate the simplicity of these foundational concepts, with the goal of providing the audience with a deeper understanding of reinforcement learning from a topological perspective, fostering inspiration for the development of future reinforcement learning algorithms.

Algebraic and Geometric Aspects of Knowledge Graph Embeddings

 Kossi Amouzouvi | ScaDS.AI at Technische Universität Dresden, Germany |  12.10 - 12.25 | Slides

Abstract

Knowledge Graph Representation Learning (KGRL) is one of the core subfields of Graph-based AI. KGRL approaches embed a knowledge graph (KG) into a vector space and aim to preserve the underlying structure and semantics of a KG. More specifically, the underlying algebraic and geometric properties of the embedding space contribute to an optimal representation of the nodes and edges of the KG. KGRL approaches are fundamental for AI, since they act as a bridge between symbolic knowledge representation and machine learning techniques. In this talk, we will visit the state-of-the-art KG embedding models, and discuss how the algebra and geometry of the embedding space contribute to the overall performance of the models.

PLENARY 2 : Stochastic Gradient Descent: understanding adaptive step-sizes, momentum, and initialization

 Rachel Ward | University of Texas Austin, United States |  14.00 - 14.40 | Slides

Abstract

Stochastic gradient descent (SGD) is the foundational algorithm used in machine learning optimization, but several algorithmic modifications to the basic SGD algorithm are often needed to make it “work” on high-dimensional non-convex problems. Three of the crucial modifications are: adaptive step-size updates, momentum, and careful random initialization of the parameters. This talk will discuss recent theoretical insights towards understanding why adaptivity, momentum, and careful random initialization are so powerful in practice. In particular, the theory unveils a novel but simple initialization method for gradient descent on matrix- and tensor-factorization problems; with this initialization, we prove that gradient descent discovers optimal low-rank matrix and tensor factorizations in a small number of steps.

Analysis of Gradient Descent and Stochastic Gradient Descent for Training Deep Linear Neural Networks 

 Gabin Maxime Nguegnang | RWTH Aachen University, Germany |  14.50 - 15.05 | Slides

Abstract

In this talk, we will present our work on the analysis of gradient descent (GD) and stochastic gradient descent (SGD) for learning deep linear neural networks. First of all, we established the boundedness of GD iterates and proved its convergence to a critical point of the square loss under suitable conditions on the step sizes. We then extended the convergence results towards a global minimum of Bah et al. (2020) from gradient flow to GD. Our work provides precise conditions that ensure convergence for both constant and decreasing step sizes. Moreover, our maximal allowed step size does not vanish exponentially with the number of layers and we also showed numerically that violating the bound for our step sizes may result in divergence. Finally, we will conclude with our work in progress on the extension of the insights of this study from GD to SGD. 

Panel Discussion

Growing African talents in the mathematics of Machine Learning

Panelists: