Fall 2019

Quick info:

To sign up for the mailing list (you do not need to be enrolled in the seminar), visit our google groups website.
The seminar meets Tuesdays, 3:30 to 4:30 PM in the Newton Lab (ECCR 257) in the Classroom Wing of the Engineering Center on the CU Boulder main campus (for information on finding Newton Lab, see visiting the applied math department).
The seminar is open to the public! You do not have to be enrolled in the class
... but if you are enrolled in the class, for a passing grade, you must:
1. attend all the talks (missing an occasional talk is permissible if you have a valid reason, so email the instructors), and
2. give a ~20 min presentation at some point in the semester, either on your own research, or presenting a paper

List of Talks

(all times/locations Tues 3:30 PM at Newton Lab, ECCR 257, unless otherwise indicated)

Aug 27, organizational meeting, no talk
Sept 3, Bo Waggoner, "Toward a Characterization of Loss Functions for Distribution Learning"
Sept 10, Jorge Poveda, "Real-Time Optimization with Robustness and Acceleration via Hybrid Dynamical Systems and Averaging Theory"
Sept 17, Ashutosh Trivedi, "Reinforcement Learning and Formal Requirements"
Sept 24, Zhishen Huang, "Finding local minimizers in nonconvex and non-smooth optimization" and Antony Pearson, "Extracting structure from contaminated independent models"
Oct 1, Yury Makarychev, "Performance of Johnson-Lindenstrauss Transform for k-Means and k-Medians Clustering "
Oct 8, Joy Mueller, "Exact Sampling from the Stationary Distribution of Chemical Reaction Networks"
Oct 15, Purnendu, "The mathematical secrets of Computational Origami"
Oct 22, Alec Michael Dunton, "Learning a kernel matrix for nonlinear dimensionality reduction" (Weinberger et. al. 2004)
Oct 29, Gonzalo Mateos , "Network Topology Inference from Spectral Templates"
Nov 5, Mohsen Imani, "Towards Learning with Brain Efficiency"
Nov 12, Sriram Sankaranarayanan, "Reasoning about Neural Feedback Systems"
Nov 19,
- Altug Karakurt, "Multi-Armed Bandit Problems with Queue Dynamics "
- Liam Madden, "Time-varying optimization"
- Michael McCabe, "A Brief Overview of Graph Neural Networks"
Dec 3,
- Caleb Miller, "Ensemble Transform Particle Filter"
- Ruyu Tan, "Tracking Epidemics With Google Flu Trends Data and a State-Space SEIR Model."
- Wenqi Zhang, "A nonstationary and non-Gaussian moving average model for solar irradiance downscaling"
Dec 10,
- Amir Ajalloeian, "Inexact Online Proximal-gradient Method for Time-varying Convex Optimization"
- Maddela Avinash, "Semidefinite Relaxation technique to solve Optimal power flow Problem"
- Ayoub Ghriss, "Hierarchical Deep Reinforcement Learning through Mutual Information Maximization"

Abstracts

Sept 3

Title: Toward a Characterization of Loss Functions for Distribution Learning

Speaker: Bo Waggoner

Abstract: A common machine-learning task is to learn a probability distribution over a very large domain. Examples include natural language processing and generative adversarial networks. But how should the learned distribution be evaluated? A natural approach is to draw test samples and use a loss function. However, none, even the popular log loss, can satisfy natural axioms (inspired by the literature on evaluating human forecasters). We show this impossibility can be overturned, and many simple loss functions can have strong usefulness guarantees, by using "one weird trick" -- calibration, a classical forecasting requirement. These results imply that requiring learning algorithms to be calibrated, a kind of regularization, allows us to provably evaluate them while picking a loss function tailored to the setting.

Joint work with Nika Haghtalab and Cameron Musco; https://arxiv.org/abs/1906.02652

Speaker Bio: Bo Waggoner is a new Assistant Professor of Computer Science at CU Boulder working at the interface of game theory, machine learning, and theoretical CS. Prior to Colorado, he held postdoc positions at Microsoft Research NYC and the University of Pennsylvania, and received his PhD from Harvard in 2016. https://www.bowaggoner.com

Abstracts

Sept 10

Title: Real-Time Optimization with Robustness and Acceleration via Hybrid Dynamical Systems and Averaging Theory

Speaker: Jorge Poveda

Abstract: In this talk we will discuss robust and accelerated zero-order algorithms for the solution of black-box optimization problems in dynamical systems. In particular, we will introduce a family of novel derivative-free dynamics that can be universally modeled as singularly perturbed hybrid dynamical systems with resetting mechanisms. From this family of dynamics, we synthesize four algorithms designed for convex, strongly convex, constrained, and unconstrained zero-order optimization problems. For each case, we establish semi-global practical asymptotic or exponential stability results, and we show how to construct well-posed discretized algorithms that retain the main properties of the algorithms. Given that existing averaging theorems for singularly perturbed hybrid systems are not directly applicable to our setting, we derive a new averaging theorem that relaxes some of the existing assumptions required to apply averaging theory in hybrid systems. This allows us to make a clear link between the $\mathcal{K}\mathcal{L}$ bounds that characterize the rates of convergence of the average system and the original algorithms. We also show that our results are applicable to non-hybrid zero-order dynamics modeled as ODEs, thus providing a unifying framework for hybrid and non-hybrid zero-order dynamics. The performance of the algorithms will be illustrated via numerical examples. Potential extensions to game-theoretic and multi-agent settings will also be discussed.

Speaker Bio: Prof. Jorge I. Poveda received in 2016 and 2018 the M.Sc. and Ph.D. degrees in Electrical and Computer Engineering from the University of California at Santa Barbara. He is an Assistant Professor in the Department of Electrical, Computer, and Energy Engineering at the University of Colorado, Boulder. Before joining CU, he was a Postdoctoral Fellow at Harvard University in 2018, and a Research Intern at the Mitsubishi Electric Research Laboratory during the summers of 2017 and 2016. He was a Best Student Paper Award finalist at the 2017 IEEE Conference on Decision and Control. His research interests lie a the intersection of adaptive control, game theory, and hybrid dynamical systems theory, with applications in optimization and control of cyber-physical systems.

Abstracts

Sept 17

Title: Reinforcement Learning and Formal Requirements

Speaker: Ashutosh Trivedi

Abstract: Reinforcement learning is an approach to controller synthesis where agents rely on reward signals to choose actions in order to satisfy the requirements implicit in reward signals. Oftentimes non-experts have to come up with the requirements and their translation to rewards under significant time pressure, even though manual translation is time-consuming and error-prone. For safety-critical applications of reinforcement learning, a rigorous design methodology is needed and, in particular, a principled approach to requirement specification and to the translation of objectives into the form required by reinforcement learning algorithms.

Formal logic provides a foundation for the rigorous and unambiguous requirement specification of learning objectives. However, reinforcement learning algorithms require requirements to be expressed as scalar reward signals. We discuss a recent technique, called limit-reachability, that bridges this gap by faithfully translating logic-based requirements into the scalar reward form needed in model-free reinforcement learning. This technique enables the synthesis of controllers that maximize the probability to satisfy given logical requirements using off-the-shelf, model-free reinforcement learning algorithms.

Speaker Bio: Ashutosh Trivedi is an assistant professor in the Department of Computer Science at the University of Colorado Boulder. He is affiliated with the Programming Languages and Verification Group and the Theory Group at the University of Colorado Boulder.His research focuses on applying rigorous mathematical reasoning techniques to design and analyze safe and secure cyber-physical systems (CPS) with guaranteed performance. He investigates foundational issues (decidability and complexity) related to modeling and analysis of CPS as well as practically focused tools that can be used by practitioners to analyze large systems at scale.

Abstracts (2 speakers)

Sept 24

Title: Finding local minimizers in nonconvex and non-smooth optimization

Speaker: Zhishen Huang

Abstract: We consider the problem of finding local minimizers in nonconvex and non-smooth optimization. The objective function we consider is in the form of the sum of a nonconvex function and a l1-penalty. Under the assumption of strict saddle points, positive results have been derived for first-order methods. We present the first known results for the non-smooth case, which requires different analysis and a different algorithm.

Title: Extracting structure from contaminated independent models

Speaker: Antony Pearson

Abstract: We describe a new quantity, the latent weight, which measures the average fraction of data from a multivariate probabilistic source which can be attributed to an independent model. We give mathematical properties on this weight and contrast its interpretation with that of p-values resulting from hypothesis tests of independence. We report an efficient algorithm to compute this weight from a given source and apply the methodology to real transcription factor binding data to demonstrate its usefulness in exploratory data analysis.

If time, Raf will present a basic problem about linear regression which for some reason is still open.

Abstracts

October 1st, 2019

Title: Performance of Johnson-Lindenstrauss Transform for k-Means and k-Medians Clustering

Speaker: Yury Makarychev (Associate Professor at Toyota Technological Institute at Chicago (TTIC) )

Abstract: Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1+ε) under a projection onto a random O(log(k/ε)/ε^2)-dimensional subspace. Further, the cost of every clustering is preserved within (1+ε). More generally, our result applies to any dimension reduction satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the p-th power for any constant p.

For k-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for k-medians, it answers a question raised by Kannan.

Based on a joint work with Konstantin Makarychev and Ilya Razenshteyn.

Speaker Bio: Yury Makarychev received his MS degree in Mathematics from Moscow State University in 2001 and his PhD degree in Computer Science from Princeton University in 2008. He spent the following two years as a Postdoctoral Researcher at Microsoft Research in Redmond, WA and Cambridge, MA.

Yury’s research interests include combinatorial optimization, approximation algorithms, semi-definite programming, and low-distortion metric embedding. He has recently worked on developing approximation algorithms for unique games, constraint satisfaction, graph partitioning, and vertex ordering problems, investigating tradeoffs between local and global properties of metric spaces, and studying lift-and-project hierarchies of mathematical relaxations.

Abstracts

October 8th, 2019

Title: Exact Sampling from the Stationary Distribution of Chemical Reaction Networks

Speaker: Joy Mueller

Abstract: Chemical reaction networks (CRNs) are collections of chemical species and the rules that govern their interactions. Large reacting systems (species counts with orders of magnitude larger than 103) may often be modeled deterministically using coupled ordinary differential equations. However, dilute reacting systems, which have species counts with orders of magnitude between 102 − 103, are highly susceptible to statistical fluctuations and must be modeled stochastically. A first approach is to solve the Forward Kolmogorov equation, but analytic solutions are known in only a handful of special cases and numerical methods are therefore used to compute distributions and other quantities of interest. In this project, we focus on the stationary distribution of CRNs and describe our recent efforts to develop perfect simulation algorithms that will enable us to sample values directly from the stationary distribution of the network while bypassing convergence issues of more traditional approaches. To do this, we define a concept of ordering on stoichiometric compatibility classes and generalize thinning algorithms typically associated with birth-and-death processes.

Abstracts

October 15th, 2019

Title: The mathematical secrets of Computational Origami

Speaker: Purnendu

Abstract: Origami is the Japanese name for the centuries-old art of folding paper into representations of birds, insects, animals, plants, human figures, inanimate objects, and abstract shapes. In the purest form of origami, the figure is folded from a single uncut square of paper. Throughout the history of origami, most origami design has been carried out by a combination of trial and error and/or heuristic techniques based on the folder’s intuition. However, in the past 20-25 years, things have changed dramatically and the field of origami design has become more and more algorithmic based on mathematical ideas from geometry. The principles of geometry were first applied to origami around the mid-twentieth century when Japanese physicists and mathematicians began to formulate axioms that explain how folding creates three-dimensional objects from flat material. In this work, we will go through the mathematical principles which underlie the art of paper folding and present a complete algorithm for the design of an arbitrary origami figure. The algorithm is based on a set of mathematical conditions on the crease pattern which can be mapped to a tree graph. In this fashion, the problem is transformed into a nonlinear constrained optimization, which as it turns out, is closely related to existing circle packing and triangulation algorithms. This algorithmic approach to paper folding has opened up the scope of a plethora of engineering applications in robotics, aerospace, mechanical engineering, material science, and biology.

Abstracts

October 22nd, 2019

Title: Learning a kernel matrix for nonlinear dimensionality reduction (Weinberger et. al. 2004)

Speaker: Alec Michael Dunton

Abstract: We investigate how to learn a kernel matrix for high dimensional data that lies on or near a low dimensional manifold. Noting that the kernel matrix implicitly maps the data into a nonlinear feature space, we show how to discover a mapping that unfolds the underlying manifold from which the data was sampled. The kernel matrix is constructed by maximizing the variance in feature space subject to local constraints that preserve the angles and distances between nearest neighbors. The main optimization involves an instance of semidefinite programming---a fundamentally different computation than previous algorithms for manifold learning, such as Isomap and locally linear embedding. The optimized kernels perform better than polynomial and Gaussian kernels for problems in manifold learning, but worse for problems in large margin classification. We explain these results in terms of the geometric properties of different kernels and comment on various interpretations of other manifold learning algorithms as kernel methods.

Abstracts

October 28th, 2019

Title: Network Topology Inference from Spectral Templates

Speaker: Gonzalo Mateos

Abstract: Advancing a holistic theory of networks necessitates fundamental breakthroughs in modeling, identification, and controllability of distributed network processes – often conceptualized as signals defined on the vertices of a graph. Under the assumption that the signal properties are related to the topology of the graph where they are supported, the goal of graph signal processing (GSP) is to develop algorithms that fruitfully leverage this relational structure, and can make inferences about these relationships when they are only partially observed. After presenting the fundamentals of GSP, we leverage these ideas to address the problem of network topology inference from graph signal observations. It is assumed that the unknown graph encodes direct relationships between signal elements, which we aim to recover from observable indirect relationships generated by a diffusion process on the graph. The innovative approach is to consider the Graph Fourier Transform of the acquired signals associated with an arbitrary graph and, among all the feasible networks, search for one that endows the resulting transforms with target spectral properties and the sought graph with appealing physical characteristics such as sparsity. Leveraging results from GSP and sparse recovery, efficient topology inference algorithms with theoretical guarantees are put forth. Numerical tests corroborate de effectiveness of the proposed algorithms when used to recover social and structural brain networks from synthetically-generated signals, as well as to identify the structural properties of proteins.

Bio: Gonzalo Mateos earned the B.Sc. degree from Universidad de la Republica, Uruguay, in 2005, and the M.Sc. and Ph.D. degrees from the University of Minnesota, Twin Cities, in 2009 and 2011, all in electrical engineering. He joined the University of Rochester, Rochester, NY, in 2014, where he is currently an Assistant Professor with the Department of Electrical and Computer Engineering, as well as a member of the Goergen Institute for Data Science. During the 2013 academic year, he was a visiting scholar with the Computer Science Department at Carnegie Mellon University. From 2004 to 2006, he worked as a Systems Engineer at Asea Brown Boveri (ABB), Uruguay. His research interests lie in the areas of statistical learning from Big Data, network science, decentralized optimization, and graph signal processing, with applications in dynamic network health monitoring, social, power grid, and Big Data analytics. He currently serves as Associate Editor for the IEEE Transactions on Signal Processing, the IEEE Transactions on Signal and Information Processing over Networks, and is a member of the IEEE SigPort Editorial Board. Dr. Mateos received the NSF CAREER Award in 2018, the 2017 IEEE Signal Processing Society Young Author Best Paper Award (as senior co-author), and the Best Paper Awards at SPAWC 2012, SSP 2016, as well as ICASSP 2018 and 2019. His doctoral work has been recognized with the 2013 University of Minnesota's Best Dissertation Award (Honorable Mention) across all Physical Sciences and Engineering areas.

Abstracts

November 6th, 2019 (Joint with RCDS Seminar)

Title: Towards Learning with Brain Efficiency

Speaker: Mohsen Imani

Abstract: Modern computing systems are plagued with significant issues in efficiently performing learning tasks. In this talk, I will present a new brain-inspired computing architecture. It supports a wide range of learning tasks while offering higher system efficiency than the other existing platforms. I will first focus on HyperDimensional (HD) computing, an alternative method of computation which exploits key principles of brain functionality: (i) robustness to noise/error and (ii) intertwined memory and logic. To this end, we design a new learning algorithm resilient to hardware failure. We then build the architecture exploiting emerging technologies to enable processing in memory. I will also show how we use the new architecture to accelerate other brain-like computations such as deep learning and other big data processing.

Bio: Mohsen Imani is a Ph.D. candidate in the Department of Computer Science and Engineering at UC San Diego. His research interests are in brain-inspired computing and computer architecture. He is an author several publications at top tier conferences and journals. His contributions resulted in over $40M grants funded from multiple governmental agencies (NSF, SRC) and several companies including IBM, Intel, Micron, and Qualcomm. He has received the most prestigious awards from the UCSD school of engineering including the Gordon Engineering Leadership Award and the Outstanding Graduate Research Award. He also got several nominations for the best paper awards from DAC 2019, ISQED 2018, and ICCD 2016 conferences. Mohsen will be in the academic job market this year.

Abstracts

November 12th, 2019

Title: Reasoning about Neural Feedback Systems

Speaker: Sriram Sankaranarayanan

Abstract: Data-driven components such as feedforward neural networks are increasingly being used in safety critical systems such as autonomous vehicles and closed-loop medical devices. Neural networks compute nonlinear functions. Relatively tiny networks present enormous challenges for existing reasoning techniques used in formal verification. In this work, we will present first steps into verifying properties of neural networks in isolation, and reasoning about properties of dynamical systems with neural networks as feedback.

Joint work with Souradeep Dutta (CU Boulder), Ashish Tiwari (Microsoft) and Susmit Jha (SRI).

Bio: Sriram Sankaranarayanan is an associate professor of Computer Science at the University of Colorado, Boulder. His research interests include automatic techniques for reasoning about the behavior of computer and cyber-physical systems. Sriram obtained a PhD in 2005 from Stanford University where he was advised by Zohar Manna and Henny Sipma. Subsequently he worked as a research staff member at NEC research labs in Princeton, NJ. He has been on the faculty at CU Boulder since 2009. Sriram has been the recipient of awards including the President's Gold Medal from IIT Kharagpur (2000), Siebel Scholarship (2005), the CAREER award from NSF (2009), Dean's award for outstanding junior faculty (2012), outstanding teaching (2014), and the Provost's faculty achievement award (2014).

Abstracts

November 19th, 2019

Speakers: Altug Karakurt, Liam Madden, Michael McCabe

Title: Multi-Armed Bandit Problems with Queue Dynamics

Speaker: Altug Karakurt

Abstract: The celebrated multi-armed bandit (MAB) problem setting is a classical formulation for online learning problems. This setting captures the exploration-exploitation trade-off faced by a decision-maker who tries to discover its environment through noisy observations, while simultaneously trying to maximize their short-term reward. As a simple model that captures a fundamental phenomenon in decision-making problems, many extensions were proposed to the classical MAB setting that involve novel challenges like combinatorial, contextual, active learning or multi-objective tasks.

In this talk, we consider a recent extension to MAB that considers queueing dynamics. Unlike the classical setting, the decision-maker does not have an infinite backlog of tasks to act on, but instead the tasks are generated following a known stationary arrival process. This new formulation finds natural applications in various communication problems such as packet routing, resource allocation in multi-channel systems and scheduling. We survey the recent developments in this field, discuss their practical implications and propose future research directions.

Title: Time-varying optimization

Speaker: Liam Madden

Abstract: When an optimization problem is used for decision-making or information-extraction in real-time, there is a chance that algorithms operating within the given computation and communication limitations are unable to converge quickly enough to model the problem as approximately time-invariant. In this case, we have to use online algorithms as opposed to batch algorithms. We will consider some examples to gain intuition about online algorithms and then we will give a universal lower complexity bound for online first-order methods.

Title: A Brief Overview of Graph Neural Networks

Speaker: Michael McCabe

Abstract: Deep neural networks have contributed to wildly improved performance across a variety of tasks in fields like NLP and computer vision. One property of deep learning that gets credited for this progress is the ability to encoder prior knowledge of the problem structure into the architecture of the network. Unfortunately, the convenient low dimensional grid structure found in these problems that allows for straight-forward learning of locally compact filters is not present in the wide variety of non-Euclidean data used by other scientific fields which can often only be represented as graphs or point clouds. Graph Neural Networks, and in particular Graph Convolutional Neural Networks, are a family of architectures developed to address these types of problems. In this presentation, I'll review the major ideas and milestones in the development of modern graph neural networks and highlight several recent applications in the physical sciences.

Abstracts

December 3rd, 2019

Speakers: Caleb Miller, Ruyu Tan, Wenqi Zhang

Title: Ensemble Transform Particle Filter

Speaker: Caleb Miller

Abstract: I will review the paper, "A Nonparametric Ensemble Transform Method for Bayesian Inference" authored by Sebastian Reich. The ensemble transform particle filter (ETPF) is a sequential Monte Carlo algorithm that uses the idea of optimal transport as a method of resampling. I will review particle filters, the ETPF method and results presented in the paper.

Title: Tracking Epidemics With Google Flu Trends Data and a State-Space SEIR Model

Speaker: Ruyu Tan

Abstract: Infectious disease surveillance has traditionally played a sentinel role in public health preparedness of a pandemic. Here, we embed a classical mathematical epidemiology model [a susceptible-exposed-infected- recovered (SEIR) model] within the state-space framework, thereby extending the SEIR dynamics to allow changes through time. The implementation of this model is based on a particle filtering algorithm, which learns about the epidemic process sequentially through time. We take a close look at the Google Flu Trends data describing the spread of flu in the United States during 2003–2009 and in nine separate U.S. states chosen to represent a wide range of health care and emergency system strengths and weaknesses.

Title: A nonstationary and non-Gaussian moving average model for solar irradiance downscaling

Speaker: Wenqi Zhang

Abstract: Renewable energy sources such as wind and solar power are increasing in adoption. Historically, power has flowed from large power plants to customers. Increasing penetration of renewable energies such as solar power from photovoltaic rooftop installations has made the distribution network a two-way-street with power now being generated at the customer level. The incorporation of renewables introduces uncertainty and variability in the power grid which affects grid voltages. Distribution network operation studies are being adapted to include renewables; however, such studies require high-quality data on solar irradiance that adequately reflect realistic climatological and diurnal variabilities. Data from satellite-based products are spatially complete, but temporally coarse, whereas solar irradiances exhibit high-frequency variation at very fine timescales down to minutes. We propose a new stochastic downscaling method from satellite-based 30-minute snapshots of global horizontal irradiance to the one-minute resolution. The first step is a stochastic decision rule for distinguishing between clear and non-clear days. Solar irradiance’s first and second-order structures vary diurnally and seasonally, and our model adapts to such nonstationarity. Moreover, empirical irradiance data exhibits highly non-Gaussian behavior with heavier tails; we develop a nonstationary and non-Gaussian moving average model that is shown to capture realistic solar variability at multiple timescales in our data examples. We also propose a new estimation scheme based on Cholesky factors of empirical autocovariance matrices, bypassing difficult and inaccessible likelihood-based approaches. The approach is illustrated using the National Solar Radiation Database, as well as direct in situ measurements that are part of the University of Oregon’s Solar Radiation Monitoring Laboratory.

Abstracts

December 10th, 2019

Speakers: Amir Ajalloeian, Maddela Avinash, Ayoub Ghriss

Title: Inexact Online Proximal-gradient Method for Time-varying Convex Optimization

Speaker: Amir Ajalloeian

Abstract: This paper considers an online proximal-gradient method to track the minimizers of a composite convex function that may continuously evolve over time. The online proximal-gradient method is "inexact,'' in the sense that: (i) it relies on an approximate first-order information of the smooth component of the cost; and, (ii)~the proximal operator (with respect to the non-smooth term) may be computed only up to a certain precision. Under suitable assumptions, convergence of the error iterates is established for strongly convex cost functions. On the other hand, the dynamic regret is investigated when the cost is not strongly convex, under the additional assumption that the problem includes feasibility sets that are compact. Bounds are expressed in terms of the cumulative error and the path length of the optimal solutions. This suggests how to allocate resources to strike a balance between performance and precision in the gradient computation and in the proximal operator.

Title: Semidefinite Relaxation technique to solve Optimal power flow Problem

Speaker: Maddela Avinash

Abstract: I would like to discuss about using the convex relaxation technique to find the optimal solution for cost function of a power distribution system. Conventional optimal power flow problem is a nonconvex problem. Traditional Newton-Rpahson method has a convergence issue when the system reaches its limit. Semi definite programming makes an approximation to the power flow constraints by increasing the boundaries of the feasible set to make it a convex problem. This convex problem can therefore be solved to minimize the total cost of generation, transmission and distribution of Electric Power.

Title: Hierarchical Deep Reinforcement Learning through Mutual Information Maximization

Speaker: Ayoub Ghriss

Abstract:

As it’s the case of the human learning, biological organisms can master tasks from extremely small samples. This suggests that acquiring new skills is done in a hierarchical fashion starting with simpler tasks that allow the abstraction of newly seen samples. While reinforcement learning is rooted in Neuroscience and Psychology, Hierarchical Reinforcement Learning (HRL) was developed in the machine learning field by adding the abstraction of either the states or the actions. Temporally abstract actions, our main focus, consists of top-level/manager policy and a set of temporally extended policies (low-level/workers). At each step, a policy from this set is picked by the manager and continues to run until a set of specified termination states is reached. The decision making in this hierarchy of policies starts by top-level policy that assigns low-level policies to different domains of the state space. These low-level policies operate as any other monolithic policy on the assigned domain. In this talk, we introduce HRL and present our method to learn the hierarchy: we use Information Maximization to learn the top-level policies with on-policy method (Trust Region Policy Optimization) to learn the low-level policies.