Fall 2021

Statistics, Optimization and Machine Learning Seminar

Quick info

We meet Tuesdays 3:30-4:30pm MT in ECCR 257 (Newton Lab). To sign up for the mailing list and receive announcements about talks (you do not need to be enrolled in the seminar), visit our google groups website and add yourself to the group

The seminar is organized by Professors Rafael Frongillo (CS Dept) and Stephen Becker (Applied Math Dept)

Fall 2021 Update: the seminar is happening!

The seminar was canceled Fall 2020 and Spring 2021, but now it is back for Fall 2021. We anticipate running in hybrid mode for most talks: there will be an audience in person, but anyone wishing to join remotely can do so. Please join the google groups list, as that is where we will distribute remote access information (we'll use zoom).

The seminar is always open to the public!

You do not have to be enrolled in the class ... but if you are enrolled in the class, for a passing grade, you must:

    1. attend all the talks (missing an occasional talk is permissible if you have a valid reason, so email the instructors), and

    2. give a ~20 min presentation at some point in the semester, either on your own research, or presenting a paper

List of Talks

  • Aug 24. No talk, just organizational meeting for students who are enrolled

  • Aug 31. Kevin Doherty

    • Title: Derivative Free Optimization with Interpolation and Trust Regions: Efficient Use of Zeroth Order Information

  • Sep 7. Paper discussion

    • Title: Fast Rates for Structured Prediction

  • Sep 14. Stephen Becker, Assoc. Prof of Applied Math, CU Boulder

    • Title: Algorithmic stability for generalization guarantees in machine learning

  • Sep 21. Gabriel P Andrade

    • Title: Learning in Matrix Games can be Arbitrarily Complex

  • Sep 28. Anish Thilagar

    • Title: Efficient Competitions and Online Learning with Strategic Forecasters

  • Oct 5. Lijun Chen

    • Title: Distributed Control and Learning of Networked Dynamical Systems

  • Oct 12. Samy Wu Fung, Assistant Professor at the Colorado School of Mines, Dept of Applied Math & Stat.

    • Title: Efficient Training of Infinite-depth Neural Networks via Jacobian-free Backpropagation

  • Oct 19. Gongguo Tang, Associate Professor in Electrical Engineering at CU Boulder

    • Title: Geometry and algorithm for some nonconvex optimizations

  • Oct 26. Ishan Kumar

    • Title: Machine Learning in Sports

  • Nov 2. Jessie Finocchiaro

    • Title: Designing loss functions for general prediction tasks

  • Nov 9. Student presentations

  • Nov 16. Student presentations

  • Nov 23. Thanksgiving holiday week. No talk

  • Nov 30. Aaditya Ramdas, Assistant Professor at CMU Department of Statistics and Machine Learning

    • Comparing sequential forecasters

  • Dec 7. TBD

Abstracts

  • Aug 24. No talk, just organizational meeting for students who are enrolled

  • Aug 31. Kevin Doherty

Title: Derivative Free Optimization with Interpolation and Trust Regions: Efficient Use of Zeroth Order Information

Abstract: Derivative free optimization via interpolation-based trust region methods are a particular type of zeroth order optimization that aims to optimize a black-box function using the fewest possible number of function evaluations. These methods are useful when derivatives are either unavailable or expensive to compute. Instead, these methods build an interpolant that is used as a model of the black-box function and refines the model as new function evaluations are available. In this way, they can efficiently optimize an objective that is expensive to evaluate as in simulation optimization and PDE-constrained optimization. We will discuss some of the challenges of scaling this method to higher dimensions, as well as techniques to overcome poor matrix conditioning and the large memory footprint of the resulting Vandermonde system.

Biography: Kevin Doherty is a second year Ph.D. student at the University of Colorado - Boulder studying Applied Mathematics. He is currently working with Stephen Becker on optimization problems that aim to efficiently utilize available information. Before coming to the University of Colorado, Kevin received a Master's in Mathematics at George Washington University. While earning his masters, Kevin was employed as a Signal Processing Engineer and worked with radar and communications technologies.


  • Sep 7. Paper discussion

    • Title: Fast Rates for Structured Prediction

  • Sep 14. Stephen Becker

Title: Algorithmic stability for generalization guarantees in machine learning

Abstract: Inspired by the practical success of deep learning, the broader math community has been energized recently to find theoretical justification for deep learning methods. There is a large amount of theory from the computer science community, dating to the 1980s and earlier, but usually the quantitative guarantees are too loose to be helpful in practice, and it is rare that theory can predict something useful (such as what iteration to perform early-stopping in order to prevent over-fitting).


Many of these theories are less well-known inside applied math, so we briefly review essential results before focusing on the notion of algorithmic stability, popularized in the early 2000s, which is an alternative to the more mainstream VC dimension approach, and is one avenue that might give sharper theoretical guarantees. Algorithmic stability is appealing to applied mathematicians, and in particular analysts, since a lot of the technical work is similar to analysis used for convergence proofs.


We give an overview of the fundamental results of algorithmic stability, focusing on the stochastic gradient descent (SGD) method in the context of a nonconvex loss function, and give the latest state-of-the-art bounds, including some of our own work (https://arxiv.org/abs/2006.05610, joint with L. Madden and E. Dall'Anese) which is one of the first results that suggests when to do early-stopping.


  • Sep 21. Gabriel P Andrade

Title: Learning in Matrix Games can be Arbitrarily Complex

Abstract: Many multi-agent systems with strategic interactions have their desired functionality encoded as the Nash equilibrium of a game, e.g. machine learning architectures such as Generative Adversarial Networks. Directly computing a Nash equilibrium of these games is often impractical or impossible in practice, which has led to the development of numerous learning algorithms with the goal of iteratively converging on a Nash equilibrium. Unfortunately, the dynamics generated by the learning process can be very intricate and instances failing to converge become hard to interpret. In this paper we show that, in a strong sense, this dynamic complexity is inherent to games. Specifically, we prove that replicator dynamics, the continuous-time analogue of Multiplicative Weights Update, even when applied in a very restricted class of games--known as finite matrix games--is rich enough to be able to approximate arbitrary dynamical systems. In the context of machine learning, our results are positive in the sense that they show the nearly boundless dynamic modelling capabilities of current machine learning practices, but also negative in implying that these capabilities may come at the cost of interpretability. As a concrete example, we show how replicator dynamics can effectively reproduce the well-known strange attractor of Lonrenz dynamics (the ``butterfly effect") while achieving no regret


  • Sep 28. Anish Thilagar

Title: Efficient Competitions and Online Learning with Strategic Forecasters

Abstract: Winner-take-all competitions in forecasting and machine-learning suffer from distorted incentives. Witkowski et al. 2018 identified this problem and proposed ELF, a truthful mechanism to select a winner. We show that, from a pool of n forecasters, ELF requires Θ(nlogn) events or test data points to select a near-optimal forecaster with high probability. We then show that standard online learning algorithms select an ϵ-optimal forecaster using only O(log(n)/ϵ2) events, by way of a strong approximate-truthfulness guarantee. This bound matches the best possible even in the nonstrategic setting. We then apply these mechanisms to obtain the first no-regret guarantee for non-myopic strategic experts.


  • Oct 5. Lijun Chen

Title: Distributed Control and Learning of Networked Dynamical Systems

Abstract: Complex networked systems like the power grid, the Internet and networked self-driving vehicles consist of interconnected, active, and possibly self-interested components, operate with incomplete information and in uncertain environments, and must achieve certain network-wide objectives or collective behaviors. Problems associated with such systems are typically large, computationally hard, and require distributed solutions; yet they are also very structured and have features that can be exploited by appropriate computational methods. In this talk, I will present our work developing optimization approaches for such problems, and bringing together optimization, systems theory, and domain-specific knowledge for exploring structures of the underlying problem and leveraging them for the principled design of distributed control and learning architectures and algorithms. In particular, I will discuss how to exploit the dynamics of networked systems for real-time distributed optimization, the factorized structure of networked systems for distributed and structured learning, and the convexity of the underlying problems for robust and scalable design under limited cyber resources. I will conclude the talk with a brief discussion of a few future directions such as distributed autonomy for networked multiagent systems.


  • Oct 12. Samy Wu Fung, Assistant Professor at the Colorado School of Mines, Dept of Applied Math & Stat.

Title: Efficient Training of Infinite-depth Neural Networks via Jacobian-free Backpropagation

Abstract: A promising trend in deep learning replaces fixed depth models by approximations of the limit as network depth approaches infinity. This approach uses a portion of network weights to prescribe behavior by defining a limit condition. This makes network depth implicit, varying based on the provided data and an error tolerance. Moreover, existing implicit models can be implemented and trained with fixed memory costs in exchange for additional computational costs. In particular, backpropagation through implicit depth models requires solving a Jacobian-based equation arising from the implicit function theorem. We propose a new Jacobian-free backpropagation (JFB) scheme that circumvents the need to solve Jacobian-based equations while maintaining fixed memory costs. This makes implicit depth models much cheaper to train and easy to implement. Numerical experiments show JFB is more computationally efficient while maintaining competitive accuracy for classification tasks


  • Oct 19. Gongguo Tang, CU Boulder ECEE Dept, Associate Prof.

Title: Geometry and algorithm for some nonconvex optimizations

Abstract: Great process has been made in the past few years in our understanding of nonconvex optimizations. In this talk, I will share with you three of our works in this direction. In one, we study low-rank matrix optimization with a general objective function, solved by factoring the matrix variable as the product of two smaller matrices. We characterize the global optimization geometry of the nonconvex factored problem and show that the corresponding objective function satisfies the robust strict saddle property as long as the original objective function satisfies restricted strong convexity and smoothness properties. In two, we recognized that many machine learning problems involve minimizing empirical risk functions with well-behaved population risks. Instead of analyzing the non-convex empirical risk directly, we first study the landscape of the corresponding population risk, which is usually easier to characterize, and then build a connection between the landscape of the empirical risk and its population risk. Lastly, we study the convergence behavior of alternating minimization, a class of algorithms widely used to solve the aforementioned nonconvex optimizations. We show that under mild assumptions on the (nonconvex) objective functions, these algorithms avoid strict saddles and converge to second-order optimal solutions almost surely from random initialization.


  • Oct 26. Ishan Kumar

Title: Machine Learning in Sports


  • Nov 2. Jessie Finocchiaro

Title: Designing loss functions for general prediction tasks

Abstract: Supervised machine learning algorithms often learn to predict by minimizing loss functions measuring error over a set of training data. The choice of loss function should be (a) easily optimized-- convex (b) correspond to what one wants to predict-- consistent, and (c) be efficient- having low prediction dimension. This presentation will introduce the notion of property elicitation and use it as a tool to design such losses for general (finite) prediction tasks via the embeddings framework. We introduce this framework as a way of constructing convex, consistent surrogate losses for discrete prediction tasks, such as high-confidence classifiers, rankings, top-k prediction, and structured prediction problems. Given such a task, we will finally introduce a tool to give lower bounds on the prediction dimension of these embeddings. No prior knowledge is necessary.


  • Nov 9. Student presentations

  • Nov 16. Student presentations

  • Nov 23. Thanksgiving holiday week. No talk


  • Nov 30. Aaditya Ramdas, CMU Department of Statistics and Machine Learning, Assistant Professor

Title: Comparing sequential forecasters

Abstract: Consider two or more forecasters, each making a sequence of predictions for different events over time. We ask a relatively basic question: how might we compare these forecasters, either online or post-hoc, while avoiding all unverifiable assumptions on how the forecasts or outcomes were generated?

This work presents a novel and rigorous answer to this question. We design a sequential inference procedure for estimating the time-varying difference in forecast quality as measured by a relatively large class of proper scoring rules. The resulting confidence intervals can be continuously monitored to yield statistically valid comparisons at arbitrary data-dependent stopping times ("anytime-valid"); this is enabled by employing variance-adaptive supermartingales.

Motivated by Shafer and Vovk's game-theoretic probability, our coverage guarantees are also distribution-free, in the sense that they make no distributional assumptions whatsoever on the forecasts or outcomes. We demonstrate their effectiveness by comparing forecasts on Major League Baseball (MLB) games and statistical postprocessing methods for ensemble weather forecasts.


  • Dec 7. TBD