Schedule

To receive announcements about talks please signup at the google groups page!

Welcome to the SOML / StatOptML (Statistics, Optimization, and Machine Learning) seminar for Spring 2017.

Note that all talks are in the Newton Lab (ECCR 257) unless otherwise specified.

Mar 21, Paul Constantine, Department of Applied Mathematics and Statistics, Colorado School of Mines
Time/Location: 3:30 - 4:30, DLC 170, Discovery Learning Center (Note the room change!!)

Title: Active Subspaces: Emerging Ideas for Dimension Reduction in Computational Science and Engineering Models

Abstract: Scientists and engineers use computer simulations to study relationships between a physical model's input and and its output predictions. However, thorough parameter studies -- e.g., constructing response surfaces, optimizing or averaging -- are challenging, if not impossible, when the simulation is expensive and the model has several inputs. To enable parameters studies in these cases, the engineer may attempt to reduce the dimension of the model's input parameter space. Active subspaces are part of an emerging set of subspace-based dimension reduction tools that identify important directions in the input parameter space. Constantine will (i) describe computational methods for discovering a model's active subspaces, (ii) propose strategies for exploiting the reduced dimension to enable otherwise infeasible parameter studies, and (iii) review results from several science and engineering applications. For more information, visit: activesubspaces.org.


Mar 14, Abtin Rahimian, Courant Institute of Mathematical Sciences, New York University
Time/Location: 3:30 - 4:30, DLC 170, Discovery Learning Center (Note the room change!!)

Title: Fast Algorithms for Structured Matrices in Simulations of Physical Systems

Abstract: Real-world complex phenomena are typically characterized by interacting physical processes, uncertain parameters, dynamic boundaries, and close coupling over a wide span of spatial and temporal scales. Predictive computational models of such phenomena inherit these characteristics and require many novel algorithmic components. In this talk, I will identify some common features and challenges in physical modeling, focusing on cellular hemodynamics and cell biomechanics, and outline algorithms that enable predictive simulations of these processes. I will discuss some recent advances in efficiently solving large linear systems arising from the discretization of such models using the Tensor-Train decomposition.


Mar 7, Alexandra Kolla, Assistant Professor of Computer Science, University of Illinois Urbana-Champaign
Time/Location: 3:30 - 4:30, DLC 170, Discovery Learning Center (Note the room change!!)

Title: The Sound of Graphs

Abstract: In this talk, Kolla will discuss major implications that linear algebraic techniques have in understanding and resolving hard computational and graph theoretical questions, as well as unifying various areas of mathematics and computer science. She will focus on two representative examples, stemming from two key areas of computer science, namely computational complexity and robust network design respectively:

. Resolving the infamous Unique Games Conjecture and its implications to the theory of inapproximability
. Constructing optimal expander graphs, and its implications to fault tolerant network design and clustering.

Kolla will show how, via the prism of spectral graph theory, those seemingly unrelated questions can be seen to be deeply connected. She will present techniques she developed to tackle both problems, which span a wide range of areas such as linear algebra, convex optimization, group theory, harmonic analysis and probability theory.


Feb 28, Lisa Natale, Department of Ecology & Evolutionary Biology, University of Colorado Boulder
Time/Location: 3:30 - 4:30, ECCR 257, Newton Lab

Title: How near is near enough?: Proximity-based social networks to aid California condor conservation

Abstract: When the California condor (Gymnogyps californianus) reached a direly low count in the 1980s, the U.S. Fish & Wildlife Service undertook an ardent recovery effort. While the program has been successful in raising numbers, condor populations are far from self-sustaining. Lead poisoning is a potent threat, especially for inland populations. Observed condor behavior suggests that the social structure of populations may influence individual birds' risk for poisoning. We are interested in exploring this potential relationship, and we seek to start by mapping the birds' social network using a rich spatio-temporal dataset containing observations of individual birds across many years. Building from the common assumption made in the construction of ecological proximity-based social networks, namely that co-location implies a social connection between individuals, we ask: on what timescale are two birds observed in the same location considered co-located (and hence socially linked)? I will briefly present two approaches to this problem and hope to receive more ideas to try from those in attendance. 


Feb 28, Joseph Benzaken, Department of Applied Mathematics, University of Colorado Boulder
Time/Location: 3:30 - 4:30, ECCR 257, Newton Lab

Title: A Parametric Framework for Propagation, Analysis, and Control of Geometric Variation in Engineering Design

Abstract: There s a great need for efficient, robust, and informative design space exploration tools throughout the engineering design cycle. In early-stage design, this tool help the end-user gain understanding of how changes in design features affect overall system response, narrow down the admissible design space through the application of design constraints, as well as determine optimality criteria for a given design problem. In late-stage design, this tool helps the end-user quantify, propagate, and ultimately control the effect of uncertain geometric variations after a final design has been selected. Despite the obvious need for effective design space exploration tools, current approaches are largely ad-hoc, application-specific, and they require frequent user interaction with both Compiler Aided Design (CAD) and Computer Aided Engineering (CAE) software packages. 

In this talk, I introduce a parametric framework for propagation, analysis, and control of geometric variation in engineering design. The isogeometric paradigm naturally yields the concept of a "geometric family" where, rather than treating each engineering model as an independent entity, we instead parametrize geometries belonging to the same "family" by a set of shared design parameters. A subset of geometries belonging to the same "family" are selected through a collocation-like method and analyzed. Subsequently, technologies emerging from the uncertainty quantification community provide the necessary tools for representing the entirety of the design space from these collocated samples. 

This methodology addresses the issues of manufacturing uncertainty in late-stage design by understanding the effects of perturbations in design parameters on the corresponding solution field, allowing the specification of geometric tolerances such that the resulting product remains in compliance with the predefined system requirements such as allowable stress or maximum displacement. This application is exemplified through the linear-elastic analysis of a square plate with a hole and an L-bracket, where we consider the influence of perturbations in stochastic geometric parameters on the resulting displacement field.


Feb 21, Nathan Heavner, Department of Applied Mathematics, University of Colorado Boulder
Time/Location: 3:30 - 4:30, ECCR 257, Newton Lab

Title: Randomized Algorithms for Analysis of High Dimensional Data

Abstract: Determining low-rank approximations of matrices is a problem that occurs commonly in areas of data mining and machine learning, such as Principal Component Analysis, Latent Semantic Indexing, and the PageRank algorithm, to name a few. We will discuss a method for computing an approximate singular value decomposition which is faster than classical deterministic methods and executes efficiently in parallel computing environments. Of particular interest is a procedure involving randomized projections which efficiently computes an approximate basis for the numerical range of a matrix. Further applications of the randomized range finder for other problem types in data analysis will be briefly discussed. 


Feb 21, Gregor Robinson, Department of Applied Mathematics, University of Colorado Boulder
Time/Location: 3:30 - 4:30, ECCR 257, Newton Lab

Title: Pursuing Intuitive Models for a Weather Mystery: MCMC for Selecting High-Dimensional Dynamics

Abstract: The Madden-Julian Oscillation (MJO) is a complex multiscale atmospheric phenomenon that directly impacts the monsoon for a majority of Earth's human population and has cascading effects on global climate. The mechanisms behind its formation are unclear, despite decades of thorough research. Detailed multiphysics weather models capture some of the MJO's dynamics with reasonable accuracy, but are too complicated to isolate the mechanisms responsible. A number of simplified models capture some of the qualitative features, but observation has failed to narrow the variety of simple models that point to dramatically different physics. It is often desirable to confront this kind of model comparison problem with a Bayesian framework. Bayesian model comparison not only allows one to assign relative probabilities to models, but also avoids overfitting by gracefully penalizing models with a large number of parameters. Unfortunately, long sample times hinder the application of Markov Chain Monte Carlo (MCMC) that is generally used to approximate the analytically intractable integrals required of most Bayesian problems. This talk will discuss why we still want to use Bayesian model comparison for the MJO, describe a simple way to parallelize MCMC without loss of accuracy, and touch on some ongoing work to mitigate the difficulties of performing MCMC in extremely high-dimensional search spaces. 


Feb 14, Shudong Hao, Department of Computer Science, University of Colorado Boulder
Time/Location: 3:30 - 4:30, ECCR 257, Newton Lab

Title: Learning Low-Resource Languages for Emergent Incidents

Abstract: Social network and social media have been a popular platform for communication. During a disaster, this platform becomes more important where people send out messages looking for help. For languages that few people outside the region understand and few resources available, however, it is much more difficult to understand the messages and provide assistance quickly. Traditional machine translation systems usually require huge language resources and long time for training an accurate model. Both requirements limit its usage in this situation. Multilingual topic models can learn consistent representations across languages without using huge dataset, which helps people understand the messages and respond to them quickly. 

In this presentation, I will briefly review the current multilingual topic models, and talk about how to increase language resources for training them with a native informant during a short time, so that it could be applied to emergent incidents. 
 

Feb 7, William Kleiber, Department of Applied Mathematics, University of Colorado Boulder
Time/Location: 2:30 - 3:30/ ECAD 100, Clark Conference Room (Engineering Dean's Office) (Note the time AND room change!)

Title: Simulation of High-Resolution Random Fields

Abstract: Simulation of spatial random fields is an essential goal in most geostatistical analyses. This talk will focus on two approaches to generating high resolution simulations of nonstationary Gaussian random fields. The first relies on spatial deformation of a stationary solution that can then be generated using extant approaches such as circulant embedding. The second generates approximate realizations for an arbitrary given covariance function. The procedure rests on sequential conditional simulations based on topologically connected regions in the domain. We suggest initializing this consecutive conditioning approach with an initial coarse resolution simulation over the domain to reduce bias of correlations at long distances. We provide a theoretical justification for the local prediction step. Both approaches are illustrated on climatological datasets.

Jan 31, Mohit IyyerDepartment of Computer Science, University of Maryland, College Park
Time/Location: 2:30 - 3:30/ ECCR 257, Newton Lab (Note the time change!)

Title: Deep Learning for Creative Language Understanding

Abstract: Creative language--the sort found in novels, film, and comics--contains a wide range of linguistic phenomena, from increased syntactic complexity (e.g., metaphors, sarcasm) to high-level discourse structures such as narrative and character arcs. In this talk, I explore how we can use deep learning to understand, generate, and answer questions about creative language. In particular, I present neural architectures for two different tasks involving creative language understanding: 1) modeling dynamic fictional relationships in novels and 2) predicting dialogue and artwork from comic book panels. I also propose a method to disentangle an author's writing style from the content of their words by strategically weakening parts of the network architecture. These tasks are motivated through quiz bowl, a trivia game that contains many questions about novels, paintings, and comics, for which I present deep models that are competitive with human players. I conclude by discussing future plans to build more engaging conversational agents by leveraging systems for creative language understanding and question answering.


Jan 24, Satyen Kale, Google Research NYC
Time/Location: 3:30 - 4:30/ ECCR 257, Newton Lab

Title: Online Boosting Algorithms

Abstract: We initiate the study of boosting in the online setting, where the task is to convert a "weak" online learner into a "strong" online learner. The notions of weak and strong online learners directly generalize the corresponding notions from standard batch boosting. For the classification setting, we develop two online boosting algorithms. The first algorithm is an online version of boosting-by-majority, and we prove that it is essentially optimal in terms of the number of weak learners and the sample complexity needed to achieve a specified accuracy. The second algorithm is adaptive and parameter-free, albeit not optimal.

For the regression setting, we give an online gradient boosting algorithm which converts a weak online learning algorithm for a base class of regressors into a strong online learning algorithm which works for the linear span of the base class. We also give a simpler boosting algorithm for regression that obtains a strong online learning algorithm which works well for the convex hull of the base class, and prove its optimality.