# Machine Learning Seminar

This is the home page for the Machine Learning Seminar Series, sponsored by the Computer Science Department, at the University of Massachusetts Lowell. The faculty coordinator is Farhad Pourkamali, and please feel free to email him for more information. Unless otherwise noted, the meetings are on Wednesdays at 4:30 PM in DAN 321.

## Fall 2019 Speakers

### Jose Bento, Wednesday, September 4, 2019

**Title:** How should we (correctly) compare multiple graphs?* *

**Abstract: **Graph representations of real-world phenomena are ubiquitous – from social and information networks, to technological, biological, chemical, and brain networks. Many graph mining tasks require a distance (or, conversely, a similarity) measure between graphs. Examples include clustering of graphs and anomaly detection, nearest neighbor and similarity search, pattern recognition, and transfer learning. Such tasks find applications in diverse areas including image processing, chemistry, and social network analysis, to name a few. Intuitively, given two graphs, their distance is a score quantifying their structural differences. A highly desirable property for such a score is that it is a metric, i.e., it is non-negative, symmetric, positive-definite, and, crucially, satisfies the triangle inequality. Metrics exhibit significant computational advantages over non-metrics. For example, operations such as nearest-neighbor search, clustering, outlier detection, and diameter computation have known fast algorithms precisely when performed over objects embedded in a metric space. Unfortunately, algorithms to compute several classic distances between graphs do not scale to large graphs; other distances do not satisfy all of the metric properties: non-negativity, positive definiteness, symmetry, and triangle inequality. The purpose of this tutorial is to go over the recent and expanding literature of graph metric spaces, focusing specifically on tractable metrics. Furthermore, we also explain how to compute the distance between n graphs in a way that the resulting distance satisfy a generalization of the triangle inequality to n elements, and is still tractable.

### Stephen Becker, Wednesday, September 11, 2019

**Title:** Randomized tensor decompositions

**Abstract:** We discuss some of the work from our research group, starting with randomized tensor decompositions. Here, one seeks to decompose a tensor into a CP or Tucker or Interpolative decomposition. A significant challenge is working with streaming data, or datasets that are so large that they can only be loaded into memory piece-by-piece; one such example is data generated by a numerical simulation. By using stochastic "sketches", our methods handle these types of data sets, sometimes with provable guarantees. This is joint work with Osman Malik.

### Marek Petrik, Wednesday, September 18, 2019

**Title:** Algorithms for robust reinforcement learning

**Abstract: **Access to cheap and precise domain simulators is essential for many reinforcement learning algorithms. Domain experts, however, find building such simulators time-consuming, impractical, or even impossible. They would like to optimize their actions using logged datasets, which are typically too small, noisy, or biased. This leads to decisions that appear good in training but fail when deployed. In this talk, I will discuss new algorithms that can, with high probability, make good decisions with small datasets or imprecise domain models. We use Bayesian models to incorporate relevant prior information and robust MDPs to immunize solutions to data errors. The work is motivated by problems in ecology and agriculture, in which making a bad decision is expensive and data is inherently scarce.

### Jeremias Sulam, Wednesday, September 25, 2019

**Title:** Neural networks as sparse pursuit algorithms

**Abstract:** Over the last few decades sparsity has become a driving force in the development of new and better algorithms in signal and image processing. In the context of the late deep learning zenith, a recent pivotal work showed that deep neural networks can be interpreted and analyzed as pursuit algorithms seeking for sparse representations of signals belonging to a multilayer synthesis sparse model. In this talk, we will first review these works and further show that this observation is correct but incomplete, in the sense that such a model provides a symbiotic mixture of coupled synthesis and analysis sparse priors. We will make this observation precise and use it to expand on uniqueness guarantees and stability bounds for the pursuit of multilayer sparse representations. We will then explore a convex relaxation of the resulting pursuit and derive efficient optimization algorithms to approximate its solution. Importantly, we will deploy these algorithms in a supervised learning formulation that generalizes feed-forward convolutional neural networks into recurrent ones, improving their performance without increasing the number of parameters the model.

### Christina Boucher, Wednesday, October 2, 2019

**Title: **Developing computational methods to analyze foodborne illness on a large scale

**Abstract:** The money and time needed to sequence a genome have decreased remarkably in the past decade. With this decrease has come an increase in the number and rate at which sequence data is collected for public sequencing projects. This led to the existence of GenomeTrakr, which is alarge public effort to use genome sequencing for surveillance and detection of outbreaks of foodborne illnesses. This effort includes over 50,000 samples, spanning several species available through this initiative—a number that continues to rise as datasets are continually added. Unfortunately, analysis of this dataset has been limited due to its size. In this talk, I will describe our method for constructing the colored de Bruijn graph for large datasets that is based on partitioning the data into smaller datasets, building the colored de Bruijn graph using a FM-index based representation, and succinctly merging these representations to build a single graph. Finally, I will show its’ capability of building acolored de Bruijn graph for 16,000 strains from GenomeTrakr in a manner that allows it to be updated. Lastly, I conclude by outlining some opportunities for further study in this area.

### Alex Gittens, Wednesday, October 9, 2019

**Title:** Adaptive sketching for the low-rank tensor approximation problem

**Abstract: **Sketching has been used with great success to reduce the cost of linear algebraic computations that arise in scientific computation and machine learning applications. Recently there is interest in extending the use of sketching to non- convex problems; in particular, we consider the application of sketching to low CP- rank approximation of tensors. Prior works have introduced novel forms of sketching that are well-suited to this setting from the computational perspective, but have left open the question of whether the resulting heuristics converge. In this work, we provide generic conditions under which sketched, regularized alternating least squares algorithms guarantee convergence to a stationary point. Our results imply that the sketching rate must vary during the optimization procedure to ensure convergence; this contrasts with current practice, where the sketching rate is fixed a priori. Based on this insight, we introduce CPD-MWU, an algorithm that predicts and tracks the best choice of sketching rate over the course of the optimization, and show empirically that CPD-MWU performs as well or better than the other sketched ALS algorithms while also being much more robust to the sketching rate selection.

### Irene Chen, Wednesday, October 16, 2019

**Title:** Robustly Extracting Medical Knowledge from Electronic Health Records

**Abstract:** Increasingly large electronic health records (EHRs) provide an opportunity to algorithmically learn medical knowledge. In one prominent example, a causal health knowledge graph could learn relationships between diseases and symptoms and then serve as a diagnostic tool to be refined with additional clinical input. Prior research has demonstrated the ability to construct such a graph from over 270,000 emergency department patient visits. In this work, we describe methods to evaluate a health knowledge graph for robustness. Moving beyond precision and recall, we analyze for which diseases and for which patients the graph is most accurate. We identify sample size and unmeasured confounders as major sources of error in the health knowledge graph. We introduce a method to leverage non-linear functions in building the causal graph to better understand existing model assumptions. Finally, to assess model generalizability, we extend to a larger set of complete patient visits within a hospital system. We conclude with a discussion on how to robustly extract medical knowledge from EHRs.

### Soledad Villar, Wednesday, October 23, 2019

**Title:** Classification-aware dimensionality reduction and genetic marker selection

**Abstract:** Classification and dimensionality reduction are two fundamental problems in mathematical data science. Given labeled points in a high-dimensional vector space, we seek a projection onto a low dimensional subspace that maintains the classification structure of the data. Taking inspiration from large margin nearest neighbor classification, this work introduces SqueezeFit, a semidefinite relaxation of this problem. Unlike its predecessors, this relaxation is amenable to theoretical analysis, allowing us to provably recover a planted projection operator from the data. We apply this framework to the marker selection problem, where we use linear programming to find the markers in genetic data that exhibit the classification structure of single-cell RNA sequencing data.

### James Murphy, Wednesday, October 30, 2019

**Title:** Data-Dependent Distances for Unsupervised and Active Learning

**Abstract:** Approaches to unsupervised clustering and active learning with data-dependent distances are proposed. By considering metrics derived from data-driven graphs, robustness to noise and class geometry is achieved. The proposed algorithms enjoy theoretical guarantees on flexible data models, and also have quasilinear computational complexity in the number of data points. Applications to a range of real data will be shown, demonstrating the practical applicability of our methods. Portions of this work are joint with Anna Little (Michigan State University) and Mauro Maggioni (Johns Hopkins University).

### Carlotta Domeniconi, Wednesday, November 6, 2019

**Title:** Learning in Complex, Large, and Dynamic Networks

**Abstract:** Community detection and role detection in networks are broad problems with many applications. Existing methods for community discovery frequently make use of edge density and node attributes; however, the methods ultimately have different definitions of community and build strong assumptions about community features into their models. Furthermore, users in online social networks often have very different structural positions which may be attributed to a latent factor: roles. In this talk, I will present recent advances in community and role discovery in networks, and discuss how their interplay can lead to a new definition of community viewed as interactions of roles. This is joint work with Matt Revelle.

### Brian Kulis, Wednesday, November 13, 2019

**Title: ** New Directions in Metric Learning

**Abstract: **Metric learning is a supervised machine learning problem concerned with learning a task-specific distance function from supervised data. It has found numerous applications in problems such as similarity search, clustering, and ranking. Much of the foundational work in this area focused on the class of so-called Mahalanobis metrics, which may be viewed as Euclidean distances after linear transformations of the data. This talk will describe two recent directions in metric learning: deep metric learning and divergence learning. The first replaces the linear transformations with the output of a neural network, while the second considers a broader class than Mahalanobis metrics. I will discuss some of my recent work along both of these fronts, as well as ongoing attempts to combine these approaches together using a novel framework called deep divergences.

### Tina Eliassi-Rad, Wednesday, November 20, 2019

**Title:** Just Machine Learning

**Abstract:** Tom Mitchell in his 1997 Machine Learning textbook defined the well-posed learning problem as follows: “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.” In this talk, I will discuss current tasks, experiences, and performance measures as they pertain to fairness in machine learning. The most popular task thus far has been risk assessment. For example, Jack’s risk of defaulting on a loan is 8, Jill’s is 2; Ed’s risk of recidivism is 9, Peter’s is 1. We know this task definition comes with impossibility results (e.g., see Kleinberg et al. 2016, Chouldechova 2016). I will highlight new findings in terms of these impossibility results. In addition, most human decision-makers seem to use risk estimates for efficiency purposes and not to make fairer decisions. The task of risk assessment seems to enable efficiency instead of fairness. I will present an alternative task definition whose goal is to provide more context to the human decision-maker. The problems surrounding experience have received the most attention. Joy Buolamwini (MIT Media Lab) refers to these as the “under-sampled majority” problem. The majority of the population is non-white, non-male; however, white males are overrepresented in the training data. Not being properly represented in the training data comes at a cost to the under-sampled majority when machine learning algorithms are used to aid human decision-makers. There are many well-documented incidents here; for example, facial recognition systems have poor performance on dark-skinned people. In terms of performance measures, there are a variety of definitions here from group- to individual-fairness, from anti-classification, to classification parity, to calibration. I will discuss our null model for fairness and demonstrate how to use deviations from this null model to measure favoritism and prejudice in the data.

### Paul Hand, Wednesday, December 4, 2019

**Title:** Neural Network Priors for Inverse Imaging Problems

**Abstract: **Recovering images from very few measurements is an important task in medical, biological, astronomical, and other image problems. Doing so requires assuming a model of what makes some images natural. Such a model is called an image prior. Classical priors such as sparsity have led to the speedup of Magnetic Resonance Imaging by a factor of 10x in certain cases. With the recent developments in machine learning, neural networks have been shown to provide efficient and effective priors for inverse problems arising in imaging. They can, for example, achieve comparable recovery quality as classical sparsity priors with 10x less data. In this talk, we will discuss three different philosophies of neural network image priors. First are generative models which describe desired images as belonging to an explicitly parameterized low-dimensional manifold. Second are invertible priors, which learn a probability distribution over all images. Third are unlearned priors, which include neural networks which require no training at all. For all these methods, we will see how to exploit the prior in a tractable optimization problem. We will present empirical performance and discuss relevant theoretical results and opportunities.