For students enrolled: paper signup (google sheets) AbstractsSPEAKER: Manuel Lladser, Associate Professor of Applied Mathematics at the University of ColoradoBoulder
TITLE: Approximation of Markovian functionals In this presentation, we will discuss work in progress to approximate the distribution of a socalled linear functional of the path of a Markov chain. An archetypical example of this are sojourntimes, such as those encountered in genomic sequence analyses, telecommunication protools, and inventory models. We will see how one can use lowrank matrices to approximate the distribution of said functionals in the L1 norm (as opposed to the traditional approach based on the L2 norm). The technical motivation for this is that the L1 norm is up to a constant factor equal to the "total variation distance," which has a more practical probabilistic interpretation, besides being the standard metric used to analyze Markovian processes. This work is in collaboration with Dr. Barrera from Universidad Adolfo Ibañez in Chile. SPEAKER: Daniel Larremore, Assistant Professor of BioFrontiers Institute & Department of Computer Science at the University of ColoradoBoulder
TITLE: A Physical Model for Efficient Ranking in Networks We present a principled model and algorithm to infer a hierarchical ranking of nodes in directed networks. Unlike other methods such as minimum violation ranking, it assigns realvalued scores to nodes rather than simply ordinal ranks, and it formalizes the assumption that interactions are more likely to occur between individuals with similar ranks. It provides a natural framework for a statistical significance test for distinguishing when the inferred hierarchy is due to the network topology or is instead due to random chance, and it can be used to perform inference tasks such as predicting the existence or direction of edges. The ranking is inferred by solving a linear system of equations, which is sparse if the network is; thus the resulting algorithm is extremely efficient and scalable. We illustrate these findings by analyzing real and synthetic data and show that our method outperforms others, in both speed and accuracy, in recovering the underlying ranks and predicting edge directions. This work is a collaboration with Caterina De Bacco and Cris Moore. SPEAKER: Keith Lindsay, Climate and Global Dynamics Laboratory, University Corporation for Atmospheric Research (UCAR)
TITLE: A NewtonKrylov Solver for Fast Spinup of Online Ocean Tracers A challenge that arises when simulating tracers in an ocean model is spinning up the tracers to be in balance with the model's circulation. This spinup is desirable for clean comparison of the modeled solution to observations, such as nutrient distributions, and for initializing transient experiments, such as those done with coupled climate carbon models (e.g., bomb radiocarbon). Two aspects of the challenge are the long time scales of ocean ventilation and the short time scales of processes in the upper ocean. We present results here that demonstrate the successful application of a NewtonKrylov based solver to efficiently spin up tracers in online ocean tracer simulations. SPEAKER: Peter Wills, PhD Student of Department of Applied Mathematics, University of Colorado Boulder
TITLE: Anomaly Detection on Graphs using the Resistance Metric In the era of big data, algorithms are needed to analyze graphical data (consisting of objects and relationships) that is dynamic (changing in time). In particular, one might wish to know when a change made to a graph significantly affects the flow of information on the graph. We propose an algorithm based on the graph resistance, which is a topologically sensitive measurement of distance between nodes on a graph. We provide quantitative metrics and experimental data indication the utility of this approach. SPEAKER: Kathleen Finlinson, PhD Student of Department of Applied Mathematics, University of Colorado Boulder
TITLE: Tunability of Neural Networks TBD SPEAKER: Wen Zhou, Assistant Professor of Statistics at Colorado State University  Fort Collins
TITLE: A Nonparametric Procedure to Detect Spurious Discoveries with Sparse Signals Identifying a subset of responseassociated covariates from a large number of candidates has become a fundamental tool for scientific discoveries in many fields, particularly in biology including the differential expression analysis in genomics, the genomewide association study (GWAS) in genetics, the critical transcription factor identification in the Encyclopedia of DNA Elements (ENCODE) project, etc. However, given the high dimensionality and the sparsity of signals in data from those researches, spurious discoveries can easily arise. In addition, the ubiquitous data with mixed types, along with sophisticated dependence structures, greatly limit the applicability of the traditional goodnessoffit based procedures. In this paper, we introduce a statistical measure on the goodness of spurious fit based on the maximum rank correlations among predictors and responses. The proposed statistic imposes no assumptions on the data types and underlying models, and can be regarded as a generalization of the maximum spurious correlation for linear models. We derive the asymptotic distribution of such goodness of fit of spurious under very mild assumptions on the associations among predictors and responses. Such an asymptotic distribution depends on the sample size, ambient dimension, the number of predictors under study, and the covariance information. We propose a multiplier bootstrap procedure to estimate such a distribution and utilize it as the benchmark to guard against spurious discoveries. It is also applied to the variable selection problems for the high dimensional generalized regressions. While the theory and method are convincingly illustrated by numerical studies, we applied our method to both GWAS and ENCODE studies to demonstrate that the proposed measure provides a statistical verification of the detected biomarkers in practice and reveals the necessity of a twostage or even multiple stage statistical approach for general genomic or genetic researches. SPEAKER: Jason Dou, PhD Student of Operations Management, University of Colorado Boulder
TITLE: A Least Squares Approach to Appointment Scheduling under Patient Cancellation and NoShow Behavior Patient cancellation and noshow behavior is a major challenge in appointment scheduling. A stochastic dynamic programming model for appointment scheduling is introduced. We develop a least squares Monte Carlo approach to tackle the problem and show promising performance via extensive numerical experiments. SPEAKER: Fan You, PhD Student of Operations Management, University of Colorado BoulderSPEAKER: Luis Tenorio, Associate Professor of Applied Mathematics and Statistics, Colorado School of MinesTITLE: An Approximate Dynamic Approach to a Rollinghorizon Appointment Scheduling Problem We consider a rollinghorizon appointment scheduling problem with multiple patient classes. The problem is formulated as an infinite horizon discounted cost Markov decision process. We consider affine and finitehorizon approximations and show that they admit compact representations and can be efficiently solves as small scale linear programs. A numerical study illustrates the performance of the heuristic control policies based on the approximations.
TITLE: Randomization Methods for Large Linear Leastsquares and Inverse Problems I will consider randomized versions of stochastic Newton and stochastic quasiNewton methods that can be used to solve large linear leastsquares and inverse problems where the large data sets present a significant computational burden (e.g., the size may exceed computer memory or data are collected in realtime). In the proposed framework, stochasticity is introduced in two different frameworks as a means to overcome these computational limitations. The randomized recursion defines quasimartingales with provable convergence properties. SPEAKER: Carlos MartinsFilho, Professor of Economics, University of Colorado Boulder
TITLE: Estimation of a Partially Linear Regression in Triangular Systems We propose kernelbased estimators for the components of a partially linear regression in a triangular system where endogenous regressors appear both in the linear and nonparametric components of the regression. Compared with other estimators currently available in the literature, e.g. the sieve estimators proposed in Ai and Chen (2003) or Otsu (2011), our estimators have explicit functional form and are much easier to implement. They rely on a set of assumptions introduced by Newley et al. (1999) that characterize what has become known as the "control function" approach for endogeneity in regression. We explore conditional moment restrictions that make this model suitable for additive regression estimation as in Kim et al. (1999) and Manzan and Zerom (2005). We establish consistency and squareroot n asymptotic normality of the estimator for the parameters in the linear component of the model, give a uniform rate of convergence, and establish the asymptotic normality for the estimator of the nonparametric component. In addition, for statistical inference, a consistent estimator for the covariance of the limiting distribution of the parametric estimator is provided. PRESENTER: Richard Clancy, PhD Student of Department of Applied Mathematics, University of Colorado BoulderTITLE: Lazy PCA: Even Faster SVD Decomposition Yet Without Agonizing Pain PRESENTER: Gabriel OrtizPena, PhD Student of Department of Astrophysical and Planetary Sciences, University of Colorado BoulderTITLE: Understanding Blackbox Predictions via Influence Functions SPEAKER: Yu Du, Assistant Professor of Business Analytics, University of Colorado Denver TITLE: Selective Linearization for Multiblock Statistical Learning Problems ABSTRACT We consider the problem of minimizing a sum of several convex nonsmooth functions. In this talk, we introduce a new algorithm called the selective linearization model, which iteratively linearizes all but one of the functions and employs simple proximal steps. The algorithm is a form of multiple operator splitting in which the order of processing partial functions is not fixed, but rather determined in the course of calculations. It proposes one of the first operatorsplitting type methods which are globally convergent for an arbitrary number of operators without artificial duplication of variables. This algorithm is a multiblock extension of the alternating linearization (ALIN) method for solving structured nonsmooth convex optimization problems. Global convergence is proved and estimates of the convergence rate are derived. Specifically, under a strong convexity condition, the number of iterations needed to achieve solution accuracy ε is of order O(ln(1/ε)/ε). The convergence rate analysis technique invented by us can also be used to derive the rate of convergence of the classical bundle ALIN method, for which no convergence rate estimate has been available so far. We report results of extensive comparison experiments in statistical learning problems such as largescale fused lasso regularization problem, overlapping group lasso problem and regularized support vector machine problem. The numerical results demonstrate the efficacy and accuracy of the method. SPEAKER: McKell Carter, Assistant Professor of Psychology and Neuroscience, University of Colorado Boulder TITLE: Information Processing Models of Neuroimaging Data ABSTRACT
