UM Statistics Student Seminar Website - Seminar Series 2020-2021

Past Seminar Events

Winter 2021

The Winter 2021 schedule is here.

Fall 2020

Thursday, December 3
- Speaker: Simon Fontaine
- Title: Speeding up R with C/C++/Fortran
- Abstract: R is a great language for statisticians because of its interactivity, simple syntax and access to multiple libraries produced by the community. These features also allow rapid prototyping, testing and debugging of new methods; however, the short implementation time is often outweighed by long execution times. To produce more efficient code and libraries, R possess simple interfaces to other languages such as C/C++ and Fortran, enabling efficient runtime with limited additional implementation. Indeed, executing the core calculations of a method in a compiled language can produce speed-ups in the order of 10-100x and up to 1000x in some cases. We will consider a few examples to showcase the increase in efficiency of compiled code compared to interpreted R code and build a simple R package from scratch to exhibit the simplicity of the process.
- Note: The second part of the presentation will be of the workshop type: if you wish to follow along, you will need to have the R packages “Rcpp” and “devtools” installed. This talk is presented by the UMich Statistics Computing Club.
- Link: [Slides] [Recording] (UMich Login Required)

Thursday, November 19
- Speaker: Yichao Li (Tsinghua University)
- Title: Stratification and Optimal Resampling for Sequential Monte Carlo
- Abstract: Sequential Monte Carlo, also known as particle filters, has been widely accepted as a powerful computational tool for making inference with dynamical systems. A key step in sequential Monte Carlo is resampling, which plays the role of steering the algorithm towards the future dynamics. Several strategies have been proposed and used in practice, including multinomial resampling, residual resampling, optimal resampling, stratified resampling, and optimal transport resampling. We show that, in one dimensional cases, optimal transport resampling is equivalent to stratified resampling on the sorted particles, and they both minimize the resampling variance as well as the expected squared energy distance between the original and resampled empirical distributions; in multidimensional cases, the variance upper bound of stratified resampling after sorting particles using Hilbert curve is improved, and the improved rate is the lowest for ordered stratified resampling schemes as conjectured in Gerber et al. (2019). We also present an almost sure bound on the Wasserstein distance between the original and Hilbert-curve-resampled empirical distributions. In light of these results, we show that, for the multi-dimensional cases, the mean square error of sequential quasi-Monte Carlo can be reduced to a lowest rate, to the best of our knowledge, by implementing Hilbert curve resampling and selecting a specific low-discrepancy set.
- Link: [Paper] [Slides] [Code] [Recording] (UMich Login Required)

Thursday, November 12
- Speaker: Emery Brown (MIT)
- Title: State-Space Multitaper Time-Frequency Analysis
- Abstract: Rapid growth in sensor and recording technologies is spurring rapid growth in time series data. Nonstationary and oscillatory structure in time series is commonly analyzed using time-varying spectral methods. These widely used techniques lack a statistical inference framework applicable to the entire time series. We develop a state-space multitaper (SS-MT) frame-work for time-varying spectral analysis of nonstationary time series. We efficiently implement the SS-MT spectrogram estimation algorithm in the frequency domain as parallel 1D complex Kalman filters. In analyses of human EEGs recorded under general anesthesia, the SS-MT paradigm provides enhanced denoising (>10 dB) and spectral resolution relative to standard multitaper methods, a flexible time-domain decomposition of the time series, and a broadly applicable, empirical Bayes’ framework for statistical inference.
- Link: [Paper] [Recording] (UMich Login Required)

Thursday, November 5, 1:00-2:00 PM ET
- Speaker: Frank Wood (University of British Columbia)
- Title: Uncertainty in Artificial Intelligence; Techniques from the Intersection of Deep Learning and Probabilistic Programming
- Abstract: In this talk I will put forward an argument that proper treatment of uncertainty remains relevant in the quest for artificial intelligence. I will introduce techniques from modern, deep probabilistic programming and argue that the languages and tools that implement these techniques are necessary to conduct the search over the space of inductive biases that include Bayesian computation. Specifically I will tell a story that ties together my group's recent work on neural processes, amortized variational inference, inference compilation, wake-sleep, auto-encoding sequential Monte Carlo, reinforcement learning as inference, and meta-learning, focussing on the AI-inspired inference problems at their core and showing off results achieved on some pretty cool modern AI and science problems.
- Alt: Wainright, Jordan, and Blei were right; variational inference is the way to go. Tran, Goodman, and Murphy were right; deep probabilistic programming is the way to go. Lecun, Hinton, and Bengio were right; deep neural networks do approximate inference. Gelman and Wiecki were right; Stan and PyMC3 are good enough for most small-data Bayesian statistical analyses.
- Link: [Recording] (UMich Login Required)

Thursday, October 29
- Speaker: Samuel Hansen
- Title: Library Resources and Tools for Statistics Research
- Abstract: In this seminar our Statistics and Mathematics Librarian Samuel Hansen will cover the numerous resources available through the University of Michigan Library. Beyond just talking about how to access and use these resources (arXiv; MathSciNet; Web of Science; Scopus; Springer, SIAM, AMS, and Taylor & Francis eBooks), they will explore different methods for diving deep into the literature of a topic and how to best manage all of the literature and citations which you find.
- Link: [Resource] [Recording] (UMich Login Required)

Thursday, October 22
- Topic: Recent (Industry-focused) Alumni and Internship Panel
- Panelists: Ziping, Laura, Anwesha, Jack, and Misha
- Abstract: We're excited to have two current PhD students who have done summer internships and welcome back our three PhD alumni who have gone on to pursue careers in industry. This event will be an opportunity to ask them about how they navigated the application process, how their experience prepared them for their careers, their reflections in different industries since graduation. We hope this will help our students to take a more informed decision and we encourage students from all years to come with any questions they may have for our panelists.

Thursday, October 15, 1:00-2:00 PM ET
- Topic: Recent Alumni Panel (Academia)
- Panelists: Yuqi Gu, Tianxi Li, Naveen N. Narisetty, Jingshen Wang
- Abstract: We're excited to welcome back four PhD alumni from our department who have gone on to pursue careers in academia. This event will be an opportunity to ask them about how their time at Michigan prepared them for their careers, how they navigated the application process, and what it's like to be a postdoc or professor. We encourage students from all years to come with any questions they may have for our panelists.

Thursday, October 8
- Title: Git's a Wonderful Life
- Speaker: Dan Kessler
- Abstract: Whether you're analyzing data, developing an R package, or even writing your dissertation, chances are you're editing text files. Effective version control of these files can provide protection against data loss and mistakes, improve your efficiency, help you isolate bugs, and facilitate collaboration with others. Git is a leading tool for version control, but many statisticians have at best a shallow understanding of how it works. In this talk, I'll present Git in a way that you may not have seen before. Rather than giving you a cookbook of steps to memorize, I'll explicate some of its underlying architecture. This will help you to understand how to use git effectively in a variety of situations. This presentation is designed to be very "hands on" and frequent questions from the audience are encouraged. Please ensure that you have git installed on your computer so that you can follow along.
- Note: This talk is presented by the UMich Statistics Computing Club. Learn more at our website or talk to us on our MS Teams channel.
- Link: [Slides] [Recording] (UMich Login Required)

Thursday, October 1
- Speaker: Karl Rohe (University of Wisconsin–Madison)
- Title: Vintage Factor Analysis with Varimax Performs Statistical Inference with an application to “elite-sphere” Twitter
- Abstract: This talk has two parts. The first part is methodological. The second part is applied.
  - Part 1: Factor Analysis is nearly a century old and remains popular today with practitioners. A key step, the factor rotation, is historically controversial because it appears to be unidentifiable. This controversy goes back as far as Charles Spearman. The unidentifiability is still reported in all modern multivariate textbooks. Part 1 of this talk will overturn this controversy and provide a positive theory for PCA with a varimax rotation. Just as sparsity helps to find a solution in p>n regression, we show that sparsity resolves the rotational invariance of factor analysis. PCA + varimax is fast to compute and provides a unified spectral estimation strategy for Stochastic Blockmodels, topic models (LDA), and nonnegative matrix factorization. Moreover, the estimator is consistent for an even broader class of models and the old factor analysis diagnostics (which have been used for nearly a century) assess the identifiability.
  - Part 2: This will discuss our lab’s website murmuration.wisc.edu that tracks 24 flocks of “elite-sphere” twitter users that discuss politics and current events. These flocks were computed with PCA + varimax applied to a ~400k x ~1M matrix that records who “elite-sphere” twitter users follow. We found flocks that represent Bernie Bros, White Nationalists, Academics, Media, and others. Everyday, we analyze ~500k tweets to document the key news events, list the flocks that discussed each event, and describe how each flock discussed each event.
- Link: [Slides] [Recording]

Thursday, September 24
- Speaker: Ben Dai (University of Minnesota)
- Title: Embedding Learning
- Abstract: Numerical embedding has become one standard technique for processing and analyzing unstructured data that cannot be expressed in a predefined fashion. It stores the main characteristics of data by mapping it onto a numerical vector. An embedding is often unsupervised and constructed by transfer learning from large-scale unannotated data. Given an embedding, a downstream learning method, referred to as a two-stage method, is applicable to unstructured data. In this article, we introduce a novel framework of embedding learning to deliver a higher learning accuracy than the two-stage method while identifying an optimal learning-adaptive embedding. In particular, we propose a concept of U-minimal sufficient learning-adaptive embeddings, based on which we seek an optimal one to maximize the learning accuracy subject to an embedding constraint. Moreover, when specializing the general framework to classification, we derive a graph embedding classifier based on a hyperlink tensor representing multiple hypergraphs, directed or undirected, characterizing multi-way relations of unstructured data. Numerically, we design algorithms based on blockwise coordinate descent and projected gradient descent to implement linear and feed-forward neural network classifiers, respectively. Theoretically, we establish a learning theory to quantify the generalization error of the proposed method. Moreover, we show, in linear regression, that the one-hot encoder is more preferable among two-stage methods, yet its dimension restriction hinders its predictive performance. For a graph embedding classifier, the generalization error matches up to the standard fast rate or the parametric rate for linear or nonlinear classification. Finally, we demonstrate the utility of the classifiers on two benchmarks in grammatical classification and sentiment analysis. Supplementary materials for this article are available online.
- Link: [Paper] [Slides] [Recording] (UMich Login Required)

Thursday, September 17
- Topic: Statistics in the Community: Community-University Partnerships Fostering Data Science Education
- Presenter: Stephen Salerno (STATCOM; Department of Biostatistics, University of Michigan)
- Abstract: Conceived in 2001 at Purdue University and chartered nationally in 2006 through an ASA Member Initiatives Grant, Statistics in the Community (STATCOM) is a volunteer program that offers free statistical consulting services to non-profit community and governmental organizations in the areas of data organization, analysis, and interpretation. As a founding chapter, STATCOM at the University of Michigan is a prime example of community engagement through data science education.

Since its charter, STATCOM at the University of Michigan has had eight student presidents, three faculty co-advisors, several faculty project advisors, and over 100 volunteers. Our volunteers are graduate students in the departments of Biostatistics, Statistics, Survey Methodology and others, who are passionate about serving the community. This talk will elaborate on how STATCOM’s model for university-community partnership benefits both our students and our partners.

Our clients benefit from assessing areas of unmet need, answering important operational questions, or conducting thorough program evaluations. Our students work with their data to communicate statistical concepts and key results to stakeholders. These skills supplement their coursework and carry forward into their careers. Most importantly, this hands-on experience positively impacts our local community and provides a service unique to the strengths of our student volunteers.

Links: [Slides] [Recording] (UMich Login Required)