Keynote speakers of the JEF'2022 Conference (in alphabetical order)

Professor Matias D. Cattaneo

Department of Economics, Princeton University, USA.

Short Bio: Matias D. Cattaneo is a Professor of Operations Research and Financial Engineering (ORFE) at Princeton University, where he is also an Associated Faculty in the Department of Economics, the Center for Statistics and Machine Learning (CSML), and the Program in Latin American Studies (PLAS). His research spans econometrics, statistics, data science and decision science, with particular interests in program evaluation and causal inference. Most of his work is interdisciplinary and motivated by quantitative problems in the social, behavioral, and biomedical sciences. As part of his main research agenda, Matias has developed novel semi-/non-parametric, high-dimensional, and machine learning inference procedures with demonstrably superior robustness to tuning parameter and other implementation choices. Matias is currently an Amazon Scholar, and he has advised several governmental, multilateral, non-profit, and for-profit organizations around the world. He also serves in the editorial boards of the Journal of the American Statistical Association, Econometrica, Operations Research, the Journal of Business & Economic Statistics, Econometric Theory, the Econometrics Journal, and the Journal of Causal Inference.

Title: On Binscatter

(Paper ------ Slides)

Summary: Binscatter, or a binned scatter plot, is a very popular tool in applied microeconomics. It provides a flexible, yet parsimonious way of visualizing and summarizing mean, quantile, and other nonparametric regression functions in large data sets. It is also often used for informal evaluation of substantive hypotheses such as linearity or monotonicity of the unknown function. This paper presents a foundational econometric analysis of binscatter, offering an array of theoretical and practical results that aid both understanding current practices (i.e., their validity or lack thereof) as well as guiding future applications. In particular, we highlight important methodological problems related to covariate adjustment methods used in current practice, and provide a simple, valid approach. Our results include a principled choice for the number of bins, confidence intervals and bands, hypothesis tests for parametric and shape restrictions for mean, quantile, and other functions of interest, among other new methods, all applicable to canonical binscatter as well as to nonlinear, higher-order polynomial, smoothness-restricted and covariate-adjusted extensions thereof. Companion general-purpose software packages for Python, R, and Stata are provided. From a technical perspective, we present novel theoretical results for possibly nonlinear semi-parametric partitioning-based series estimation with random partitions that are of independent interest.

Professor Richard A. Davis, Howard Levene Professor of Statistics

Department of Statistics, Columbia University, New York, USA.

Short Bio: Richard Davis is the Howard Levene Professor of Statistics at Columbia University and former chair of the Statistics Department (2013-19). He has held academic positions at MIT, Colorado State University, and visiting appointments at numerous other universities. He was Hans Fischer Senior Fellow at the Technical University of Munich (2009-12), Villum Kan Rasmussen Visiting Professor (2011-13) at the University of Copenhagen, and Jubilee Professor at Chalmers University (2019). Davis is a fellow of the Institute of Mathematical Statistics and the American Statistical Association, and is an elected member of the International Statistical Institute. He was president of IMS in 2016 and Editor-in-Chief of Bernoulli Journal 2010-12. He is co-author (with Peter Brockwell) of the best-selling books, Time Series: Theory and Methods, Introduction to Time Series and Forecasting, and the time series analysis computer software package, ITSM2000. Together with Torben Andersen, Jens-Peter Kreiss, and Thomas Mikosch, he co-edited the Handbook in Financial Time Series and with Holan, Lund, and Ravishanker, the book, Handbook of Discrete-Valued Time Series. In 1998, he won (with collaborator W.T.M Dunsmuir) the Koopmans Prize for Econometric Theory.

Title: Time Series Estimation of the Dynamic Effects of Disaster-Type Shocks

(Paper ------ Slides)

Summary: This paper provides three results for SVARs under the assumption that the primitive shocks are mutually independent. First, a framework is proposed to study the dynamic effects of disaster-type shocks with infinite variance. We show that the least squares estimates of the VAR are consistent but have non-standard properties. Second, it is shown that the restrictions imposed on a SVAR can be validated by testing independence of the identified shocks. The test can be applied whether the data have fat or thin tails, and to over as well as exactly identified models. Third, the disaster shock is identified as the component with the largest kurtosis, where the mutually independent components are estimated using an estimator that is valid even in the presence of an infinite variance shock. Two applications are considered. In the first, the independence test is used to shed light on the conflicting evidence regarding the role of uncertainty in economic fluctuations. In the second, disaster shocks are shown to have short term economic impact arising mostly from feedback dynamics. (This is joint work with Serena Ng.)

Keynote speakers of the HDDA-XI Workshop (in alphabetical order)

Professor Yang Feng,

School of Global Public Health , New York University, New York, USA.

Short Bio: Yang Feng is an associate professor of biostatistics in the School of Global Public Health at New York University. Feng focuses on developing and applying machine learning methods in public health, high-dimensional data analysis, network models, nonparametric methods, and bioinformatics. He has published over 50 papers in journals including the Annals of Statistics, JASA, JRSSB, Biometrika, IEEE-PAMI, JMLR, Science Advances, JoE, JBES, etc. He is currently an associate editor for journals including JASA, JBES, and Statistica Sinica. His research is supported by NSF and NIH.

Title: Random Subspace Ensemble

Summary: We propose a flexible ensemble framework, Random Subspace Ensemble (RaSE). In the RaSE algorithm, we aggregate many weak learners, where each weak learner is trained in a subspace optimally selected from a collection of random subspaces using a base method. In addition, we show that in a high-dimensional framework, the number of random subspaces needs to be very large to guarantee that a subspace covering signals is selected. Therefore, we propose an iterative version of the RaSE algorithm and prove that under some specific conditions, a smaller number of generated random subspaces are needed to find a desirable subspace through iteration. We study the RaSE framework for classification where a general upper bound for the misclassification rate was derived, and for screening where the sure screening property was established. An extension called Super RaSE was proposed to allow the algorithm to select the optimal pair of base method and subspace during the ensemble process. The RaSE framework is implemented in the R package RaSEn on CRAN.


Relevant Papers and the R Package:

  • Tian, Y., & Feng, Y. (2021). RaSE: Random Subspace Ensemble Classification. J. Mach. Learn. Res., 22, 45-1.

  • Tian, Y., & Feng, Y. (2021). RaSE: A variable screening framework via random subspace ensembles. Journal of the American Statistical Association, (just-accepted), 1-30.

  • Zhu, J., & Feng, Y. (2021). Super RaSE: Super Random Subspace Ensemble Classification. Journal of Risk and Financial Management. 14(12):612

  • R Package RaSEn: https://cran.r-project.org/web/packages/RaSEn/index.html


Professor Arnoldo Frigessi,

Department of Biostatistics, University of Oslo, Norway.

Short Bio: Arnoldo Frigessi is professor of statistics at the University of Oslo and the Oslo University Hospital. He is director of the Oslo Center for Biostatistics and Epidemiology and of the centre for research based innovation BigInsight, a consortium of partners from academia and the public and private sectors. Frigessi develops new methods in statistics and machine learning and stochastic models to study principles, dynamics and patterns of complex dependence. Currently, he has research collaborations in genomics, personalised medicine, infectious disease modeling including Covid-19 and preference learning. Frigessi is elected member of the Norwegian Academy of Sciences and Letters and knighted Cavaliere Ordine al Merito della Repubblica Italiana.

Title: Coordinated Architectures Across Clusters in Integrative Studies: a Bayesian Two-Way Latent Structure Model

Summary: We propose a Bayesian parametric model for integrative, unsupervised clustering across data sources. Many of the methods proposed so far, make an unreasonable assumption of a common clustering across all data sources. In our two-way latent structure model, samples are clustered in relation to each specific data source, but cluster labels have across-dataset meaning, allowing cluster information to be shared between data sources. A common scaling across data sources is not required, and inference is obtained by a Gibbs Sampler, which we improve with a warm start strategy and modified density functions to robustify and speed convergence. Posterior interpretation allows for inference on common clusterings occurring among subsets of data sources. Our formulation makes no nestedness assumptions of samples across data sources so that a sample missing data from one genomic source can be clustered according to its existing data sources. We apply our model to a Norwegian breast cancer cohort of ductal carcinoma in situ and invasive tumors, comprised of somatic copy-number alteration, methylation and expression datasets. This is joint work with David M. Swanson, Tonje Lien, Helga Bergholtz and Therese Sørlie.

Professor Shuangge Ma,

Yale School of Public Health, Yale University, USA.

Short Bio: Dr. Shuangge Ma is a Professor of Biostatistics at Yale School of Public Health. He obtained his Ph.D. in statistics from the University of Wisconsin, Madison in 2004. Dr. Kjell Doksum was on his dissertation committee. He had his postdoctoral training at the University of Washington between 2004 and 2006. His research interests include genetic epidemiology, high-dimensional statistics, survival analysis, and cancer biostatistics.

Title: Gaussian graphical model-based heterogeneity analysis via penalized fusion

Summary: Heterogeneity is a hallmark of many complex diseases. This study has been motivated by the unsupervised heterogeneity analysis for complex diseases based on molecular and imaging data, for which, network-based analysis can be more informative than that limited to mean, variance, and other simple distributional properties. In the literature, there has been limited research on network-based heterogeneity analysis, and a common limitation shared by the existing techniques is that the number of subgroups needs to be specified a priori or in an ad hoc manner. We develop a novel approach for heterogeneity analysis based on the Gaussian graphical model. It applies penalization to the mean and precision matrix parameters to generate regularized and interpretable estimates. A fusion penalty is imposed to "automatedly" determine the number of subgroups. The heterogeneity analysis of non-small-cell lung cancer based on single-cell gene expression data of the Wnt pathway and that of lung adenocarcinoma based on histopathological imaging data not only demonstrate the practical applicability of the proposed approach but also lead to interesting new findings.

Professor Vijay Nair, Donald A. Darling Professor Emeritus of Statistics

Managing Director and Head of Modeling, Machine Learning, and Advanced Computing in the Corporate Model Risk Group at Wells Fargo

University of Michigan, USA.

Short Bio: Vijay Nair is currently with Wells Fargo Bank where he is a Managing Director and Head of the Advanced Technologies for Modeling (AToM) group in Corporate Model Risk. His group develops techniques, algorithms, and computational technologies to facilitate best practices in quantitative modeling. Team members have technical expertise in statistical and mathematical modeling, machine learning, NLP, AI, and advanced computing. Before joining Wells Fargo in 2016, Vijay was with the University of Michigan in Ann Arbor, where he was Donald A. Darling professor of statistics and professor of industrial & operations engineering. He was also a Distinguished Data Scientist at the Michigan Institute for Data Sciences. Vijay was Chair of the Statistics Department there for 12 years. Prior to that, he was a Research Scientist at Bell Labs. He has served as President of the International Statistical Institute and International Society for Business and Industrial Statistics as well as chief editor of several journals. He has been elected as Fellow of several professional societies: American Association for the Advancement of Science, American Statistical Association, American Society for Quality, and Institute of Mathematical Statistics.

Title: Supervised Machine Learning: Interpretability and Applications

Summary: Machine Learning techniques are increasingly used in a wide variety of application areas. They have better predictive performance than traditional statistical methods, and they work well with large datasets (large n and p). However, their use in regulated industries such as banking have been limited by the need for interpretability. In this presentation, I will describe state-of-the-practice in interpreting ML models, ranging from post-hoc techniques and use of surrogate models to inherently interpretable ML models. This will also cover research and applications in our own group.

Professor Anand N Vidyashankar,

Department of Statistics, George Mason University, Virginia, USA.

Short Bio: Anand Vidyashankar is a professor in the department of statistics at George Mason University. His current research interests are mathematical and probabilistic foundations of machine learning and deep learning, high-dimensional statistics, Markovian and Non-Markovian population processes, and Privacy and Security analytics. For his recent work, see anandnv.squarespace.com.

Title: Post Selection Inference and Local Dependence

Summary: Post-model selection inference is a significant research problem in high-dimensional data analysis, and several methods that account for model selection uncertainty are under intense study. In these problems, it is typically assumed that the relationship between the response and the covariate remains constant in the entire covariate space. A natural question then is the impact of this assumption when models are selected rather than fixed, as in traditional considerations. This presentation provides a detailed description of the bias in model selection and the resulting inference and methods to mitigate them. In the process and using different considerations, we describe a post-selection version of the correlation curve leading to a local dependence function in heterocorrelatious datasets-a terminology used by Prof. Doksum and his co-authors. Extensions of these concepts to streaming data are also provided.