What's New

Recent News

Jordan Urges Both Computational, Inferential Thinking in Data Science.
- --- 2016. Presented at the National Science Foundation (NSF) in late January for the NSF Data Science Seminar Series.
- In a concise 25-minute presentation, Jordan made his case for why one should think of data science as the combination of computational and inferential thinking, noting that the most appealing challenge in Big Data for him is the potential for personalization. He started with the challenge that the core theories in computer science and statistics were developed separately and there is an oil and water problem to be surmounted. As an example, he noted that core statistical theory does not have a place for runtime and other computational resources while core computational theory does not have a place for statistical risk.....
Teaching Intro Stats Students to Think with Data.
- --- 2016.
- The ASA Curriculum Guidelines for Undergraduate Programs in Statistical Science (PDF) states, “Institutions need to ensure students entering the work force or heading to graduate school have the appropriate capacity to ‘think with data’ and to pose and answer statistical questions.” The guidelines also note the increasing importance of data science. While the guidelines were explicitly silent about the first course, they do state the following:.....
How Data Analytics is going to transform all industries.
- --- 2016. At Stanford's first Women in Data Science Conference, engineers from industry and academia discuss personalized medicine, entertainment, marketing, cybersecurity and more.
- Almost anywhere we turn, evidence of a data revolution abounds. That realization suffused the inaugural Women in Data Science Conference at Stanford. Sharing the opening stage with Drell was conference organizer Margot Gerritsen, associate professor of energy resources engineering at Stanford and director of the Institute for Computational and Mathematical Engineering. Gerritsen amplified Drell's remarks, saying: “Data science is a very rapidly growing field of increasing importance. So much research and business decisions are based on data. If we want to ask all of the right questions and analyze all aspects of a problem, we need diversity and multidisciplinary thinking.” Here are several insights that emerged from this daylong exploration of our unprecedented ability to harness the power of data....

Recent Interesting Papers

Heterogeneity Adjustment with Applications to Graphical Model Inference by Jianqing Fan et al..
- Heterogeneity is an unwanted variation when analyzing aggregated datasets from multiple sources. Though different methods have been proposed for heterogeneity adjustment, no systematic theory exists to justify these methods. In this work, we propose a generic framework named ALPHA (short for Adaptive Low-rank Principal Heterogeneity Adjustment ) to model, estimate, and adjust heterogeneity from the original data. Once the heterogeneity is adjusted, we are able to remove the biases of batch effects and to enhance the inferential power by aggregating the homogeneous residuals from multiple sources. Under a pervasive assumption that the latent heterogeneity factors simultaneously affect a large fraction of observed variables, we provide a rigorous theory to justify the proposed framework. Our framework also allows the incorporation of informative covariates and appeals to the "Bless of Dimensionality". As an illustrative application of this generic framework, we consider a problem of estimating high-dimensional precision matrix for graphical model inference based on multiple datasets. We also provide thorough numerical studies on both synthetic datasets and a brain imaging dataset to demonstrate the efficacy of the developed theory and methods.
The fifty years of data science by David L. Donoho.
- Drawing on work by statisticians John Tukey, John Chambers, Bill Cleveland and Leo Breiman, the author presents a vision of data science based on the activities of people who are ‘learning from data’, and describes an academic field dedicated to improving that activity in an evidence-based manner. This new field is a better academic enlargement of statistics and machine learning than today’s Data Science Initiatives, while being able to accommodate the same short-term goals.
A knockoff filter for high-dimensional selective inference by Emmanuel J. Candes et al..
- This paper develops a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the number of observational units. In this framework, the observations are split into two groups, where the first group is used to screen for a set of potentially relevant variables, whereas the second is used for inference over this reduced set of variables; the authors also develop strategies for leveraging information from the first part of the data at the inference step for greater accuracy.
Iterative Hessian sketch: Fast and accurate solution approximation for constrained least-squares by Martin J. Wainwright et al..
- They study randomized sketching methods for approximately solving least-squares problem with a general convex constraint. The quality of a least-squares approximation can be assessed in different ways: either in terms of the value of the quadratic objective function (cost approximation), or in terms of some distance measure between the approximate minimizer and the true minimizer (solution approximation). Focusing on the latter criterion, their first main result provides a general lower bound on any randomized method that sketches both the data matrix and vector in a least-squares problem; as a surprising consequence, the most widely used least-squares sketch is sub-optimal for solution approximation. They then present a new method known as the iterative Hessian sketch, and show that it can be used to obtain approximations to the original least-squares problem using a projection dimension proportional to the statistical complexity of the least-squares minimizer, and a logarithmic number of iterations.

Highly Cited Papers that Influence Me

Decoding by linear programming (2004) by Emmanuel J. Candes and Terence Tao.
Ideal Spatial adaptation by wavelet shrinkage (1994) by David L. Donoho and Iain M. Johnstone.
Compressed sensing by David L. Donoho (2006).
Limiting the Risk of Bayes and Empirical Bayes Estimators (1978) by Bradley Efron.
Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction (2010) by Bradley Efron , IMS Monographs, Cambridge University Press.
Local Linear Regression Smoothers and Their Minimax Efficiencies (1993) by Jianqing Fan .
Test of Significance Based on Wavelet Thresholding and Neyman's Truncation (1996) by Jianqing Fan .
Additive logistic regression: a statistical view of boosting (With Discussions) (2000) by Jerome Friedman, Trevor Hastie and Robert Tibshirani.

Google Sites

Report abuse