Research Interests

Design and Analysis of Experiments

My primary area of research interest is design and analysis of experiments (DAE). Learning from experimentation and, more informally, trial and error is as old as humanity itself. Sound theory, principles and methods for designing experiments as part of the scientific endeavor were developed throughout the 20th century, with new challenges arising in a world that is hungry for data. Experiments are conducted in almost every academic discipline and throughout industry. As a result, it remains a vitally important area for students to learn about, both for those in statistics and those in other academic fields. Regretfully, too many miss out on learning the basics, perhaps in part because our courses on the topic have not always been great. For example, the connection between data collection and analysis is often not appreciated or understood, and many experiments have been conducted without the experimenter realizing that the collected data is useless to answer the questions of interest.

Research in DAE can take on many different forms. There is room for those interested in developing theory, for those interested in exploring new methodology, for those whose forte is in computing, and for those who are simply interested in advising how to run and analyze experiments properly. In short, DAE can be a field of dreams for researchers with widely different skill sets.

Within statistics, DAE is sometimes viewed as a bit out in left field. For those who hold this view, it may say more about them than about DAE. The pivotal role that DAE continues to play in science and industry places it at the very center of discovery and improvement. One has to be pretty far away from that center to perceive DAE as being somewhere out in left field.

The spectrum of my own research endeavors in DAE is reflected in my publication record. It has been driven by at least two factors: (1) furthering my own understanding of theory and methods; and (2) research interests of students and collaborators. A selective summary of topics within this spectrum includes:

  • design and analysis for experiments using supersaturated designs

  • optimal design inspired solutions for big data analysis

  • optimal design for generalized linear models and nonlinear models

  • optimal design for mixed effects models

  • design and analysis of event-related fMRI experiments

  • crossover designs

  • orthogonal arrays and fractional factorial experiments

  • combinatorial problems related to statistical designs

  • trend-free designs

  • designs for the comparison of a standard treatment to test treatments

  • connection between design of experiments and survey sampling designs

  • sampling plans when contiguous units provide similar information

Collectively, the work has helped me to understand some of the problems a little bit better, and hopefully this has had the same effect on my collaborators and students.

Big Data Challenges

As a discipline, statistics has its roots in applied data-oriented problems. In fact, its origins are from census counts in the 19th century. During most of the 20th century, the datasets that helped to fuel the growth of statistics, such as in agriculture, were by today's standards, quite small. Throughout this growth, statistics remained at its core a very applied discipline, especially for those working in industry and government and those involved in consulting in academia. In terms of academic research and teaching, the focus has shifted over time, more than once, and not necessarily in the same way at every institution. But certainly, there have been times and places with a much stronger emphasis on statistical theory. These days, with more computational power and better software, the emphasis is back to working with data, sometimes enormous amounts of data.

While statistics is the science of collecting, organizing, analyzing, and interpreting data, it (fortunately) does not have a patent on working with data. Many interesting developments, terms, techniques and ideas related to data have come from elsewhere. In particular, not surprisingly, terminology and studies about the volume of data often come from the computer science world. The term big data, initially not necessarily referring to a single large dataset but to the growth rate of the volume of data, has been used at least since the 1990s. However, the concept of information explosion goes back as far as the 1940s.

While the term big data does not mean the same to everyone, and is not liked by everyone, its current meaning refers primarily to a dataset that is too large to manage or process. Or, as stated here, "Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time." This obviously means that what is big data for me may not be big data for you, and what is big data today may not be big data tomorrow.

From a statistical perspective, there are at least two broad areas of challenges in the big data arena:

  1. Data collection. The larger volume of the data, easily collected electronically, cannot be used as a substitute for a careful data collection plan. What questions do we hope to answer, what inferences do we plan to make, what data should we collect, and from which population? Large quantities of low-quality data will not help to answer any of the questions of interest.

  2. Data exploration and analysis. Datasets may be too large for traditional visualization techniques or data analysis techniques.

To be completed.

Informatics, Data Analytics and Data Science

To be completed.