Workshops

The Chapter tries to offer a one day technical workshop every Fall and sometimes at other times as well.

Upcoming Workshop

Updates in late Summer 2023

Past Workshops

October 14, 2022

An Introduction to Second-generation p-values and their Use in Statistical Practice: Jeffrey D. Blume and Megan H. Murray

Second-generation p-values were recently proposed to address the well-known imperfections of classical p-values. Their implementation can largely be thought of as codifying ‘good standard practice’ for interpreting and reporting classical p-values. Second-generation p-values maintain the favorable properties of classical p-values while emphasizing scientific relevance to expand their utility, functionality, and applicability. In particular, they can report evidence in favor of the alternative, in favor of the null hypothesis, or neither (inconclusive); they automatically incorporate an adjustment for multiple comparisons and multiple looks; they have lower false discovery rates than classical p-values; and they are easier to interpret. Second-generation p-values have been shown to work well in regularized models. They also lead to significantly improved model selection procedures in linear and generalized linear models. Also, second-generation p-values are non-denominational in the sense that they are readily applied in frequentist, likelihood and Bayesian settings.

This course will briefly revisit the history of p-values as originally envisioned in significance and hypothesis testing, and the resulting controversy over their scientific interpretation. The importance of distinguishing between three key inferential quantities (the measure of the strength of evidence, design error rates, and false discovery rates for observed data) will be illustrated. The second-generation p-value will be introduced and contrasted with standard methods. The workshop will explain how to design studies in which the second-generation p-value is used as the primary mode of inference. We will cover computation of second-generation p-values (in R), guidelines for presenting results, and when appropriate, how to present accompanying false discovery rates. Multiple examples will be presented using data from clinical trials, observations studies and high-dimensional analysis of large-scale data. Advanced applications in model selection, adaptive monitoring of clinical trials, and regularized models will be shown if time. Mathematical details will be kept to a minimum, e.g., statistical properties will be presented but without mathematical proof.

October 18, 2019

Essential Bayes: Paradigm, Techniques, Software, and Applications: Fang Chen, SAS Institute

This course reviews the fundamentals of Bayesian methods (prior distributions, inferences, multilevel modeling, and so on), introduces computational techniques (simulation algorithms such as MCMC, Metropolis, and Hamiltonian Monte Carlo; approximation algorithms such as variational Bayes, Expectation Propagation), surveys modern-day Bayesian software packages (BUGS, NIMBLE, Stan, PROC MCMC), and presents Bayesian treatment of various statistical topics, including regression models, multilevel hierarchical models, missing data analysis, model assessment, and predictions. The tutorial emphasizes the practical aspect of performing Bayesian analysis, concepts and guidance are demonstrated through software and examples. SAS® software is used for analyses, including the MCMC procedure for general modeling and the specialized BGLIMM procedure for Bayesian generalized mixed models.

October 15, 2018

Data Mining using JMP Pro: Dick DeVeaux, Williams College

A well-constructed model will provide you with the much-needed insights that can ultimately help your organization make more strategic, informed decisions. And the most successful models are those that originate with data mining. Noted author, applied statistician and data modeling expert Dick De Veaux says preparing your data for analysis is just as important as the analysis itself. Figure out which data are relevant. Identify trends you need to further explore with data visualization. Sift through the extraneous information to pull only the most important variables into an analysis and eliminate missing values before they become problematic. De Veaux will present a series of cross-industry case studies using JMP Pro software that showcase the ways in which successful models helped drive real improvements. You will learn how to identify the relationships and correlations in your data set that have implications for your work; explore your data and prepare data sets for analysis; design the methodology that best fits the data you have available; and turn predictive analytics from a job requirement to a professional advantage.

September 9, 2017

Applied Longitudinal Analysis: Garrett Fitzmaurice, Harvard Medical School

The goal of this course is to provide a broad introduction to statistical methods for analyzing longitudinal data. The emphasis is on the practical aspects of longitudinal analysis. I begin the course with a review of established methods for longitudinal data analysis when the response of interest is continuous and present a general introduction to linear mixed effects models for continuous responses. Next, I discuss how smoothing and semi-parametric regression allow greater flexibility for the form of the relationship between the mean response and covariates. I demonstrate how the mixed model representation of penalized splines makes this extension straightforward. When the response of interest is categorical (e.g., binary or count data), two main extensions of generalized linear models to longitudinal data have been proposed: "marginal models" and "generalized linear mixed models." While both classes account for the within-subject correlation among the repeated measures, they differ in approach. We will highlight the main distinctions between these two types of models and discuss the types of scientific questions addressed by each.

September 10, 2016

R Programming: From the Classroom to the Real World: Jay Emerson, Yale University

This course reviews the core R language and teaches essentials of R programming to R users at a range of levels. It uses real-world data problems and emphasizes graphical exploration and development. This Cleveland iteration of the course is designed for participants who have some prior experience working with R (roughly at the "advanced beginner" level or higher). The distinction between programming (or scripting) with R and using R is an important one. Most people can use R as a tool for a small number of focused tasks that fit neatly into different boxes. This workshop emphasizes problem solving outside-the-box, where no single function or package is likely to be sufficient. The process is as important as the solution, and this approach to the R language is invaluable in the classroom and in the real world.

October 19, 2015

Using Propensity Scores to Effectively Design and Analyze Observational Studies: Thomas E. Love, Ph.D., Case Western Reserve University

This course describes and demonstrates effective strategies for using propensity score methods to address the potential for selection bias in observational studies comparing the effectiveness of treatments or exposures. We review the main analytical techniques associated with propensity score methods (matching, weighting, multivariate regression adjustment and stratification using the propensity score, sensitivity analysis for matched samples) and describe key strategic concerns related to effective propensity score estimation, assessment and display of covariate balance, choice of analytic technique, and communicating results effectively. Although we will focus on established approaches to dealing with design and analytical challenges, we conclude the session by reviewing some literature regarding recent methodological advances in propensity scores and application of propensity score methods to problems in health policy research.

October 25, 2014

An Introduction to Using R for Data Visualization: Robert Kabacoff, Management Research Group (MRG)

R has become one of the most popular languages for data analysis and graphics. Its ability to create sophisticated, highly customized, publication quality graphs on multiple platforms is unparalleled. The first half of this one-day workshop will include an introduction to R (R syntax, common mistakes, using packages) and basic data management (importing, cleaning, reformatting data). The second half of this workshop will focus on data visualization techniques including a practical review of R's major graphing capabilities (including base functionality), as well as new capabilities provided by packages such as ggplot2 (grammar of graphics) to create a wide range of univariate and multivariate graphs.

October 4 and 5, 2013

Workshop (October 4): Introduction to Statistical Computing with R and Bioconductor

Symposium (October 5): Statistics at the Crossroads: Its Multifaceted Impact on the Society

This workshop and symposium, jointly presented by the Cleveland Chapter of the ASA and the University of Akron Department of Statistics, celebrated the International Year of Statistics. The hands-on workshop covered the basics of R applied to a variety of fields and the basics of Bioconductor for cutting-edge biomedical applications. The symposium explored the evolving role of statistics across society.

October 5, 2012

Introduction to Structural Equation Modeling: Douglas Gunzler, Case Western Reserve University

This workshop will provide an introduction to structural equation modeling (SEM), a very general technique combining path models with latent (unobserved) variables. SEM-based approaches link conceptual models, path diagrams and regression-style equations together to capture complex and dynamic relationships among a web of variables. Topics will include the advantages of SEM-based approaches, modeling causal relationships, measurement error, model specification, and assessing model fit. The speaker will go through steps of conducting SEM analyses with real data using MPlus, one of the major software packages for SEM.

October 5, 2011

A Variety of Workshops at The Ninth International Conference on Health Policy and Statistics

The Ninth International Conference on Health Policy and Statistics (ICHPS) was held in Cleveland this year on October 5-7. Part of the conference included a full day of workshops by some well known speakers on a variety of statistical, methodological, and measurement topics. The Cleveland Chapter of the ASA was a co-sponsor of ICHPS. These workshops served as an alternative to our usual annual Fall workshop.

November 11, 2010

Hands on Workshop of R: Jay Kerns and Andy Chang, Youngstown State University

The workshop will be held in a computer lab. The morning session will be an introduction into R, data management using R, graphics in R and similar topics. The afternoon session will include advanced topics implemented in R from multiple linear regression, logistic regression, survival analysis, and re-sampling.

October 30, 2009

The Goals and Methods of Genetic Epidemiology : Robert C. Elston, Case Western Reserve University [This segment will be rescheduled to a later date]

This course will comprise two components each lasting about 1.5 hours: (1) Definitions of terms used in human genetics and an introduction to the goals of genetic epidemiology, with an explanation of genetic segregation, linkage and association analysis. (2) An overview of what can be done with, and how to use, the Statistical Analysis for Genetic Epidemiology (S.A.G.E.) package of software: this will include a description of the purpose of each S.A.G.E. program, how to format and input data to the S.A.G.E. package, and a brief demonstration of the S.A.G.E. GUI. (http://darwin.cwru.edu/sage/)

Sample Size and Power Calculations: Paul Mathews, Mathews Malnar and Bailey, Inc.

Paul will present methods for calculating sample size for confidence intervals and sample size and power for hypothesis tests for: means, standard deviations, proportions, counts, regression, correlation and agreement, ANOVA for fixed and random effects, reliability, process capability, and gage error studies. Practical methods using large sample approximations, variable transformations, and the delta method will be emphasized, but exact methods will be noted where the approximate methods fail. Paul will also demonstrate software solutions using Russ Lenth's FREE Piface program (www.stat.uiowa.edu/~rlenth/Power/), Power and Sample Size (PASS, by NCSS Inc., www.ncss.com), R (www.r-project.org), and MINITAB (www.minitab.com).

October 21, 2008

Using Propensity Score Methods Effectively To Deal With Selection Bias: Thomas E. Love, Case Western Reserve University

This course is designed to provide a friendly, applied, and practical survey of propensity score methods used for dealing with selection bias in observational studies of exposure effects. Propensity score methods are applicable whenever treatment or policy decisions are of interest, and numerous examples and illustrations will be presented from a variety of subject areas drawn from published articles in biostatistics, education, and public health research and from the speaker's experiences working with industrial clients in insurance, market research, and management consulting.

October 29, 2007

Kernel Methods In A Regularization Framework For Nonparametric Model Building: Yoonkyung Lee, The Ohio State University

Regularization methods for model building and prediction are popular in statistics and machine learning. They may be viewed as the procedures that modify the maximum likelihood principle or the principle of empirical risk minimization. In particular, methods of regularization in reproducing kernel Hilbert spaces provide a unified framework for nonparametric statistical model building. Examples include smoothing splines and support vector machines.

In this workshop, kernel methods are explained with examples, and some issues of model selection and computation for the practical implementation of the methods are discussed. Applications to genomic data for building a medical diagnostic algorithm and selecting relevant biomarkers will be presented, as will marketing data for finding predictive demographic factors.

December 7, 2006

The Design of Industrial Screening Experiments: Angela Dean, The Ohio State University

Screening is the process of using designed experiments and statistical analyses to sift through a very large number of features, such as factors, genes or compounds, in order to discover the few features that influence a measured response. In current research, screening methods are actively being developed for the detection of factors which have a substantial effect on the average response or response variability in a complex system. In particular, the design and analysis of supersaturated and group screening experiments has been shown to be effective for this purpose and much research has recently been done in this area.

The workshop will discuss recent work on the construction of supersaturated designs as well as methods of analysis. Various types of two-stage screening experiments and their uses in searching for active factors will be discussed, and a description given of a recent group screening experiment that was run successfully at Jaguar Cars. A comparison between the methodology of supersaturated designs and two-stage group screening will be presented.

December 15, 2005

Hands-On Bayesian Data Analysis Using WinBUGS: William F. Guthrie, National Institute of Standards and Technology

This workshop is designed to provide statisticians, scientists, or engineers with the tools necessary to begin to use Bayesian inference in applied problems. Participants in the course will learn the basics of Bayesian modeling and inference using Markov chain Monte Carlo simulation with the open-source software package WinBUGS.

The workshop will introduce some of the theory underlying Bayesian analysis, but will primarily focus on Bayesian analysis of "real-world" scientific applications using examples from collaborative research with NIST scientists and engineers. Topics discussed will include Bayesian modeling, Markov chain Monte Carlo algorithms, convergence tests, model validation, and inference.

October 11, 2004

Using Propensity Score Methods Effectively: Thomas E. Love, Case Western Reserve University and MetroHealth Medical Center

This course is designed to provide a friendly, applied and practical survey of propensity score methods used for dealing with selection bias in observational studies of exposure effects. Propensity score methods are applicable whenever treatment or policy decisions are of interest, and numerous examples and illustrations will be presented from a variety of subject areas drawn from published articles in biostatistics, education and public health research and from the speaker's experiences working with industrial clients in insurance, market research and management consulting.

October 27, 2003

Analysis of Curves: Esteban Walker, University of Tennessee

Advances in technology have dramatically increased the amount and quality of data that are recorded in all areas of human endeavor. Thousands of measurements are available nowadays in situations where previously only a few measurements, at given points in time or space, were taken. These measurements allow the reconstruction of the whole profile or "signature". Basically, the profile becomes the unit of analysis. This seminar will discuss two problems with profiles: (1) how to determine if predetermined sets of curves are different, and (2) how to identify clusters in a set of curves. Examples from various fields will be presented. Instructions on the implementation of these techniques in S-Plus and SAS will be given.

October 14, 2002

A Review: Missing Data: Joseph Schafer, Pennsylvania State University

Statistical procedures for missing data have vastly improved in the last two decades, yet misconception and unsound practice still abound. In this seminar, we will

frame the missing-data problem from a statistician's viewpoint
introduce fundamental concepts regarding the distribution of missingness and missing at random (MAR)
review older, ad hoc procedures including case deletion and single imputation, discussing their merits and weaknesses
discuss the theory, implementation and use of maximum likelihood (ML) for incomplete data problems
introduce the idea of multiple imputation (MI), discuss its properties, and demonstrate its use on a real data example
discuss some of the latest developments in the missing-data literature that have appeared in the last five years, including weighted estimating equations and methods for handling missing values that are not MAR.

October 3, 2001

Experiments: Planning, Analysis, and Parameter Design Optimization : C. F. Jeff Wu, The University of Michigan

This seminar will be based on the book "Experiments: Planning, Analysis, and Parameter Design Optimization" by Jeff Wu and Mike Hamada (2000). Course notes will be made available to attendees. This book contains many new methods not found in existing textbooks, and covers more than 80 data sets and 200 exercises. The new tools covered include robust parameter design, use of minimum aberration criterion for optimal factor assignment, orthogonal arrays of economic run size, analysis strategies to exploit interactions, experiments for reliability improvement, and analysis of experiments with non-normal responses. Data from real experiments will be used to illustrate concepts. Time will be reserved for questions and discussion.

May 2, 2001

Permutation Tests: A Guide for Practicioners: Phillip Good, Information Research

This is a one-half day course on permutation methods. It is intended for practicing statisticians and others with interest in applying statistical methods. High school algebra is assumed but no higher-level mathematics is required. Some familiarity with computer simulation would also be helpful. Attendees will be given historical background on resampling methods and a formal introduction to these methods. Emphasis will be placed on the wide variety of applications of the techniques, the computer-intensive nature of implementation along with many examples and "real world" applications. The course is intended for those who use statistical methods in their work. This includes practitioners in medicine, business, engineering and the social sciences. It also will be useful to professors of statistics and those who do statistical research but may not be familiar with resampling methods and want to be updated on the latest advances in methodology and application. Dr. Good is the author of two popular texts on resampling methods. Resampling is a powerful technique which has only recently seen an explosion in applications due to enhancements in computational techniques that make these computer-intensive methods practical.

October 4, 2000

The Grammar of Graphics: Designing a System for Displaying Statistical Graphics on the Web: Leland Wilkinson, SPSS, Inc.

"The Grammar of Graphics" (GOG) is the title of a recent Springer-Verlag book that encapsulates a new theory of statistical graphics. GOG is based on an algebraic model. It contrasts with the prevailing view toward classifying and analyzing charts by their appearance - a view that one might call Pictures of Graphics (POG). In POG, there are pie charts, bar charts, line charts, and so on. Not only are most books and papers on graphics organized by chart type, but so also are most charting programs and production libraries.By contrast, GOG begins with a strong statement: there is no such thing as a pie chart. A pie chart is a stacked bar graphic measured on a proportional scale and transformed into polar coordinates. Significantly, the description of simple charts in POG (such as a pie) are more complex in GOG and seemingly complex charts in POG (such as scatterplot matrices) are simple to describe in GOG. This contrast between surface POG descriptions and deep GOG specifications exposes not only previously unappreciated subtleties in the structures of common charts but also the existence of charts not generally known. GOG is ideally suited for designing a system for interactive Web graphics. Examples will be shown using a Java production library called nViZn (formerly GPL).

November 3, 1999

Logistic Regression: Mike Kutner

Data are said to be binary when each outcome falls into one of two categories such as alive or dead, success or failure, true or false. Binary outcome data occur frequently in practice and are commonly analyzed using the logistic regression model. This workshop will emphasize the practical aspects of modeling binary outcome data using logistic regression, including checking the adequacy of the fitted models. Several examples from the health sciences area are presented to illustrate the techniques. Parallels between multiple linear regression modeling and multiple logistic regression modeling will be presented throughout the workshop. Therefore, attendees of this workshop are expected to be familiar with multiple linear regression modeling techniques.

September 18, 1998

Survival Analysis Extended to Recurrent Events Data with Repairable Products, Disease Recurrences and Other Applications: Wayne B. Nelson

Analysis of recurrent events data is an emerging area of survival analysis whose prominence is increasing due to its many and varied applications. Most life-data courses deal with a single endpoint, end of life or failure, which is modeled with a life distribution that must be estimated. In contrast, recurrence data are modeled with a stochastic process. This Workshop provides a simple nonparametric model and data plots and analyses including point estimates, confidence intervals, and comparisons for populations with recurrent events. This recent methodology is often more appropriate than parametric methods based on a nonhomogeneous Poisson process, which depends upon often unrealistic assumptions, such as independent increments.

October 5, 1996

Teaching Statistics to Non-Statisticians: A Panel Presentation

This year's chapter fall workshop will feature five of our chapter members who have taught statistics to nonstatisticians. They will describe the specific topics in their courses, what software they use, texts, audio-video materials, and demonstrate statistical devices such as the quincunx, black box, helicopter, and Deming funnel. They will focus on the details of what has worked and what hasn't worked.

The workshop presenters were:

Susan Cowling, The Lubrizol Corporation
Dennis Keller, RealWorld Quality Systems
Shari Medendorp, The Cleveland Clinic Foundation
Mukul Mehta, Quality Sciences
Gary Skerl, ICI Paints
Jeff Witmer, Oberlin College (discussant)