|
(Joseph Beck) The goal of this tutorial is to decrease the number of common, recurring problems in statistical analyses in papers presented at the ITS, AIED, and EDM conferences. Many researchers are able to conduct a study that generates data, and load that data into a statistics package. At that point, too many researchers run statistical tests without understanding the assumptions those tests are making, or limits to the types of inference they can make. Even worse, many community members do not recognize the problems and erroneous conclusions can be believed and extended. The most common mistake made is not understanding the independence assumptions made by most tests. It is frequently the case that collected data are not independent of each other, and “frequently” rises to “almost certainly” when doing educational data mining. Thus, the tutorial will spend considerable time on this topic, and explain how independence breaks the assumptions of various tests. We will cover a variety of recovery strategies ranging from the very conservative (compute group meas and use those, accepting the restriction on N), to the more aggressive (attempting to explicitly model confounding factors to achieve conditional independence). The other main topic is the misinterpretation of p-values. Even though “accepting the null” makes no semantic sense in the model of hypotheses testing, researchers often do it—particularly when it agrees with their hypothesis. Thus, this tutorial will deviate slightly from its goal of not teaching statistics per se, and will give a brief, intuitive explanation of statistical power. Once attendees understand statistical power, we can discuss the dangers of reporting statistically significant effects, particularly for large data sets, without any estimate of the magnitude of the effect. |