Post date: May 16, 2013 10:16:46 AM
(I've updated the post with some good examples provided by Suzanne van Gils at the bottom).
Also, read this blog post by Andrew Gelman.
I was drinking a beer with Rolf Zwaan and Iris Schneider yesterday, when I brought up a topic that has been on my mind lately, namely the limited role of statisticians in the current reform in psychology. You would think that if you are passionate about statistics, then you want to help people to calculate them correctly in any way you can. Sure, your main interest might be to find a way to calculate effect sizes for multivariate statistics, or figure out better ways to deal with missing data, etc, but given that there is a large audience of researchers that has to use statistics and looks to you for guidance, you’d think some statisticians would be interested in helping a poor mathematically challenged psychologist out by offering some practical advice.
Let me illustrate what statisticians do instead, by referring to a paper I read a few weeks ago by Ruscio & Mullen, 2012, about ‘Confidence Intervals for the Probability of Superiority Effect Size Measure and the Area Under a Receiver Operating Characteristic Curve’. It’s about effect sizes. More specifically, it compares 9 analytic and 3 bootstrap methods to construct a confidence interval for the A statistic, which is a non-parametric version of the CL effect size (I discuss CL in my effect size primer).
Let’s take a moment to let it sink in what these researchers did. First, there is a CL effect size. Only a few people know about it, even less people use it (despite the fact that it is quite useful). Then, there is a non-parametric version of it, the A-statistic. Despite the fact that it has attractive properties (it is very robust against outliers and violations of parametric assumption – who doesn’t want that in an effect size partner?) seeing it in practical use is a very, very rare thing indeed. The question of interest is how people can report confidence intervals for this rarely used effect size. Confidence intervals for effect sizes are rarely provided even in meta-analyses (which are pretty rare themselves). The authors compare these 12 different ways to calculate CI around the A statistic. So we are talking rarely used, times rarely used, times rarely used. Now, if you happen to be the person in the world that really wants to report a confidence interval for an A statistic, you can read the paper. It’s excellent work. For the rest of you, I wanted to talk about why statisticians are doing so very, very little to make the life of researchers easier.
A comedian (I’m sorry I can’t provide a name, but it was some American late night show I think) said something along the lines of: ‘Liking a charity on Facebook is the least you can do. Literally. It is like doing almost completely nothing’. I’m reminded of this every time I hear a statistician or methodologist complain about researchers (they especially like to complain about psychologists) not reporting effect sizes, not pre-registering their experiments, not switching to Bayesian statistics, etc. Complaining about this is the least you can do. Literally.
When I wrote the manuscript on effect sizes, I was pretty frustrated with statistic papers that provided formula, and nothing more. Surely, anyone who can compare 3 bootstrap procedures for the CI for the A statistic, can also provide a spreadsheet that will help you to calculate the CL effect size and the A statistic. If you’d care about people actually using this statistic, I’d say that making a spreadsheet are 10 minutes well spent. Note that I’m perfectly ok with examining the best way to calculate confidence intervals – I also write papers that are of interest only to a few people working on the same thing, and I enjoy doing it. But given that you do something that other people can use (which is one of the great things about being a statistician, I think) then I’m talking about 10 minutes to make it easy for people to actually use the procedures you are talking about. Nevertheless, statisticians rarely provide spreadsheets with their papers (there are exceptions, off course). It often took me half an hour to get one specific formula to work in my spreadsheet. God knows how I survived 32 years in this world without knowing how to get an inverse sine formula working in Excel, but there you have it. Even if statisticians go beyond merely complaining about how researchers don’t do a good job, and they write a paper about how researchers should calculate something, then providing some formula’s is perhaps not the least you can do, but it is not enough. And yes, I know it was cost-effective for most statisticians to learn how to use R, but when you (or your fellow statisticians) taught me statistics, you put me behind a computer with SPSS. I didn’t choose to be taught how to use crappy statistical software (instead of R) when I had a world of time to learn new skills, but right now, and until a statistician takes over the Introduction to Psychology course I have to teach next semester, I’ll have to make due with SPSS. It’s fine to give a script in R, but it is not enough.
Yes, many researchers have problems with statistics. And there are some excellent attempts to get people how to improve the way they report statistics (e.g., ESCI, by Geoff Cumming). Yes, we should pre-register. Yes, we should have understood years ago that flexible data analysis has severe consequences for the reliability of the conclusions we draw. And in the end, it’s each researchers’ responsibility to learn how to do a good job. But I am amazed by the limited practical significance (forgive the pun) of the work most statisticians have been doing. Sure, there are exceptions, most notably Cohen, but even Cohen never made a spreadsheet. Let’s look at some of the people that made a significant contribution to improve the way psychologists do research in the last years:
1) Discovering Statistics Using SPSS. I’m pretty sure I have just made several statisticians physically ill by starting my list with this book. And yes, you can complain about many things in the book (but remember: Complaining is the least you can do. Literally). However, for many graduate students I know, this book is finally making it clear what they should do when they analyze their results. Not only which buttons to push, but also which assumptions to check. Oh, and did I mention Andy Field is not a statistician, but an experimental psychologist?
2) G*Power. Probably the most widely used program to perform a-priori power analyses. The contribution this piece of software has made to improving the way researchers plan their experiments is impossible to exaggerate. Made by a team of experimental psychologists. http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/who-we-are
3) Open Science Framework. It (finally) allows people to pre-register their experiments, and is the most practical solution to move towards confirmatory research. Created by Brian Nosek and Jeffrey Spies, who are experimental psychologists. http://www.openscienceframework.org/project/4znZP/wiki/home.
4) PsychFileDrawer.org. Publication bias has been haunting the field for decades, but the first attempt to do something about it comes from, wait for it, experimental psychologists. http://www.psychfiledrawer.org/about.php
5) False-Positive Psychology. The paper by Simmons, Nelson, and Simonsohn has been cited over 232 times in a perhaps 2 years, according to Google scholar. It is one of those articles that make it crystal clear what happens if you don’t do statistics correctly. Sure, you can complain about some of their recommendations (e.g., using at least 20 subjects in each condition), but it got the message across. No statisticians were harmed, or involved, in writing the article. Instead, the authors are experimental psychologists.
Are these examples biased? Am I missing some important practical contributions by statisticians? Perhaps (and feel free to let me know on Twitter, @Lakens). Perhaps we should fix things ourselves, and it’s silly to assume the work of statisticians should have practical significance. But perhaps this post will motivate some of them to put on their anthropological hats, come and take a look at the practical problems we are dealing with, and help us improve the way we work.
Suzanne van Gils contacted me, and noted:
In the field of Organizational Psychology the interactions with statisticians are a bit better. For example;
Jeremy Dawson (Stats MSc and PhD(?)) provides an excellent website to plot interactions (two- and threeway) and calculate slopes and slope differences http://www.jeremydawson.co.uk/slopes.htm
Chris Stride (stats) runs a statistical consultancy service called Figure it Out, with extremely useful workshops and manuals that are also accessible to non-statisticians.http://www.offbeat.group.shef.ac.uk/FIO/qualifications.htm
The Muthen couple(psychometrie) runs a website on MPlus with a million tips and very helpful FAQ section. http://www.statmodel.com/
Finally, we have Preacher & Hayes (exp/quant. psy) who provided SPSS macros that by now can calculate any moderation, mediation or combination you can think of. http://www.quantpsy.org/medn.htm
In addition, there is a facebook group in which both would answer any questions posted up till last year (when it became too repetitive and time consuming) - the group is currently still open, so any previous answers can be found through google.
Also Edwards en Lambert (psy) designed a user-friendly excel macro (including readme file) for mediation and bootstrapping that accompanies their paper on the topic.