Update: Since a lot of people are finding this blog post, please note you can download the practical primer I've written about calculating and reporting effect sizes here: http://openscienceframework.org/project/ixGcd/ On this page, you can also download a spreadsheet to calculate effect sizes when you have the data, and my new effect size spreadsheet (From_R2D2) that you can use to calculate effect sizes from the published literature, or that can be used to convert between effect sizes.
Until approximately one month ago, I had the following understanding of effect sizes. If you do a t-test, you can calculate Cohen’s d by entering some numbers in an online form you get when you search for ‘online Cohen’s d calculator’. If you do an ANOVA, there is a checkbox in an option menu that will give you partial eta squared. If you report these numbers, reviewers will not complain. Now maybe it’s the availability heuristic talking here, but I feel my approach to reporting effect sizes is pretty representative for experimental psychologists.
I am tempted to write υ = .21 after an F-test in my next paper, just to see what will happen. Will reviewers simply assume υ is actually an effect size measure? I would not be surprised. Because the first rule of not understanding effect sizes is you don’t talk about not understanding effect sizes.
Just as in Fight Club, where nobody is supposed to talk about Fight Club but they end up all knowing about where to go for the next Fight Club, I’ll talk about what I didn’t know about effect sizes. When I tried to calculate an a-priori sample size from the results of a paired-samples t-test, and G*Power asked me for Cohen’s dz, I thought I could ignore that tiny little z attached to the d without any problems. I thought there was just one Cohen’s d, and had no idea you always have to report which of the many different calculations you have used. Not that I could have answered the question how I calculated Cohen’s d if I wanted to, unless: ‘I got it after typing in some numbers in this online spreadsheet’ counts as an explanation.
I didn’t know that for a One-Way ANOVA, partial eta squared is the same as eta-squared. The fact that ηp² is often reported for One-Way ANOVAs indicates that researchers are either very passionate about unnecessary subscript letters, or rely too much on the effect sizes as they are provided by statistical software packages.
Something else I didn’t know, was that you can always calculate partial eta squared from the F-value, and the two degrees of freedom associated with an F-test. For example, if an articles gives F(1, 38) = 7.21, you can calculate that ηp² = 7.21 * 1/(7.21 * 1 + 38) = 0.16. Try it. It really works. When I prepared my replication study for the reproducibility project by the Open Science Framework, I had to e-mail the author of the article for the raw data. A lot of hassle, which I now realize was completely unnecessary. Conveniently, the author of the article didn’t know I was wasting his time, and was extremely cooperative in trying to figure out how to calculate an effect size from the data, but it suffices to say there were some forms on internet websites involved, and no simple arithmetic you can do by hand.
Another thing I didn’t know is that when you are performing an a-priori power analysis for a within-subject design, you should not directly insert the partial eta squared value that SPSS provides into G*Power. G*Power by default uses a different way to calculate partial eta squared, and using the SPSS version will give you a wrong sample size estimate. It is an easy mistake to make. I only figured it out when I tried to compare sample size estimates from an a-priori power analysis for a paired t-test and a repeated measures ANOVA, and had to e-mail the G*Power team to ask for an explanation (who replied within an hour with the answer – they are great). There are published articles that make this mistake, and studies with a sample size that is assumed to lead to 95% power, while the actual power of the study is much lower.
However, the most important thing I didn’t know is how easy it is to understand effect sizes. Sure, there are many situations where there are different, all equally defensible, ways in which you can calculate an effect size. And calculating generalized omega squared for a 2X3 mixed model design where you’ve thrown in a covariate for good measure will probably take you the better part of an afternoon (but don’t worry, there will be only about 12 people in the world that are able to judge whether the value you calculated is correct or not). But for most practical purposes, and most of the studies you have done so far, it’s really pretty easy.
I wrote a short article that explains how you can calculate effect sizes for t-tests and ANOVA’s. You might wonder why you would want to read it, given that I just explained how incredibly limited my knowledge of effect sizes was a month ago. I don’t blame you. I surely didn’t think I would be explaining others how to calculate effect sizes. So why did I write the article? It started just as some note taking when I tried to figure out what to do. But then I started to get annoyed. There are some good books about effect sizes (.g., Aberson, 2010; Cohen, 1988; Cumming, 2012; Ellis, 2010; Grissom & Kim, 2005; Maxwell & Delaney, 2004; Murphy, Myors, & Wolach, 2012) but I don’t expect you to spend the next 5 weeks reading them. Besides some minor annoyances (e.g., information being spread out over 2 dozen articles, a focus on between-subject designs, despite the prevalence of within-designs in experimental psychology, describing a lot of different effect sizes and their unbiased estimates, but not providing guidance in which effect sizes to report for what) my major annoyance was that the articles provided formula’s, and left it up to the reader to figure out what to do with them. I don’t know many statisticians, so I’m just going to assume they are empathic, friendly people, who understand most individuals are not very interested in statistics, and therefore try to make it as easy as possible for people to use formula’s (for excellent examples, see ESCI (Cumming & Finch, 2001) and G*Power (Faul, et al., 2009). I would expect these pro-social individuals to realize that the larger part of the scientific community just wants to report the correct effect size with as little effort as possible (which is a very rational goal), and that authors would make it easy for researchers to calculate effect size by providing, oh I don’t know, a spreadsheet?
So in addition to the article, I made a spreadsheet (download here: http://openscienceframework.org/project/ixGcd/). It has a decision tree (which was a great suggestion by Job van Wolferen) to guide you to the correct calculation. You fill in the required numbers in the green cells, and the grey cells provide the output you need to report the effect size. A short sentence is provided as an example of how to report the outcome of the statistical test, and the effect sizes. It should be pretty easy to use. The article and spreadsheet are not yet peer-reviewed, so I can give no guarantees, but I’m pretty sure all calculations are correct. However, if you know more about this than I do, and you find some mistakes, let me know, and I’ll update the spreadsheet.
I've submitted the manuscript, because peer-review can only improve it, but I have no idea whether this would be interesting for a journal. However, what's more important for me is that I sincerely hope this is useful for some of you, and this will make it easier for you to report the information that will allow other people to perform a-priori power analyses when building on your work, or to include your studies in a meta-analysis.