Enter the meta-analysts

Meta-analysis is an advanced statistical technique which enables researchers to combine the results from a number of different studies. There is a great advantage in this, because in general the greater the number of participants in the study to more capabilities of demonstrating any genuine effects. By combining studies, one is in effect able to convert them into one much larger study, which ought to be much more capable of telling us what is really happening. A number of meta-analyses have been conducted on offending behaviour programmes; again, these have concentrated on the treatment of sex offenders, reflecting the current preoccupation with this kind of offending.

Lösel and Schmucker (2005) presented a meta-analysis which purported to show a good treatment effect overall for cognitive-behavioural sex offender treatment; they combined the results of 80 comparisons from 69 studies; some were formally published, but they got additional unpublished studies from appeals to colleagues. This meta-analysis has been influential, and is widely quoted in support of offending behaviour programmes based on cognitive-behavioural principles. However, some aspects of it seem to have escaped wide attention:
  1. Lösel and Schmucker acknowledged the comments of Rice and Harris (2003) about selection bias and then completely ignored them; 84% of the comparisons they used failed to meet the Rice and Harris criteria for adequate control groups.
  2. They also acknowledged the “bound for publication” hypothesis but failed to realise that releasing studies to them was a form of publication in itself by the Copas definition (referred to in Statistics).
  3. They stated that there was “no difference” between different evaluation designs; their own data showed that this was not true.
  4. Nonetheless, they inadvertently presented data which enabled some examination of the ideas raised by Rice and Harris, and by Copas.
The following data are taken from Lösel and Schmucker (2005), Table 3. Bear in mind that they claimed there was no difference in the amount of treatment effect shown by different experimental designs.  Lösel and Schmucker, in common with many other researchers, measured the strength of treatment effect by using the “odds ratio”. This  shows how much better the treated group did: an odds ratio of 1 means performance of the treated and untreated groups was the same, but a ratio greater than 1 means the treated group did better. By the same token, a ratio smaller than 1 means that the control group did better.
Level on the Maryland Scale Odds ratio No of studies Statistical significance
Level 3 (2-groups, no matching or randomisation)  2.08  17  <0.001
Level 4 (matched pairs design)  1.19  6  None
Level 5 (randomised)  1.48  6  None

It is quite clear that the results were not the same regardless of research design. The only studies consistently showing a treatment effect were those at Level 3, which is known to incorporate a bias in favour of treatment groups.

Lösel and Schmucker also stated that more recent studies showed a greater treatment effect. This is frequently reported, and is normally attributed to the fact that cognitive-behavioural programmes were introduced only in the mid-1990s; since they also purport to be more effective, it would not be surprising if these new techniques improved treatment effects. However, the table below, also using data from Lösel and Schmucker's Table 3, casts doubt upon the claim.

Decade of studies Odds ratio No of studies Statistical significance Range of odds ratios found
 1970s  2.03  14  <0.001  1.34 – 3.09
 1980s  1.38  30  <0.01  1.08 – 1.77
 1990s  1.27  17  None  0.86 – 1.87

It is in fact clear that the odds ratios were higher in earlier decades, and steadily fell as time went on. In the 1970s, they were highly significant, in the 1980s reasonably significant and in the 1990s not significant at all. It is not clear how this squares with Lösel and Schmucker’s statement that more recent programmes were more effective.

One final point: the column on the right shows the highest and lowest odds ratios that were found in each decade. It can be seen that in the 1970s and 1980s even the lowest odds ratios were higher than one, i.e., even the lowest were showing some treatment effect, spurious or not. Studies reported in the 1990s, however, were more mixed. Some reported odds ratios of less than 1, i.e. the control group was doing better in the treated group. This is an alarming thought, but one which comes up again later.

Why should treatment effects apparently become smaller over time? There could be many reasons, but coupled with the finding that the treatment effect diminishes, the more rigorous the research design, the most likely explanation is just that research designs got better. However, an additional possibility is that it is another demonstration of the bound for publication bias: Lösel and Schmucker compiled their meta-analysis in the early 2000s, and it is hardly likely that their colleagues would drag unsuccessful studies out of their 1970s archives when they could present much more recent work which looked much better.