The effectiveness of offending behaviour programmes

What are offending behaviour programmes?

The term “offending behaviour programmes” (OBPs) refers to a range of treatment interventions which are intended to train offenders in new ways of thinking and/or behaving, in the hope that this will reduce their likelihood of reoffending. There have been various trends in the provision of such programmes, mainly due to intellectual fashion rather than to any scientific justification for them.

Virtually all programmes now are “cognitive-behavioural” in nature (allegedly). These techniques originated in treatment of depression and anxiety, and have a fair degree of research support for their effectiveness in that field. The basic principle is to get people to challenge entrenched ways of thinking (their own or other people’s!) and try to develop new ones leading to new behaviour patterns. This is acknowledged to be no use unless followed up with practice at the new behaviour.

Earlier programmes (up to the 1960s) were based on “psychodynamic” (Freudian) principles, which have generally not been supported by research, either in this field or others. They were derived from a mental illness model, according to which offenders were regarded as being similar to sick people, who needed to be cured (this is a paraphrase!). It became apparent that offenders are not sick, but have antisocial behaviour patterns, which they might be helped to change. This model was more in tune with modern thinking, which emphasises self-determination and human rights, and has moved away from the idea that all antisocial behaviour is evidence of mental illness.

It was known that some factors could contribute to reduced recidivism, but these were general social, educational and economic factors, not geared to offenders’ specific patterns of offending behaviour; perhaps specific interventions using modern “scientific” techniques might be more effective. In the 1970s, the first steps were taken in developing behavioural methods (e.g., the use of aversion therapy to try to change sexual orientation in sex offenders). These methods were not dropped because they were found to be ineffective (most were never evaluated and little work was done anyway) but because of a change in fashion (Roxanne Lieb, DSPD conference, 2005).

There were a number of reasons for introducing these programmes, and the following is not intended to be an exhaustive list:

  1. Reconviction rates have always tended to be about 30% for first-time prisoners and about 70% for the rest; the evidence has mainly been that people pass through prison unchanged; pressure groups campaigned for something more constructive to be done.

  2. Imprisonment is expensive.

  3. Serious offenders (especially sexual and violent ones) raise public anxiety, putting pressure on politicians to provide yet more prison places.

  4. Victims were demanding more consideration: better “treatment” might reduce the risk of more victims, and reassure existing ones that their attackers were being reformed.

  5. Some measures, such as providing vocational training, tend to reduce reoffending generally. However, it was thought that modern psychological techniques might enable the Prison and Probation Services to reduce offending in those groups which cause most public anxiety.

  6. The Probation Service in particular was under a lot of pressure at the time, largely due to its perceived ineffectiveness, and needed a new direction. Offending behaviour programmes were seized upon as a means of saving the service. Many offending behaviour programmes are now conducted in the community and run by probation officers.

A few convincing individuals managed to persuade the Home Office that modern “cognitive-behavioural” techniques held the answer; this claim went far beyond what was known at the time, though there was a “What Works” literature which purported to feature effective programmes (though most were not scientifically evaluated). Programmes were given the go-ahead using special funding (not from the regular budget), and on the understanding that programmes would be inspected and monitored to ensure adherence to treatment principles to avoid “therapeutic drift”. Essentially, this means departing from the approved methods and goals of treatment, so that little progress is made. Traditional psychotherapists are often accused of this, resulting in treatment becoming prolonged and of little value.

It was originally (and rightly) decided that programmes would be evaluated for effectiveness, but most of them never have been, and probably won’t be. It was also decided that evaluators would not be the same people who ran the programmes, in order to prevent bias in the evaluations, but this objective too has slipped. An Accreditation Panel was set up to make sure that programmes conformed to the principles thought to be effective, and they review programmes before and during implementation.

There are many arguments about what should have happened next. What did happen was that programmes were rolled out nationwide without any proper piloting. No less than 60% of the budget went on sex offenders, who constitute 10% of the prison population and commit 1% of crimes. Evaluations were carried out by teams including people responsible for running programmes. Most programmes were not evaluated at all. When they were, the studies used to evaluate them were methodologically weak, and likely to show a treatment effect even where none existed (see below). Accreditation was carried out by a clinical team committed to the idea that programmes were effective. Results of evaluations were not always published. Programmes were devised on a rigid "one size fits all" basis, inimical to recognising the variety of human behaviour. A major objection raised by prisoners and their legal representatives to these programmes is that they are not in any way tailored to the individual, and are therefore unlikely to be effective.

Evaluating treatment outcomes is a major problem in all fields dealing with human behaviour, whether they are medical, psychological, educational, or anything else. Part of the problem is the known tendency for almost any intervention to have some kind of impact. For example, in medicine (and often in psychology) people tend to feel better just because they know something is being done, even if they are being given useless imitation medicine. This well-known "placebo" effect is just one of the problems. Thus, simply trying out a treatment and seeing if people have (or claim to have) changed afterwards is not enough.

The Maryland Scale

In recent years, researchers have tended to adopt the Maryland Scale of rigour in research design. This scale has five levels, and the higher the number the more rigorous the research design. These five levels are:

Level 1. Correlation between a crime prevention programme and a measure of crime or crime risk factors at a single point in time. (Essentially, this means an observational study from which no conclusions about causes can actually be drawn).

Level 2. Temporal sequence between the programme and the crime or risk outcome clearly observed, or the presence of a comparison group without demonstrated comparability to the treatment group. (This means a group underwent the treatment, and their behaviour changed but there was no adequate demonstration that the change was actually caused by the treatment).

Level 3. A comparison between two or more comparable units of analysis, one with and one without the programme. (This is generally accepted as the minimum level which can be claimed to show a difference between treated and untreated groups; however, recent evidence suggests that even this is not adequate (Rice and Harris, 2003; Weisburd, 2003) because there is a built-in selection bias which causes the treated group to look better than it really is).

Level 4. Comparison between multiple units with and without the programme, controlling for other factors, or using comparison units that evidence only minor differences. (The best example of this is probably a prospective matched pairs design; in this design, pairs of individuals are selected to be as like each other as possible in all known relevant respects, such as risk factors. Then one is assigned to the treatment group and one to the control group. If either drops out, his "twin" is also removed, thus maintaining comparability between the two groups).

Level 5. Random assignment and analysis of comparable units to programme and comparison groups.(This is the randomised controlled trial design (RCT), which is now mandatory for the evaluation of medical treatments. In this a large pool of individuals is allocated at random to either treatment or control group, the randomisation ensuring that the two groups are comparable. Its main drawback is that for randomisation to work properly the groups must be very large, and so this design costs a lot of money).

There is no doubt that a majority of studies which have been conducted to evaluate the effectiveness of offending behaviour programmes do show a treatment effect. However, there is considerable doubt about the reliance which can be placed upon these studies. The reason for this is that the vast majority conform only to level 3 on the Maryland Scale. Why is this a problem?

The answer to this is that a great many people drop out of programmes without completing them. Still others refuse even to enter the programmes. There is now considerable evidence that both of these groups pose a higher risk of reoffending than the more conforming prisoners who consent to undergo “treatment”. Consider the table below:

Type of offender

In Treatment group?

In Control group?

Refusers (refuse treatment)



Dropouts (do not complete)



Completers (complete the course)



It is obvious that all three types of individual will be present in the control group. However, of the two high risk groups, one (refusers) will be excluded by definition, and the other (dropouts) may or may not be present depending on how the study is conducted. No one will be excluded from the control group; indeed, they are very unlikely to know that they are even in it.

As Rice and Harris (2003) have pointed out, this produces a built-in bias in favour of the treatment group. High-risk offenders are excluded from the treatment group but not the control group, thus making the treatment group look good by comparison. There is absolutely no way of correcting for this problem using a level 3 design. This is why this design is generally regarded as discredited in the field of treatment evaluation, and would not be acceptable in the field of medicine, for example. There is one further problem: in some studies it is distinctly possible that people who refuse to take part in the treatment group are allocated to the control group, thus increasing its risk even further, and making its reconviction performance much worse by comparison. Researchers do not generally report details like this, so we cannot know for certain, but even the possibility raises further serious questions.

Rice and Harris, having established that level 3 studies could not decide the question, went on to consider level 4 and level 5 studies. Very few of these have been published, but they could not find a single one that had shown a treatment effect.

Rice and Harris have been largely ignored. The majority of evaluation studies published continue to be at level 3 on the Maryland Scale. Why should this be, when it has been conclusively shown that such studies are not adequate? The fact is, there are a lot of professionals who just don’t want to see this: denial is not the sole province of offenders. In a popular book on psychiatric and psychological legal testimony, Margaret Hagen (1997) stated: “There would be a lot of people out of work. With so much at stake, it is too much to expect the truth”. Many researchers, including many who hold important positions in the Prison and Probation Services, have built professional reputations and workplace empires on this work. They have a deep-rooted belief in its value, and simply cannot be objective.