20140826_R4

Source: BBC Radio 4

URL: N/A

Date: 26/08/2014

Event: Jolyon Jenkins presents Everything we Know is Wrong

Credit: BBC Radio 4

People:

    • Professor John Crabbe: Professor of Behavioral Neuroscience
    • Dr. Daniele Fanelli: Visiting professor, University of Montreal
    • Professor Hal Herzog: Professor of Psychology, Western Carolina University
    • Professor John Ioannidis: Professor in Disease Prevention, Stanford University
    • Jolyon Jenkins: BBC producer
    • Bridget Kendall: BBC correspondent
    • Professor Marcus Munafo: Professor of Biological Psychology, Bristol University
    • Professor Brian Nosek: Social psychologist, University of Virginia
    • Dr. Simone Schnall: Social psychologist, University of Cambridge

Bridget Kendall:: Stress is one of the diseases of modern living...

Jolyon Jenkins: This is a very young Bridget Kendall on Newsnight, introducing an item about the health effects of pet ownership.

Bridget Kendall: Doctors have discovered that by simply stroking a pet, blood pressure is brought down...

Male voice 1: We looked at heart patients who had had a heart attack or had chest pain. Pets made a difference in survival...

Male voice 2: The blood pressure was consistently reduced, when the dog was present.

Male voice 3: There's something about interacting with pets, which is calming.

Jolyon Jenkins: This isn't a programme about pets - pets are just a way in. But it seems like a classic example of a problem in science - the so-called "decline effect". It's the tendency of everything you thought you knew turning out to be wrong. Or, at least, less true than you thought. Back in 1980, a researcher called Erica Friedman did a one year follow-up study of a hundred people who'd had heart attacks.

Hal Herzog: What she found was that whether or not people had pets was a major factor in the death rates -

Jolyon Jenkins: This is Hal Herzog, a psychologist from Western Carolina University, author of Some We Love, Some We Hate, Some We Eat. He's an expert on anthrozoology.

Hal Herzog: - so that people with pets had one fourth the death rate as people who did not have pets. And that was a very strong effect. [Sound of dog barking.] Some studies have replicated that, but other studies have not. And so, for example, there is a recent study which found it's the opposite! Which - and it was a study of 400 people who had suffered heart attacks, and what they found was that when you looked at both the remission rate and the death rate, that pet owners were twice as likely to either die or have a second heart attack, as were people that owned [he meant to say "didn't own"?] pets. [Sound of cat growling.]

Jolyon Jenkins: Thirty years on, there are some bits of evidence that suggest that, in some ways, having a pet may be good for you. But it's hardly conclusive.

Hal Herzog: If a person is lonely, and they're depressed or they have cardiac issues, it's probably a mistake for them to assume that by getting a puppy, that your life is suddenly going to get better - their life could become worse.

Jolyon Jenkins: Why does a dog make you worse?

Hal Herzog: How can a dog make you worse? Let me count the ways... [Laughs.] Er, number one: your dog can bark and make your neighbour angry - in the United States, dogs are the second biggest cause of conflicts between neighbours. Five million people in the United States are bitten by dogs, 85,000 people in the United States are taken to hospital emergency rooms because they trip over their pet. And then they don't necessarily make you happier. For example, I talked to a - it was a newspaper reporter, actually, from New York, had just recently moved to Manhattan and she was lonely. And so she'd read all this literature on the "pet effect", and so she went out and got herself a dog. And she never bonded with the dog. And so the magic never happened, and she was still lonely, although in this case she was lonely with this animal that she didn't particularly like and that she had to take outside and go for walks three times a day, picking up its poo. By the way, I'm not the Grinch - I'm a pet lover myself, but what I'm trying to do is I'm trying to take a look at this really large body of literature objectively.

Jolyon Jenkins: What's true for dogs seems true across science - you start by discovering an effect that seems true and strong, but over time, as more and more people study it, the effect seems to shrink. Here's Marcus Munafo from Bristol University.

Marcus Munafo: I was looking at the literature, trying to identify genes that were robustly associated with whether or not we smoke, for example. And something that began to emerge, were patterns in these literatures which suggested, for example, that the strength of evidence that a particular gene is associated with a particular outcome weakened over time.

Jolyon Jenkins: Daniele Fanelli at the University of Montreal, gives the example of so-called second-generation antipsychotic drugs.

Daniele Fanelli: The literature shows that, compared to the first studies, over time, the effects that are recorded have gone down.

Jolyon Jenkins: It happens in all sciences. Psychologist Brian Nosek from the University of Virginia.

Brian Nosek: There is what appears to be a pervasive decline effect, and we've been observing that in some of our research, as well. Every theory looks much bigger in its initial phases than in its final phases.

Jolyon Jenkins: Looked at one way, the decline effects seems like a kind of cosmic revenge on scientists. It's as if the truth is a perishable commodity, which wears off. Even if you don't buy that, there's something very strange going on. At the heart of science is the idea of replicability - if you discover something, other people should be able to repeat your experiment and get the same result. If something only works for you, then it's no use. But what if different laboratories, doing the same experiment, get different results? Twenty years ago, this was a possibility that troubled Professor John Crabbe, a neuroscientist in Oregon. He was working with mice.

John Crabbe: I think there was the assumption, among a lot of folks, that mouse behaviour should be absolutely stable and replicable, from laboratory to laboratory. But there's not a lot of data to back up that assumption.

Jolyon Jenkins: So he asked "What if you did the same experiments on the same genetic strains of mice at the same time in three different laboratories?" He teamed up with colleagues in Edmonton in Canada and New York.

John Crabbe: We all used the same source of animal bedding in the cages, we fed them the same food, we had them on the same light-dark cycle, we tested them at exactly the same time of day, and we all handled them by picking them up by the tail instead of - some people handle mice by forceps, and we didn't do that.

Jolyon Jenkins: You went to great lengths, didn't you.

John Crabbe: Does "anal-retentive" have a hyphen? You know, I think this was approaching the level of OCD.

Jolyon Jenkins: All three labs did the same experiments. How much do mice run around in novel environments? How does their behaviour change if they drink alcohol or are given cocaine?

John Crabbe: There were significant effects of the laboratory occasionally that cut across strain differences. So mice were less active, overall, in Edmonton, Alberta, than they were in Portland.

Jolyon Jenkins: Same mice.

John Crabbe: Same mice. And we don't know why. But the important finding was that pattern of strain differences was different in the three different laboratories.

Jolyon Jenkins: So there's something different going on in the labs.

John Crabbe: Mm-hm.

Jolyon Jenkins: But we don't know what.

John Crabbe: We don't know what. One of the things that is an obvious possibility is how they respond to the specific experimenter.

Jolyon Jenkins: So what's different about the experimenters? Is it the way they look?

John Crabbe: It's possible.

Jolyon Jenkins: In one paper you suggest that maybe they had different smells, different odours, the experimenters.

John Crabbe: Sure. No doubt. I mean, rodents in general are pretty reactive to pheromones.

Jolyon Jenkins: So where does that leave the possibility of doing experiments with mice? In a way, you might argue that this just says "Well, we're never going to be able to standardise the conditions so we might as well give up on mice".

John Crabbe: Well, you might as well give up on science. The take-home message should be that you need to be careful, because if you can't reproduce a finding in your own laboratory, there's absolutely no reason to believe that another laboratory's going to be able to reproduce that finding.

Jolyon Jenkins: And often they can't. Some findings decline away to zero. Professor John Ioannidis, originally from Greece but now at Stanford, has spent the last three decades casting a sceptical eye over scientific literature, and wondering how much of it is actually true. Ten years ago, he did a survey of nearly 50 of the recent top papers in the medical journals.

John Ioannidis: These were papers that had been cited by more than a thousand other papers in the scientific literature. We're talking about the crème de la crème of medical research. And for each one of these papers, I tried to find whether there was a subsequent study that was larger, better controlled and more conclusive, to test what the original highly-cited paper had proposed. Five out of six of the claims that were based on non-randomised data had been proven to be wrong, and about 25% of the claims based on randomised data were also found to be either completely wrong or grossly exaggerated. Since then, there have been other studies looking at other findings in different fields, and the results usually are pretty disparaging. So, for example, there have been efforts to replicate experiments, even involving the original investigators, and two papers that have done that with a large use of experiments have shown that about 70 - 90% of the original claims in papers that have appeared in the very best journals, the highest impact factor ones, by very well-trained and visible teams of investigators, they cannot be reproduced. So the failure rate is about 70 - 90% in that domain. We have experience from genetic associations - 98% to 99% of the previously claimed associations of genes with disease could not be replicated.

Jolyon Jenkins: So is it actually your belief that the scientific journals are mostly full of results that aren't true?

John Ioannidis: If you take an average paper, it's very likely that this is the case.

Jolyon Jenkins: According to John Ioannidis, the academic science industry is pretty much organised to produce lots of novel and interesting results but not necessarily true ones. He did some statistical modelling, and, based on reasonable assumptions about how research effort is organised, came up with results summarised by the title of his provocative paper "Why most published research findings are false". It starts with the fact that too many studies are too small or underpowered to find anything out. Statisticians talk about the power of a study, by which they mean the ability of a test to detect an effect if the effect actually exists. If the power is too low, it risks missing things that do exist and detecting things that don't exist. But there's a twist. When an underpowered study finds something that's actually true, the chances are that the effect will seem bigger than it really is.

John Ioannidis: If you run a very small study and you detect an effect - let's start with the assumption that the effect is really out there, so indeed that risk factor is causing some trouble but it has a small effect, overall. And let's say that you run a small study to try to find that, and you do detect a significant signal. Then what happens is that you have found something that is genuine, but the magnitude of the effect that you have discovered is inflated, and it could be tremendously inflated.

Jolyon Jenkins: So for it to cut through, it has to appear to be much bigger than it really is.

John Ioannidis: Exactly, because otherwise it wouldn't have been discovered. So in a sense, you're suffering from what we call is the "winner's curse". You're the winner, you have discovered that, which is a genuine discovery, but in a way you're unlucky because you're finding an effect that is disproportionate, it's extreme compared to what reality is.

Jolyon Jenkins: Underpowered studies don't make scientific sense but they do make political sense, in the cutthroat world of academia. If you're under pressure to come up with publishable research findings, then the last thing you should do is throw all your resources into a big study that might come up with nothing. It'll look as if you've just been wasting your time. Better to hedge your bets in the hope that at least one of your studies comes up with the goods. At Bristol University, Marcus Munafo did a survey of what's published in his field, neuroscience, and found that there was a chronic problem with studies that had too little power to be useful.

Marcus Munafo: All of the incentive structures point towards us finding new things and lots of new things. And that means that if you've got a finite pot of cash to devote towards your research, you're better off, strategically, dividing that into smaller chunks to run multiple studies rather than investing it all in one question. So each one of those individual studies will give you an answer, and some of them may well be correct, but even those which are correct will probably be overestimating the size of the - of any true effect, so you run the risk, in small studies, of either finding evidence for an association that isn't there, or is in the opposite direction to reality, if you like, or of overestimating any true effect.

Jolyon Jenkins: For each individual scientist, it makes career sense, but in the long run it's incredibly wasteful, because...

Marcus Munafo: The point is that if you have a long sequence of small studies that are giving inconclusive or inconsistent findings, that literature can carry on for far longer than it would have done if someone had, earlier on in the process, simply run a larger, more authoritative, definitive study that would have given a clear answer. So we showed, in a review of that particular animal literature, that the total number of animals used over the lifetime of that literature was far more than would have been required in a single authoritative study. But the problem is that that single authoritative study would have needed to have been much larger than is the norm in that particular field, and so people are resistant to increasing their sample sizes by an order of magnitude because there are financial implications and, in the case of animal studies, very strong ethical implications. And so there are these competing demands, if you like, between the need to have a large authoritative study and the need to use resources sparingly and to reduce the use of animals in research as much as possible.

Jolyon Jenkins: So financial incentives are driving scientists to do more and more research that produces unreliable results. But you wouldn't know that, from the literature. There, everything is positive - the discoveries keep on coming.

Daniele Fanelli: Throughout the scientific literature, scientists are reporting more and more positive results.

Jolyon Jenkins: Daniele Fanelli, in Montreal.

Daniele Fanelli: And it's quite a dramatic increase, and it's something that you cannot explain, scientifically. So there is very little doubt in my mind that something has been changing in science, over the last - between 20 and 40 years. In some disciplines, above 95% of what is reported are positive results.

Jolyon Jenkins: One well-known explanation is that journals simply aren't interested in publishing negative results, But that doesn't quite explain why psychology, for instance, has far more positive results than, say, astrophysics. Another explanation is that scientists themselves sit on the negative results. Hal Herzog, the dog man, started asking around in conferences to see if there were any unpublished studies that showed that pets weren't good for human health.

Hal Herzog: And what I found was three studies which found no beneficial impact of pets on their owners, but yet none of these have been published.

Jolyon Jenkins: Why do you think they don't get published?

Hal Herzog: What worries me is that people sometimes don't publish them because them they don't like their own results. One of the problems with my field is that it attracts people that are animal-lovers. And that's great. But there's a downside to that. I think we have to be careful by being blinded by what we want to find, rather than what we actually find.

Jolyon Jenkins: But probably the biggest reason for the increase in positive results - or, rather, seemingly positive results - is the career pressure on scientists. Daniele Fanelli has some evidence for this.

Daniele Fanelli: For the simple fact of having a first author working in the United States, studies tend to report stronger results, no matter what the research question is, to begin with. We hypothesise, at the moment, that it has to do with how scientific careers are decided upon in the United States more than elsewhere. That is: your publication record and where you published, and so on, have a greater influence on your career opportunities in the US than in other countries.

Jolyon Jenkins: So this is quite a dangerous thing, isn't it, that if scientists are under pressure to publish, because of their careers, that in itself is distorting the whole scientific process.

Daniele Fanelli: Yes, it's completely unscientific logic to link career advancement upon success and impact measured in this way.

Jolyon Jenkins: According to John Ioannidis, it's not only possible but pretty much obligatory to hype your findings.

John Ioannidis: Let's say that I'm spending two years of my life performing a study and analysing these data. And according to my original intention and my original protocol, I see absolutely nothing that seems to be highly statistically significant. I have two options. One is to acknowledge that I have found nothing new, and then probably no-one will want to publish that in a major journal - the promotion committees will say "Well, what have you contributed?" Or I can start exploring that study and that dataset further - I can start data-dredging and coming up with some results that seem to be interesting - they may be even highly significant, in terms of statistical terms - but now I have really deviated from my original intention, I'm entering an area where everything is possible. I can get any result that I want.

Jolyon Jenkins: You can find a pattern in anything, effectively. It's like seeing faces in shadows. I mean, if you look hard enough, you will always see something that seems to you to be meaningful.

John Ioannidis: I think it's a very good analogy. Literally, there's no result that you cannot make it seem plausible, even though it's a completely [sic] red herring and nothing that has any seriousness.

Jolyon Jenkins: This is getting close to fraud, but it's a kind of allowable fraud. Out-and-out fraud seems fairly rare. Daniel Fanelli.

Daniele Fanelli: On average, 2% will admit to having either manipulated or fabricated or falsified their result at least once. Between 12 and 14% know - or think they know a colleague who did these kinds of behaviours.

Jolyon Jenkins: Of course, you have to take those figures with a pinch of salt. In the old days, scientific fraud was blatant. Those are the scientists doing skin grafts on mice, who coloured their skin in with a felt pen to make the grafts look more convincing. Fabricating data like that is obviously wrong, but there's a more subtle kind of fraud, which Daniele Fanelli calls "grey areas".

Daniele Fanelli: If you ask people "Did you ever write a paper stating, as the main hypothesis, something that was effectively a chance finding in your research?" you get about 30% of people will recall having done something like that. They report these higher numbers because they don't really think that's entirely wrong to do.

Jolyon Jenkins: So here's how it might work. You start with a hypothesis - let's say that a new drug will lower blood pressure. You give half your patients the new drug and you give the other half a placebo. After a while, you notice that the patients getting the drug don't actually have lower blood pressure but the male patients seem to be growing more hair on their head. So you change your hypothesis. "The drug", you speculate, "promotes hair growth in men". And you've got a result, which you can publish. Is that wrong? Well, yes it is. It really is. If you come up with your hypothesis after you gather the data, you might just have got a chance result. My impression from talking to working scientists, off the record, is that this kind of thing happens all the time. And they don't think it's wrong, because they persuade themselves that they've found something significant.

Daniele Fanelli: The fundamental problem, when we're talking about these grey-area behaviours, is that people don't describe them in the paper. People will not present those as potentially chance results.

Jolyon Jenkins: It's science-y but it's not science. But surely the truth will out - people will test your findings, and if they're false you'll get found out. Surprisingly, not a lot of scientific effort goes into replicating research findings. Psychologist Brian Nosek.

Brian Nosek: There was a study published in 2012 that tried to count how frequently this happens. And they found that 1% of the published literature could be described as replications of prior results.

Jolyon Jenkins: Why is the figure so low?

Brian Nosek: I can't make a career out of doing replications of other work. There is an ecosystem of universities, of grant-funding agencies, of journals, that all are perpetuating this system of rewards that is only saying "Go for the next innovation". At the expense of replication.

Jolyon Jenkins: I suppose it would probably be even worse for your career if you failed to replicate something, and the person whose study you were trying to replicate was influential in the field. I can imagine that it's not great for your career, particularly when you're starting out, to try and undermine something which was being established by someone quite senior.

Brian Nosek: There certainly are risks for that. It's also similar risks for pursuing something innovative that challenges someone more senior career [sic].

Jolyon Jenkins: So the prudent thing to do is neither to try and replicate something exactly, nor to do something completely novel, but to steer a kind of middle path, to be reasonably conservative but just tinker a little bit with what's already known. Is that, is that the route to success? To sort of jump on the bandwagon but not to try and derail the bandwagon?

Brian Nosek: That is one route of success, and another would be to be innovative but not clash with any existing theories.

John Ioannidis: It's very difficult to remain objective and to be willing to accept that at some point, maybe you get some results, or other people will get results, that completely annihilate your theory. It's very likely that, consciously or unconsciously, each one of us will try to defend their theory. And that could sometimes lead to the creation of bubbles of evidence, or quote-unquote "evidence", where people are generating results that just fit a given perspective or a given theory or a given explanation of reality, but it's not really reality.

Jolyon Jenkins: But things are changing. There are now several projects around the world systematically trying to reproduce public results, in an attempt to find out how many are actually true.

Brian Nosek: One of those projects is called the Reproducibility Project. So we pick three journals from 2008 and particular studies were in the eligible set to select from those issues. Our early returns are that about a third of the results that we've examined are reproducing the original results very cleanly. So it does suggest that there is less reproducibility than most might have assumed.

Jolyon Jenkins: But trying to replicate or disprove other people's theories can cause big problems. One of the people whose work came under scrutiny was Simone Schnall, a Cambridge psychologist who published research claiming that people's moral judgements were based on how clean they felt - if you primed them with sentences about cleanliness, they tended to be less severe in their moral judgements. Before the replicators started their work, they approached Simone Schnall to get her buy-in.

Simone Schnall: I said "yeah", I've shared all my materials with them - I was quite flattered, in fact, that they selected my work and because I was very confident in the effect, as well. And I reviewed the proposal to say that everything was accurate.

Jolyon Jenkins: But her confidence was apparently misplaced. The replicators in America couldn't reproduce her findings, and the main investigator posted on his blog that it was an "epic fail".

Simone Schnall: When I saw that, I was rather disturbed, to be honest, and especially since an email was circulated to many social psychologists, so it was widely circulated and it was distributed over Twitter, and so on.

Jolyon Jenkins: Other people pitched in. One senior Harvard academic described the replicators as "shameless little bullies".

Simone Schnall: The bullying really comes in when, after a result fails to replicate, people draw all kinds of inappropriate conclusions about what it means for the person who did this initial research. On social media, people make negative comments about the finding, they draw all kinds of conclusions about the person who did the research, they treat it as if one failed replication can invalidate an entire line of research, and they also voice accusations of improper research behaviour.

Jolyon Jenkins: Simone Schnall also took to social media, explaining at great length why she thought the replicators stated [?] didn't disprove her original findings. Basically, it came down to the fact that she had tested British students and they had tested Americans, who seemed to have stronger moral judgements. It would take hours to disentangle who's right, but what's interesting is how personalised it all became. [To Simone Schnall]. Do you feel that your own reputation has been damaged?

Simone Schnall: In a recent interview for a substantial grant, somebody questioned my work, and yes I suspect it has done some damage to my reputation.

Jolyon Jenkins: The trouble is, from your point of view, surely, is that it just makes you look like a bad loser.

Simone Schnall: Yes, that's of course the problem, but it's true I approved the study method, and the method was indeed appropriate for that particular project. But I had not had any opportunity to actually evaluate the results as well.

Brian Nosek: It is an example of how fraught this kind of work is. People with very different perspectives on very different sides of the issue have had heated words. We are so tied up with our effects that we feel a sense of needing to defend them. Discovering where we are wrong is really where the great parts of science emerge.

Jolyon Jenkins: She doesn't think she is wrong.

Brian Nosek: Yes, I know. And she may not be. There is uncertainty that we have to get comfortable with.

Jolyon Jenkins: So you can be quite high-minded about this, but from her point of view, she's been bullied, and people have taken sides. I mean, one half [?] her colleague has described your side as "shameless little bullies".

Brian Nosek: Yeah, it is heated, in ways that are not productive for having the sort of scientific conversations that we idealise.

Jolyon Jenkins: Isn't it possible that in future people are going to say "Well, I'm not sure I really want to be involved in this, if it's going to end up with me being dragged through the mud"?

Brian Nosek: Yeah, if this particular example were the ordinary outcome, I would certainly not want to be involved, either.

Jolyon Jenkins: The great Nobel Prize-winning physicist Richard Feynman once wrote "The first principle is that you must not fool yourself, and you are the easiest person to fool". But could you ever be considered to have had a successful scientific career if you'd never made a positive discovery? John Ioannidis.

John Ioannidis: Even though it sounds like a paradox, it should be a fair assessment, so there's about 15 million scientists publishing scientific papers. I mean, even under the very best optimistic interpretation of what has happened in scientific research, I don't think that anyone would claim that we have 15 million major discoveries, which means that if you divide the numbers, probably for each scientist the average track is that they will work very, very hard in their lives, but they will not really come up with any major discoveries.

Jolyon Jenkins: And it's fine to admit that?

John Ioannidis: I think it's perfectly fine to admit that, and I think we should respect the major effort that we're all putting into science, trying to get these few important discoveries that can change the world.

Jolyon Jenkins: The problem is not that they're not finding things out, the problem is that they are sort of pretending to themselves that they are.

John Ioannidis: They have to. Because if they don't do that, they're not going to get funded. So this is I think what we need to change. We can promise that we will do our best. We cannot promise that we will save the world.