Hello, I’m Dr Dr Dr John NA Brown, consulting UX researcher, and this is a UX mystery.
The Case of the Significant Difference
An acquaintance who works for the world’s biggest search engine company in Zurich called me in Carinthia to ask if I would be willing to help a friend of his in New York solve a problem that her team in California had been working on for a few years. One of the reasons I agreed to take on the case was so that I would someday be able to tell a story that starts with that sentence.
While I was working on that case, I ended up being asked to consult on over a dozen others. Let me tell you about one that I volunteered to solve.
I spent my second week on the job at a gathering of their Qualitative and Quantitative UX researchers from around the world. Just to be clear, the qualitative folks are interested in opinions and feelings. The quantitative folks focus on measurable events and hard numbers. My acquaintance from Zurich was one of the three team leaders running the show and my client from New York was another. The third was a woman I hadn’t yet met.
She went to the front of the room and introduced a member of her team, explaining that he would share the story of a big research project he had run in the previous year; a project that had failed. Her words told us that everyone can make mistakes and that we can all learn from them. It seemed to me that her tone, her body language, and the look on her face were telling us that she had been displeased by her subordinate, and that this was his punishment.
The young qualitative researcher started by saying that he really had thought his intervention would work, and he wished that someday he would be able to look at the results, and they would finally make sense and show that he had been right.
His presentation started, and he explained that a 5-point opinion survey had shown that users were unhappy with a particular product. That kind of survey is called a Likert scale, named after Rensis Likert, the psychologist who came up with it back in 1932. He called it “A technique for the measurement of attitudes.” You’ve used that kind of survey before if you’ve ever left a 1-5 star review on an app, or rated anything on a scale of 1 to 10.
Let’s all be clear that I’m making up the following numbers. I want to illustrate the problem, not give away in-house data.
He showed us a bar graph summing up the results of a 5-point Likert scale where 20 percent of the people surveyed had been strongly in favour of the product (5 out of 5), but everyone else had either been neutral or against it. The average opinion was typed out below the graph. It was negative, 2.8 out of 5, or just below neutral. His next job was to improve people’s opinions.
He showed us the product and the survey and he explained how he had collected data about what had displeased the users. Then he showed us how he had tried to resolve those issues, and what he had designed to replace the problem areas, and he showed us the results of the survey he ran after the intervention to measure what people thought of the new and improved product. This was another bar graph, again with the average written below it.
To his embarrassment, the new survey also returned an average rating of 2.8 out of 5.
The poor guy looked wretched, and nearly everyone there enthusiastically joined in on criticizing his work. They suggested areas to improve his method, and offered to review his work in the future, all in a fantastic spirit of schadenfreude.
Eventually my acquaintance interrupted the ruckus and asked if I wanted to share a comment.
With thanks I reminded the young researcher of the wish he had told us about in his opening remarks. Then I told him that this was his lucky day, because the graphs from his presentation had clearly shown us that his new design had in fact significantly improved people’s opinions. The statistics were wrong.
About half of the people in the room started shouting at me. A charming quantitative researcher who had been chatting with me politely all morning was now slamming her fist onto the desk in front of her, screaming that it was impossible to judge significance without running all of the numbers through the right software. She and the others seemed outraged. They went on for a while. Once they’d worn themselves out a little, I cleared my throat and softly said:
“If only there were some kind of universal search engine where we could look this up…”
That got them all started up shouting again, but I didn’t mind. Better they should tire themselves out now, so that they’d be less likely to interrupt when I got around to explaining.
Now, before I offer up that explanation to you, I’m going to suggest you try to solve this puzzle yourself. If you’d like, you can pause this playback and think for a while. You can even go back and listen to the story again. I promise that every clue you need was right there the whole time. For those of you who’d like to see what the graphs looked like, they should be plainly visible wherever you found this link. They may help, but they aren’t necessary.
Why don’t you go ahead and give it a try. When you’re ready, come on back.
--
Welcome back.
Did you figure out that the problem was that they were looking at average answers? If so, congratulations. If not, no worries. The point of this story is that this problem had stumped some very smart and very capable people, and my answer had upset some very skilled quantitative statisticians.
Most quantitative data can be averaged. Qualitative data cannot. For instance, you can calculate the average height of people in a group by measuring their individual heights, adding them up, and then dividing the total by the number of people you measured. This works if the heights were always measured on the same scale. People’s opinions aren’t measured on the same scale. When you rate a meal 5 out of 5 and I rate it 4, that could be because we disagree on the quality of the meal, or it could be because we disagree about what a meal would have to include in order to rate a perfect score.
Using quantitative statistics on qualitative data means your results will always be wrong.
If you’d like a more detailed explanation of how I did a quick qualitative analysis of the graphs, it will be in my forthcoming book. If you’d like to know more about in-depth qualitative statistics, I discuss that in another mystery in this series. You could also just look up “qualitative statistics” on your favourite search engine.