What is data journalism?

By MaryJo Webster

What is data journalism?

To answer that, let's circle back to a more basic question: What is journalism?

This is an old profession that has changed a lot over time, but one thing has always remained constant. Journalists ask important questions and seek out answers to share with their neighbors.

A long time ago, the answers largely came from human sources. Public records laws changed that starting in the 1960s, providing access for journalists and the public to government documents. In those early years, that meant paper records.

Then computers came onto the scene and by the 1980s, investigative journalists, in particular, realized they wouldn't be able to truly access government records if they didn't know how to work with electronic versions where information was neatly packed into rows and columns, often encompassing multiple tables. Getting answers from this form of government records meant learning how to be a data analyst.

But as numerous articles have pointed out - including this one - at least some journalists were analyzing data long before it had a name.

Since then, journalists who have these analytical skills have often been treated as a niche in the newsroom. For those of us in that role, it's a bit of a good thing (lots of job security!) and also a bad thing because we think analyzing data is just another form of asking questions, and that it's a skill every journalist should have.

Learning how to analyze data is kind of like learning a foreign language. You have to learn how to translate your questions into another language (which differs depending which computer program you're using) and then you also have to interpret the answers. Many journalists presume it requires being good at math. It definitely does not. The level of math we're doing is akin to about a fourth grader's lessons, but most often, the work we're doing is more in the realm of basic statistics, with an emphasis on basic.

Underneath that layer of learning a computer language (which isn't as hard as it sounds), we have something every journalist already knows: Asking questions. The best data journalists are always the people in the room who ask the best questions. And the ones who ask a lot of questions.

One big misconception I run into over and over is that too many editors see data journalists merely as people who gather data points. They ask a reporter to go "get some data" to go with a story, but what they really mean is to ask the government agency for a couple big, fat numbers that somebody has already compiled.

Data journalists, though, are people who ask questions of the raw data that hasn't yet been analyzed and make their own data points, rather than relying on someone else's interpretation. That raw data they start with is usually the most detailed information they can get. And usually the question(s) they are asking are ones that haven't already been asked and answered by others. The reason this results in such powerful stories is that raw data provides insights that you can't get any other way.

For example, let's say a journalist asks the local police department how many crimes occurred in the city last year. The police will give her one number. It really doesn't tell you anything. Even if she got the number for the previous year, all you can really do with that is say whether crimes are up or down.

Those big, fat numbers leave out the detail. We don't know WHERE the crimes occurred, or WHEN they occurred, or WHO they might have affected or WHO might have committed them, or WHAT happened during each of those crimes. Maybe the police have done some analysis and answered some of those questions, but what if they haven't?

If we had a spreadsheet where each row represented a crime last year and the columns identified the date it was reported, the location where it occurred, the name of the person arrested (if there was one) and perhaps some detail about the incident, then we can do our own summation to find things like: Where was there the biggest concentration of crimes? Which part of the city saw the biggest increase compared to last year? What day of the week or time of the day do the crimes most often occur?

In addition to being able to get answers to those questions, the data journalist also gets to see all the detail. They can zoom in on a particular crime, and find examples that can be used as the anecdotes in the story. Seeing the individual incidents also helps understand the big picture. Let's say there's a big increase in crime in one neighborhood. You can then pull up all the incidents in that neighborhood and see if that's being driven by a particular kind of crime, or perhaps a rash of crimes during a given week or some other pattern.

The questions that data journalists ask are usually WHO, WHAT, WHEN, and WHERE. We try to ask the WHY question, but data is usually terrible at providing an answer to that (we usually still need the humans for that one). Sounds like Reporting 101 class, doesn't it?

It's important to note that there are different levels of data journalism. A lot of it is simple enough that a few hours of training will get you off the ground. Some of the fancier projects you see winning the big prizes are ones that came about as a result of very sophisticated analyses and large databases that required not only a higher level of skill but also years of experience (a.k.a. fluency).

So in the beginning, your data analyses will be like those first conversations you have when you learn a foreign language (Where is the bathroom? How much does that cost? How do I get to the train station?) and you'll need to keep practicing to get to the point of being able to have a long, in-depth conversation with that person who has spoken that language their whole life. But you will get there.