Data Journalism

Mongabay stories often include discussion of data to elucidate and analyze measurable phenomena like land cover change. This section will help you write about data and choose valid data sources.

The basics

Data is a plural word (its singular form is “datum”) so please use appropriate verb conjugations.

The data show not The data shows

Please mention where the data came from and, if possible, when and how they were collected.

Data sources

The source of data should be vetted for reliability, currency, scope and bias before including them in a story.

Make sure the data source includes a thorough methodology for how, when and which data were collected.

Independently collected data published in peer-reviewed studies tend to be the most reliable.

Data self-published by advocacy organizations can provide interesting, valid insights — but be aware of reliability issues.

Checking data and conclusions past outside experts is highly recommended.

Writing about data

Context is important when discussing data. For instance, saying “Data from the University of Maryland indicate Colombia’s Guaviare Department lost ___ hectares of tree cover in 2015…” doesn’t really mean much unless you compare it to something.

Colombia’s Guaviare Department lost __ of its tree cover in 2015
Colombia’s Guaviare Department lost __ hectares of tree cover in 2015 – a representing a doubling over 2014 numbers

Make sure the data points you’re comparing were collected via the same methodologies. For instance, tree cover loss data detected via satellite should not be compared to FAO forest cover data that were collected via on-the-ground estimates.

Do not confuse correlation with causation. Correlation implies two variables are related to one another, while causation means a change in one was directly caused by the other. Causation can be difficult to prove, particularly for situations with a high number of variables. When in doubt, go with correlation.

The data show a correlation between primary forest loss and road construction
The clearing of the patch of rainforest was caused by the development of an illicit runway

There can be mistakes in data sets, so include measures of uncertainty (e.g. margins of error) when applicable.

Working with data

Researchers may send you their data sets, which can provide a wealth of information. If you are working within a data set, please repeat your calculations multiple times to mitigate errors.

Averaging numbers is the most common way to calculate the central tendency of a data set. But you’re working with a data set with a few big outliers, the median may be a better choice since averages can be skewed by outliers and not accurately represent the center point of the data set.

Example: Consider a data set with the following (hypothetical) tree cover loss values for 2007-2011:

2007 – 3,000 hectares

2008 – 4,000 hectares

2009 – 5,000 hectares

2010 – 6,000 hectares

2011 – 38,000 hectares

The average of these values is 11,200 hectares while the median is 5,000 hectares. If you were writing about the jump in tree cover loss in 2011, it would make more sense to compare that number to the median of the data set (5,000) rather than the average (11,200) since the former more accurately represents the central tendency of this particular group of data.

Microsoft Excel is very useful for performing calculations and creating charts. If you’re working with a lot of data points, it’s generally best to create a chart to visualize trends rather than describing them at length in the text.

Example: This Mongabay article includes two kinds of charts that concisely and effectively visualize land cover change data.

Microsoft provides tutorials on how to create a chart using data in an Excel spreadsheet. They also have resources on Excel’s various formulas and functions.

If you’d like help crunching data and creating charts, contact Mongabay's graphics specialist.

Report abuse