ORIGINS OF EPIDEMIOLOGY

John Snow and the Cholera outbreak of 1849

In 1852 John Snow published his analysis looking at the association between London districts and cholera. He was in the contagion camp of thought believing that cholera was spread by something travelling between people. He theorised that the areas worst hit by cholera were the districts closest to the Thames where sewage was pumped into it. Unfortunately, William Farr had a different theory. He was in the miasma camp believing cholera was spread by something in the air. He theorised that elevation above sea level was the main causal factor driving the cholera outbreak. Farr’s status as the Registrar General gave him more sway with the authorities and his theory was supported. Snow’s ideas were rejected which ultimately resulted in the continuation of the outbreak and the loss of hundreds of lives.

In 2004 the Journal of Public Health published an article re-examining the 1849 cholera outbreak data: https://www.ncbi.nlm.nih.gov/pubmed/15313591. The article presented the original data and the analysis performed by both Farr and Snow as well as a reanalysis. The authors used a method for the reanalysis that had not been developed in 1854, logistic regression. Logistic regression is a standard statistical method in modern statistics but it was not developed until 1958. This is covered in our third course, Statistics for Public Health: Logistic Regression for Public Health.

Snow had limited data when he first published his theory in 1849. He presented simple descriptive statistics, which, he observed, showed more deaths in the districts that received water from the Thames after sewage had been pumped into it.

In 1852, Farr published additional data on deaths from the 1849 cholera outbreak along with a more detailed analysis. He proposed eight potential explanatory variables and looked at simple associations between them and cholera deaths. The explanatory variables included elevation above sea level; crowding measured by persons per house and persons per acre; wealth measured by average value of a house within a district and average house value per head of the district population; poor rate precept per pound of house; annual mortality; and water supply. It’s not clear how Farr arrived at these explanatory variables. Farr graphed the individual associations between each of these explanatory variables and the number of deaths. He also split (known as dichotomisation which we will cover later) each of the explanatory variables into groups reflecting high and low values and compared the ratio of deaths in the low value groups to the high value groups. Farr found an association between each of these explanatory variables and cholera deaths but concluded that the evidence supported elevation as the most important factor.

Farr acknowledge that the data showed that there was an association between cholera deaths and water supply but argued that his analysis showed that elevation played a role over and above that of water supply. He surmised this after he had split the data into groups based on water districts and presented ratio of deaths across these by the previous high and low categories of each explanatory variable.

What both men failed to recognise in their analysis is that correlation or associations between a potentially explanatory variable and outcome data does not imply causation. As a result of this the conclusions Farr drew from his analysis were wrong. There are lots of famous examples showing spurious correlations, in fact there is an entire website dedicated to this www.tylervigen.com/spurious-correlations. When looking back at the original analysis we must remember that Farr and Snow were conducting their analysis in the 1800s and were limited by the methods available to them. More sophisticated modelling methods had not yet been developed. If they had been perhaps the later cholera outbreaks would have had better outcomes for the residence of London.

The Millennium Development Goals (MDGs)

In 2000 the United Nations (UN) set 8 development goals, related to poverty, healthcare, education and equality which were agreed upon by 189 nations. These were named the Millennium Development Goals (MDGs). The idea was to provide a framework of time-bound targets by which progress could be measured for each of the 8 goals. Statisticians selected indicators that could be used to monitor the progress of these goals between 2000 and 2015.

The report published in 2015 highlighted that the MDG had achieved a number of successes including reducing the number of people living in extreme poverty by more than half and reducing the global under 5 mortality rate by more than half between 1990 and 2015.

Success of the MDGs is attributed in part down to continuous monitoring that was undertaken and specification of targets to be reached over the course of 15 years that were used as indicators of progress. As the UN put it “What gets measured gets done.” The full report which can be found here http://www.un.org/millenniumgoals/2015_MDG_Report/pdf/MDG%202015%20rev%20(July%201).pdf suggests that the monitoring plans of the MDGs allowed countries to focus their development policies. For example In Colombia, local data showed “uneven rates of progress, which motivated local governments to implement key interventions according to local priorities”. This highlights the importance good quality data can have on policy.

The monitoring task was a huge undertaking in terms of data collection and assessment. It was achieved in part through strong collaboration with global organisations and between developed and developing countries. Setting of the MDGs also provided motivation to “increase the production and use of development data”. Progress reports were reliant on access to reliable data; as well as ensuring that the necessary resources were in place to collect the data; and that there was capability to analyse this data, especially in developing countries. As the final report remarks: “Monitoring requirements drew attention to the need for strengthening statistical capacity and improving statistical methodologies and information systems at both national and international levels. Over time, this increased the availability of more and better data, while improving coordination within national statistical systems and leading to new statistical methodologies.” For example “in 2003, only 2 per cent of developing countries had at least two data points for 16 or more of the 22 indicators, by 2014 this figure had reached 79 per cent.”

“The MDG monitoring experience … demonstrated that effective use of data can help to galvanize development efforts, implement successful targeted interventions, track performance and improve accountability”. Going forward the report highlighted that “strengthening statistical capacity is the foundation for monitoring progress of the new development agenda” and “promoting open, easily accessible data and data literacy is key for effective use of data for development decision-making”.


Additional reading that may be of interest:


Source: https://www.coursera.org/learn/introduction-statistics-data-analysis-public-health/supplement/hJm7H/john-snow-and-the-cholera-outbreak-of-1849