The HE corpus contains 12,070 occurrences of the concept data.
Click here to enlarge and for more details
Refresh the website if the graphics are not shownData occurs mostly in documents published in Europe, followed by North America, Asia, Africa and MENA with comparatively smaller contributions. Overall, the top five contributors in terms of occurrences are IGO, State, NGO, C/B and RC organisations.
Occurrences from IGO and C/B were mostly obtained from general documents published in Europe. NGO and RC documents provide the greatest number of occurrences, primarily from activity reports published in Europe.
State documents mostly generated occurrences in activity reports published in North America.
is a
currency
force for good
preoccupation
prerequisite
lifeblood of decision-making
can vary in quality & sources
is analysed to help decision-making
has great importance & limitations
is collected in many manners
written communication
oral communication
physical observation
mixed means
the collection of which
requires training in methodologies, tools, variables, & other factors
is being fundamentally redesigned because of internet & smartphones
is conducted periodically in rounds & continuously in real-time
may use multiple methods to later triangulate results
may require partnerships at various levels
is missing in areas with "data gaps"
variables (gender, HIV status, mortality, minority group)
cultures
programming
data sources
organisations
types of financing
countries included/excluded from the HDI
related phenomena (internal vs. external migration)
can be collected more effectively via
surveys (individual/industry)
monitoring
data gap analysis
partnership
statistical correction (as a last resort)
standardisation
the lack of which especially affects
the poorest, most vulnerable
the local level
fragile states, economies
nutrition progress
accountability
integration
is experiencing a revolution
with enormous quantities becoming available
with new approaches
should be
protected
managed better
used in a timely fashion
disaggregated
used transparently
comparable
shared more often
the management of which has
methodological challenges
resource issues
related issues
Very few definitional contexts were found for data, with no complete, explicit definitions. Only five useful contexts were found, all of which consist of metaphors and uncontroversial but nonetheless subjective valuations. With the information available, data is categorised as a currency, force for good, preoccupation, prerequisite, and the lifeblood of decision-making. Although there is much discussion over data (see Debates & Controversies), this is limited to how it is used rather than how it should be defined.
It was recognized that "data is the currency of the private sector" and has intrinsic value.
Many, including the UN Secretary-General, advocate that data is a force for good, which will help "get people the support they need, more quickly and more efficiently" (Meneghetti, 2018).
Data itself has become a primary preoccupation in the field of disaster management and the humanitarian sector more broadly.
Data are a prerequisite for delivering the 2030 Agenda for Sustainable Development, ensuring that no one is left behind.
The United Nations Secretary-General''s report, "A world that counts", says " Data are the lifeblood of decision-making and the raw material for accountability.
As with the lack of definitional contexts for data, there is no typology to speak of in the HE corpus. Instead, the most productive contexts with data are related to its uses in multiword terms; this analysis centres on these compounds.
Data appears in many multiword terms, most frequently those regarding how it is used and handled. These terms make up a third of all data contexts - a sizeable portion - but their distribution varies dramatically. Following is a summary of the highest frequency terms, which together offer a sense of the different aspects of humanitarian data usage.
Data collection
makes up over 10% of cases and has a more even distribution than the compounds below. It appears in most text types without extreme outliers, though it is a more common term in Europe and Africa than other regions.
Data quality and data source
together account for 15% of contexts, but roughly 90% of their cases appear in the AA organisation subtype, in North America. These contexts are used in reporting and hence and very repetitive.
Data protection
(3% of contexts) is overwhelmingly concentrated in European Activity Reports, led by the ICRC. It has seen an exponential increase in usage since 2005, appearing twice a year in the 2000s and 85 times in 2017 alone. The GDPR plays a significant role in this rise, but does not entirely explain low frequencies in other geographic regions.
Data management
has the greatest number of cases in Europe (120) but the highest relative density in Africa, with 2-4 times those in other regions. As with other cases of data, its usage is increasing over time.
Data analysis
is most often discussed in Activity Reports, although with a third to half the relative density of Strategy and General Documents. It is less frequent especially in Oceania and MENA.
Overall, these aspects of how organisations approach the use of data have several commonalities. They generally are
managed by a specific team;
managed jointly or require partnership;
improved/monitored via expert consultations;
provided as a unique service to organisations with less expertise/fewer resources;
subject to inter-/intra-organisational frameworks, models, best practices, systems, etc.;
considered areas for improvement.
Data collection is given considerably more focus than other compounds. As the highest frequency term and one of the most evenly distributed, it could be qualified as the centre of attention for humanitarian organisations as a whole. In the contexts found here, data collection is a process undertaken by humanitarian organisations for their own programmes, although some reference the usage of data from external sources (e.g., NASA satellite imagery). Its features are summarised below, and also see Debates & Controversies for a list of challenges for data collection.
Data is collected via
written communication
SMS
questionnaires
case studies
oral communication
hotlines
meetings
focus groups
interviews
household surveys
physical observation
direct observation
physical & biological measurements
aerial footage
satellite imagery
mixed means
tablets, smart phones
apps/software, online portals (developed internally or externally)
crowdsourcing
GPRS transmission
metadata, GPS tagging, bar-code distribution
Data collection
requires training in methodologies, tools, variables, and other factors
is being fundamentally redesigned because of internet and smartphones
is conducted periodically in rounds and/or continuously in real-time
may use multiple methods to later triangulate results
may require partnerships at various levels
Data gaps are mentioned most frequently in European IGO and NGO General Documents. While no explicit definitions were found, the term is used to refer to the lack of data in an area, such that humanitarians cannot comprehensively understand or respond to issues. The quote below emphasises their impact and relevance for humanitarians.
To quote again the late, former UN Secretary-General Kofi Annan, "Data gaps undermine our ability to target resources, develop policies and track accountability. Without good data, we're flying blind. If you can't see it, you can't solve it."
Data gaps exist within/across
variables (gender, HIV status, mortality, minority group)
cultures
programming
data sources
organisations
types of financing
countries included and excluded from the HDI
related phenomena (internal vs. external migration)
They are addressed via
surveys (individual or industry)
monitoring
data gap analysis
partnership
statistical correction (as a last resort)
standardisation
They cause worse outcomes for
the poorest, most vulnerable
the local level
fragile states, economies
nutrition progress
accountability
integration
Data gaps are mentioned with
data deficits
knowledge gaps
quality issues
data incompatibilities
In the HE corpus, data revolution begins to appear in the mid 2010s as a need stated by the UN. For instance, it is related to PARIS21 (GD-155) and appears most often in General Documents. Data revolution is explicitly linked to the areas of sustainable development and nutrition. General Document 158 has the most substantive treatment of the concept. While its usage has increased rapidly, it is still a relatively low frequency compound.
The somewhat related concepts of big data and open data, considered contemporary "buzzwords," also have low but steadily increasing frequencies from Europe and North America. For each there are early adopters and proponents, whose discourse contributes much of the content in the HE corpus. There is also some questioning of what these concepts mean for the future of humanitarian work.
Following on from this work, PARIS21 has prepared a road map for a country-led data revolution that identifies three main elements for success:
1. a major and sustained increase in the generation and use of data to help countries and the world as a whole to deal with the major challenges of eliminating extreme poverty, leaving no one behind and managing natural resources
2. promotion of real institutional change and much more effective use of technology to improve the performance of everyone involved in the production and use of data
3. making data accessible to everyone in ways that they are able to understand and use the data to hold
A key challenge is that the data revolution is not yet producing dividends for most developing countries. Having appropriate information and communications technology (ICT) infrastructure is a pre-condition for seizing the opportunities presented by the data revolution. ICT can also increase the speed, accuracy and impact of data collection and dissemination while reducing costs.5 Yet for this to happen, it is essential to bridge the significant digital divide that underlies the data divide.
Frequent words that accompany a term are known as collocates. A given term and its collocates form collocations. These can be extracted automatically based on statistics and curated manually to explore interactions with concepts.
Comparisons over time between organisation types with the greatest number of hits (IGO, State, NGO, C/B and RC organisations) may prove to be meaningful. Below is an histogram for the top yearly collocation for each of the five organisations with the greatest contribution as well as across all organisation types.
Collocational data for Data was found to be scarce. Across all 5 organisation types analysed, only 3 top collocates were obtained:
collection;
revolution; and
disaggregation
IGO documents generated disaggregation as top collocate in 2016. Other top IGO collocates include collection and revolution.
State documents generated collection as top collocate in 2014 with the highest overall score.
NGO documents generated collocation as top collocate in 201, obtaining the highest overall score.
C/B documents only generated collocation as top collocate for 2017.
RC documents generated revolution as top collocate for 2018.
Organisation subcorpora present unique and shared collocations with other organisation types. Unique collocations allow to discover what a particular organisation type says about data that others do not.
IGO documents feature the following top 10 unique collocates:
specify
JODI ( Joint Organization Data Initiative )
census
reorganisation
column
inventory
smuggling
compilation
watch
ecosystem
State documents feature the following top 10 unique collocates:
verify
DQA (Data Quality Assessment )
FY (Financial Year)
fertility
history
issue-specific
region-specific
semi-annual
exceeded
LAG (Legal Advisors Group)
NGO documents feature the following top 10 unique collocates:
coder
ICMP (International Commission on Missing Persons )
ICCO (Inter-Church Organisation for Development Cooperation )
B'Tselem (A jerusalem-based non-profit organisation)
automatic
regulation
ODK (Open Data Kit software )
prioritisation
sindicate
IDMC (Internal Displacement Monitoring Centre )
C/B documents feature the following top 10 unique collocates:
FTS (Financial Tracking Service )
partial
UNOCHA (United Nations Office for the Coordination of Humanitarian Affairs )
GHA (Global Humanitarian Assistance)
mid-2015
focused
vector
incomplete
express
interactive
RC documents feature the following top 10 unique collocates:
signify
denote
admission
institution-wide
refining
forensic
centralize
informed
finish
administrator
Shared collocations allow to discover matching elements with organisations who discuss data. These constitute intersections between subcorpora.
Top collocates shared by 2 organisation types are:
validation ( State + NGO)
UCDP (Uppsala Conflict Data Program ) (IGO + C/B)
visualisation ( NGO + IGO)
update ( NGO + IGO )
transparency ( IGO + C/B)
utilize ( State + IGO)
UNESCO ( State + IGO)
vary ( State + NGO)
type ( State + IGO)
transfer ( NGO + IGO)
Top collocates shared by 3 organisation types are:
visualization ( RC + NGO + IGO)
verification ( State + NGO + IGO)
warehouse ( State + RC + IGO)
UNICEF ( State + NGO + IGO)
UN ( State + IGO + C/B)
unit ( State + NGO + IGO)
workshop ( RC + NGO + IGO)
treatment ( State + NGO + IGO)
two ( State + NGO + IGO)
water ( State + NGO + IGO)
Top collocates shared by 4 organisation types are:
user ( State + NGO + IGO + C/B)
tool ( RC + NGO + IGO + C/B)
trend ( State + RC + NGO + IGO)
technology ( State + RC + NGO + IGO)
world ( State + NGO + IGO + C/B)
team ( State + RC + NGO + IGO)
year ( State + NGO + IGO + C/B)
training ( State + RC + NGO + IGO)
work ( State + RC + NGO + IGO)
Top collocates shared by 5 organisation types are:
use ( State + RC + NGO + IGO + C/B)
total ( State + RC + NGO + IGO + C/B)
The chart below represents the distribution of data between 2005 and 2019 in terms of the number of occurrences and relative frequency of occurrences. It also allows you to view the distribution across Regions, Organisations and Document types.
The relative frequency of a concept compares its occurrences in a specific subcorpora (i.e. Year, Region, Organisation Type, Document Type) to its total number of occurrences in the entire HE corpus. This indicates how typical a word is to a specific subcorpus and allows to draw tentative comparisons between subcorpora, e.g. Europe vs Asia or NGO vs IGO. You can read these relative frequencies as follows:
Relative frequency is expressed as a percentage, above or below the total number of occurrences, which are set at 100%. This measure is obtained by dividing the number of occurrences by the relative size of a particular subcorpus.
Under 100%: a word is less frequent in a subcorpus than in the entire corpus. This is means that the word is not typical or specific to a given subcorpus.
100%: a word is as frequent in a subcorpus as it is in the entire corpus.
Over 100%: a word is more frequent in a subcorpus than in the entire corpus. This means that the word in question is typical or specific to a given subcorpus.
As an author, you may be interested in exploring why a concept appears more or less frequently in a given subcorpus. This may be related to the concept's nature, the way humanitarians in a given year, region, organisation type or document type use the concept, or the specific documents in the corpus and subcorpora itself. To manually explore the original corpus data, you can consult each Contexts section where available or the search the corpus itself if needs be.
Occurrences of data were highest in 2018, also obtaining the highest relative frequency recorded (142%).
Europe generated the greatest number of occurrences and North America generated the highest relative frequency with 156%.
The top 5 organisation types with the highest relative frequency of data are State, C/B, IGO, Net, Project and WHS.
General documents provided the greatest number of occurrences as well as the highest relative frequency with 168 %.
This shows the evolution of data and in the vast Google Books corpus, which gives you a general idea of the trajectory of the term in English books between 1950 and 2019. Values are expressed as a percentage of the total corpus instead of occurrences.
Please note that this is not a domain-specific corpus. However, it provides a general overview of and its evolution across domains.
Data increases at a steady pace in 1950 and it reaches it peak in 1983. From then onwards it declines until 2019.
A sample of 100 contexts was collected to better understand humanitarian difficulties surrounding data. Among these contexts, collection, analysis, limitation, gaps, protection, and quality were the most frequent keywords. Data collection received by far the most attention, although low frequency keywords still offer important perspective, such as possible disagreements over data interpretation.
collection
analysis
limitations
gaps
protection
quality
management
timeliness
disaggregation
revolution
transparency
comparability
sharing
availability
capacity
interpretation
Various challenges exist regarding the use of data in the humanitarian field. As the keywords above show, these distinct yet interrelated areas pose multidimensional challenges at all stages of operations, with data collection being the perennial issue.
Generally speaking, contexts across the corpus share similar messages and underscore the shared difficulties of modernising data systems. While data limitations occur for many reasons, their impact on organisational effectiveness is a commonality.
The list below includes issues that have been related to data collection, though they notably extend to later stages, including analysis. This reflects the preventive attitude that organisations have in regards to potential difficulties that could undermine results at any stage, as seen here:
The key lesson is that a high-profile data analyst must be involved in the study from inception to incorporate any variations before the actual data collection.
methodological challenges
bias for reporting more extreme cases
quality/error control
data entry
data cleaning
diversity of methods internationally
quantity and representativeness
statistical validity
proper disaggregation (age, sex, etc.)
comparability across data sets
timeliness (i.e. delay between collection, analysis and response)
streamlining
harmonisation
verification
resource issues
unequal access to resources and technology
data gaps
outdated tools, methods
availability, sharing across organisations
related issues
physical and digital security
government authorisation
site access, safety
local cultures, practices
adapting and testing with local conditions
While many contexts only mention the problematic state of data management in passing, others, generally from General Documents, offer more in-depth commentary. These commentaries tend to reflect the general concerns enumerated in the list above, despite the unique circumstances of each organisation.
Main challenges raised by IASC organizations:
Linking data analysis and collective decision-making. Decision-makers often make decisions without having the right data in hand.
Protect what has worked so far in the area of information management and reduce duplication.
Optimize technology at our disposal (mobile phones are underutilized).
Capacity challenge faced by governments and partners that are excluded from digital revolution.
Lack of Sex and Age disaggregated data.
Need for a common approach.
Inter-operability and data protection differ between organizations making it difficult to share data.
In the same survey, organisations raised concerns about data privacy and security issues, inadequate IT systems and human resources for effective publishing, and the need for further improvements to the IATI Standard to ensure that it meets the needs of the humanitarian community.41 Grand Bargain signatories will work together with the support of external partners in the coming months and years to overcome these challenges.
A veritable explosion in the volume, variety, veracity, source and speed of available data creates ever-increasing opportunities to understand the world and respond more effectively to development challenges ( Data Revolution Group, 2014). But there are major data gaps, including in civil registration and vital statistics systems (CRVS), poverty data and humanitarian assessments, and whole populations can be rendered invisible as a result. According to one estimate, as many as 350 million people are likely to be absent from the data used to measure development progress (Carr-Hill, 2013), many of whom are in countries affected by humanitarian crises (Development Initiatives, 2017b).
The American Red Cross is able to track and integrate social media comments from a disaster-affected area into response decision-making. The problem with data monitoring is that the sheer volume of data needs to be converted into timely and actionable information. During Hurricane Sandy, for example, more than half-a-million Instagram pictures and 20 million tweets were posted. In Japan, there were more than 177 million disaster-related tweets the day after the 2011 earthquake.Another challenge is the risk to data security and privacy, and of information misuse. These concerns are legitimate, but the actual risk may vary and it depends on the type of data being collected.
But transparency is neither neutral nor natural: instead, organizational approaches to improving transparency, such as engaging with a constituency on Facebook, are artificial and constructed. Moreover, in a world characterized by information overload, it is difficult to determine both what is relevant and what is missing. Finally, transparency also has an internal aspect: technical challenges to transparency have long been considered a problem. Data silos – systems that are not designed to facilitate data exchange – have proliferated within the humanitarian community and are believed to greatly complicate transparency.
Aside from discussing the challenges of data management, organisations have also clearly stated the progress being made, sometimes offering valuable and unique examples.
This report presents impressive examples where technologies already contribute to humanitarian action, often with the result of putting affected communities at the centre of humanitarian action as engaged participants and not merely as witnesses or recipients of aid. 10 Focus on technology and the future of humanitarian action In Syria, for example, digital data collection tools were adapted and are now used to serve as a commodity tracking system, monitoring the distribution of supplies as they are transported and delivered by local partner organizations in areas that remain inaccessible to international humanitarian agencies.
This is particularly the case for livelihoods in the informal sector, which in some cases can make up more than 50 per cent of the economic activity of a village or region. In response to the challenge of data quality, CRED has developed a ranking system to rationalize its choice of data sources. This improves transparency but cannot address the lack of standardized and systematic data collection. In 2003, the World Disasters Report (Chapter 7) argued that data quality could be improved by encouraging active rather than passive data collection.
This chapter discusses how thinking on development and development co-operation have been informed by the availability and use of data, and what now needs to change to efficiently exploit traditional data sources and take advantage of new ones. It argues that the data revolution is contributing to three shifts in focus: from gross domestic product to multi-dimensional well-being; from aggregate to micro data; and from administrative data to smart data.
You can add your feedback on this LAR and say whether you need us to expand the information on any section by filling in a brief form.