Linguistic Analysis

The Linguistic Analysis Portal for the Humanitarian Encyclopedia

Information for Authors

HE Corpus

You may choose list view to look for a code or grid view to browse through the documents.

Welcome to the Linguistic Analysis Portal for the Humanitarian Encyclopedia. We recommend using the latest version of Mozilla Firefox for a better experience.

As an author of an entry in the Humanitarian Encyclopedia (HE), you receive a Linguistic Analysis Report (LAR) based on the information contained in the HE Corpus. Each LAR is individually designed to assist authors in explaining and analyzing humanitarian concepts.

LARs provide rich overviews on how concepts behave in a large collection of documents, i.e. a text corpus. Thanks to natural language processing, linguists can identify textual phenomena and draw insightful conclusions from analysing word frequencies, co-occurrences and comparing sections between one another. Your LAR contains results that can help you understand how different humanitarian organizations, as well as other authors, treat your concept.

Here you will find a general description of the structure of LARs and the HE Corpus, which will allow you to better understand your LAR, and get the most out of it.

Concepts

Access

Affected population

Aid

Armed actors

Capacity-building

Cash

Child

Civilian

Climate Change

Communication

Community

Community engagement

Community based approach

Competition

Conflict

Context

Contingency planning

Coordination

Corruption

Culture

Disaster risk reduction

Do no harm

Early warning

Education

Effectiveness

Efficiency

Emergency

Empathy

Empowerment

Epidemic

Equity

Ethics

Ethnicity

Famine

Forced displacement

Funding

Gender based violence

Grand bargain

Health

Humanitarian action

Humanitarian actor

Humanitarian imperative

Humanitarian reform

Humanitarian space

Humanitarian worker

Humanitarian-development nexus

Implementation

Inclusion

Independence

Innovation

Integrated approach

Inter-governmental organisation

Intervention

Justice

Knowledge

Law

Leadership

Leave no one behind

Livelihood

Local organisation

Localisation

Logistics

Management

Mandate

Mitigation

Need

Non-governmental organisation

Nutrition

Participation

Partnership

Peace

Policy

Poverty

Programme

Protection

Psychosocial support

Quality

Recovery

Rehabilitation

Remote-sensing

Resilience

Response

Responsibility-to-protect

Rights-based approach

Risk

Sanitation

Security

Services

Solidarity

Sustainability

Terrorism

Urbanisation

The LAR

Each report is generally composed of the following elements:

Frequencies, which will allow you to see the regions, document types, years and organization types where your concept appears relevant.
Definitions, whether standardized and authoritative or ad hoc, together with a summary of definitional elements and a comparison based on the corpus metadata.
Related concepts, indicating how concepts change their relational behaviour based on organization type, geographical areas or time.
Frequent collocations, mostly nouns, adjectives and verbs, showing other surrounding concepts in the corpus.
Synonyms and antonyms, where applicable, together with the sources from which they were extracted.
Usage over time, where applicable, according to both the HE corpus and Google Ngram Viewer.
Trends, debates and controversies surrounding the concept.

All of these elements are accompanied by comparative observations together with a series of interactive graphics and the contexts from which the data where extracted. Graphics are shown in the form of small snapshots in the main page of each entry, you can open them in a new page by clicking in Click here to enlarge. If you click on See contexts, you will see a dynamic table where you can filter the results based on different parameters.

The HE Corpus

The HE Corpus is a collection of 4,824 documents, amounting to a total of 71,201,157 distinct words. Each document contains metadata, which allows to divide the corpus into multiple sections, i.e. sub-corpora. The following sections describe the corpus composition by issuing organisation, region and year of publication, as well as document type. You can access and browse the documents by code in the left hand panel.

Organisation Types and Subtypes

The HE Corpus consists of documents produced by multiple organisations dealing with humanitarian matters. Each organisation is categorised in a hierarchy of 11 types and 26 subtypes.

The following interactive chart shows the composition of the HE Corpus by organisation type and subtype. Clicking on a wedge of the big pie chart reveals the subtypes comprising each type. To find out more about each organisation type and subtype, simply hover over each wedge.

Refresh the website if graphics are not properly shown.

World Regions

Comparing phenomena across regions can yield significant insight as to how a concept behaves in different parts of the world. Based on their place of publication, documents in the HE Corpus are classified into 7 regions: Africa, Asia, CCSA (Caribbean, Central and South America) MENA (Middle East and North Africa), North America and Oceania.

The following map illustrates the distribution of words in the corpus according to each world region. Click on each bubble to learn more about the distribution of organisation types in each region.

Document Types & Year of Publication

Organisations involved in the humanitarian domain generate a wealth of textual data. Based on their nature and communication role, all documents in the HE Corpus are classified into one of 3 broad categories: General Document, Activity Report and Strategy. In addition, each document is identified by its year of publication. Not only can linguists compare patterns across document types, but they can also look at when phenomena emerge and how they evolve across time.

The following histogram shows the distribution of all documents in the corpus by document type between 2005 and 2019.

Page updated

Google Sites

Report abuse