Welcome to the Linguistic Analysis Portal for the Humanitarian Encyclopedia. We recommend using the latest version of Mozilla Firefox for a better experience.
As an author of an entry in the Humanitarian Encyclopedia (HE), you receive a Linguistic Analysis Report (LAR) based on the information contained in the HE Corpus. Each LAR is individually designed to assist authors in explaining and analyzing humanitarian concepts.
LARs provide rich overviews on how concepts behave in a large collection of documents, i.e. a text corpus. Thanks to natural language processing, linguists can identify textual phenomena and draw insightful conclusions from analysing word frequencies, co-occurrences and comparing sections between one another. Your LAR contains results that can help you understand how different humanitarian organizations, as well as other authors, treat your concept.
Here you will find a general description of the structure of LARs and the HE Corpus, which will allow you to better understand your LAR, and get the most out of it.
Access
Aid
Armed actors
Capacity-building
Cash
Child
Community
Community based approach
Context
Coordination
Corruption
Culture
Disaster risk reduction
Early warning
Education
Effectiveness
Efficiency
Empathy
Equity
Ethics
Ethnicity
Famine
Funding
Humanitarian action
Humanitarian actor
Humanitarian imperative
Humanitarian reform
Humanitarian space
Humanitarian worker
Humanitarian-development nexus
Implementation
Inclusion
Inter-governmental organisation
Intervention
Justice
Law
Leadership
Local organisation
Management
Mandate
Mitigation
Need
Nutrition
Partnership
Peace
Policy
Poverty
Programme
Protection
Quality
Recovery
Rehabilitation
Response
Responsibility-to-protect
Security
Services
Solidarity
Sustainability
Terrorism
Urbanisation
Each report is generally composed of the following elements:
Frequencies, which will allow you to see the regions, document types, years and organization types where your concept appears relevant.
Definitions, whether standardized and authoritative or ad hoc, together with a summary of definitional elements and a comparison based on the corpus metadata.
Related concepts, indicating how concepts change their relational behaviour based on organization type, geographical areas or time.
Frequent collocations, mostly nouns, adjectives and verbs, showing other surrounding concepts in the corpus.
Synonyms and antonyms, where applicable, together with the sources from which they were extracted.
Usage over time, where applicable, according to both the HE corpus and Google Ngram Viewer.
Trends, debates and controversies surrounding the concept.
All of these elements are accompanied by comparative observations together with a series of interactive graphics and the contexts from which the data where extracted. Graphics are shown in the form of small snapshots in the main page of each entry, you can open them in a new page by clicking in Click here to enlarge. If you click on See contexts, you will see a dynamic table where you can filter the results based on different parameters.
The HE Corpus is a collection of 4,824 documents, amounting to a total of 71,201,157 distinct words. Each document contains metadata, which allows to divide the corpus into multiple sections, i.e. sub-corpora. The following sections describe the corpus composition by issuing organisation, region and year of publication, as well as document type. You can access and browse the documents by code in the left hand panel.
The HE Corpus consists of documents produced by multiple organisations dealing with humanitarian matters. Each organisation is categorised in a hierarchy of 11 types and 26 subtypes.
The following interactive chart shows the composition of the HE Corpus by organisation type and subtype. Clicking on a wedge of the big pie chart reveals the subtypes comprising each type. To find out more about each organisation type and subtype, simply hover over each wedge.
Comparing phenomena across regions can yield significant insight as to how a concept behaves in different parts of the world. Based on their place of publication, documents in the HE Corpus are classified into 7 regions: Africa, Asia, CCSA (Caribbean, Central and South America) MENA (Middle East and North Africa), North America and Oceania.
The following map illustrates the distribution of words in the corpus according to each world region. Click on each bubble to learn more about the distribution of organisation types in each region.
Organisations involved in the humanitarian domain generate a wealth of textual data. Based on their nature and communication role, all documents in the HE Corpus are classified into one of 3 broad categories: General Document, Activity Report and Strategy. In addition, each document is identified by its year of publication. Not only can linguists compare patterns across document types, but they can also look at when phenomena emerge and how they evolve across time.
The following histogram shows the distribution of all documents in the corpus by document type between 2005 and 2019.