Task A - Term Typing

Discover the generalized type for a lexical term

Given training instances defined per the following formalism 

Where S is an optional context sentence (if available in the source ontology), L is the lexical term prompted for, and T is the conceptual term type.  In the test phase, types are hidden, and participants predict them for given terms using their trained models. To account for the synonymity phenomenon in the English language, we permit systems returning synonym clusters as type predictions for a term. In this case, the evaluation program will look for the presence of the designated source ontology type in the synonym cluster.

SubTask A.1 - Term Typing - WordNet

WordNet is a Lexicosemantics data dump that is derived from the original WordNet but released as a benchmark dataset with pre-created train and test splits. Overall, the train consists of 40,943 terms and the test consists of 9,470 terms with 18 different relation types between the terms and four-term types (noun, verb, adverb, adjective). Stats of the train and test set with the number of types for this subtask are as follows:

Examples of this dataset are presented as follows w.r.t the definition.

SubTask A.2 - Term Typing - GeoNames

GeoNames consist of geographical locations that comprise 680 categories of geographical locations, which are classified into 9 higher-level categories, e.g. H for stream, lake, and sea, and R for road and railroad.  Stats of the train and test set with the number of types for this subtask are as follows:

Examples of this dataset are presented as follows w.r.t the definition.

SubTask A.3 - Term Typing - UMLS


NCI is the subontological source from UMLS, that is produced by NCI Enterprise Vocabulary Services (EVS) to facilitate the standardization of terminology across the Institute and the larger biomedical community. It provides reference terminology for many NCI and other systems. It covers vocabulary for clinical care, translational and basic research, and public information and administrative activities. Stats of the train and test set with the number of types for this subtask are as follows:

Examples of this dataset are presented as follows w.r.t the definition. As we can see, for given lexical term L, we can expect multiple types for this source.


MEDCIN is the subontological source from UMLS. MEDCIN, a medical terminology system, covers a wide range of medical components including symptoms, medical history, physical examination findings, diagnostic tests, diagnoses, and treatment options. Within MEDCIN, numerous clinical hierarchies have been established. These hierarchies are formed by linking various MEDCIN data elements to describe diagnoses found in the diagnostic index. Unlike reference terminologies, which typically focus on semantic relationships between words, MEDCIN adopts a clinical relationship approach, emphasizing the interconnectedness of medical concepts within diagnostic hierarchies. Stats of the train and test set with the number of types for this subtask are as follows:

Examples of this dataset are presented as follows w.r.t the definition. As we can see, for given lexical term L, we can expect multiple types for this source.


SNOMEDCT_US is the subontological source from UMLS. SNOMEDCT_US serves as the foundational general terminology used in electronic health records (EHRs). Its concepts are equipped with distinct meanings and are accompanied by formal logic-based definitions that are structured into hierarchical arrangements. Stats of the train and test set with the number of types for this subtask are as follows:

Examples of this dataset are presented as follows w.r.t the definition. As we can see, for given lexical term L, we can expect multiple types for this source.

SubTask A.4 - Term Typing - GO

Biological Process

GO-Biological Process is a subontological source from Gene Ontology (GO). GO-Biological Process describes our knowledge of the biological domain in the larger processes accomplished by multiple molecular activities. The statistics SubTask B.4-GO - Biological Process is represented as follows: 

Examples of this dataset are presented as follows w.r.t the definition. As we can see, for given lexical term L, we can expect multiple types for this source.

Cellular Component

The GO-Cellular Component is a sub-ontological source from Gene Ontology (GO). The GO-cellular component location, relative to cellular compartments and structures, is occupied by a macromolecular machine. The statistics SubTask B.4-GO  - Cellular Component represented as follows: 

Examples of this dataset are presented as follows w.r.t the definition. As we can see, for given lexical term L, we can expect multiple types for this source.

Molecular Function

GO-Molecular Function is a sub-ontological source from Gene Ontology (GO). GO-Molecular Function describes activities performed by gene products. activities that occur at the molecular level, such as “catalysis” or “transport”. The statistics SubTask B.4-GO -- Molecular Function is represented as follows: 

Examples of this dataset are presented as follows w.r.t the definition. As we can see, for given lexical term L, we can expect multiple types for this source.