May 16, 2020 at Palais du Pharo, Marseille, France
Computational Terminology covers an increasingly important aspect in a range of areas in Natural Language Processing such as text mining, information retrieval, information extraction, summarization, textual entailment, document management systems, question-answering systems, ontology building, machine translation, etc. Terminological information is paramount for knowledge mining from texts, including bilingual texts, for scientific discovery and competitive intelligence. Scientific needs in fast growing domains (such as biology, medicine, chemistry and ecology) and the overwhelming amount of textual data published daily demand that terminology is acquired and managed systematically and automatically; while in well-established domains (such as law, economy, banking and music) the demand is on fine grained analyses of documents for knowledge description and acquisition. For all specialized domains, multilingual terminology is more and more mandatory.
There have been four years between the last Computerm workshop held in Coling 2016. During this period, deep learning and neural methods have become the state of the art for most NLP applications, reaching higher performance on various tasks. This workshop would like to investigate what deep learning brought to computational terminology and its traditional topics, its impact towards human applications, and the new questions within the terminology scope that it raises.
The aim of this sixth Computerm workshop is to bring together Natural Language Processing and Human Language Technology researchers as well as terminology researchers and practitioners to discuss recent advances in computational terminology and its impact within automatic and human applications. We also host a special session for the shared task TermEval, which uses the large, manually annotated ACTER dataset (Annotated Corpora for Term Extraction Research), that covers multiple domains and languages.
This year, in addition to the general session, we introduce a special session for TermEval shared task.
Themes and topics
For the general session, we call for submissions in the following areas, though the list does not limit the range of topics to be considered:
- term extraction, including single and complex terms either morphologically or syntactically built, which is the core of the terminological activity that lays basis for other terminological topics and tasks; - event recognition and extraction, that extends the notion of the terminological entity from terms meaning static units up to terms meaning procedural and dynamic processes;
- acquisition of semantic relations among terms, which is also an important research topic as the acquisition of semantic relationships between terms finds applications such as the population and update of existing knowledge bases, definition of domain specific templates in information extraction and disambiguation of terms;
- term variation management, that helps to deal with the dynamic nature of terms, their acquisition from heterogeneous sources, their integration, standardization and representation for a large range of applications and resources, is also increasingly important, as one has to address this research problem when working with thesaurus, ontologies and textual data. Term variation is also related to their paraphrases and reformulations due to historical, regional, local or personal issues. Besides, the discovery of synonym terms or term clusters is equally beneficial to many NLP applications;
- definition and terminological context extraction, that cover important research and aims to provide usage and description of terminological entities. Such definitions and usage contexts may contain elements necessary for the formal description of terms and concepts within ontologies;
- consideration of the user expertise, that is becoming a new issue in the terminological activity, considers the fact that specialized domains contain notions and terms often non-understandable to non-experts or to laypersons (such as patients within the medical area, or bank clients within banking and economy areas). This aspect, although related to specialized areas, provides direct link between specialized languages and general language and are crucial for applications such as automatic email generation or spoken language interface;
- distributional semantic analysis in specialized domain to construct semantico-conceptual representations of the domain (domain ontologies, thesaurus, terminological resources) which requires to deal with small size corpora to the contrary of gargantuan corpora of general language;
- monolingual and multilingual resources, that open the possibility for developing cross-lingual and multi-lingual applications, requires specific corpora, methods and tools which design and evaluation are challenging issues;
- robustness and portability of statistical methods, which allows to apply methods to developed in one given context to other contexts (corpora, domains, languages, etc.) and to share the research expertise among them; - detection of unfortunate artefacts in terminology processing such as suspicious terms and term translation errors which are clue of fake scientific news and fake neural scientific news;
- social networks and modern media processing, that attracts an increasing number of researchers and that provides challenging material to be processed; - utilization of terminologies in various NLP applications, including machine translation, as they are a necessary component of any NLP system dealing with domain-specific literature, is another novel and challenging research direction.
Besides, experiments on the evaluation of terminological methods and tools are also encouraged since they provide interesting and useful proof about the utility of terminological resources:
- direct evaluation may concern the efficiency of the terminological methods and tools to capture the terminological entities and relations, as well as various kinds of related information;
- indirect evaluation may concern the use of terminological resources in various NLP applications and the impact these resources have on the performance of the automatic systems. In this case, research and competition tracks (such as TREC, BioCreative, CLEF, CLEF-eHealth, I2B2, *SEM, and other shared tasks), provide particularly fruitful evaluation contexts and proved very successful in identifying key problems in terminology such as term variation and ambiguity.
TermEval: A special session for a shared task on term extraction
This time we include a special session, TermEval, which will be a shared task on monolingual term extraction using the ACTER dataset. This dataset contains over 100k manual annotations in comparable corpora in three different languages (English, French, and Dutch) and four different domains (corruption, dressage, heart failure, and wind energy). Participants in the shared task can enter for one or multiple languages and will get access to the annotated data in three of the domains, while the domain of heart failure will be provided at a later stage for evaluation. Participants can choose from different tracks (including or excluding named entities & open or closed track) and will be ranked based on f1-scores of the list of automatically extracted terms on the evaluation corpus. Apart from the scores, there will also be more in-depth evaluations on how the tools handle difficulties, e.g. infrequent terms, single-word vs. multiword terms, etc. All information concerning the shared task is available on http://termeval.ugent.be.
Submissions and publications
The workshop submissions are open to different approaches, ranging from term extraction in various languages (using verb co-occurrence, information theoretic approaches, machine learning, etc.), translation pairs extracting from bilingual corpora based on terminology, up to semantic oriented approaches and theoretical aspects of terminology.
We encourage authors to submit their research work related to various aspects of computational terminology, such as mentioned in this call. Special interest is dedicated to terminology evolution and neologisms in specialized domains.
The workshop authors will be proposed to submit an extended version of their work to a special issue of an international journal or of a book collection.
Details of the submission procedure is given in the submission page.
Identify, describe and share your LRs!
- Describing your LRs in the LRE Map is now a normal practice in the submission procedure of LREC (introduced in 2010 and adopted by other conferences). To continue the efforts initiated at LREC 2014 about “Sharing LRs” (data, tools, web-services, etc.), authors will have the possibility, when submitting a paper, to upload LRs in a special LREC repository. This effort of sharing LRs, linked to the LRE Map for their description, may become a new “regular” feature for conferences in our field, thus contributing to creating a common repository where everyone can deposit and share data.
- As scientific work requires accurate citations of referenced work so as to allow the community to understand the whole context and also replicate the experiments conducted by other researchers, LREC 2020 endorses the need to uniquely Identify LRs through the use of the International Standard Language Resource Number (ISLRN, www.islrn.org), a Persistent Unique Identifier to be assigned to each Language Resource. The assignment of ISLRNs to LRs cited in LREC papers will be offered at submission time.