PROGRAMME

The book of abstracts is now available in a draft version (it may take a while to load). This way for the final version errors can be corrected and abstracts of canceled presentations can be removed. Conference participants are encouraged to report any errors to the (virtual) conference helpdesk or to lingcoll (at) gmx (dot) de.

List of participants (draft as of Nov. 25, 2020)

Below you find links to the conference programme. It will be continuously updated as necessary. Presenters: Please check your schedule about two hours before your presentation starts as minor shifts may still occur. We have done our best to take authors' time zones and language requirements into account. Please report errors to lingcoll (at) gmx (dot) de

English and German sessions have different background colors. Abstracts can be located by looking up the name of the first author (highlighted in bold face) in the book of abstracts, where contributions are sorted by first author.

Abstracts of Invited Keynote Presentations

Nov. 26, 2020

Christopher Manning, Stanford University, Departments of Linguistics and Computer Science, USA

Empirical Perspectives on Human Language, Its Structure, Acquisition, and Interpretation

[click here to see abstract]

I will discuss what we have learned from about 30 years of empirical, statistical, computationally-based research into human language, its structure, acquisition, and interpretation. On the one hand, human language is a paradigmatic example of a categorical system and a rule-based system. On the other hand, human language is a squishy, variable, and changing thing, and the data we get is changing statistical patterns of actual human language use. Recent decades have seen the development of many language corpora and formal and computational tools for building models of language from them. In what ways do and don’t these models give insight into the structure, acquisition, and interpretation of human language? How has the perspective changed with the progression from corpus-based NLP to machine learning models, and now deep learning or neural models of language?

Nov. 27, 2020

Yannis Ioannidis, Athena R.C., Athens, Greece

SciTopix: A Multi-Lens on Topic Modeling

[click here to see abstract]

In this presentation,I will outline the main characteristics of SciTopix, an extensible platform that addresses the problem of analyzing a corpus of documents, both its raw text as well as particular internal and external metadata, to extract the topics dealt with in the corpus overall and annotate each document with the topics of its concern. Its primary focus is scientific corpora: it collects and analyzes scientific publications, patents and other related information as well as varied additional side or extracted information (e.g., authors, venues, grants, semantic annotations, bio-entities) and links (e.g., citation network) aiming to alleviate the impact of information overload that a large corpus may create. SciTopic combines advance topic modelling, natural language processing (NLP) and visualization techniques to accurately identify and present underlying thematic information (i.e., topics) that correspond to and characterize hidden overlapping entity clusters. Generated topics serve as the means to draw connections between scientific areas, scientific concepts, people, organizations, funding sources, specialized terms, etc. both within and across each category, and both within and across scientific disciplines and technical domains. SciTopix, is already in operation in OpenAIRE (a key EC e-Infrastructure supporting open access and open science in scholarly communication) and has repeatedly been used productively.

Nov. 28, 2020

Jan Hajič, Charles University, Prague, Czech Republic

SynSemClass and Parallel Dependency Treebank Annotation

[click here to see abstract]

At the Institute of Formal and Applied Linguistics at Charles University in Prague, treebank creation and rich annotation has a long tradition, and the underlying dependency-oriented, deep syntactic theory (the Functional Generative Description, or FGD) is almost 60 years old now. There is now a family of Prague Dependency Treebanks, mostly for Czech, but one of them is parallel: the Prague Czech-English Dependency Treebank (PCEDT), published in full in 2012 (and extended, in several respects, later). The annotation scheme consists (for written texts) of four layers: the original plain text, morphological annotation layer, dependency syntax layer, and the deep syntactic layer, called also “tectogrammatical”, borrowing the term from the underlying FGD theory. The tectogrammatical layer annotation revolves around verbs, which are the core (root) elements of clauses, which in turn connect to form sentences. Valency information about the verbs is implicitly present in the dependency annotation, and explicitly in the accompanying valency lexicons (in case of PCEDT, both for Czech and English). Recently, a new project has been started to create an ontology of event types suitable for a truly semantic annotation (or “knowledge graph” type of annotation), called SynSemClass, where verbs (or more precisely, verb senses) are grouped together into synonym classes, assigned semantic roles – and carefully linked back to the valency lexicons and other external resources. Such a linking facilitates future annotation, especially on top of the existing one, with the new resources. In the talk, the SynSemClass project, the PCEDT and the first experiments with its annotation will be presented.

Nov. 28, 2020

Philipp Koehn, Johns Hopkins University, USA

What can Linguistics Teach Machine Translation? What can Linguistics Learn from Machine Translation?

[click here to see abstract]

Building machine translation systems has undergone many changes in approaches and refinements over its many decades of research and development. While over the last twenty years the use of data has trumped the use of linguistic insight, building linguistically motivated models of translation has been a constant undercurrent of this work. When statistical machine translation arrived with word-based models, they slowly incorporated ideas from morphology, syntax and even deeper semantics. Now, with the turn to neural machine translation, the field has reverted to linguistic nihilism. Nevertheless, there have been efforts to arm even these models with linguistic principles. Also, neural translation models have been probed to see if they discover linguistic concepts, and what linguistic challenges they fail to address and what inherent biases they may incorporate.