Prof. Lonneke van der Plas, Assoc. Professor at USI

Intro

Prof. Lonneke van der Plas

Institute of Argumentation, Linguistics and Semiotics

Dalle Molle Institute for Artificial Intelligence

------

Faculty of Communication, Culture and Society

Faculty of Informatics

USI Università della Svizzera italiana, Lugano, Switzerland

Office: Palazzo principale, Ufficio 258 (Livello P2), Via Buffi 13, 6900 Lugano

Since October 2024, I am associate professor in NLP at USI at the Institute of Argumentation, Linguistics and Semiotics, and adjunct professor at the Faculty of Informatics. I am leading the USI NLP Research Lab which I started after I left the Idiap Research Institute in Martigny. I was associate professor at the University of Malta from 2014-2020. Before that, I was junior professor at the Institute for Natural Language Processing (IMS), University of Stuttgart, where I lead a research group (see below) in the framework of the SFB 732, a collaborative research centre that brings computational linguists and linguists together. I did a post-doc (maître-assistante) at the University of Geneva working in the field of cross-lingual transfer of semantic role labelling as part of the CLASSiC project. I earned my PhD from the University of Groningen (Department of Humanities Computing), where I worked on vector-based word representations for medical question answering within the Alfa-Informatica group. Before that, I did the M.Phil Computer Speech and Language Processing at the University of Cambridge. The M.Phil has now been renamed into Computer Speech, Text and Internet Technology.

Research in the lab is highly interdisciplinary, including collaborations with social and cognitive scientists, linguists, life sciences, health, finance and business. I have been working on the following subjects: computational creativity, cross-lingual natural language processing, vector-based representations of language, (medical) terminology extraction, (medical) question answering, semantic role labelling, low-resource languages.

Visiting fellowships:

I was a DSI fellow at the University of Zurich from October 2019-January 2020

I was a Erasmus Mundus LCT visiting scholar at the Shanghai Jiao Tong University (SJTU) and at the School of Computing and Information Systems of the University of Melbourne in June, July 2016.

I was a visiting academic at the Division of Information and Communication Sciences of Macquarie University, Sydney from January till March 2007

Current main projects:

NCCR Evolving Language

SNSF (2024-2028, principle investigator)

SNSF (2020-2023, associate investigator)

The Swiss National Centre of Competence in Research (NCCR) Evolving Language is a nationwide interdisciplinary research consortium bringing together research groups from the humanities, from language and computer science, the social sciences, and the natural sciences at an unprecedented level. Together, we aim at solving one of humanity’s great mysteries: What is language? How did our species develop the capacity for linguistic expression, for processing language in the brain, and for consistently passing down new variations to the next generation? How will our capacity for language change in the face of digital communication and neuroengineering?

Together with Lena Jäger and Andrea Migliano from the University of Zurich we will lead a project on lexical innovation.

PhD student: Diego Rossini (USI)

C-LING (sole applicant)

SNSF 2022-2026

This project aims to create computational models of language as a tool for creative thinking. We will extract statistical patterns from large text corpora to inform these models as well as structured knowledge bases. We aim to generate new concepts by means of comparing statistical patterns in large text corpora from different cultures and domains, just like a person may get new ideas from travelling and collaborating with people from different backgrounds. At the same time, we want to go one step further by looking at more complex constructs such as new ideas, for example scientific discoveries. For such complex constructs statistical patterns alone will not suffice and structural knowledge will be added to our models.

PhD students:

Molly Petersen (EPFL)

Mete Ismayilzada (EPFL)

AI2Pub (co-applicant)

SNSF Agora 2025-2028

Artificial intelligence (AI) has become a powerful and pervasive technology, influencing many aspects of our daily lives. However, its rapid growth and integration into society raise complex questions and concerns. Our team of scientists and communication experts are committed to improving public understanding of AI technologies, fostering a positive societal impact.Building on the foundations of our previous project, NewsOnAI, we will go beyond traditional communication channels such as workshops and media publications. Our proactive engagement efforts will establish dynamic, two-way communication between scientists and the public, utilizing new formats like interactive exhibitions and theater performances.Analyzing feedback from these initiatives will enable scientists to pursue research directions that effectively address societal concerns. Over time, we anticipate that our efforts will create a multiplying social impact, fostering informed public discourse and a deeper understanding of AI technologies.

ORIENTER (co-applicant)

SNSF SPIRIT 2025-2029

This project aims to investigate what aspects of patients’ language and behaviour can be effectively and efficiently modelled by very recent AI techniques in the diagnostic construct of depression disorders, considering relevant demographic variables such as cultural background (native language) and gender.

Past projects:

SEM24

Innosuisse 2023-2025 (PI)

SEM24 focuses on skill extraction in multiple languages from resumes and job ads that incorporates insights from the fields of HRM and NLP. We work with real-world data in collaboration with the EHL business school and ARCA24. The semantic engine will identify a wider range of important skills, including soft skills, improving the matching quality, mitigating bias and significantly reducing manual labor.

Post-doc:

Laura Vasquez

Developer:

Samuel Michel

Transfer manager:

Alexandre Nanchen

FactCheck

Hasler Foundation (2024, co-PI)

In this project we plan to build a dataset and implement methods for multi-modal fact checking

Postdoc: Michiel van der Meer

LT-BRIDGE

H2020-WIDESPREAD (2021–2024, main applicant/coordinator)

Bridging the technology gap: Integrating Malta into European Research and Innovation efforts for AI-based language technologies.

UPSKILLS (UPgrading the SKIlls of Linguistics and Language Students)

EU Erasmus+ (2021-2023, main applicant/coordinator)

The central goal of this project is to tackle the identified skills gaps and mismatches in linguistics and language students through supporting the development of innovative materials that better meet the learning outcomes needed in the current job market, and this in collaboration with language technology companies.

MUFINS (NLP-Driven, MUltilingual FInancial News and content Search)

Malta Enterprise Research grant (2020-2022, co-applicant)

This project aims to investigate transfer learning techniques to resolve a variety of NLP tasks in multiple languages, within the financial and news domains. This project is led in collaboration with CityFalcon Trading Ltd.

Postdoc:

Marc Tanti

MASRI (Maltese Automatic Speech Recognition, https://www.um.edu.mt/projects/masri/)

UM Research Fund (2018-2020, co-applicant)

This project aims to build a automatic speech recogniser for Maltese, a low-resource language, using various bootstrapping approaches.

SFB 732 project D11: A crosslingual approach to the analysis of compound nouns

DFG (2014-2018, main applicant)

This project tries to bridge the gap between computational linguistics and theoretical linguistics by using linguistically-informed models and explicitly testing hypotheses stemming from Linguistics literature. It proposes a compositional approach to noun-noun (N-N) compound analysis with an interdependent three-level model that comprises compound splitting, capturing the meaning of the components and the covert relation that holds between them. We used multi-lingual data throughout the project, in analysis and evaluation, and followed a language-independent approach, using automatic, knowledge-lean, data-driven methods.

PhD students:

Stefan Müller

Patrick Ziering

Collaborations with Gianina Iordachioaia (Institute for English Linguistics)

CLASSiC project: Cross-lingual semantic annotation transfer from English to French

EC FP7 (2008-2011, post-doc)

In the CLASSiC project (Computational Learning in Adaptive Systems for Spoken Conversation) we are focusing on semantic role labeling for French and in particular on methods to automatically generate semantic annotations for French. Syntactic annotation is available for French, but no semantic information. Since there is semantic annotation available for English and there are parallel corpora for the language pair English-French, we transfer the semantic annotation from English to French translations using word alignments. Contrary to previous work (Padó and Pitel, TALN 2007; Padó and Lapata, Comp. Ling. 2009; Basili et al. CICLing 2009), we did not use an ontology constructed for the target language. We want to minimize the amount of manual labour and aim for broad coverage annotations. We used the PropBank annotation framework constructed for English to annotate French sentences, after having tested the cross-lingual validity of PropBank (Van der Plas et al., LAW 2010). Because we know that there is a high correlation between syntax and semantics (see also Merlo and Van der Plas, ACL 2009), we leveraged the information contained in the syntactic annotations in a second step. In this step we trained a syntactic-semantic parser on the combination of syntactic annotations and the semantic annotations resulting from transfer. The automatically generated semantic annotations for French are close to the upper bound from manual annotations (Van der Plas et al., ACL 2011).

Watch a video of the CLASSiC system.

PhD project: Automatic lexico-semantic acquisition for question answering (NWO IMIX: 2003-2008, PhD student)

(Promotor: John Nerbonne, co-promotor: Gosse Bouma)

Freedom and liberty share the same meaning. Paris denotes a city, and the word party triggers associations of wine and fun for many. People naturally acquire these lexico-semantic relations such as synonyms, categorised named entities, and associations by using language in their daily life.

For many natural language processing applications, such as question answering, this type of information is essential, e.g. to recognise that a particular meaning can be inferred from different text variants or to compensate for the lack of general world knowledge.

This thesis proposes three methods for using large text corpora to acquire lexico-semantic information automatically: a syntax-based method, a multilingual word-alignment-based method and a proximity-based method. The three methods complement each other in the type of data needed, the way they deal with sparse data and most importantly, in the types of lexico-semantic information they provide. This information is then applied to the Groningen question answering system Joost. Among the different types of lexico-semantic information acquired, categorised named entities, e.g. Paris denotes a city, improved the system the most and this information was obtained with the syntax-based method.

Side projects:

In my spare time, I like to draw and paint: https://flic.kr/s/aHsj9J6viw, go hiking, swimming, and eat good food.

Page updated

Google Sites

Report abuse