10th June 2022
The MWE workshop will be hybrid and take place in Ada Lovelace at Regents Court and online.
9:30 am - 9:50 am
Arrival and set up
9:50 am - 10:00 am
Introduction to the MWE Group
10:00 am - 11:00 am
Title: Towards Language Technology for a Truly Multilingual World?
Abstract:
Language technology tools such as Google Translate or virtual assistants (Siri, Alexa) were components of collective SciFi-inspired imagination not many years ago. Today, they are an essential driver of the digital AI transformation, used by hundreds of millions of people. A key challenge in multilingual NLP is developing general language-independent architectures that will be equally applicable to any language. However, this ambition is hindered by the large variation in 1) structural and semantic properties of the world’s languages, as well as 2) raw and task data scarcity for many different languages, tasks, and application domains. As a consequence, existing language technology is still largely limited to a handful of resource-rich languages, leaving the vast majority of the world’s 7,000+ languages and their speakers behind, thus amplifying the problem of the “digital language divide”. In this talk, I will introduce and discuss the importance of addressing multilingualism and bringing language technology also to minor and low-resource languages and communities. I will introduce some recent techniques, breakthroughs and lessons learned that aim to deal with such large cross-language variations and low-data learning regimes. I will also demonstrate that low-resource languages, despite very positive research trends and results achieved in recent years, still lag behind major languages in terms of performance, resources, overall representation in NLP/IR research and other key aspects, and will outline several crucial challenges for future research in this vibrant NLP area.
11:00 am - 11:30 am
SemEval 2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding
Harish Tayyar Madabushi, Edward Gow-Smith
11:30 am - 12:00 pm
MWE Representations: Methods and Data Efficiency
Dylan Phelps
12:00 pm - 12:30 pm
Idiomaticity Detection in the context of Multiword Expressions and Language Models
Joanne Boisson
(Cardiff University)
12:30 pm - 1:00 pm
Improving Tokenisation by Alternative Treatment of Spaces
Edward Gow-Smith
1:00 pm - 2:00 pm
Lunch, Discussion and Posters
2:00 pm - 3:00 pm
Brainstorming and project planning
3:00 pm - 4:00 pm
Title: Do the eyes mirror the mind? A view from language processing.
Abstract:
Language provides a variety of ways to express events. To describe a scene we can say “The nurse vaccinated the policeman.” or, alternatively, “The policeman was vaccinated by the nurse.”.
There is nothing much in the scene itself that would force us to use one or the other description. There seems to be ‘something in us’, however, that has a preference: we can decide to foreground the nurse or the policeman to satisfy our communicative needs. Thus, the truly exciting question is: does this have any effect on our conversation partner? Does our choice affect how others view the situations that we describe?
We investigated this question by tracking the eyes of native speakers of English while they looked at static scenes after they had heard it described in one of two possible ways. We found (among many other interesting things) that the description of a scene does affect how it is viewed: it either changes the order in which the participants of a scene are accessed or it changes the amount of attention each participant receives.
4:00 pm - 4:30 pm
Harish Tayyar Madabushi
4:30 pm - 5:15pm
MSc Student projects