Sheffield NLP Special Interest Group in MWEs

Sheffield MWE Workshop: 10 JUNE 2022

Introduction and Motivation

The Sheffield Special Interest Group in Multiword Expressions (MWEs) focusses on methods and processes that deal with handling MWEs in NLP

Despite the significant gains in NLP, MWEs are not handled effectively (Yu and Ettinger, 2020; Garcia et al., 2021; Tayyar Madabushi et al., 2021)

Projects

LREC 2022 and COLING 2022 Tutorials: Psychological, Cognitive and Linguistic BERTology: An Idiomatic Multiword Expression Perspective

Read more about our tutorial that will be delivered at both LREC 2022 and COLING 2022: Psychological, Cognitive and Linguistic BERTology: An Idiomatic Multiword Expression Perspective

SemEval 2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding

Members of the team organised Task 2 at SemEval 2022 focused on Multilingual Idiomaticity Detection and Sentence Embedding. You can read more about the task on the task website.

People

Project Leads

Aline Villavicencio

Lexical semantics, multilinguality, and cognitively motivated NLP. This work includes techniques for Multiword Expression treatment using statistical methods and distributional semantic models, and applications like Text Simplification and Question Answering.

https://sites.google.com/view/alinev

Carolina Scarton

Online content verification (misinformation detection), Personalised NLP, Text simplification, Machine Translation, Quality estimation of machine translation, Document-level evaluation of NLP tasks outputs

https://carolscarton.github.io/

Project Members

Edward Gow-Smith

PhD Student (2020)

Cross-Domain Idiomatic Multiword Representations for Natural Language Processing

Dylan Phelps

PhD Student (2021)

Biomedical NLP, multiword expressions, deep neural networks, efficient NLP, language models

Tom Pickard

PhD Student (2021)

Computational Linguistics, multi-word expressions, domain adaptation and robustness, emojis and rapidly-evolving language, fairness, equality and sustainability.

MSc Students

Emily Ip Computer Science with Speech and Language Processing

Yutong Gu Advanced Computer Science

Darshan Adiga Haniya Narayana Computer Science with Speech and Language Processing

Yihua Huang Data Analytics

Meifang Li Advanced Computer Science

Rui Li Advanced Computer Science

Mohammed Yaseen Maniyar Advanced Computer Science

Publications

Phelps, D., Fan, X.R., Gow-Smith, E., Tayyar Madabushi, H., Scarton, C. and Villavicencio, A., 2022. Sample Efficient Approaches for Idiomaticity Detection. In Proceedings of the the 13th Edition of its Language Resources and Evaluation Conference.(LREC2022). Association for Computational Linguistics.

Phelps, D., 2022. drsphelps at SemEval-2022 Task 2: Learning idiom representations using BERTRAM. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval2022). Association for Computational Linguistics.

Tayyar Madabushi, H., Gow-Smith, E., Garcia, M., Scarton, C., Idiart, M. and Villavicencio, A., 2022. SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval2022). Association for Computational Linguistics.

Gow-Smith, E., Tayyar Madabushi, H., Scarton, C. and Villavicencio, A., 2022. Improving Tokenisation by Alternative Treatment of Spaces. arXiv preprint arXiv:2204.04058.

Tayyar Madabushi, H., Gow-Smith, E., Scarton, C. and Villavicencio, A., 2021. AStitchInLanguageModels: dataset and methods for the exploration of idiomaticity in pre-trained language models. In Findings of the Association for Computational Linguistics: EMNLP 2021 (pp. 3464-3477). Association for Computational Linguistics.

Vickers, P., Wainwright, R., Tayyar Madabushi, H. and Villavicencio, A., 2021, June. CogNLP-Sheffield at CMCL 2021 Shared Task: Blending Cognitively Inspired Features with Transformer-based Language Models for Predicting Eye Tracking Patterns. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (pp. 125-133).

Funding

The Sheffield SIG-MWE group is partly supported by the EPSRC grant "Modeling Idiomaticity in Human and Artificial Language Processing" (EP/T02450X/1).

This project is aimed at developing computational models with the ability to recognize and accurately process idiomatic (non-literal) language that are linguistically motivated and cognitively-inspired by human processing data. Equipping models with the ability to process idiomatic expressions is particularly important for obtaining more accurate representations as these can lead to gains in downstream tasks, such as machine translation and text simplification. The originality of this work is in integrating linguistic and cognitive clues about human idiomatic language processing in the construction of models for word and phrase representations, and in integrating them in downstream tasks.

Page updated

Report abuse