The Sheffield Special Interest Group in Multiword Expressions (MWEs) focusses on methods and processes that deal with handling MWEs in NLP
Despite the significant gains in NLP, MWEs are not handled effectively (Yu and Ettinger, 2020; Garcia et al., 2021; Tayyar Madabushi et al., 2021)
Read more about our tutorial that will be delivered at both LREC 2022 and COLING 2022: Psychological, Cognitive and Linguistic BERTology: An Idiomatic Multiword Expression Perspective
Members of the team organised Task 2 at SemEval 2022 focused on Multilingual Idiomaticity Detection and Sentence Embedding. You can read more about the task on the task website.
Lexical semantics, multilinguality, and cognitively motivated NLP. This work includes techniques for Multiword Expression treatment using statistical methods and distributional semantic models, and applications like Text Simplification and Question Answering.
Online content verification (misinformation detection), Personalised NLP, Text simplification, Machine Translation, Quality estimation of machine translation, Document-level evaluation of NLP tasks outputs
PhD Student (2020)
Cross-Domain Idiomatic Multiword Representations for Natural Language Processing
PhD Student (2021)
Biomedical NLP, multiword expressions, deep neural networks, efficient NLP, language models
PhD Student (2021)
Computational Linguistics, multi-word expressions, domain adaptation and robustness, emojis and rapidly-evolving language, fairness, equality and sustainability.
Emily Ip Computer Science with Speech and Language Processing
Yutong Gu Advanced Computer Science
Darshan Adiga Haniya Narayana Computer Science with Speech and Language Processing
Yihua Huang Data Analytics
Meifang Li Advanced Computer Science
Rui Li Advanced Computer Science
Mohammed Yaseen Maniyar Advanced Computer Science
Phelps, D., Fan, X.R., Gow-Smith, E., Tayyar Madabushi, H., Scarton, C. and Villavicencio, A., 2022. Sample Efficient Approaches for Idiomaticity Detection. In Proceedings of the the 13th Edition of its Language Resources and Evaluation Conference.(LREC2022). Association for Computational Linguistics.
Phelps, D., 2022. drsphelps at SemEval-2022 Task 2: Learning idiom representations using BERTRAM. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval2022). Association for Computational Linguistics.
Tayyar Madabushi, H., Gow-Smith, E., Garcia, M., Scarton, C., Idiart, M. and Villavicencio, A., 2022. SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval2022). Association for Computational Linguistics.
Gow-Smith, E., Tayyar Madabushi, H., Scarton, C. and Villavicencio, A., 2022. Improving Tokenisation by Alternative Treatment of Spaces. arXiv preprint arXiv:2204.04058.
Tayyar Madabushi, H., Gow-Smith, E., Scarton, C. and Villavicencio, A., 2021. AStitchInLanguageModels: dataset and methods for the exploration of idiomaticity in pre-trained language models. In Findings of the Association for Computational Linguistics: EMNLP 2021 (pp. 3464-3477). Association for Computational Linguistics.
Vickers, P., Wainwright, R., Tayyar Madabushi, H. and Villavicencio, A., 2021, June. CogNLP-Sheffield at CMCL 2021 Shared Task: Blending Cognitively Inspired Features with Transformer-based Language Models for Predicting Eye Tracking Patterns. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (pp. 125-133).
The Sheffield SIG-MWE group is partly supported by the EPSRC grant "Modeling Idiomaticity in Human and Artificial Language Processing" (EP/T02450X/1).
This project is aimed at developing computational models with the ability to recognize and accurately process idiomatic (non-literal) language that are linguistically motivated and cognitively-inspired by human processing data. Equipping models with the ability to process idiomatic expressions is particularly important for obtaining more accurate representations as these can lead to gains in downstream tasks, such as machine translation and text simplification. The originality of this work is in integrating linguistic and cognitive clues about human idiomatic language processing in the construction of models for word and phrase representations, and in integrating them in downstream tasks.