I am a computational linguist - a cognitive scientist who studies language and knowledge using computational methods. More specifically, I would describe myself as as data-driven formal semanticist: I use evidence from corpora, psychology and the neural sciences, analyzed using statistical and machine learning methods, to test hypothesis about semantic and pragmatic interpretation formulated in mathematically precise ways, and / or to develop new such theories. For instance, my work on anaphoric reference (also called coreference in NLP)--e.g., on ambiguous anaphoric expressions, on salience, on bridging reference, and on reference to abstract objects--has been driven by behavioral experiments and by the analysis of corpora (which, in most cases, we created) and of disagreements in corpus annotation--most recently, using Games-With-A-Purpose (GWAPs) such as Phrase Detectives , TileAttack!, Wormingo/Lingotorium, WordClicker and Lingotowns to collect such data. My work on the organization and acquisition of conceptual knowledge iinvolves using machine learning techniques to acquire evidence about commonsense and lexical knowledge from corpora and brain data. I am also involved in a number of projects applying NLP methods to real-world issues including deception detection or offensive language detection online. Finally, I am very interested in working with other languages, particularly Arabic and Italian but I've also worked on Basque, Bengal, Chinese, Hindi and Japanese. For up-to-date information on my research see my profiles on Google Scholar and on Research Gate.
I currently hold two appointments. In the UK, I am a full professor in Computational Linguistics at the School of Electronic Engineering and Computer Science, Queen Mary University of London, and a member of the University's Cognitive Science and Games and AI research groups. I am also a Fellow of the Turing Institute, a supervisor in the IGGI Doctoral training centre in Intelligent Games and Game Intelligence and the Wellcome Trust's PhD programme in Health Data in Practice. My official QMUL page is here. In the Netherlands, I am a full professor of Natural Language Understanding in the Natural Language Processing Group within the Department of Information and Computing Science at the University of Utrecht. My official UU page is here.
My main current project in the UK is the ARCIDUCA project, funded by EPSRC. ARCIDUCA is about improving conversational agents in general and in particular their ability to interpret coreference and reference in dialogue by embedding them in online gaming platforms such as Light or Minecraft, in which they can interact with people to get feedback and ask clarification questions. In the Netherlands, my main current project is the Dealing with Meaning Variation project in NLP, funded by NWO. The objective of this project is to carry out fundamental as well as applicable research on meaning variation in NLP along several dimensions of variation, exploring the interconnections between them and the implications for NLP research and applications. Both ARCIDUCA and Dealing with Meaning Variation follow-up on the DALI project funded by the European Research Council (ERC), which investigated Disagreements in Anaphora and Language Interpretation. In this project we developed games-with-a-purpose such as Phrase Detectives, Tile Attack and Wormingo and used them to collect large datasets of judgments about anaphoric interpretation, which we then used to study anaphora and develop models of anaphora resolution. Please visit the project's page for publications and updates, or follow our YouTube Channel on Games and NLP. Through DALI, we are also collaborating in the effort of building the LingoBoingo portal of games-with-a-purpose for creating linguistic resources.
Other past projects include the SENSEI project on using discourse information to support summarization of conversations including online forums; the Concepts in Brain and Language project in collaboration with the University of Trento, devoted to studing conceptual representations by using a combination of brain imaging and techniques for acquiring concepts from corpora, with its spinoffs ADAM and Deep Relations; the Deception in Text project with Tommaso Fornaciari, on detecting deceptive reviews; several projects on using NLP to support detecting human rights violations, including the Human Rights, Big Data and Technology Project at the University of Essex and a KTP with Minority Rights Group on human rights violations in Iraq; the Brain and Emotions project, also in collaboration with Trento, on studying emotions using brain data; ARRAU, on studying difficult cases of anaphora; the GALATEAS EU project on using HLT techniques to facilitate the analysis of query logs in digital libraries; the 2007 Johns Hopkins workshop ELERFED (using lexical and encyclopedic knowledge for entity disambiguation), which led to the development of the BART toolkit; GNOME (generating referring expressions); and LiveMemories (using information extraction to help sharing knowledge).
I am happy to supervise tMSc- and PhD-level projects in Conversational AI, Anaphora / Coreference, Games-With-A-Purpose or other forms of crowdsourcing, deception detection, and Arabic NLP. My research and my ongoing and past projects are described in more detail here.
I am a co-founder and have been Associate Editor of Dialogue and Discourse since its foundation. I am also co-editor of the Computational and Mathematical section of Language and Linguistics Compass.
Earlier on, I was at the University of Essex, School of Computer Science and Electronic Engineering, and a member of Essex University's Language and Computation group. Before that, I was at the University of Trento, Center for Mind and Brain Sciences, where I started the CLIC Lab.