Text mining and its applications

Much of my research, in particular of my funded research, is concerned with the application of CL/NLP methods to practical problems, such as detecting deceptive language, using NLP to support human rights organizations, using NLP to support medical research, or managing big data in digital libraries or in social media. Many of these projects are collaborations with Udo Kruschwitz.

Deception detection using stylometric methods

Methods for discovering whether assertions are reliable or deceptive could have a variety of applications, e.g., in court, or to assess the reliability of online reviews. In collaboration with my former PhD student Tommaso Fornaciari I have studied the application of stylometric techniques--techniques that use statistics about the occurrence of function words in order to determine psychological traits of the author of a text--to evaluate the reliability of witness statements in court (Fornaciari and Poesio, 2013) and of Amazon reviews (Fornaciari and Poesio, 2014). In collaboration with another former PhD student, Fabio Celli, we have also explored the use of personality identification techniques for this task (Fornaciari et al, 2013).

This work has been supported by the development of two datasets: Deceptive Statements in Italian Courts (DECOUR), and Deception in REViews (DEREV), both of which are publically available.

Social media mining and NLP support for human rights work

Social media are a very rich source of information of different kinds. Many of my current text mining projects are concerned with extracting information from social media.

An area in which we have been particularly active is the use of social media mining in support of the work of human rights organizations. In a KTP project with Minority Rights Group we contributed to developing the Ceasefire Iraq platform to support grass-root reporting on human rights abuse. Part of this work was the development of an Arabic social media monitoring system that identifies possible reports of human rights abuse in Twitter (Alhelbawy et al, 2016; Alhelbawy et al, submitted).

This work has continued through the Human Rights, Big Data and Technology Project. In this project we are collaborating with the UN High Commissioner for Refugees (UNHCR) to develop methods for predicting refugee crisis using a combination of NLP and computer vision methods.

Medical applications of NLP and early diagnosis of mental health issues

The use of NLP methods to support medical research in general and for early diagnosis of mental health issues in particular is one of the main areas of research of the Cognitive Science Research Group at EECS.

My own research on these topics started several years ago with work in collaboration with the University of Essex dept. of Biology on information extraction from medical text. With my then PhD student Olivia Sanchez Graillet I investigated in particular the role of semantic interpretation and anaphoric interpretation on relation extraction in these domains. In (Sanchez-Graillet and Poesio, 2007a) we studied the effect of negation detection on extraction of protein-protein interaction. In (Sanchez-Graillet and Poesio, 2007b) we tried to identify conflicting claims about protein-protein interaction in the literature. Earlier, in (Sanchez-Graillet et al, 2006), we looked at the effect of anaphoric interpretation on the task. We also explored the issue of deidentification / anonymization, in collaboration with the University of Essex's Data Archive (Poesio et al, 2006).

More recently, I started to work in the area of mental health diagnosis, focusing again on using semantic and discourse information (in particular, animacy detection) for this purpose. A number of studies have indicated that Alzheimer's patients' language becomes progressively more concrete, whereas depressed patients' language becomes more abstract. My PhD student Kevin Glover developed the Genitive Ratio, an approach to assessing the degree of abstractness or concreteness of a text (Glover, 2017). Evaluations of this method on texts produced by patients diagnosed as having those illnesses suggest that the GR might be successful at monitoring the prognosis of both illnesses, facilitating timely clinical interventions.

Our group is part of a consortium that was recently awarded a Wellcome Trust 4yr PhD Programme in Health Data in Practice.

Text mining in the Digital Humanities

Another rich source of textual data is represented by digital libraries. In the GALATEAS project we applied information extraction techniques to analyze query logs. In an ongoing collaboration with the Bagolini Archaeological Lab from the University of Trento, we have been developing NER techniques to facilitate the upload and search of scholarly articles in Archaeology. More recently, we have in particular focused on the use of active learning methods to imnprove the quality of our mining methods.

Projects (in inverse chronological order)

  • Human Rights in the Era of Big Data and Technology (2016-21), funded by ESRC. The premise of the project is that the use of big data may offer unprecedented opportunities to secure the fulfilment of human rights, but equally, its misuse may interfere with the enjoyment and protection of human rights.
  • Improving Reporting of Human Right Abuses through Arabic Social Media (2014-17), a KTP collaboration between Essex and Minority Rights Group funded by Innovate UK.
  • SENSEI (2013-16), a EU project on using discourse to summarize online conversations.
  • GALATEAS (2009-12), a EU project on using text mining to analyze query logs.
  • LiveMemories (2006-2010), a large project on using text mining to support creation of shared knowledge funded by the Provincia di Trento.

Main publications