Text mining and its applications
Much of my research, in particular of my funded research, is concerned with the application of CL/NLP methods to practical problems, such as detecting deceptive language, using NLP to support human rights organizations, using NLP to support medical research, or managing big data in digital libraries or in social media. Many of these projects are collaborations with Udo Kruschwitz.
Deception detection using stylometric methods
Methods for discovering whether assertions are reliable or deceptive could have a variety of applications, e.g., in court, or to assess the reliability of online reviews. In collaboration with my former PhD student Tommaso Fornaciari I have studied the application of stylometric techniques--techniques that use statistics about the occurrence of function words in order to determine psychological traits of the author of a text--to evaluate the reliability of witness statements in court (Fornaciari and Poesio, 2013) and of Amazon reviews (Fornaciari and Poesio, 2014). In collaboration with another former PhD student, Fabio Celli, we have also explored the use of personality identification techniques for this task (Fornaciari et al, 2013).
This work has been supported by the development of two datasets: Deceptive Statements in Italian Courts (DECOUR), and Deception in REViews (DEREV), both of which are publically available.
Social media mining and NLP support for human rights work
Social media are a very rich source of information of different kinds. Many of my current text mining projects are concerned with extracting information from social media.
An area in which we have been particularly active is the use of social media mining in support of the work of human rights organizations. In a KTP project with Minority Rights Group we contributed to developing the Ceasefire Iraq platform to support grass-root reporting on human rights abuse. Part of this work was the development of an Arabic social media monitoring system that identifies possible reports of human rights abuse in Twitter (Alhelbawy et al, 2016; Alhelbawy et al, submitted).
This work has continued through the Human Rights, Big Data and Technology Project. In this project we are collaborating with the UN High Commissioner for Refugees (UNHCR) to develop methods for predicting refugee crisis using a combination of NLP and computer vision methods.
Medical applications of NLP and early diagnosis of mental health issues
The use of NLP methods to support medical research in general and for early diagnosis of mental health issues in particular is one of the main areas of research of the Cognitive Science Research Group at EECS.
My own research on these topics started several years ago with work in collaboration with the University of Essex dept. of Biology on information extraction from medical text. With my then PhD student Olivia Sanchez Graillet I investigated in particular the role of semantic interpretation and anaphoric interpretation on relation extraction in these domains. In (Sanchez-Graillet and Poesio, 2007a) we studied the effect of negation detection on extraction of protein-protein interaction. In (Sanchez-Graillet and Poesio, 2007b) we tried to identify conflicting claims about protein-protein interaction in the literature. Earlier, in (Sanchez-Graillet et al, 2006), we looked at the effect of anaphoric interpretation on the task. We also explored the issue of deidentification / anonymization, in collaboration with the University of Essex's Data Archive (Poesio et al, 2006).
More recently, I started to work in the area of mental health diagnosis, focusing again on using semantic and discourse information (in particular, animacy detection) for this purpose. A number of studies have indicated that Alzheimer's patients' language becomes progressively more concrete, whereas depressed patients' language becomes more abstract. My PhD student Kevin Glover developed the Genitive Ratio, an approach to assessing the degree of abstractness or concreteness of a text (Glover, 2017). Evaluations of this method on texts produced by patients diagnosed as having those illnesses suggest that the GR might be successful at monitoring the prognosis of both illnesses, facilitating timely clinical interventions.
Our group is part of a consortium that was recently awarded a Wellcome Trust 4yr PhD Programme in Health Data in Practice.
Text mining in the Digital Humanities
Another rich source of textual data is represented by digital libraries. In the GALATEAS project we applied information extraction techniques to analyze query logs. In an ongoing collaboration with the Bagolini Archaeological Lab from the University of Trento, we have been developing NER techniques to facilitate the upload and search of scholarly articles in Archaeology. More recently, we have in particular focused on the use of active learning methods to imnprove the quality of our mining methods.
Projects (in inverse chronological order)
- Human Rights in the Era of Big Data and Technology (2016-21), funded by ESRC. The premise of the project is that the use of big data may offer unprecedented opportunities to secure the fulfilment of human rights, but equally, its misuse may interfere with the enjoyment and protection of human rights.
- Improving Reporting of Human Right Abuses through Arabic Social Media (2014-17), a KTP collaboration between Essex and Minority Rights Group funded by Innovate UK.
- SENSEI (2013-16), a EU project on using discourse to summarize online conversations.
- GALATEAS (2009-12), a EU project on using text mining to analyze query logs.
- LiveMemories (2006-2010), a large project on using text mining to support creation of shared knowledge funded by the Provincia di Trento.
Main publications
- Ayman Alhelbawy, Mark Lattimer, Udo Kruschwitz, Chris Fox and Massimo Poesio, submitted. An NLP-Powered Human Rights Monitoring Platform.
- Kevin Glover, 2017. The Genitive Ratio and its Applications. PhD dissertation, University of Essex.
- Ayman Alhelbawy, Udo Kruschwitz and Massimo Poesio, 2016. Towards a corpus of violence acts in Arabic social media. Proc. of LREC
- Maha Althobaiti, Udo Kruschwitz, and Massimo Poesio, 2015. Combining Minimally Supervised Methods for Arabic Named Entity Recognition. Transactions of the ACL.. (pdf)
- Ans Alghamdi, Francesca Bonin, Asif Ekbal, Sriparna Saha, Fabio Cavulli, Sara Tonelli, Massimo Poesio, and Udo Kruschwitz, 2014. Active Expert Learning for the Digital Humanities. In Proceedings of STRIX, Gothenburg, November.
- Tommaso Fornaciari and Massimo Poesio, 2014. Identifying fake Amazon reviews as learning from crowds. Proc. of EACL, Gothenburgh, April.
- Tommaso Fornaciari, Fabio Celli, and Massimo Poesio, 2013. The Effect of Personality Type on Deceptive Communication Style. In Proc. of FORTAN, Uppsala, August.
- Deirdre Lungley, Massimo Poesio, Marco Trevisan, Maha Althobaiti and Vien Nguyen, 2013. GALATEAS D2W: A Multi-lingual Disambiguation to Wikipedia Web Service. In Proc. of ENRICH, Dublin, August.
- Tommaso Fornaciari and Massimo Poesio, 2013. Automatic deception detection in Italian court cases. Journal of AI and Law. 21(3), 303--340. (pdf)
- This paper was discussed in the Wall Street Journal here
- T.-Vien Nguyen and Massimo Poesio, 2012. Entity disambiguation and linking over queries using encyclopedic knowledge. In Proc. of the Sixth Workshop on Analytics for Noisy Unstructured Text Data (AND 2012, in conjunction with COLING 2012), Mumbai, India.
- Francesca Bonin, Fabio Cavulli, Massimo Poesio, and Egon W. Stemle, 2012. Annotating Archaeological Texts: An Example of Domain-Specific Annotation in the Humanities. In Proc. of the Sixth Linguistic Annotation Workshop (LAW) at ACL 2012, Jeiu, Korea, p. 134-138. (pdf)
- Tommaso Fornaciari and Massimo Poesio, 2012. DECOUR: A corpus of Deceptive Statements in Italian Courts. In Proc. of LREC, Istanbul.
- Tommaso Fornaciari and M. Poesio, 2012. On the Use of Homogenous Sets of Subjects in Deceptive Language Analysis. In Proc. of EACL Workshop on Computational Approaches to Deception Detection, Avignon. (pdf)
- Asif Ekbal, Francesca Bonin, Sriparna Saha, Egon Stemle, Eduard Barbu, Fabio Cavulli, Christian Girardi, and Massimo Poesio, 2011. Rapid Adaptation of NE Resolvers for Humanities Domains using Active Annotation. Journal for Language Technology and Computational Linguistics, 26(20), 39–51.
- Massimo Poesio, Eduard Barbu, Francesca Bonin, Fabio Cavulli, Asif Ekbal, Christian Girardi, Francesco Nardelli, Sriparna Saha, and Egon W. Stemle, 2011. The humanities research portal: Human language technology meets humanities publication repositories. In Proceedings of Supporting Digital Humanitites (SDH), Copenhagen.
- Josef Steinberger, Massimo Poesio, Mijail Kabadjov and Karel Jezek 2007. Two uses of anaphora resolution in summarization. Information Processing and Management, v. 43, n. 6, 1663-1680. Special issue on Summarization (Donna Harman, ed.). (pdf of preliminary version)
- Olivia Sanchez-Graillet and Massimo Poesio, 2007a. Negation of protein-protein interactions. Bioinformatics, v. 23, n. 13, 424-432. (Full content of article in .html)
- Olivia Sanchez-Graillet and Massimo Poesio, 2007b. Discovering contradicting protein-protein interactions in text. In Proc. of BIONLP.
- Olivia Sanchez-Graillet, Massimo Poesio, Mijail A. Kabadjov, and Roman Tesar, 2006. What kind of problems do protein interactions raise for anaphora resolution? A preliminary analysis. In Proc. of SMBM.
- Massimo Poesio, Mijail A. Kabadjov, Philippe Goux, Udo Kruschwitz, Elizabeth Bishop and Louise Corti, 2006. An anaphora resolution-based anonymization module. In Proc. of LREC.
- Olivia Sanchez-Graillet and Massimo Poesio, 2004. Acquiring Bayesian Networks from text. In Proc. of LREC.