Semantic Interpretation of Personal Health MessageS

The SIPHS project ran from 2015 to 2020. You can find an up to date project page here.

About

Open online data such as microblogs and discussion board messages have the potential to be an incredibly valuable source of information about health in populations. Going beyond simple keyword search and harnessing this data for public health represents both an opportunity and a challenge to natural language processing (NLP). SIPHS aims to help health experts leverage social media for their own clinical and scientific studies through automatic techniques that encode messages according to a machine understandable semantic representation.

At the technological level SIPHS seeks to pioneer new methods for NLP and machine learning (ML). Social media remains a challenging area for NLP for a variety of reasons: short de-contextualised messages, high levels of ambiguity/out of vocabulary words, use of slang and an evolving vocabulary, as well as inherent bias towards sensational topics. The fellowship seeks to harness the progress made so far in NLP for social media analysis in the commercial domain and develop it further to provide meaningful public health evidence. One key aspect not previously addressed is in the clinical coding of patient messages. Although knowledge brokering systems exist for clinical and scientific texts (e.g. MetaMap), their performance on social media messages has been poor. SIPHS aims to utilise the rich availability of ontological resources in biomedicine together with ML on annotated message data to disambiguate informal language. Research will also aim to understanding the communicative function of messages, for example whether the message reports direct experience or is related to news, humour or marketing. If these problems are successfully overcome an important barrier to data integration with other types of clinical data will be removed.

Publications

  • Gritta, M., Pilehvar, M. T., Limsopatham, N. and Collier, N. (2017), "Vancouver Welcomes You! Minimalist Location Metonymy Resolution", in Proceedings of the Association of Computational Linguistics Annual Meeting (ACL 2017), Vancouver, Canada, August (in press). Outstanding paper award.
  • WSDM Workshop on Mining Online Health Reports (2017), Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 825-826 [html]
  • Limsopatham, N. and Collier, N. (2016), “Bidirectional LSTM for Named Entity Recognition in Shared Task”, in Proceedings 2nd Workshop on Noisy User-generated Text at COLING 2016, Osaka, Japan, pp. 145-152. [pdf]
  • Limsopatham, N. and Collier, N. (2016), “Learning orthographic features in bi-directional LSTM for biomedical named entity recognition”, in Proceedings 5th Workshop on Building and Evaluating Resources for Biomedical Text Mining at COLING 2016, Osaka, , pp. 10-19. [pdf]
  • Limsopatham, N. and Collier, N. (2016), “Normalising medical concepts in social media texts by learning semantic representation”, in Proceedings of the Association of Computational Linguistics Annual Meeting (ACL 2016), Berlin, Germany, August 1st to 7th [pdf][data].
  • Limsopatham, N. and Collier, N. (2016), “Modelling the combination of generic and target domain embeddings in a convolution neural network for sentence classification”, in Proc. BioNLP 2016 at the 2016 Annual Meeting of the Association for Computational Linguistics (ACL 2016), Berlin, Germany, August 12th to 13th [pdf].
  • Limsopatham, N. and Collier, N. (2015), “Adapting phrase-based machine translation to normalize medical terms in social media messages”, in Proceedings of the Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, Portugal. [pdf][data]
  • Alvaro, N., Conway, M., Doan, S., Lofi, C., Overington, J., & Collier, N. (2015). “Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use”, Journal of biomedical informatics, 58, 280-287. [html]

Data Sets / Software

  • Data sets for Limsopatham, N. and Collier, N. (2016), “Normalising medical concepts in social media texts by learning semantic representation” [zenodo]
  • Data sets for Limsopatham, N. and Collier, N. (2015), “Adapting phrase-based machine translation to normalize medical terms in social media messages” [zenodo]

Outreach

  • Invited talk to the International Society of Pharmacovigilence (ISoP 2017) at their tutorial on Pharmacovigilence and the Social MediaList of text mining tools [html] [slides]
  • WSDM Workshop on Mining Online Health Reports at the Tenth ACM International Conference on Web Search and Data Mining (February 2017)
  • Invited Talk at the National Institute of Informatics, Japan (March 2017)
  • Invited Talk at the UK-Korea Healthcare Policy and Technology Relationship in New Fields of Interest, Seoul National University Hospital, Korea (March 2017)
  • Invited Talk at the Cambridge Langauge Sciences Annual Symposium 2017 (November 2017) [video]
  • Invited Talk at the Seventh International Workshop on Health Text Mining and Information Analysis (LOUHI), USA (November 2016) [html]
  • Seminar at the Cambridge University Festival of Ideas (October 2016)
  • Invited Talk at the Cambridge University Linguistics Society (March 2016) [html]
  • Seminar at the University of Warwick (Oxctober 2015) [slides]
  • Invited Talk at the Social Science and Big Data Methods Workshop, Cambridge (September 2015) [html]
  • Invited Talk at the Centre for Science and Policy, 8th Policy Leader's Fellowship Meeting, Cambridge (June 2015)
  • Invited Talk at the Public Health at Cambridge Network Showcase, Cambridge (June 2015) [html][video]

Collaborators

  • Dr. Dietrich Rebholz-Schuhmann (INSIGHT, National University of Ireland at Galway
  • Prof. Wendy Chapman (University of Utah)
  • Dr. Mike Conway (University of Utah)
  • Prof. Ingemar Cox (UCL, EPSRC IRC)
  • Prof. Nigel Lightfoot (CORDS network)
  • Dr. David Milward (Linguamatics)
  • Prof. Peter Murray-Rust (University of Cambridge)
  • Dr. Richard Pebody (Public Health England)

Funding

SIPHS is funded by an EPSRC Experienced Researcher Fellowship (EP/M005089/1).