Roles and responsibilities

Welcome!  I am Director of Research in Computational Linguistics at the Department of Theoretical and Applied Linguistics (DTAL) in the University of Cambridge and co-founder of the Language Technology Laboratory. I am also a member of the
EPSRC Peer Review College (2015- present), an elected member of the faculty board in Modern and Medieval Languages (2016-2018) and a study steering committee member on the NIHR DEPEND project (2015-2018).  I am a member of the Cambridge Centre for Science and Policy (CSaP), Cambridge Big Data, Cambridge Health, Medicine and Society as well as Cambridge Language Sciences.

Research Interests
  • Computational linguistics
  • Information extraction, Web mining and knowledge discovery
  • Lexical semantics and ontologies
  • Machine learning approaches for NLP, including representation learning of concepts
  • Natural language processing for public health, clinical and biomedical applications

1/2017 New PhD-funded position on NLP for measuring the veracity of rumours (project 'Rumour Mill'): deadline March 13th ( - Contact me early to discuss if interested. Also the position is now online here.

11/2016 Prospective PhD students and MPhil students please see below.

EP/M005089/1): I am funded by a 1.2 million 5-year EPSRC fellowship to investigate the Semantic Interpretation of Personal Health messages on the Web (SIPHS) project. This is an international collaborative effort to leverage social media data for digital disease applications such as detecting infectious disease outbreaks and adverse drug reaction.

MRC PheneBank (
MR/M025160/1): I am PI on the PheneBank project. This project seeks to develop a new method for the identification and harmonisation of human phenotypes from the scientific literature as well as their associations to entities of interest such as diseases, genes and other phenotypes.

Recent publications
  1. Pilehvar, M. T., and Collier, N. (2016), "De-Conflated Semantic Representations", arXiv preprint arXiv:1608.01961. To appear in EMNLP 2016 in November.  Download pdf.
  2. Limsopatham, N. and Collier, N. (2016), “Normalising medical concepts in social media texts by learning semantic representation”, in Proceedings of the Association of Computational Linguistics Annual Meeting (ACL 2016), Berlin, Germany, August 1-7. Download pdf.
  3. Le, H.Q., Tran, M.V., Dang, T.H. Ha, Q.T. and Collier, N. (2016), “Sieve-based coreference resolution enhances semi-supervised learning model for chemical-induced disease relation extraction”, in Database, Oxford University Press, vol. 2016: article ID baw102; DOI: 10.1093/database/baw102.
    Download pdf.
  4. Pilehvar, M. T. and Collier, N. (2016), “Improved Semantic Representation for Domain-Specific Entities”, in Proc. BioNLP 2016 at the 2016 Annual Meeting of the Association for Computational Linguistics (ACL 2016), Berlin, Germany, August 12-13.  Download pdf.
  5. Limsopatham, N. and Collier, N. (2016), “Modelling the combination of generic and target domain embeddings in a convolution neural network for sentence classification”, in Proc. BioNLP 2016 at the 2016 Annual Meeting of the Association for Computational Linguistics (ACL 2016), Berlin, Germany, August 12-13. Download pdf.
If you want to see a snapshot of my publications including the most cited please follow the link to Google Scholar. Also available on the LTL publications page.


I have been actively involved in teaching throughout my career and have taught a range of computational linguistics units both at the Department of Informatics in Sokendai and in the University of Cambridge where I have given frequent guest lectures. I currently teach on the Biomedical Information Processing course at the Computer Lab during the Lent term.

Prospective PhD students

I am delighted to consider applications for PhD project proposals from students with a strong background in computing, linguistics or AI. I do however receive a steady stream of such contacts and in order to save time request that in your initial message you (a) provide a brief overview of your project idea and - importantly - how it relates to my research interests, and (b) provide an up to date CV including overall course grades. If you wish to apply for the PhD course in October 2017 please contact me by October/November 2016. You might find it useful to see some project ideas here as starting points.

MPhil students

MPhil students on the ACS course please contact me about project proposals on Biomedical Information Processing. You can find two project ideas here.

Present activities

I am currently:
I recently:

Other background

Prior to joining the University of Cambridge I was a FP7 Marie Curie fellow on the PhenoMiner project at EMBL-EBI (2012-2014) and Associate Professor at the National Institute of Informatics in Tokyo where I led the Natural Language Processing laboratory. From 2007 - 2012 I served as a technology advisor on the international Global Health Security Action Group technical working group on Risk Management and Communication. I obtained my PhD in computational linguistics at UMIST in 1996 (now the University of Manchester) for my research into the application of neural networks for machine translation.

I am a senior member of the Association for Computing Machinery (1996 - present) and a member of the Association for Computational Linguistics (1996 - present).