Nigel Collier

Professor of Natural Language Processing, University of Cambridge

Fellow, the Alan Turing Institute

News (5/10/2020): We're hiring for an RA to work on NLP in the Automated Understanding and Alerting of Disease Outbreaks from Global News Media Project (EPI-AI) - please see here Deadline is 6/12/2020.

I have been working in NLP and AI for over 20 years. Before joining the University of Cambridge on an EPSRC Experienced Researcher Fellowship (2015-2020) I spent the early part of my career in Japan (1996-2012). I was a Toshiba Fellow, a postdoc at Tokyo University with Junichi Tsujii and Associate Professor at the newly formed National Institute of Informatics where I led the NLP lab for 12 years before returning to the UK on a Marie Curie Research Fellowship. As an undergraduate I studied for a BSc. in Computer Science at the University of Leeds (1992). I received an MSc in Machine Translation (1994) and a PhD in Computational Linguistics (1996) from the University of Manchester (UMIST) for my research on English-Japanese Lexical Transfer using a Hopfield Neural Network.

My work focuses on natural language processing and machine learning. My research interests are broadly in creating better models for natural language understanding as well as applications with the potential for tangible social impact, for example in the area of global health (see below for the BioCaster and EPI-AI projects) where I am a member of the WHO's Epidemic Intelligence from Open Sources initiative.

For a list of publications please see Google Scholar.

Prospective PhD students: I am always interested to supervise new NLP projects on the PhD in Computation, Cognition and Language. Before contacting me please make sure that you meet the minimum requirements and take time to check out my publications. In your email please send a CV with a brief statement of research interests. Please note the application deadline and documents you need to submit with your application.

Enquiries for postdoctoral opportunities are always welcome. When needed I can help explore funding sources for fellowships from UK, EU and other agencies.

Over the years many people have contributed to the research and publications in my lab. Here's a list of current students, postdocs and alumni.

Current projects

2020.2 to 2023.8 ESRC EPI-AI: Automated Understanding and Alerting of Disease Outbreaks from Global News Media (with Professor David Buckeridge and Dr Nick King, McGill University). The EPI-AI project aims to achieve a step change in automated global epidemic alerting using news media monitoring. Teams at McGill and Cambridge universities, in collaboration with national and international public health agencies, are adopting an interdisciplinary approach that combines natural language processing, epidemiology, biomedical informatics and bioethics to address this complex task.

2020.2 to 2024.1 HDR UK Text Analytics Resource (with a consortium of researchers led by Professor Richard Dobson and Dr Angus Roberts, Kings College London).

2020.4 to 2022.3 Alan Turing Institute: Interpretable and Explainable Deep Learning for Natural Language Understanding and Commonsense Reasoning (with Professor Thomas Kukasiewicz, University of Oxford).

2017.9 to 2021.9 Mapping of Rumours and Information Diffusion (with Dr Chryssi Giannitsarou and Dr Flavio Toxvaerd, University of Cambridge).

Selected publications on NLP for epidemic detection and mapping

    • Collier, N., Doan, S., Kawazoe, A., Goodwin, R. M., Conway, M., Tateno, Y., ... & Shigematsu, M. (2008). BioCaster: detecting public health rumors with a Web-based text mining system. Bioinformatics, 24(24), 2940-2941. [pdf]

    • Hay, S. I., Battle, K. E., Pigott, D. M., Smith, D. L., Moyes, C. L., Bhatt, S., ... & Gething, P. W. (2013). Global mapping of infectious disease. Philosophical Transactions of the Royal Society B: Biological Sciences, 368(1614), 20120250. [pdf]

    • Collier, N., Son, N. T., & Nguyen, N. M. (2011). OMG U got flu? Analysis of shared health messages for bio-surveillance. Journal of biomedical semantics, 2(5), S9. [pdf]

    • Kawazoe, A., Jin, L., Shigematsu, M., Barrero, R., Taniguchi, K., & Collier, N. (2006). The Development of a Schema for the Annotation of Terms in the Biocaster Disease Detecting/Tracking System. In KR-MED. [pdf]

    • Collier, N., Goodwin, R. M., McCrae, J., Doan, S., Kawazoe, A., Conway, M., ... & Dien, D. (2010). An ontology-driven system for detecting global health events. In Proceedings of the 23rd International Conference on Computational Linguistics (pp. 215-222). Association for Computational Linguistics. [pdf]

Recently completed projects

2015.2 to 2020.2 EPSRC SIPHS (EP/M005089/1): I was funded by a 1.2 million 5-year EPSRC fellowship to investigate the Semantic Interpretation of Personal Health messages on the Web (SIPHS) project. This international collaborative effort leveraged social media data for digital disease applications such as detecting infectious disease outbreaks and adverse drug reaction.

2015.10 to 2018.10 MRC PheneBank (MR/M025160/1): This project aimed to develop a new method for the identification and harmonisation of human phenotypes from the scientific literature as well as their associations to entities of interest such as diseases, genes and other phenotypes.

Selected recent publications

    1. Conforti, C., Berndt, J., Pilehvar, M. T., Giannitsarou, C., Toxvaerd, F., & Collier, N. (2020). Will-They-Won't-They: A Very Large Dataset for Stance Detection on Twitter. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020). [pdf]

    2. Basaldella, Marco, and Nigel Collier (2019). "BioReddit: Word Embeddings for User-Generated Biomedical NLP." Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019). [pdf]

    3. Gritta, M., Pilehvar, M. T., & Collier, N. (2019). A pragmatic guide to geoparsing evaluation. Language Resources and Evaluation, 1-30. [pdf]

    4. Prokhorov, V., Pilehvar, M. T., Kartsaklis, D., Lio, P., & Collier, N. (2019). Unseen Word Representation by Aligning Heterogeneous Lexical Semantic Spaces. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 6900-6907). [pdf]

    5. Le, H. Q., Can, D. C., Ha, Q. T., & Collier, N. (2019). A Richer-but-Smarter Shortest Dependency Path with Attentive Augmentation for Relation Extraction. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (Vol. 1, pp. 2902-2912). Association for Computational Linguistics. [pdf]

    6. Prokhorov, V., Pilehvar, M. T., & Collier, N. (2019). Generating Knowledge Graph Paths from Textual Definitions using Sequence-to-Sequence Models. In Proceedings of NAACL-HLT (pp. 1968-1976). [pdf]

    7. Kartsaklis, D., Pilehvar, M. T. and Collier, N. (2018), “Mapping Text to Knowledge Graph Entities using Multi-Sense LSTMs”, in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), Brussels, Belgium, pp. 1959-1970. [pdf]


Nigel Collier, Professor of Natural Language Processing

The Language Technology Lab, Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, 9 West Road, Cambridge CB3 9DB, United Kingdom

Tel: +44 (0)1223-760373

Email: nhc30 [AT] cam dot ac dot uk

Office: Room TR-23, English Faculty Building

ORCID ID: 0000-0002-7230-4164

Follow on Twitter | Slideshare | LinkedIn