Nigel Collier - PhenoMiner

PhenoMiner

The PhenoMiner project ran from October 2012 to December 2014 at the European Bioinformatics Institute (EMBL-EBI) at the Wellcome Trust Genome Campus in Hinxton, Cambridge, UK.

About

Phenotypes play a key role in inferring the complex relationships between genes and human heritable diseases. PhenoMiner is a research project aimed at the capture and encoding of phenotypes in the scientific literature. This should provide insights into the complex processes involved in human diseases as well as enabling semantic interoperability with existing biomedical ontologies such as those that describe human anatomy, genetics and behaviours.

PhenoMiner is based on text/data-mining technology - natural language processing, machine learning and conceptual analysis. It builds on insights gained from semantic parsing to extract structured information about phenotypes from whole sentences - in contrast to existing techniques which often apply string matching. The system exploits the wealth of scientific data locked within the scientific literature in databases such as PubMed Central and Europe PMC to extract the semantic vocabulary of phenotypes that scientists use. The system will provide scientists, clinicians and informaticians with the data and tools they need to gain new insights into Mendelian diseases.

Publications

Collier, N., Tran, M. V., Le, H. Q., Oellirch, A., Kawazoe, A., Hall-May, M. and Rebholz-Schuhmann, D. (2012), "A hybrid approach to finding phenotype candidates in genetic texts", in Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India, December 10-14.
Collier, N., Oellrich, A. and Groza, T. (2013), “Toward knowledge support for analysis and interpretation of complex traits”, Genome Biology 14(9):214.[html]
Groza, T., Oellrich, A., & Collier, N. (2013), “Using silver and semi-gold standard corpora to compare open named entity recognisers”, in 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013, pp. 481-485.
Tran, M. V., Le, H. Q., Phi, V., T., Pham, T.B. and Collier (2013), "Exploring a probabilistic Earley parser for event composition in biomedical texts", in Proceedings of the BioNLP workshop shared task at ACL 2013, Sophia, Bulgaria, pp. 130-134.
[html].
Collier, N., Tran, M., Le, H. Ha, Q., Oellrich, A. Rebholz-Schuhmann, D. (2013), “Learning to recognize phenotype candidates in the auto-immune literature using SVM re-ranking”, PLoS One 8(10): e72965.[html][pdf]
- Collier, N., Paster, F. and Tran, M. V (2014), "The impact of near domain transfer on biomedical named entity recognition", in Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi) at EACL, pp. 11-20. [pdf]
- Collier, N., Oellrich, A. and Groza, T. (2014), "Concept selection for phenotypes and disease-related annotations using support vector machines" in Proc. PhenoDay and Bio-Ontologies at ISMB 2014. [pdf]
- Collier, N., Kafkas, S., Kim, J.H. and McEntyre, J. (2015), "OMIM concept annotation: Steps towards automated tagging the disease iterature using PhenoMiner phenotypes", Force 2015 Research Communications and e-Scholarship Conference, Oxford, 12-13 January. [pdf poster].
- Kafkas, S., Kim, J.H., McEntyre, J. and Collier, N. (2015), "Analysis of PhenoMiner phenotypes in the open access full text literature", Force 2015 Research Communications and e-Scholarship Conference, Oxford, 12-13 January. [pdf poster].
- Oellrich, A., Collier, N., Smedley, D., & Groza, T. (2015). Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes. PloS one, 10(1), e0116040.[pdf]
- Groza, T., Kohler, S., Dolken, S., Collier, N., Oellrich, A., Smedley, D., Couto, F. M., Baynam, G., Zankl, A. and Robinson, P. N., "Automatic concept recognition in the Human Phenotype Ontology reference gold standard and test suites corpora", Database, OUP (in press).
- Collier, N., Groza, T., Smedley, D., Robinson, P, N., Oellrich, A. and Rebholz-Schuhmann, D. (2015), "PhenoMiner: from text to a database of phenotypes associated with OMIM disorders", under submission for Database, OUP.
- Collier, N., Oellrich, A. and Groza, T. (2015), "Concept selection for phenotypes and diseases using learn to rank", under submission for the Journal of Biomedical Semantics.

Data Sets

PhenoMiner database demonstration
Phenotype entities in Medline for auto-immune and cardiovascular diseases in CSV format. Please cite:

Collier, N., Paster, F. and Tran, M. V (2014), "The impact of near domain transfer on biomedical named entity recognition", in Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi) at EACL, pp. 11-20.[pdf]

PhenoMiner database in XML format at Zenodo (DOI: 10.5281/zenodo.12493) or at GitHub.
EuropePMC external links to OMIM inferred from PhenoMiner data (select an article and then select the External Links tab).

Outreach

List of text mining tools
- Symposium at the University of East Anglia (April 2013)
- Colloquium held at the University of Zurich (Oct. 2013)
- PhenoDay 2014 was a joint workshop held with Bio-Ontologies and Bio-Link Special Interest Groups at ISMB.
- Seminar at the University of Sheffield (September 2014) see slides here.
- Seminar at the University of Manchester (November 2014)
- Seminar at the French National Institute for Agricultural Research (INRA), Paris (December 2014)
- Biomedical Linked Annotation Hackathon (BLAH) at DBCLS (February 2015)
- Special issue of PhenoDay to appear in BMC Bioinformatics (Spring 2015)
- Tutorial on biomedical text mining at TGAC (Spring 2015)
- And keep up with PhenoMiner related blog message on Twitter. Hashtag #PhenoMiner

Collaborators

Dietrich Rebholz-Schuhmann (University of Zurich)
Damian Smedley (Wellcome Trust Sanger Institute)
Anika Oellrich (WellcomeTrust Sanger Institute)
Tudor Groza (University of Queensland)
Peter Robinson (Charite Universitatsmedizin Berlin)
Vu Tran Mai (University of Vietnam)
Jo McEntyre (EMBL-EBI)

Funding

PhenoMiner is funded by an FP7 Marie Curie Fellowship (grant 301806).

Page updated

Google Sites

Report abuse