Supervisors: Prof. Frank Barry and Dr. Helena F. Deus (NUIG), Prof. Walter Kolch and Prof Boris Kholodenko (UCD)
Many cell processes such as proliferation and differentiation are controlled by signalling cascades, i.e. chains of proteins responsible for communicating signals from the surface of the cell to the nucleus, effectively affecting protein transcription. One of the most important signalling cascades in carcinogenesis is the ERK/MAPK pathway - it is believed that mutations in the genes responsible for the proteins involved in this pathway may lead normal cells to become cancer cells. Recent research has revealed that such signal transduction pathways appear to be organised as communication networks where information is processed and integrated through relay stations formed by multi-protein complexes . Identifying the proteins involved in these signalling cascades and understanding how they interact to produce a chain of events is therefore a crucial step towards our ability to devise drugs that restore normal activity in the cell.
Computational simulation methods have become a popular method for predicting potential protein-protein interactions based on 3D protein docking, domain-domain interactions or the co-evolution model. The accuracy and predictive power of such computer models relies heavily on the amount and quality of integrated information used as input . The current state of the art in devising such models relies on ad hoc integration of the relevant information e.g. sequence and structure information, to build a useful predictive model. Every additional layer of information must be extracted, transformed and integrated separately before it can be used as input. Alternatively, Linked Data can be used as an integrative technology as it relies on the simple concept that existing relationships between entities, such as proteins, can be represented as a network where each individual entity is represented by a node and its relationships to other entities in the graph, e.g. drugs or other proteins, are represented by an arc. Moreover, both the entities and the links established between them can be deferenced, i.e. their description and associated properties can be automatically retrieved from the Web to be used in the creation of new layers of integrated information. Multiple studies have shown that these technologies are suitable for integrating proteomics and genomics experimental results [3-5].
In this project, Linked Data technologies will be weaved to represent protein-protein interactions. The research focus will be on identifying the type of relationships that are best used to represent both the provenance of the interaction information (e.g. mass spectrometry, co-upregulation, etc) and its probabilistic value in order to create non-overlapping layers of information. Representing protein-protein interaction data in this format will enable the creation of mathematical constructs, e.g. adjacency matrixes that can be algebraically manipulated to identify the topology of the protein-protein interaction network. The advance beyond the state of the art will be the possibility to enrich the predictive models with ad hoc layers of information such as drug interactions and its effect on the network topology.
1. Kolch W: Coordinating ERK/MAPK signalling through scaffolds and inhibitors. Nature Reviews Molecular Cell Biology 2005, 6:827-837.
2. Wierling C, Herwig R, Lehrach H: Resources, standards and tools for systems biology. Briefings in functional genomics proteomics 2007, 6:240-251.
3. Anwar N, Hunt E: Francisella tularensis novicida proteomic and transcriptomic data integration and annotation based on semantic web technologies. BMC Bioinformatics 2009, 10:S3.
4. Deus HF, Prud E, Zhao J, Marshall MS, Samwald M: Provenance of Microarray Experiments for a Better Understanding of Experiment Results. In ISWC 2010 SWPM. 2010.
5. Deus HF, Veiga DF, Freire PR, et al. Exposing The Cancer Genome Atlas as a SPARQL endpoint. Journal of Biomedical Informatics 2010, 43:998-1008.