Authors: Nicholette Ivezaj and Jennifer Howard
Date: December 13, 2016
Objectives:
- To identify the gene for a protein purified from HeLa cells using a partial amino acid sequence.
- To explore gene structure and alternate forms of the gene as well as the tissues in which it is expressed.
- To identify the structure and function, in the form of biochemical pathways, of the gene and the protein for which it codes.
- To discover how the protein is implicated in disease.
Introduction/Background:
Although it is still a nascent field of study, bioinformatics techniques are inseparable from the study of molecular biology and have become heavily relied upon. The term 'bioinformatics' was first introduced by Paulien Hogeweg and Ben Hesper in the 1970s and was meant to refer to, "the study of informatic processes in biotic systems" (4). A main goal of molecular biology has been to learn how living things process, store, and use their information. The process through which a living thing expresses a gene is really just a form of information processing, as it is unidirectional and quantifiable. The best way to study this vast pool of information processing was through the use of computational analysis and the creation of databases, as no human being has the capacity to filter through such vast bodies of information as the genetic code.
Bioinformatics has facilitated a way to share scientific information that had never been seen before. As the influence and popularity of computers grew, so too did the users and contributors of biological databases. In the past, databases were all very separate and only held a narrow range of information, but now databases overlap and link to each other, or automatically update themselves as new information is submitted to different databases. This has led to much greater ease when analyzing new protein or nucleic acid sequences, or sequences that have recently gained a new aspect that makes them of greater biological importance. Bioinformatics has also been useful in testing molecular interactions using computational models rather than by trial and error in a wet lab setting. This has helped to advance the development of therapies and medicines as the structure of the proteins can be seen and matched with artificial drugs or other proteins so that scientists know for a fact they will be able to interact.
For our project, the availability of diverse databases helped to hasten our exploration of the B-Raf proto-oncogene, serine/threonine kinase gene in Homo sapiens. BRAF is a part of the Raf family of proteins and is involved in regulating the MAP/ERK pathway, which stands for the mitogen activated protein/ extracellular signal-regulated kinase pathway. This pathway plays a crucial role in cell division, differentiation, and secretion. The BRAF gene is thus very important because errors in its translation can result on problems with cell division, often leading to cancer. This project will make an important contribution to conditions relating to this gene as researchers exploring it will now have a single website containing a great deal of BRAF's most important information, and links to the original sources should they wish to explore on their own. This will streamline the efforts of scientists and help, even if only in a small way, to find new treatments for conditions caused by BRAF mutations.
Methods:
- Our partial amino acid sequence was inserted in NCBI BLAST in order to identify it as B-Raf proto-oncogene, serine/threonine kinase in Homo sapiens. Our protein matched the entry at 100% query coverage and a total score of 1531 out of a maximum score of 1531. This page provided up with the protein accession number (8) for our gene. Clicking on this gene took us to its NCBI page (2), from which we get the mRNA (5) and genomic accession (6) numbers in addition to the protein number. This page also showed our genes synonyms, function, and pathways.
- To explore isoforms of our gene we needed to search in on Ensembl, which provided the locations of the exons and introns and identified if the isoform was protein coding, or if it wasn't then it explained what error had occurred. Locations for expression of the gene and isoforms was not available on Ensembl (10, 11, 12), and so was found by searching for BRAF on AceView (7).
- To gain an overall understanding of the structure and function of Braf, its AceView profile (7) was helpful in providing general information about its domains. The secondary structure of Braf was determined via the Braf entry on UniProt (26), and this same source provided information about other significant residues. 3D structural representations of Braf were found using PDB (23), which provided the published structures of the different Braf domains that were then available to manipulate and interact with using PyMol software.
- Aceview also provided other links for more detailed pathway diagrams for the mechanisms Braf is involved with
- OMIM (21) was used to generate a list of diseases associated with Braf mutations, and also provided clinical features and information about the diseases. The Atlas of Genetics and Cytogenetics in Oncology and Hematology (15) provided further information about the Braf mutations correlated with the different diseases.
Tools:
- NCBI (2) is a database where nucleic acid or amino acid sequences can be searched in order to identify the gene from which they originated.
- Ensembl (1) is a vertebrate genome browser which can be used for comparative genomics and to predict function.
- AceView (9) is a database containing all public mRNA sequences and experimental cDNAs from which alternative transcripts can be found.
- OMIM (21): The Online Mendelian Inheritance in Man is a database that includes detailed summaries, pathways, and genetic information about inherited diseases and disorders.
- PDB (23): The Protein Databank provides structures, summaries, and information about proteins.
- UniProt (26): UniProt is a protein database including structural information, functional information, pathways, and other information about proteins.
- Pubmed: Pubmed is a database of biochemical literature, allowing for the searching of all relevant publishings for topics concerning biochemical research.
- Atlas of Genetics and Cytogenetics in Oncology and Hematology (15): A database about cancer related genes and their clinical features and research about cancer related
- PyMol: Allows for the viewing and interactive changing/highlighting of of publicly available molecular graphics pdb files