Basic Local Alignment Search Tool (BLAST) : Bioinformatics tools that finds sequence similarities to locate possible matches of genes, nucleic acid sequences, and proteins(31)
Ensembl : Genomic database that provides information on eukaryotic ranging from genome to protein structure(8,9)
U.S. Library of National Medicine: provides up-to-date scientific journals on experiments and information gathered on various scientific topics. It also hosts a variety of helpful databases including those found on NCBI. (20,22-27)
GenBank : A part of NIH (National Institute of Health) and contains all known DNA sequences to compare and determine unknown sequences and their relatives (4,6)
NCBI AceView : A database that contains sequence information of mRNA transcripts and includes details such as variants, domains/motifs, conserved sequences, validated poly-a signals. (32)
NCBI Molecular Modeling Database : Database hosted by NCBI that supplies 3D images of DNA, RNA, and protein structures that are validated by Protein Data Bank(16,19)
NCBI Protein : NCBI database that includes all known nucleotide sequences which translate to polypeptides and protein structure (5,7)
Online Mendelian Inheritance in Man (OMIM) : a database through NCBI that contains Mendelian disorders in conjunction with genetic disorders and their phenotypes for a majority of Human genes
Protein Data Bank : database of protein structures that contain 3D structures of proteins along with isoforms and possible transcriptional features of sequences (16)
Science Direct : Is a source of scientific articles that have peer-reviewed as well as textbook chapters relating to biology and other scientific fields(12)
UniProt : A database that has organized proteins and provides their structure, sequences, variants, and domains(14,15)
We were first provided a partial amino acid sequence provided by Dr. Bagga of Ramapo College of New Jersey:
"YKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRDGVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGAC"
This sequence was ran through NCBI BLAST to identify similar sequences which are provided based on the similarity percentage between the sequences. This lead us to determine our protein sequence to be the Bcl-2 protein which is encoded by the BCL2 gene with Entrez ID 596. This provided further information on what genomic sequence our protein is derived from and what isoforms existed. This lead to the determination that for this protein it is derived from the genomic sequence NG_009361.1 (2), that encoded two possible mRNA transcript variants; isoform alpha (NM_000633.2)(4) and isoform Beta (NM_000657.2)(6). The genomic sequence page provided much information on the loaction of our gene, it's size, and features such as possible exons and coding sequences.
Pre-mRNA Transcript Variants
From the NCBI genomic sequence page we found the two existing isoforms of Alpha and Beta, each encoded by a slightly difference sequence with slightly different features. We used this page to determine the structure of each pre-mRNA by information given on the size, location, coding sequence, exon location, poly-A sites, and Poly-A signals. The use of Ensembl(8,9) and Aceview(32) provided more in-depth information on the limited information of Poly-A signals and exon locations of each isoform. By obtaining exon lengths and locations as well as coding sequences we were able to determine 3' and 5' UTR regions. NCBI also provided information on the difference in transcript structure between the two forms regarding an altered 3' UTR region and distinct C-terminus in the beta isoform. Provided further from each nucleotide page was the protein in which those transcript variants coded for.
Protein
Using NCBI nucleotide sequence pages(8,9) we were able to find the proteins in which these transcripts coded for; Alpha (NP_000624.2)(5) and Beta (NP_000648.2)(7). These NCBI pages provided good basic information on the structure of our proteins, including amino acid length and domain features but did not provide thorough accounts of how the difference in isoforms played into function and 3D conformation. The use of Protein Databank(16) was called on to provide 3D models of our proteins and information such as the number of Alpha-Helixes and Hydrophobic regions. Uniprot(14,15) allowed us to find sub-cellular localization, information on specific binding domains, and other Bcl-2 family proteins that Bcl-2 interacts with. We then used the U.S. National Library of Medicine Database(20, 22-27) and Science Direct (12) to accumulate information on how these binding domains interacted with each other, as well as other proteins, to form its structure and complexes. These journals also provided information on what the functions or predicted functions of the complexes were.
Pathways and Mutations
The biological pathways which with our protein worked on was obtained through scientific journals from both the U.S. National Library of Medicine(20,22-27) and Science Direct (12). Since we had already obtained information on structure and function of the protein, we used these databases to obtain information on the pathways with which our protein was associated with; most noticeably the intrinsic apoptosis and calcium homeostasis regulation. Most mutations were found through literature on the same databases when journals were used to obtain disease information, but Uniprot (14,15) provided actual sequence mutations of the protein structure. OMIM was used for preliminary disorder research to narrow down the most commonly associated disorders with the Bcl-2 protein.