(Photo Credit: Rachel P McCord)
(Photo Credit: Rachel P McCord)
"We are drowning in information yet thirsting for wisdom." -E.O. Wilson.
When Wilson said this, he was referring to the current data avalanche in biology. On a daily basis, scientists, around the world, are collecting, compiling, and analyzing vast amounts of genomic, proteomic, metabolic, and every other kind of -omic data. To satisfy our thirst, a new field of Computational Biology- also called Bioinformatics- has emerged to make sense of all this data.
One of the great things about this field is the open access nature of data and analytical tools. This means anyone with a computer and internet connection can comb through gene sequences, analyze protein structure, or search organismal distribution maps.
I was first introduced to this field by taking a class taught by the brilliant Dr. Rachel P McCord and Chris Playter. This web page will serve as a repository of modern tools of computational biology which I have learned from them and used for my own personal research.
All fields of study have their own language and use of mathematics, and Bioinformatics is no different.
Bioinformatics is ripe with terminology and codes, but there are various websites dedicated to define terms and identify biological structures.
Uniprot: A database that has unique identifying codes for each gene and protein. It also contains information about FASTA sequences. Note: It's a good idea to include this number as well as the PDB ID for any protein you may be writing about.
Ontology Search: A database that has codes for what type of experiment was performed to generate a certain data set.
gProfiler: TBD
Bioinformatics, along with many other fields of biology, make use of statistics to help understand the biological world. The majority of my statistical training was taught to be by Dr. James Fordyce of the Ecology and Evolutionarily Biology Department at the University of Tennessee, Knoxville. He has a YouTube Channel, Stats EEB, with all the lectures of his EEB 411: Biostatistics and EEB 560: Biometry classes uploaded for the public to view.
EEB Class Playlist: https://youtube.com/playlist?list=PLSjL6PTXbeae40aHSCgfzU9zOnGYLho_R
Note: I made this playlist to make it easier to find the videos, as I can never remember his channel name.
The source code that shapes biology. DNA was one of the first -omic data sources scientists deciphered. This means that there are various websites and databanks designed to analyze genetic sequences. Here are just a few:
Genes- Small segments of DNA that give rise to proteins- the molecular workhorses of the cell. We are particularity interested in where these genes are, what proteins they code for, and what happens when they go wrong.
BLAST: BLAST is a one-stop-shop for all things gene analysis: from identifying unknown genes, designing primers, to sequence alignment.
UCSC Genome Browser: Maintained by the University of California, San Diego, this is a database of various genomes, where one can search for various genes in species from microbe to man.
Gene Enrichments: We like to see what functions genes from data sets possess, especially if we are looking at genes that are up or down regulated in diseases.
Enrichr: Gives a lot of detailed information about enrichments.
GeneOntolgy: A good resource for looking for enrichments, also run by a software called Panther, so points for coolness.
Reactome: Gives a really cool enrichment map output.
3D Genome Structure. Our cells are the most complicated piece of organic origami in the known universe. Each cell in our body contains over 6 ft of DNA, so it must be intricately folded. So, how do we determine the geometry of our genome?
HiGlass- This tool is a way to visualize Genomic data and let us see what bits of DNA are interacting with each other- giving us great insight into various diseases.
In each of our cells is a structure called the Nucleus, and it houses all our genetic information in the form of DNA. For scale, if we were to increase the size of our nuclei to the size of a tennis ball, it would have over 9 miles of DNA inside it. (Photo Credit: Rachel P McCord).
The molecular workhorses of the cell. Without proteins, life as we know it couldn't exist, so there's lots of research out there about how these proteins look, work, and interact with other proteins and genetic material.
A cartoon representation of Inosine-5′-Monophosphate Dehydrogenase Type II. The first of many enzymes responsible for synthesizing nucleotides. Photo Credit: Sintchak et al. 2001.
PDB: A repository of over 200,000 protein structures determined by experimental and computational methods.
BLAST: BLAST also does protein sequence alignments.
Motifs: Sometimes, we want to look and see if any parts of a protein have structural and functional similarity to other known proteins, especially when we don't have an experimentally determined structure. There are many websites that do this, but these are the ones I have worked with the most.
Interaction: Proteins interact with other proteins, and we often want to see the networks that arise from these interactions.
Cancer plagues every multicellular organism, so it is commonly studied system in Bioinformatics. When analyzing cancer data, we want to see what genes, mutations, and risk factors are associated with different outcomes.
Data Visualization- Bioinformatic data from cancer can be overwhelming for even the best computers, so we often rely on data portals to access the small fraction of data we are interested in.
ICGC Data Portal - This data portal is maintained by the International Genome Consortium; it is a great resource for enrichment analysis, cohort comparisons, and a cool "Oncogrid" feature, which compares genetic pathways with different cancer types
Mutation Signatures- Dr. Rachel P. McCord often describes the human genome as noisy sports stadium. There's a lot going on and trying to take it all in at once is just going to leave you confused. Instead, if you listen for certain cues ("Defense", "Rocky Top", "Touchdown Tennessee", "Roll Tide") then you will have a better chance of seeing where you are. This tool applies that logic to cancer genomic data; by looking for certain "cues" we are able to learn something of clinical relevance about a particular cancer.
Here are the deformed nuclei of cancer cells stained with a florescent pigment. They are deformed as metastasizing cells must squeeze through narrow gaps of tissues, and the nucleus of the cell, the biggest part of the cell, takes a beating. (Photo Credit: Rachel P. McCord)
Plants are fascinating organisms that lay the foundation for every terrestrial ecosystem on earth and have lives as intricate and interesting as any animal. Many of the botanical bioinformatic tools I have worked with are concerned with the distribution of plants across the globe.
Here we see the lovely Acalypha reptans of the Euphorbiaceae family, commonly called the Chenille Plant. It is a common sight in many gardens across America. (Photo Credit: Layla Dishman)
GBIF- This is an international portal to access information about the distribution of many organisms, including plants.
SERNEC- This is similar to GBIF in that you can look for distributions of plants, but this is powered by Herbarium data from the South Eastern United States. They also have a fun "Plant of the Day" quiz.
UTK Herbarium- My alma mater has one of the best herbariums in the south east, so I had to include a direct link to it. Here you can search for plants by common names, scientific names, or by Tennessee county.
Layla's Flicker- Layla, a close personal friend and colleague, has been documenting plant biodiversity for many years. Her flicker account is organized by scientific names and by other broad groupings (such as flower color), so it is very user friendly and has high resolution, research grade images of many specimens.
Angiosperm Phylogeny Website- A bit of an old and clunky website, but it is a great repository of botanical knowledge, including up to date phylogenies on angiosperm and other plant lineages.
If you want to learn more about cutting edge Bioinformatics/ Computational Biology research, check out Dr. Rachel P McCord's Work.