Bioinformatics
http://www.dnaftb.org/dnaftb/ DNA from the Beginning
http://www.johnkyrk.com/ Cell biology animation
http://multimedia.mcb.harvard.edu/media.html
http://www.youtube.com/watch?v=fNyq4A08mTo
http://www.youtube.com/watch?v=4rLDND-Dix0
http://acg.media.mit.edu/people/fry/genocarto.html
http://celldynamics.org/celldynamics/gallery/mathModel.html
http://bioinformatics.oxfordjournals.org http://www.ploscompbiol.org/
http://mbi.dkfz-heidelberg.de/projects/cellsim/cellosim/index.html
PySCeS: Python Simulator for Cellular Systems http://pysces.sourceforge.net/index.html
http://www.bii.a-star.edu.sg/achievements/applications/cellware/index.asp CellWare
http://rusty.fhl.washington.edu/ingeneue/ Java program to build and analyze genetic networks
http://sodium.physics.drexel.edu/systemsBiology/
http://www.cms3.cnr.it/index.php?option=com_content&task=view&id=52&Itemid=33
http://www.celldynamics.org/celldynamics/index.html
http://www3.niaid.nih.gov/labs/aboutlabs/psiim/computationalBiology/
http://tsb.mssm.edu/prime/
http://www.mbi.osu.edu/
Nucleotide is one of the structural components, or building blocks, of DNA and RNA. A nucleotide consists of a base (one of four chemicals: adenine, thymine, guanine, and cytosine) plus a molecule of sugar and one of phosphoric acid.
Promoter is the part of a gene that contains the information to turn the gene on or off. The process of transcription is initiated at the promoter.
Non-coding DNA is the strand of DNA that does not carry the information necessary to make a protein. The non-coding strand is the mirror image of the coding strand and is also known as the antisense strand
Autosome: Any chromosome other than sex chromosome. Gene: An ordered sequence of nucleotides located in a particular position on a particular chromosome that encodes specific functional product such as an enzyme, protein, or RNA molecule.
Haploid: A cell with half the usual number of chromosomes or only one chromosome set. In the humans it would be 23 chromosomes. Sperm cells and egg cells are haploid.
The human haploid genome contains 23 chromosomes, of which 22 are autosomes common for both female and male. The two sex chromosomes are called X and Y. The egg cell contains only the X chromosome while the sperm cell contains either X or Y. Every somatic cell (other than germ cells) in the human body contains 46 chromosomes (22 autosomes from each of the parents and a pair of either X-X (female) or X-Y (male). Since there are two sets of the 22 chromosomes, it is enough to sequence only one set of 22 chromosomes, and the two sex chromosomes, X, and Y, amounting to a total of 24 chromosomes to be sequenced
http://scienceandreason.blogspot.com/2006/11/alternative-splicing.html
http://scienceandreason.blogspot.com/search/label/gene%20expression
http://scienceandreason.blogspot.com/2007/06/rna-tails-and-gene-expression.html
Human genome has fewer than 25,000 different genes in the genome. This is in a genome of 3.12 billion base pairs. And the human genome is far from the largest. Ordinary corn has 5 billion base pairs and 50,000 genes.
It's estimated that humans use at least 100,000 different proteins, maybe a lot more, so the point is that some genes must be capable of coding for a lot more than just one protein. It's now understood that this is accomplished by the process known as alternative splicing.
Only a few years ago – definitely less than ten years – gene expression was thought to be a fairly simple process. One gene coded for one protein. The gene was "transcribed" from DNA to messenger RNA (mRNA), and in turn the mRNA was used to direct the manufacture of proteins in structures called ribosomes.
But then there were a series of "complications". Genes could be turned "on" or "off" by means of transcription factors, which are separate proteins produced by separate genes, and which are capable of either promoting or suppressing the transcription of other genes. Further, genes are not straight uninterrupted segments of DNA that correspond directly (via mRNA) to proteins, because genes contain segments called introns that are edited out of finished mRNA and ignored. And what is more, coding segments of genes (called exons) can be spliced together in different ways to produced finished mRNA (discussed here). This makes it possible to obtain multiple distinct proteins from a single gene.
And then, outside of the RNA transcription process, it turns out that small bits of RNA, called microRNA (miRNA) and small interfering RNA (siRNA), and which are coded for in parts of the genome long thought to be "junk", can become attached to mRNA and inhibit (or perhaps at times promote) production of proteins from it. (See this.) Nor should we forget to mention ribozymes, which can also mess around with mRNA. And if all that weren't enough, there are also a variety of epigenetic factors which can turn on or off entire segments of a genome.
Is that all? No. There are probably a number of other mechanisms that modify, regulate, and control gene expression – mechanisms as yet undiscovered. After all, there's a lot of "junk" DNA, whose function we still have no clue about – except that a lot of it isn't truly "junk".
MicroRNA (miRNA) is a short (about 21 to 23 nucleotides) single-stranded RNA molecule that is now recognized as playing an important role in gene regulation – even though the term has been in use only since 2001. It is similar to, but distinct from, another type of short RNA, known as small interfering RNA (siRNA).
Although miRNA and siRNA both have gene regulation functions, there are subtle differences. MiRNA may be slightly shorter than siRNA (which has 20 to 25 nucleotides). MiRNA is single-stranded, while siRNA is formed from two complementary strands. The two kinds of RNA are encoded slightly differently in the genome. And the mechanism by which they regulate genes is slightly different.
MiRNA attaches to a piece of messenger RNA (mRNA) – which is the master template for building a protein – in a non-coding part at one end of the molecule. This acts as a signal to prevent translation of the mRNA into a protein. SiRNA, on the other hand, attaches to a coding region of mRNA, and so it physically blocks translation.
In addition to the Wikipedia articles, here's another handy source of information on miRNA.
Exon is the region of a gene that contains the code for producing the gene's protein. Each exon codes for a specific portion of the complete protein. In some species (including humans), a gene's exons are separated by long regions of DNA (called introns or sometimes "junk DNA") that have no apparent function.
Intron is a noncoding sequence of DNA that is initially copied into RNA but is cut out of the final RNA transcript.
Gene expression is the process by which proteins are made from the instructions encoded in DNA.
http://en.wikipedia.org/wiki/Gene_regulatory_network
http://en.wikipedia.org/wiki/Signal_transduction
Pairwise alignment / multiply sequence alignment http://en.wikipedia.org/wiki/Sequence_alignment
Levenstein distance: penalty gap=cost_opening+cost_extension*gap_length
http://www.biostat.wisc.edu/bmi776/syllabus.html
BLOSUM (block amino acid substitution matrix)
PAM (percent accepted mutation)
matrix_value=log(freq_observed/freq_expected)
matrix_value=0 means substitution expected at random
matrix_value<0 means substitution less likely then by chance
matrix_value>0 means substitution more often then by chance
Sequence Alignment and Assembling (local pdf)
sequence assemblers: http://en.wikipedia.org/wiki/Sequence_assembly
http://mummer.sourceforge.net for comparing an entire genome against another
http://www.repeatmasker.org screens DNA sequences for repeats and low complexity sequences
Local alignment /global alignment: In local alignment the alignment of local, high scoring sequences take precedence over the overall alignment
Smith-Waterman - dynamic programming method for local alignment
http://bix.ucsd.edu/bioalgorithms http://www.geneious.com
http://discover.nci.nih.gov/microarrayAnalysis/Affymetrix.Preprocessing.jsp
If you run the same biological sample on two separate microarrays you will get slightly different results.
This is just part of the inherent variation that you have with any laboratory assay.
Normalization is a method that attempts to remove some of this variation.
http://www.rci.rutgers.edu/~cabrera/ST/c5.pdf
1. Multiply each array by a constant to make the mean (median) intensity the same for each array.
2. Adjust the arrays using some control or housekeeping genes that you would expect to have the same intensity level across all of the samples.
3. Match the percentiles of each array.
4. Adjust using a nonlinear smoothing curve.
5. Adjust using control genes
http://ensembl.genome.tugraz.at/ http://www.phrap.org/ http://www.softgenetics.com/
http://code.google.com/p/mosaik-aligner/ http://www.clcbio.com/ http://www.scubeindia.com/SoftGenetics/nextgene.html
http://samtools.sourceforge.net/
http://maq.sourceforge.net/glfProgs.shtml
http://www.politigenomics.com/
http://bioinformatics.bc.edu/marthlab/EagleView http://bioinformatics.oxfordjournals.org/cgi/reprint/btp611v1.pdf
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2527701/ http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000369
http://www.geospiza.com/index.shtml
http://www.valexllc.com/a-compatibility.html
http://seqanswers.com/forums/showthread.php?t=43
http://www.politigenomics.com/next-generation-sequencing-informatics
Notes:
Units: B – bytes, b – bases
PA is primary analysis (includes image feature extraction and base calling)
PA CPU is calculated as the wall clock multiplied by the number of CPU cores
ABI SOLiD data, except rate, are representative of a single slide
ABI SOLiD and Illumina GA IIx primary analysis is done on instrument
454 paired-end reads vary in length depending on location of internal adapter
SRA is the size of the files (SFF, SRF, or FASTQ) that are submitted to the NCBI Short Read Archive