Genomics Tutorial: Contains text, images and videos

This website provides necessary background to understand genomics security and privacy papers. We developed it as a supplementary material to our SoK submission.

Disclaimer: Definitions for the terms are taken from the following places; however, we just put relevant terms here for reader's convenience. We modify some of the definitions to improve accuracy. We take diagrams from various sources to further clarify the terms. Source URL for every image can be seen in the browser (address bar) by clicking on the image. Links to videos are added directly from the Youtube.

Genomics Terminologies


Alternative form of a genetic locus; a single allele for each locus is inherited from each parent.
Same applies to animal traits e.g. eye color.

Genes, Traits, and Alleles

Alleles (Please click to see on Youtube)


See "Genome Annotation" below.


See "Genome Assembly" below.


A nitrogen-containing molecule having the chemical properties of a base. DNA contains the nitrogenous bases adenine (A), guanine (G), cytosine (C), and thymine (T).
Also see "Nucleotide".

Base pair (bp)

Two nitrogenous bases (adenine and thymine or guanine and cytosine) held together by weak bonds. Two strands of DNA are held together in the shape of a double helix by the bonds between base pairs.


The self-replicating genetic structure of cells containing the cellular DNA that bears in its nucleotide sequence the linear array of genes. In prokaryotes, chromosomal DNA is circular, and the entire genome is carried on one chromosome. Eukaryotic genomes consist of a number of chromosomes whose DNA is associated with different kinds of proteins.

What is a Chromosome?

Complex trait

Trait that has a genetic component that does not follow strict Mendelian inheritance. May involve the interaction of two or more genes or gene-environment interactions.


A loss of part of the DNA from a chromosome; can lead to a disease or abnormality.


A full set of genetic material consisting of paired chromosomes, one from each parental set. Most animal cells except the gametes have a diploid set of chromosomes. The diploid human genome has 46 chromosomes.

DNA (deoxyribonucleic acid)

The molecule that encodes genetic information. DNA is a double-stranded molecule held together by weak bonds between pairs of nucleotides. The four nucleotides in DNA contain the bases adenine (A), guanine (G), cytosine (C), and thymine (T). In nature, base pairs form only between A and T and between G and C; thus the base sequence of each single strand can be deduced from that of its partner.

What is DNA?

DNA sequence

The relative order of base pairs, whether in a DNA fragment, gene, chromosome, or an entire genome.

Reference Human Genome

Finished DNA Sequence

High-quality, low-error, gap-free DNA sequence of the human genome. Achieving this ultimate 2003 HGP goal requires additional sequencing to close gaps, reduce ambiguities, and allow for only a single error every 10,000 bases, the agreed-upon standard for HGP finished sequence.

Draft sequence

The sequence generated by the HGP as of June 2000 that, while incomplete, offers a virtual road map to an estimated 95% of all human genes. Draft sequence data are mostly in the form of 10,000 base pair-sized fragments whose approximate chromosomal locations are known.


The fundamental physical and functional unit of heredity. A gene is an ordered sequence of nucleotides located in a particular position on a particular chromosome that encodes a specific functional product (i.e., a protein or RNA molecule).

What is a gene?

Genetics 101 (Part 1 of 5): What are genes?



All the genetic material in the chromosomes of a particular organism; its size is generally given as its total number of base pairs.

What is a Genome?

Genome annotation

Process of identifying elements in the genome and attaching biological information to these elements. Automatic annotation tools perform this process by computer analysis, as opposed to manual annotation (i.e., curation), which involves human expertise. Ideally, these approaches co-exist and complement each other in the same annotation pipeline. 

Genome assembly

Process of taking a large number of short DNA sequences and putting them back together to create a representation of the original chromosomes from which the DNA originated. In a shotgun sequencing project, all the DNA from a source is first fractured into millions of small pieces. These pieces are then "read" by automated sequencing machines, which can read up to 1,000 nucleotides or bases at a time. A genome assembly algorithm works by taking all the pieces and aligning them to one another, and detecting all places where two of the short sequences, or reads, overlap. These overlapping reads can be merged, and the process continues.

Genome sequence

Order of nucleotides or bases within DNA molecules that make up an organism's entire genome. The four bases are adenine, guanine, cytosine, and thymine, represented as A, G, C, and T.

Genome-wide association study (GWAS)

Examination of many common genetic variants in different organisms to see if any variant is statistically associated with a trait. A GWAS is used to identify regions of the genome that may be involved in a disease or another phenotype of interest.

Clinical applications of GWAS data


The genetic constitution of an organism, as distinguished from its physical appearance (its phenotype).


Genotype (Please see the video on Youtube)


A single set of chromosomes (half the full set of genetic material) present in the egg and sperm cells of animals and in the egg and pollen cells of plants. Human beings have 23 chromosomes in their reproductive cells.


A way of denoting the collective genotype of a number of closely linked loci on a chromosome.

High-throughput sequencing

A fast method of determining the order of bases in DNA.

Illumina Sequencing Technology


An organism that has two identical alleles of a genetic variant.

Human Genome Project

An international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up human DNA, and of identifying and mapping the total genes of the human genome from both a physical and functional standpoint.


The addition of one or more nucleotide base pairs into a DNA sequence. Insertions can be anywhere in size from one base pair incorrectly inserted into a DNA sequence to a section of one chromosome inserted into another.

Linkage disequilibrium (LD)

Where alleles occur together more often than can be accounted for by chance. Indicates that the two alleles are physically close on the DNA strand.

Disease Susceptibility - Gene-disease Association Studies

Locus (pl. loci)

The position on a chromosome of a gene or other chromosome marker; also, the DNA at that position. The use of locus is sometimes restricted to mean expressed DNA regions.

Mendelian inheritance

One method in which genetic traits are passed from parents to offspring. Named for Gregor Mendel, who first studied and recognized the existence of genes and this method of inheritance.


Sets of miniaturized chemical reaction areas that also may be used to test DNA fragments, antibodies, or proteins.


Any heritable change in DNA sequence.


A subunit of DNA or RNA consisting of a nitrogenous base (adenine, guanine, thymine, or cytosine in DNA; adenine, guanine, uracil, or cytosine in RNA), a phosphate molecule, and a sugar molecule (deoxyribose in DNA and ribose in RNA). Thousands of nucleotides are linked to form a DNA or RNA molecule.
Also see "Base".

Pharmacogenomics (Personalized Medicine)

The study of the interaction of an individual's genetic makeup and response to a drug.

Genomic Medicine at Work at Mayo Clinic

Mayo Clinic Individualizing Medicine 2012 Conference

Personalized Medicine

Personalized medicine is the tailoring of medical prevention and treatment to the individual characteristics of each patient, including but not limited to genetic information.


The physical characteristics of an organism or the presence of a disease that may or may not be genetic.

Genetics 101 (Part 4of 5): What are Phenotypes?


Difference in DNA sequence among individuals that may underlie differences in health. Genetic variations occurring in more than 1% of a population would be considered useful polymorphisms for genetic linkage analysis.

Restriction enzyme, endonuclease

A protein that recognizes specific, short nucleotide sequences and cuts DNA at those sites. Bacteria contain over 400 such enzymes that recognize and cut more than 100 different DNA sequences.

Restriction fragment length polymorphism (RFLP)

Variation among individuals in the sizes of DNA fragment cut by specific restriction enzymes; polymorphic sequences that result in RFLPs are used as markers on both physical maps and genetic linkage maps. RFLPs usually are caused by mutation at a cutting site.

Restriction Fragment Length Polymorphism


Determination of the order of nucleotides (base sequences) in a DNA or RNA molecule or the order of amino acids in a protein.

Single nucleotide polymorphism (SNP)

DNA sequence variations that occur when a single nucleotide (A, T, C, or G) in the genome sequence is variable in a population.

Genetics 101 (Part 2 of 5): What are SNPs?

Single-gene disorder

Hereditary disorder caused by a mutant allele of a single gene (e.g., Duchenne muscular dystrophy, retinoblastoma, sickle cell disease).


In genetics, a type of mutation caused by replacement of one nucleotide in a DNA sequence with another nucleotide or replacement of one amino acid in a protein with another amino acid.

Tandem repeat sequences

Multiple copies of the same base sequence on a chromosome; used as markers in physical mapping.

X chromosome

One of the two sex chromosomes, X and Y. Humans have 23 chromosome pairs. A typical male has a XY chromosome pair and a typical female has XX.

Y chromosome

One of the two sex chromosomes, X and Y. Humans have 23 chromosome pairs. A typical male has a XY chromosome pair and a typical female has XX.

Genomics 101 Tutorials (PDF) 

Genomics Background (Written by us. Explains some genomic concepts discussed in the paper and does not contain all material of this website)

Other resources