created by KateN
on 2015-11-18
A biallelic site is a specific locus in a genome that contains two observed alleles, counting the reference as one, and therefore allowing for one variant allele. In practical terms, this is what you would call a site where, across multiple samples in a cohort, you have evidence for a single non-reference allele. Shown below is a toy example in which the consensus sequence for samples 1-3 have a deletion at position 7. Sample 4 matches the reference. This is considered a biallelic site because there are only two possible alleles— a deletion, or the reference allele ```G```.
``` 1 2 3 4 5 6 7 8 9
Reference: A T A T A T G C G
Sample 1 : A T A T A T – C G
Sample 2 : A T A T A T – C G
Sample 3 : A T A T A T – C G
Sample 4 : A T A T A T G C G
```
————————
A multiallelic site is a specific locus in a genome that contains three or more observed alleles, again counting the reference as one, and therefore allowing for two or more variant alleles. This is what you would call a site where, across multiple samples in a cohort, you see evidence for two or more non-reference alleles. Show below is a toy example in which the consensus sequences for samples 1-3 have a deletion or a SNP at the 7th position. Sample 4 matches the reference. This is considered a multiallelic site because there are four possible alleles— a deletion, the reference allele ```G```, a ```C``` (SNP), or a ```T``` (SNP). True multiallelic sites are not observed very frequently unless you look at very large cohorts, so they are often taken as a sign of a noisy region where artifacts are likely.
``` 1 2 3 4 5 6 7 8 9
Reference: A T A T A T G C G
Sample 1 : A T A T A T – C G
Sample 2 : A T A T A T C C G
Sample 3 : A T A T A T T C G
Sample 4 : A T A T A T G C G
```
Updated on 2015-12-01
From olavur on 2017-05-01
> True multiallelic sites are not observed very frequently unless you look at very large cohorts, so they are often taken as a sign of a noisy region where artifacts are likely.
I have a few questions on this subject:
From Sheila on 2017-05-04
@olavur
Hi,
Have a look at “A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data”. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3198575/
-Sheila
From genes on 2018-02-18
Hi,
I am doing an undergrad project on the apolipoprotein gene and wondering whether this is gene is triallelic or multi allelic, as it is not really clear in the literature? I know it has the alleles E2, E3, and E4 but how would you find/interpret the other alleles on databases such as the NCBI? or does this database always assume genes to be diallelic? (It’s quite complex to get around for those with little background in genetics).
From Sheila on 2018-02-19
@genes
Hi,
I hope [this dictionary entry](https://software.broadinstitute.org/gatk/documentation/article?id=11033) will help you.
-Sheila
EDIT: Sorry, I just realized I linked you to the same article you posted in. We usually refer to any sites that have more than two alleles present as multiallelic. In your case, having three alleles is triallelic, but we also refer to it as multiallelic. I am not sure about “how would you find/interpret the other alleles on databases such as the NCBI? or does this database always assume genes to be diallelic?” What are you exactly trying to accomplish?