Choosing the references for MuTect2

Post date: Nov 20, 2019 4:2:25 PM

1 - Panel of normals

Using the coverage information per sample per location from the global aspen dataset, located at:

/uufs/chpc.utah.edu/common/home/u6000989/data/aspen/gbs_pando_plus/Alignments_mem/coverageInformation.txt

Utah neighboring states from Utah states are: Idaho, Wyoming, Colorado, Nevada, symbolized by the letters ID, WY, CO and NV

First, I create a new file containing the neighboring states only.

I plot the coverage distribution: median is 1 000 000. I keep the samples with coverage greater than 500 000 and less than 1 500 000.

I then randomly pick one hundred trees from this selection.

The file "panelOfNormals.txt" is in the path /uufs/chpc.utah.edu/common/home/gompert-group1/data/aspen/gbs_pando_plus/Variants_mem_bcftools/rozenn

2 - Individual references: Pando and non-Pando

Mapping individuals in space, and coloring according to the PC1 score reveals all individuals with PC1 positive score cluster together and form the Pando clone, as compared to previous analyses (Mock 2008).

I run k-means clustering on PC1 score to separate Pando from non-Pando trees. (here - use only PC1 or every PCs to do the clustering? compare here)

Once I identify Pando from non-Pando trees, I plot the read depth spatially and choose three individuals within Pando and three individuals out of Pando.

I chose the following ones (spatially separated, not on the periphery and with good coverage):

Pando: 191-S 179-B 131-S

non-Pando: 117-B 645-S 013-S

Scripts used: makeClusters.R and mapRdp.R

Page updated

Google Sites

Report abuse