popanc summaries complete

Post date: Aug 07, 2015 8:37:3 PM

I finished summarizing the new data, mainly using old scripts (just see notes from before). I should have only the new files and figures. I have also re-summarized the human data (note that we always count ancestry from population 0, and thus this is Han ancestry). I calculated overall and chromosome specific ancestry and summarized the proportion of SNPs with more or less (based on 95% CIs) Han ancestry than the chromosome or genome average. I then looked at the SNPs (which were clustered in a few regions) with the greatest and least Han ancestry (0.05th and 99.95th quantiles, 118 SNPs). Here is the R code and some results (includes annotation information from the USCS human genome browser.

qch1<-read.table("popQ_estpanc_uygur15snpChrom1",header=FALSE,sep=",")

qch2<-read.table("popQ_estpanc_uygur15snpChrom2",header=FALSE,sep=",")

qch3<-read.table("popQ_estpanc_uygur15snpChrom3",header=FALSE,sep=",")

qch4<-read.table("popQ_estpanc_uygur15snpChrom4",header=FALSE,sep=",")

qch5<-read.table("popQ_estpanc_uygur15snpChrom5",header=FALSE,sep=",")

qch6<-read.table("popQ_estpanc_uygur15snpChrom6",header=FALSE,sep=",")

qch7<-read.table("popQ_estpanc_uygur15snpChrom7",header=FALSE,sep=",")

qch8<-read.table("popQ_estpanc_uygur15snpChrom8",header=FALSE,sep=",")

qch9<-read.table("popQ_estpanc_uygur15snpChrom9",header=FALSE,sep=",")

qch10<-read.table("popQ_estpanc_uygur15snpChrom10",header=FALSE,sep=",")

qch11<-read.table("popQ_estpanc_uygur15snpChrom11",header=FALSE,sep=",")

qch12<-read.table("popQ_estpanc_uygur15snpChrom12",header=FALSE,sep=",")

qch13<-read.table("popQ_estpanc_uygur15snpChrom13",header=FALSE,sep=",")

qch14<-read.table("popQ_estpanc_uygur15snpChrom14",header=FALSE,sep=",")

qch15<-read.table("popQ_estpanc_uygur15snpChrom15",header=FALSE,sep=",")

qch16<-read.table("popQ_estpanc_uygur15snpChrom16",header=FALSE,sep=",")

qch17<-read.table("popQ_estpanc_uygur15snpChrom17",header=FALSE,sep=",")

qch18<-read.table("popQ_estpanc_uygur15snpChrom18",header=FALSE,sep=",")

qch19<-read.table("popQ_estpanc_uygur15snpChrom19",header=FALSE,sep=",")

qch20<-read.table("popQ_estpanc_uygur15snpChrom20",header=FALSE,sep=",")

qch21<-read.table("popQ_estpanc_uygur15snpChrom21",header=FALSE,sep=",")

qch22<-read.table("popQ_estpanc_uygur15snpChrom22",header=FALSE,sep=",")

q<-list(qch1,qch2,qch3,qch4,qch5,qch6,qch7,qch8,qch9,qch10,qch11,qch12,qch13,qch14,qch15,qch16,qch17,qch18,qch19,qch20,qch21,qch22)

chsum<-matrix(NA,nrow=22,ncol=2)

for(i in 1:22){

mn<-mean(q[[i]][,1])

dif<-mean(q[[i]][,3] > mn | q[[i]][,4] < mn)

chsum[i,1]<-mn

chsum[i,2]<-dif

}

## unwrap it

mn<-NULL

lb<-NULL

ub<-NULL

for(i in 1:22){

mn<-c(mn,q[[i]][,1])

lb<-c(lb,q[[i]][,3])

ub<-c(ub,q[[i]][,4])

}

gmn<-mean(mn)

#[1] 0.5286457

mean(lb > gmn | ub < gmn)

# [1] 0.5511119

## examine most interesting/extreme regions

chrm<-read.table("ChromPosAll.txt")

qs<-quantile(mn,probs=c(0.0005,0.9995))

# 0.05% 99.95%

#0.09020496 0.92730454

x<-which(mn < qs[1])

y<-which(mn > qs[2])

mean anc. = 0.529 = proportion Han, prop. different then mean = 55.1%

admixtools found (western Europe = French) of 0.475-0.548 Han

by LG, mean ancestry and prop different than mean

mean propdif

[1,] 0.5468160 0.5796532

[2,] 0.5615941 0.5704260

[3,] 0.4996456 0.5519860

[4,] 0.5600322 0.5050435

[5,] 0.4816287 0.5421950

[6,] 0.4881419 0.5259300

[7,] 0.5338468 0.5792225

[8,] 0.5273559 0.5171486

[9,] 0.5517027 0.5802171

[10,] 0.4960120 0.5680428

[11,] 0.5095252 0.5035152

[12,] 0.5106656 0.5561389

[13,] 0.5582172 0.5101141

[14,] 0.5408113 0.5560561

[15,] 0.5701374 0.4744753

[16,] 0.5746712 0.5461558

[17,] 0.5590689 0.5051450

[18,] 0.4858594 0.4990286

[19,] 0.5497742 0.4502618

[20,] 0.5348564 0.4878353

[21,] 0.5109929 0.4348076

[22,] 0.5042182 0.5752655

annotations from http://genome.ucsc.edu/, only USCS Genes (RefSeq,

GEnBank, CCDS, Rfam, tRNAs and Comparative Genomics)

below the 0.05th or above 99.95th quantiles = 118 SNPs

high = 3 regions, 59 SNPs

chr10:72175772-72303464, 13 SNPs

Multiple genes: Homo sapiens eukaryotic translation initiation

factor 4E binding protein 2 (EIF4EBP2), mRNA; Homo sapiens

phosphatase domain containing, paladin 1 (PALD1), mRNA; Homo

sapiens nodal growth differentiation factor (NODAL), mRNA.

chr2:155507550-157032398, 43 SNPs

Homo sapiens potassium inwardly-rectifying channel, subfamily J,

member 3 (KCNJ3), transcript variant 3, mRNA. RefSeq Summary

(NM_001260509): Potassium channels are present in most mammalian

cells, where they participate in a wide range of physiologic

responses. The protein encoded by this gene is an integral membrane

protein and inward-rectifier type potassium channel. The encoded

protein, which has a greater tendency to allow potassium to flow into

a cell rather than out of a cell, is controlled by G-proteins and

plays an important role in regulating heartbeat. It associates with

three other G-protein-activated potassium channels to form a

heteromultimeric pore-forming complex that also couples to

neurotransmitter receptors in the brain and whereby channel

activation can inhibit action potential firing by hyperpolarizing the

plasma membrane. These multimeric G-protein-gated inwardly-rectifying

potassium (GIRK) channels may play a role in the pathophysiology of

epilepsy, addiction, Down's syndrome, ataxia, and Parkinson's

disease. Alternative splicing results in multiple transcript variants

encoding distinct proteins

chr2:182130703-182175151, 3 SNPs

AK125001 - Homo sapiens cDNA FLJ43011 fis, clone BRTHA2015853.

low = 4 regions, 59 SNPs

chr2:127589505-127875189, 17 SNPs

Homo sapiens bridging integrator 1 (BIN1), transcript variant 3, mRNA.

Genetic Association Database: BIN1 CDC HuGE Published Literature: BIN1

Positive Disease Associations: Alzheimer Disease , Cholesterol, LDL ,

Insulin Resistance , Potassium Related Studies:

Alzheimer Disease

Xiaolan Hu et al. PloS one 2011, Meta-analysis for genome-wide association study identifies multiple variants at the BIN1 locus associated with late-onset Alzheimer's disease., PloS one. [PubMed 21390209]

Alzheimer Disease

Paul Hollingworth et al. Nature genetics 2011, Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer's disease., Nature genetics. [PubMed 21460840]

chr11:89858790-89914934, 2 SNPs

Homo sapiens N-acetylated alpha-linked acidic dipeptidase 2

(NAALAD2), mRNA.

chr11:26727812-27268044, 24 SNPs

Multiple genes: Homo sapiens solute carrier family 5 (sodium/glucose

cotransporter), member 12 (SLC5A12), mRNA; SLC5A12 -> insulin

resisteance?; Homo sapiens butyrobetaine (gamma), 2-oxoglutarate

dioxygenase (gamma-butyrobetaine hydroxylase) 1 (BBOX1), mRNA; BBOX1

Positive Disease Associations: Cholesterol, LDL , Iron , Stroke;

Homo sapiens fin bud initiation factor homolog (zebrafish) (FIBIN),

mRNA

chr11:28372927-28656999, 16 SNPs

none