Post date: Aug 07, 2015 8:37:3 PM
I finished summarizing the new data, mainly using old scripts (just see notes from before). I should have only the new files and figures. I have also re-summarized the human data (note that we always count ancestry from population 0, and thus this is Han ancestry). I calculated overall and chromosome specific ancestry and summarized the proportion of SNPs with more or less (based on 95% CIs) Han ancestry than the chromosome or genome average. I then looked at the SNPs (which were clustered in a few regions) with the greatest and least Han ancestry (0.05th and 99.95th quantiles, 118 SNPs). Here is the R code and some results (includes annotation information from the USCS human genome browser.
qch1<-read.table("popQ_estpanc_uygur15snpChrom1",header=FALSE,sep=",")
qch2<-read.table("popQ_estpanc_uygur15snpChrom2",header=FALSE,sep=",")
qch3<-read.table("popQ_estpanc_uygur15snpChrom3",header=FALSE,sep=",")
qch4<-read.table("popQ_estpanc_uygur15snpChrom4",header=FALSE,sep=",")
qch5<-read.table("popQ_estpanc_uygur15snpChrom5",header=FALSE,sep=",")
qch6<-read.table("popQ_estpanc_uygur15snpChrom6",header=FALSE,sep=",")
qch7<-read.table("popQ_estpanc_uygur15snpChrom7",header=FALSE,sep=",")
qch8<-read.table("popQ_estpanc_uygur15snpChrom8",header=FALSE,sep=",")
qch9<-read.table("popQ_estpanc_uygur15snpChrom9",header=FALSE,sep=",")
qch10<-read.table("popQ_estpanc_uygur15snpChrom10",header=FALSE,sep=",")
qch11<-read.table("popQ_estpanc_uygur15snpChrom11",header=FALSE,sep=",")
qch12<-read.table("popQ_estpanc_uygur15snpChrom12",header=FALSE,sep=",")
qch13<-read.table("popQ_estpanc_uygur15snpChrom13",header=FALSE,sep=",")
qch14<-read.table("popQ_estpanc_uygur15snpChrom14",header=FALSE,sep=",")
qch15<-read.table("popQ_estpanc_uygur15snpChrom15",header=FALSE,sep=",")
qch16<-read.table("popQ_estpanc_uygur15snpChrom16",header=FALSE,sep=",")
qch17<-read.table("popQ_estpanc_uygur15snpChrom17",header=FALSE,sep=",")
qch18<-read.table("popQ_estpanc_uygur15snpChrom18",header=FALSE,sep=",")
qch19<-read.table("popQ_estpanc_uygur15snpChrom19",header=FALSE,sep=",")
qch20<-read.table("popQ_estpanc_uygur15snpChrom20",header=FALSE,sep=",")
qch21<-read.table("popQ_estpanc_uygur15snpChrom21",header=FALSE,sep=",")
qch22<-read.table("popQ_estpanc_uygur15snpChrom22",header=FALSE,sep=",")
q<-list(qch1,qch2,qch3,qch4,qch5,qch6,qch7,qch8,qch9,qch10,qch11,qch12,qch13,qch14,qch15,qch16,qch17,qch18,qch19,qch20,qch21,qch22)
chsum<-matrix(NA,nrow=22,ncol=2)
for(i in 1:22){
mn<-mean(q[[i]][,1])
dif<-mean(q[[i]][,3] > mn | q[[i]][,4] < mn)
chsum[i,1]<-mn
chsum[i,2]<-dif
}
## unwrap it
mn<-NULL
lb<-NULL
ub<-NULL
for(i in 1:22){
mn<-c(mn,q[[i]][,1])
lb<-c(lb,q[[i]][,3])
ub<-c(ub,q[[i]][,4])
}
gmn<-mean(mn)
#[1] 0.5286457
mean(lb > gmn | ub < gmn)
# [1] 0.5511119
## examine most interesting/extreme regions
chrm<-read.table("ChromPosAll.txt")
qs<-quantile(mn,probs=c(0.0005,0.9995))
# 0.05% 99.95%
#0.09020496 0.92730454
x<-which(mn < qs[1])
y<-which(mn > qs[2])
mean anc. = 0.529 = proportion Han, prop. different then mean = 55.1%
admixtools found (western Europe = French) of 0.475-0.548 Han
by LG, mean ancestry and prop different than mean
mean propdif
[1,] 0.5468160 0.5796532
[2,] 0.5615941 0.5704260
[3,] 0.4996456 0.5519860
[4,] 0.5600322 0.5050435
[5,] 0.4816287 0.5421950
[6,] 0.4881419 0.5259300
[7,] 0.5338468 0.5792225
[8,] 0.5273559 0.5171486
[9,] 0.5517027 0.5802171
[10,] 0.4960120 0.5680428
[11,] 0.5095252 0.5035152
[12,] 0.5106656 0.5561389
[13,] 0.5582172 0.5101141
[14,] 0.5408113 0.5560561
[15,] 0.5701374 0.4744753
[16,] 0.5746712 0.5461558
[17,] 0.5590689 0.5051450
[18,] 0.4858594 0.4990286
[19,] 0.5497742 0.4502618
[20,] 0.5348564 0.4878353
[21,] 0.5109929 0.4348076
[22,] 0.5042182 0.5752655
annotations from http://genome.ucsc.edu/, only USCS Genes (RefSeq,
GEnBank, CCDS, Rfam, tRNAs and Comparative Genomics)
below the 0.05th or above 99.95th quantiles = 118 SNPs
high = 3 regions, 59 SNPs
chr10:72175772-72303464, 13 SNPs
Multiple genes: Homo sapiens eukaryotic translation initiation
factor 4E binding protein 2 (EIF4EBP2), mRNA; Homo sapiens
phosphatase domain containing, paladin 1 (PALD1), mRNA; Homo
sapiens nodal growth differentiation factor (NODAL), mRNA.
chr2:155507550-157032398, 43 SNPs
Homo sapiens potassium inwardly-rectifying channel, subfamily J,
member 3 (KCNJ3), transcript variant 3, mRNA. RefSeq Summary
(NM_001260509): Potassium channels are present in most mammalian
cells, where they participate in a wide range of physiologic
responses. The protein encoded by this gene is an integral membrane
protein and inward-rectifier type potassium channel. The encoded
protein, which has a greater tendency to allow potassium to flow into
a cell rather than out of a cell, is controlled by G-proteins and
plays an important role in regulating heartbeat. It associates with
three other G-protein-activated potassium channels to form a
heteromultimeric pore-forming complex that also couples to
neurotransmitter receptors in the brain and whereby channel
activation can inhibit action potential firing by hyperpolarizing the
plasma membrane. These multimeric G-protein-gated inwardly-rectifying
potassium (GIRK) channels may play a role in the pathophysiology of
epilepsy, addiction, Down's syndrome, ataxia, and Parkinson's
disease. Alternative splicing results in multiple transcript variants
encoding distinct proteins
chr2:182130703-182175151, 3 SNPs
AK125001 - Homo sapiens cDNA FLJ43011 fis, clone BRTHA2015853.
low = 4 regions, 59 SNPs
chr2:127589505-127875189, 17 SNPs
Homo sapiens bridging integrator 1 (BIN1), transcript variant 3, mRNA.
Genetic Association Database: BIN1 CDC HuGE Published Literature: BIN1
Positive Disease Associations: Alzheimer Disease , Cholesterol, LDL ,
Insulin Resistance , Potassium Related Studies:
Alzheimer Disease
Xiaolan Hu et al. PloS one 2011, Meta-analysis for genome-wide association study identifies multiple variants at the BIN1 locus associated with late-onset Alzheimer's disease., PloS one. [PubMed 21390209]
Alzheimer Disease
Xiaolan Hu et al. PloS one 2011, Meta-analysis for genome-wide association study identifies multiple variants at the BIN1 locus associated with late-onset Alzheimer's disease., PloS one. [PubMed 21390209]
Alzheimer Disease
Paul Hollingworth et al. Nature genetics 2011, Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer's disease., Nature genetics. [PubMed 21460840]
chr11:89858790-89914934, 2 SNPs
Homo sapiens N-acetylated alpha-linked acidic dipeptidase 2
(NAALAD2), mRNA.
chr11:26727812-27268044, 24 SNPs
Multiple genes: Homo sapiens solute carrier family 5 (sodium/glucose
cotransporter), member 12 (SLC5A12), mRNA; SLC5A12 -> insulin
resisteance?; Homo sapiens butyrobetaine (gamma), 2-oxoglutarate
dioxygenase (gamma-butyrobetaine hydroxylase) 1 (BBOX1), mRNA; BBOX1
Positive Disease Associations: Cholesterol, LDL , Iron , Stroke;
Homo sapiens fin bud initiation factor homolog (zebrafish) (FIBIN),
mRNA
chr11:28372927-28656999, 16 SNPs
none