Generate (beautiful) Manhattan and QQ plots with ggplot2 and ggrastr

This photo was taken at 5:09 AM on May 27, 2017, during a voyage from Kagoshima to Kuchinoshima.

Since the Manhattan plot draws millions (sometimes tens of millions) of points, it is not wise to output the plots in vector format. To generate and save Manhattan plots, we usually output them in PNG format, basically using an R package 'qqman.' But it's not beautiful when enlarged, and more importantly, it's obviously inferior when lined up with other raster format graphics.

Today I usually use ggplot2 and ggrastr to generate a better Manhattan plot. Using ggrastr, we can rasterize plots but keep lines, text, and labels in vector format (link).

Launch R.

Load packages.

library(data.table)

library(ggplot2)

library(ggrastr)

library(patchwork)

Load EWAS result (consist of chr, pos, and p-value).

df=as.data.frame(fread("EWAS.res",header=T),stringsAsFactors=F)

head(df,4)

# chr pos pval

# 1 12345 0.9882

# 1 12349 0.6454

# 1 12368 0.8322

# 1 12552 0.2339

df$chr=factor(df$chr,levels=as.character(1:22))

# [if you make and use a dummy dataset]

chr=rep(1:22,1000)

df=data.frame(chr=chr[sort(chr)],pos=rep(1:1000,22),pval=runif(max=1,min=0,n=22*1000),stringsAsFactors=F)

Calculate cummulative value of position. This will be used as the position on X-axis on Manhattan plot.

df$pos2=df$pos

for(chr in 2:22){

tmp=chr-1

tmp2=max(as.numeric(df[df$chr==tmp,"pos2"]),na.rm=T)

df[df$chr==chr,"pos2"]=df[df$chr==chr,"pos"]+tmp2

}

Calculate center positions of each chromosome on X-axis. This will be used to set the positions of chromosome numbers.

df2=data.frame(chr=1:22,center=NA,stringsAsFactors=F)

for(chr in 1:22){

start=min(df[df$chr==chr,"pos2"],na.rm=T)

end=max(df[df$chr==chr,"pos2"],na.rm=T)

center=mean(c(start,end))

df2[df2$chr==chr,"center"]=center

}

Make a repeating pattern of two colors. My favorite is:

colour1="#2e4057"

colour2="#8c8c8c"

Get Bonferroni-corrected threshold.

bonf=0.05/nrow(df)

Ready to plot.

Manhattan plot:

g1=ggplot(df,aes(x=pos2,y=-log10(pval),colour=as.character(chr)))+

geom_hline(yintercept=-log10(bonf),linetype="dashed")+

geom_point_rast(size=0.5)+

scale_colour_manual(values=c("1"=colour1,"2"=colour2,"3"=colour1,"4"=colour2,"5"=colour1,"6"=colour2,"7"=colour1,"8"=colour2,"9"=colour1,"10"=colour2,"11"=colour1,"12"=colour2,"13"=colour1,"14"=colour2,"15"=colour1,"16"=colour2,"17"=colour1,"18"=colour2,"19"=colour1,"20"=colour2,"21"=colour1,"22"=colour2))+ # not cool though

theme_classic()+theme(legend.position="NONE")+

scale_x_continuous(breaks=df2$center,labels=c(1:15,"",17,"",19,"",21,""))+

xlab("Chromosome")+

ylab(expression(paste(-log[10],"(",italic(P), " value)")))+

geom_hline(yintercept=-log10(bonf),linetype="dashed")

QQ plot:

df_nrow=nrow(df)

exp.pval=(1:df_nrow-0.5)/df_nrow

exp.pval.log=as.data.frame(-log10(exp.pval))

var.pval=df$pval[order(df$pval)]

var.pval.log=as.data.frame(-log10(var.pval))

N=df_nrow

cupper=-log10(qbeta(0.95,1:N,N-1:N+1))

clower=-log10(qbeta(1-0.95,1:N,N-1:N+1))

df2=cbind(exp.pval.log,var.pval.log,cupper,clower)

colnames(df2)=c("expected","var","cup","clow")

g2=ggplot(df2)+geom_point_rast(aes(x=expected,y=var),colour="black",size=0.5)+geom_abline(slope=1, intercept=0,colour="grey")+geom_line(aes(expected,cup),linetype=2)+geom_line(aes(expected,clow),linetype=2)+theme_bw()+theme(panel.grid.major=element_blank(),panel.grid.minor=element_blank())+xlab(expression(paste(-log[10],"(expected ", italic(P), " value)")))+ylab(expression(paste(-log[10],"(observed ", italic(P), " value)")))+theme_classic()

g=g1+g2+plot_layout(widths=c(2,1))+plot_annotation(tag_levels='a')+theme(plot.tag=element_text(size=18))

ggsave("man+qq.pdf",g,width=9,height=3)

Then, plots will be generated.

Points are generated in raster format while lines and texts are in vector format.

Page updated

Google Sites

Report abuse