IMPUTE2

Impute2

IMPUTE version 2 (also known as IMPUTE2) is a genotype imputation and haplotype phasing program based on ideas from Howie et al. 2009 [1]. IMPUTE2 is a computer program for phasing observed genotypes and imputing missing genotypes. Most people use just a couple of the program's basic functions, but we have also built up a collection of specialized and powerful options.

If you are new to IMPUTE2, or indeed to phasing and imputation in general, IMPUTE developers provide materials for learning the basics [2].

Important Notes

- Request nodes, processors, and memory adqeuately as required by your jobs

Installed Versions

All the available versions of IMPUTE2 for use can be viewed by issuing the following command. This applies for other applications as well.

module avail impute

output:

---------------------- /usr/local/share/modulefiles -------------------------

impute/2.3.2

The default version is identified by "(default)" behind the module name and can be loaded as:

module load impute

The other versions of Mothur can be loaded as:

module load impute/<version>

Running IMPUTE2

Interactive Job

Request a node (using slurm call)

srun --x11 --nodes=1 -n 1 --mem=4gb --time=1:00:00 --pty /bin/bash

Load the module:

module load impute

Running from Command line

Copy the example directory to your user allocation:

cp -r /usr/local/doc/IMPUTE/2.3.2./Example .

cd Example

The following command may be extracted from the script pre-phasing.slurm, and run on the command line:

impute2 \

-prephase_g \

-m ./example.chr22.map \

-g ./example.chr22.study.gens \

-int 20.4e6 20.5e6 \

-Ne 20000 \

-o ./example.chr22.prephasing.impute2

output:

...last lines of output.

diploid sampling success rate: 0.989

haploid sampling success rate: (no haploid sampling performed)

-generating consensus haplotype estimates (minimizing switch error)

Have a nice day!

Batch Job

There are two script files available in the example to run as batch jobs. These scripts are pre-phasing.slurm and imputation.slurm (shown below). To control those files copied to $PFSDIR in support of the computation, a subdirectory ./base/ is prepared with the specific necessary input files. All files are available by 'cp -r /usr/local/doc/IMPUTE2/'

#!/bin/bash

#SBATCH --time=1:00:00

#SBATCH --nodes=1 --mem=4gb

#SBATCH -n 1

cp -r $SLURM_SUBMIT_DIR/base/* $PFSDIR

cd $PFSDIR

module load impute

# Example code for performing imputation following prephasing

impute2 -use_prephased_g -m ./example.chr22.map -h ./example.chr22.1kG.haps \

-l ./example.chr22.1kG.legend -known_haps_g ./example.chr22.prephasing.impute2_haps \

-strand_g ./example.chr22.study.strand -int 20.4e6 20.5e6 -Ne 20000 \

-o ./example.chr22.one.phased.impute2 -phase

# copy results back from temporary to 'home' directory or other permanent storage.

cp -ru * $SLURM_SUBMIT_DIR

Submit the job:

sbatch imputation.slurm

Find the partial output in a file slurm-<jobid>.out and other output files in your working directory.

IMPUTE2 with SHAPEIT

The tutorial on Pre-phasing imputation using SHAPEIT and IMPUTE2 is available at HPC SHAPEIT Guide.

Troubleshooting

Impute is memory intensive job. So, you can encounter "out of Memory"

The Impute website has the following suggestions:

Splitting a chromosome into smaller chunks is often a good computational strategy anyway, since it allows the chunks to be imputed separately on multiple computer processors. This decreases the effective computing time and limits the amount of RAM needed for each run.

Assign more memory requesting by more processors following the instructions at HPC High Memory Job. Your job may remain in the queue for longer period of time. Please monitor the memory usage using top command.

References:

[1] B. N. Howie, P. Donnelly, and J. Marchini (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genetics 5(6): e1000529 [Open Access Article] [Supplementary Material]

[2] Impute2 Home

[3] Example URL