IMPUTE2
Impute2
IMPUTE version 2 (also known as IMPUTE2) is a genotype imputation and haplotype phasing program based on ideas from Howie et al. 2009 [1]. IMPUTE2 is a computer program for phasing observed genotypes and imputing missing genotypes. Most people use just a couple of the program's basic functions, but we have also built up a collection of specialized and powerful options.
If you are new to IMPUTE2, or indeed to phasing and imputation in general, IMPUTE developers provide materials for learning the basics [2].
Important Notes
Request nodes, processors, and memory adqeuately as required by your jobs
Installed Versions
All the available versions of IMPUTE2 for use can be viewed by issuing the following command. This applies for other applications as well. At the time of writing, version 2.3.2 is installed on hpctest, and version 2.3.0 is installed on hpclogin.
module avail impute
output:
---------------------- /usr/local/share/modulefiles -------------------------
impute/2.3.2
The default version is identified by "(default)" behind the module name and can be loaded as:
module load impute
The other versions of Mothur can be loaded as:
module load impute/<version>
Running IMPUTE2
Interactive Job
Request a node (using slurm call)
srun --x11 --nodes=1 -n 1 --mem=4gb --time=1:00:00 --pty /bin/bash
Load the module:
module load impute
Running from Command line
Copy the example directory to your user allocation:
cp -r /usr/local/doc/IMPUTE/2.3.2./Example .
cd Example
The following command may be extracted from the script pre-phasing.slurm, and run on the command line:
impute2 \
-prephase_g \
-m ./example.chr22.map \
-g ./example.chr22.study.gens \
-int 20.4e6 20.5e6 \
-Ne 20000 \
-o ./example.chr22.prephasing.impute2
output:
...last lines of output.
diploid sampling success rate: 0.989
haploid sampling success rate: (no haploid sampling performed)
-generating consensus haplotype estimates (minimizing switch error)
Have a nice day!
Batch Job
There are two script files available in the example to run as batch jobs. These scripts are pre-phasing.slurm and imputation.slurm (shown below). To control those files copied to $PFSDIR in support of the computation, a subdirectory ./base/ is prepared with the specific necessary input files. All files are available by 'cp -r /usr/local/doc/IMPUTE2/'
#!/bin/bash
#SBATCH --time=1:00:00
#SBATCH --nodes=1 --mem=4gb
#SBATCH -n 1
cp -r $SLURM_SUBMIT_DIR/base/* $PFSDIR
cd $PFSDIR
module load impute
# Example code for performing imputation following prephasing
impute2 -use_prephased_g -m ./example.chr22.map -h ./example.chr22.1kG.haps \
-l ./example.chr22.1kG.legend -known_haps_g ./example.chr22.prephasing.impute2_haps \
-strand_g ./example.chr22.study.strand -int 20.4e6 20.5e6 -Ne 20000 \
-o ./example.chr22.one.phased.impute2 -phase
# copy results back from temporary to 'home' directory or other permanent storage.
cp -ru * $SLURM_SUBMIT_DIR
Submit the job:
sbatch imputation.slurm
Find the partial output in a file slurm-<jobid>.out and other output files in your working directory.
IMPUTE2 with SHAPEIT
The tutorial on Pre-phasing imputation using SHAPEIT and IMPUTE2 is available at HPC SHAPEIT Guide.
Troubleshooting
Impute is memory intensive job. So, you can encounter "out of Memory"
The Impute website has the following suggestions:
Splitting a chromosome into smaller chunks is often a good computational strategy anyway, since it allows the chunks to be imputed separately on multiple computer processors. This decreases the effective computing time and limits the amount of RAM needed for each run.
Assign more memory requesting by more processors following the instructions at HPC High Memory Job. Your job may remain in the queue for longer period of time. Please monitor the memory usage using top command.
References:
[1] B. N. Howie, P. Donnelly, and J. Marchini (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genetics 5(6): e1000529 [Open Access Article] [Supplementary Material]
[2] Impute2 Home
[3] Example URL