FAQs Imputation

  • What do different flavours of best guess genotype (*bgn,*bg,*bgs) refer to? What are the different filtering criteria applied to different best guess genotypes from Ricopili Imputation? Which best guess genotype should I use?

      • The different flavours of best guess genotype varies in filtering criteria.

      • These files can be found in the "cobg_dir_genome_wide/" sub-directory of your imputation directory.

  • Does anyone know if I need to manually do a liftover if my SNPs are not in HG19?

    • there is no need to liftover any datasets. the pipeline is doing this automatically. first the build is guessed (down to hg16). and then lifted over to hg19 if needed. have a look at the subdir "pi_sub". the dataset with *hg19.bed/bimfam" is the file on hg19 (will be the exact same if your starting dataset is already mapped to hg19). for more details have a look at the "*noma" files, they show the SNPs that are not mapped to the specific build (so the one with the lowest number of SNPs is the one the original dataset is mapped to).

  • I’d like to perform imputations in only a few regions rather than genome-wide.

    • You can specify chunks not to be used in imputation. Save all the chunks that do not fall within your range(s) in a file, one at each row (format below) and start imputation with: --refiex EXFILE

    • Here is an example EXFILE:

    • chr17_006_009

    • chr17_018_021

    • This format is based on the reference_info file, which splits the entire reference into chunks to be used for imputation. Each line of this file specifies a genomic chunk starting with chromosome number, followed by the basepair position borders of the genomic region that it spans (in megabases). This way you can find the correspondence of SNPs/regions to chunks and keep those of interest.

    • You can easily generate a template containing all chunks for using with --refiex with any reference panel by using the --reference_info flag with the impute_dirsub command:

    • impute_dirsub --refdir <your_reference_directory> --out <out_dir_name> --reference_info

    • You can also add the --chunk INT flag which creates chunks INT times than the original (possible values: 1,2,5,10,20).For example, on the Broad cluster the following command will produce a refiex_templ file with 969 chunks (i.e. the smallest) with HRC reference:

    • impute_dirsub --refdir /psych/ripke/imputation_references/HRC_EGAZ00001239289_2016a/chr1_22c/ --reference_info --out errandout --chunk 1You can put this file into your imputation directory and then grep inverse (grep -v) the chunks that interest you into your EXFILE.

    • (If you need to re-run, first remove the reference_info file).

  • Can I use --serial with imputation module?

    • It's only recommended if you have multiple cores available, then please use in combination with --sepa .

  • Is it ok to update sex after imputation is this information really used for imputation?

    • Sex information is used when chrX data is there (it will exclude sex - mismatches). Have a look at the report. If there is no chrX data to start with, then there are only warnings during QC

  • How can I use external imputation server (Michigan and Sanger) in combination with Ricopili.

    • This works only with versions later than jan8th 2019. Please follow this document here.