Can I run this module (postimp_navi) on GWAS summary statistics from a different source?
Yes, we can run this module on summary statistics from different sources by keeping the format same as Ricopili output (e.g. same order of columns).
Can I replicate PRS generated by danscore (Ricopili) with plink?
Yes, we can replicate but one thing to be considered for danscore you give the pipeline OR as the effect (the pipeline will convert into log(OR) internally) for plink --score you give effect as log(OR).
Can I use ricopili pipeline for generating plots from a summary statistics from different source?/How to start Ricoplili postimputation module from halfway?
Sumstats must have the same format as ricopili output (e.g. same order of columns).
You have to use the --results. Provide --results with a text file containing a list of filenames of all the sumstats or you can also just give one file.
This will also generate plots without running the whole module.
Why do I get NA values in the output of the association analysis?
Genome-wide NAs: this will most probably be a case of either perfect correlation between your phenotype and one of the covariates used (e.g. in logistic regression this also means that 1st PC from population structure PCA can perfectly explain your phenotype if all cases have positive values for PC1 and all cases have negative values); or a monomorphic phenotype after subsetting to a group of individuals.
Occasional NAs: this most probably results from MAF filtering on SNPs. The default filter used is 0.001, so markers with MAF below this value will essentially be ignored (not analyzed) and all association values will be set to NA. (A work-around of this default filtering will be introduced soon.
What does it mean if area_plot_16_speed throws this error message: missing: ..../hapmap_ref//debakker/genetic_map_chr17.txt
The reference has probably been moved after it’s creation. please copy the files genetic_map_chr*.txt from refdir into the subdirs pop_*
I noticed that the LDSC output uses a population prevalence of 1%. How do I get LDSC output for a different population prevalence?
Go to the directory where you ran the postimp_navi module and navigate into the ./report_* directory. Run the following command with the desired population prevalence (i.e., 10%) and enter the daner file name, which will be daner_*gz and named with the label on the postimp_navi --out flag (e.g., MDD): my.ldsc2 daner_MDD.gz --pop_prev 0.1
Do I need special settings for using the Postimputation module to run a Quantitative trait analysis?
Please make sure that your phenotype values contain a period, e.g. for integers please add '.0'. If there are '1's and '2's the Postimputation module will perform a case-control analysis rather than a quantitative trait analysis. This guidance applies to the Postimputation module run on a single cohort (i.e., postimp_navi --out cohort --mds PCcovs --coco 1,2,3,4,5 --pheno pheno_decimal.txt --idin cohort_keep --nohet --noldsc) . For meta-analysis (i.e., postimp_navi --out metaanalysis --result list_of_gzipped_daner_inputfiles.txt --noldsc --nolahunt) there is no phenotype file supplied, and the meta-analysis will run as though it is a case-control analysis. However, the OR column in the daner output file can be converted to Beta, which is the correct result for quantitative trait analysis (according to our cross-check with Plink 1.9, i.e., plink --meta-analysis daner_cohort1 daner_cohort2 daner_cohort3 + qt --out metaanalysis). The SE column in the outputted meta-analysis daner file is SE of log(OR) aka SE of Beta.