created by Geraldine_VdAuwera
on 2016-05-27
A pedigree is a structured description of the familial relationships between samples.
Some GATK tools are capable of incorporating pedigree information in the analysis they perform if provided in the form of a PED file through the --pedigree
(or -ped
) argument.
PED files are tabular text files describing meta-data about the samples. See http://www.broadinstitute.org/mpg/tagger/faq.html and http://zzz.bwh.harvard.edu/plink/data.shtml#ped for more information.
The PED file is a white-space (space or tab) delimited file: the first six columns are mandatory:
The IDs are alphanumeric: the combination of family and individual ID should uniquely identify a person. If an individual's sex is unknown, then any character other than 1 or 2 can be used in the fifth column.
A PED file must have 1 and only 1 phenotype in the sixth column. The phenotype can be either a quantitative trait or an "affected status" column: GATK will automatically detect which type (i.e. based on whether a value other than 0, 1, 2 or the missing genotype code is observed).
Affected status should be coded as follows:
If any value outside of -9,0,1,2 is detected, then the samples are assumed to have phenotype values, interpreted as string phenotype values.
Note that genotypes (column 7 onwards) cannot be specified to the GATK.
You can add a comment to a PED or MAP file by starting the line with a # character. The rest of that line will be ignored, so make sure none of the IDs start with this character.
Each -ped argument can be tagged with NOFAMILYID, NOPARENTS, NOSEX, NO_PHENOTYPE to tell the GATK PED parser that the corresponding fields are missing from the ped file.
Example
Here are two individuals (one row = one person):
FAM001 1 0 0 1 2 FAM001 2 0 0 1 2
Updated on 2017-09-17
From egarmo on 2017-09-10
second link , at harvard.edu is broken.
From egarmo on 2017-09-10
http://felixfan.github.io/kinship/ shows a fairly ‘humane’ way to deal and plot .ped files. Provided R is in use.
From egarmo on 2017-09-10
Which GATK tools actually get .ped files as input?
From Sheila on 2017-09-17
@egarmo
Hi,
I fixed the link. Thanks for pointing it out. As for GATK tools that use ped files, you will have to check the documentation for the tools, but a lot of the Genotype Refinement tools take in ped files.
-Sheila
From McClintock on 2019-02-09
> Each -ped argument can be tagged with NO_FAMILY_ID, NO_PARENTS, NO_SEX, NO_PHENOTYPE to tell the GATK PED parser that the corresponding fields are missing from the PED file.
Hi, I have a question. How to tag this argument?
From drtamermansour on 2019-06-22
Hi, I want to make sure how GATK makes the link between the ped file and the vcf file. Does it match the individual id in the ped file to the sample id in the vcf file? Or does it join the family id and the individual id in the ped file before matching to the sample ids in the VCF?