23andcolm‎ > ‎

Day 10 - NHGRI GWAS catalogue

posted Jul 13, 2011, 7:44 AM by Colm O'Dushlaine   [ updated Jul 13, 2011, 8:04 AM ]
There is another nice resource out there called the catalog of published genome-wide association studies, available at http://www.genome.gov/gwastudies/. It has it's limitations, but is a nice list of associations that we can be pretty confident about. So what I decided to do here was to compare my 23&me results with this list. 

I first made a "light" version of the download (www.genome.gov/admin/gwascatalog.txt), selecting fields of interest - positions, gene-names, control frequency:

Adiposity       13      80959207        SPRY2   SPRY2 - ARF4P4  rs534870-A      rs534870        0.68
Adiposity       16      53816275        FTO     FTO     rs8050136-C     rs8050136       0.60
Schizophrenia   19      42066279        ATP5SL, CEACAM21        PLEKHA3P1 - CEACAM21    rs4803480-A     rs4803480       0.13
...

I wanted some Caucasion minor allele frequency data also (get_maf.pl is attached)

 wget http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/hapmapSnpsCEU.txt.gz
 gunzip hapmapSnpsCEU.txt.gz
 perl get_maf.pl > ceu.maf
 grep -v arning ceu.maf > ceu.maf.clean

I then ran parse.pl (attached) to make a file called gwascatalogue.res.txt

 perl parse_gwas_catalogue.pl gwascatalog.txt.lite


I wanted SNPs that were rare. The file contains control frequency and CEU frequency. Here, I'm pulling out SNPs with frequency <2.5% in CEU (I think this is more accurate than the frequencies listed in the studies).

 gawk -F "\t" '$11 <= 0.025 {print  }' gwascatalogue.res.txt

disease chr     pos     geneP..down     snp_allelesnp   risk_all_cont   cod_geno        dose    ceu_maf
Type 2 diabetes 9       10430602        PTPRD   PTPRD   rs649891-C      rs649891        0.35    CC      2       0.0167
Optic disc parameters   1       92077097        CDC7, TGFBR3    RPL39P13 - HSP90B3P     rs1192415-G     rs1192415       NR      GG      2       0.0167
Parkinson's disease     17      44828931        MAPT    NSF     rs199533-C      rs199533        0.78    AA      0       0.0167
Non-alcoholic fatty liver disease histology (AST)       9       78425925        Intergenic      OSTF1 - PCSK5   rs12344488-A    rs12344488      0.07    AA      2       0.0169
Optic disc parameters   1       92077097        CDC7,TGFBR3     RPL39P13 - HSP90B3P     rs1192415-G     rs1192415       0.18    GG      2       0.0167
Optic disc size (disc)  1       92077097        HSP90B3P        RPL39P13 - HSP90B3P     rs1192415-A     rs1192415       NR      GG      0       0.0167
Ankylosing spondylitis  5       96129512        ERAP1   ERAP1   rs27434-A       rs27434 0.23    AA      2       0.0167
Parkinson's disease     17      44828931        NSF     NSF     rs199533-C      rs199533        0.83    AA      0       0.0167
Parkinson's disease     17      43719143        MAPT,  C17orf69, KIAA1267, LOC644246, IMP5      C17orf69        rs393152-A      rs393152        0.82    GG      0       0.0167
Obesity (extreme)       10      37982097        ZNF248  MTRNR2L7 - TLK2P2       rs7474896-T     rs7474896       0.14    TT      2       0.0167
Rheumatoid arthritis    6       138002637       TNFAIP3, OLIG3  OLIG3 - TNFAIP3 rs10499194-C    rs10499194      0.71    TT      0       0.0167

The e.g. 'CC' (3rd last column) are my genotypes and the 'dose' field means how many copies of the risk allele do I have. So, of the above results, I think the following are most interesting:

Type 2 diabetes 9       10430602        PTPRD   PTPRD   rs649891-C      rs649891        0.35    CC      2       0.0167
Optic disc parameters   1       92077097        CDC7, TGFBR3    RPL39P13 - HSP90B3P     rs1192415-G     rs1192415       NR      GG      2       0.0167
Non-alcoholic fatty liver disease histology (AST)       9       78425925        Intergenic      OSTF1 - PCSK5   rs12344488-A    rs12344488      0.07    AA      2       0.0169
Optic disc parameters   1       92077097        CDC7,TGFBR3     RPL39P13 - HSP90B3P     rs1192415-G     rs1192415       0.18    GG      2       0.0167
Ankylosing spondylitis  5       96129512        ERAP1   ERAP1   rs27434-A       rs27434 0.23    AA      2       0.0167
Obesity (extreme)       10      37982097        ZNF248  MTRNR2L7 - TLK2P2       rs7474896-T     rs7474896       0.14    TT      2       0.0167

I have bad eyesight (very bad!) so I'm intrigued by the optic disc parameters findings! The obesity/T2D etc. are not so interesting as common and complex genetic bases. 


Č
ċ
get_maf.pl
(3k)
Colm O'Dushlaine,
Jul 13, 2011, 8:04 AM
ċ
parse.pl
(3k)
Colm O'Dushlaine,
Jul 13, 2011, 8:04 AM
Comments