GTEx Data and Analysis FAQs

A1) How can I access the RNA sequence data?

For privacy reasons, NIH policy prevents us from releasing the raw sequence data via the GTEx portal. These data are available through dbGAP .

A2) What does a variant ID of "chr8:73811903:D" or "chr8:73809509:I" mean?

All variant IDs are from the 1000 genomes project, obtained during imputation using 1000 genomes as the reference panel. Some of the variants are small insertions and deletions. These variant IDs begin with the chromosome position of the first base and end with a 'D' for deletion and 'I' for insertion. For details on the chromosome position (genome build 37) and REF and ALT alleles of all variants used in the GTEx eQTL analysis you can download the file: GTEx_var_genot_imputed_info4_maf05_CR95_CHR_POSb37_ID_REF_ALT.txt.zip, from the "Datasets" link, under the Reference header.

A3) What does the sample ID for an RNA-Seq or genotype sample stand for, such as GTEX-14753-1626-SM-5NQ9L?

The sample ID for an RNA-Seq or genotype sample is made up of the following 3 components separated by a dash, as exemplified with the example "GTEX-14753-1626-SM-5NQ9L":

  1. "GTEX-YYYYY" (e.g. GTEX-14753) represents the GTEx donor ID. This ID should be used to link between the various RNA-Seq and genotype samples that come from the same donor.
  2. "YYYY" (e.g., "1626") mostly refers to the tissue site, BUT we do not recommend using it for tissue site designation. Sometimes sample mix-ups occur, and will be corrected however this part of the ID will not change when that happens. The accurate tissue site designation for all samples can be obtained from the "Tissue Site Detail field" (encoded as "SMTSD") in the Sample Attributes file [Datasets->Download->GTEx_Data_V6_Annotations_SampleAttributesDS.txt].
  3. "SM-YYYYY" (e.g., SM-5NQ9L) is the RNA or DNA aliquot ID used for sequencing.

'Y' stands for any number or capital letter.

A4) How can I map GTEx variant IDs to dbSNP rs IDs?

A lookup table is available on the GTEx Portal Datasets Page (for release V7, the file is GTEx_Analysis_2016-01-15_v7_WholeGenomeSeq_635Ind_PASS_AB02_GQ20_HETX_MISS15_PLINKQC.lookup_table.txt.gz).

A5) Why are some ischemic times less than zero?

In the sample annotation file, the samples have a SMTSISCH value which indicates minutes of ischemia time. Some of these values are less than zero. Is this time calculated from the time the patient is pronounced dead or when the heart is no longer pumping or when the ventilator is stopped, or all of the above?

Sample-specific ischemic time is defined as the time from death or withdrawal of life-support until the time the sample is placed in a fixative solution or frozen. So it's all of those scenarios, depending on the particular patient, although NOT from when the person is pronounced death, but rather the actual time of death (or as close to it as possible for rapid autopsy patients). The negative times should appear only for blood samples and represent samples that were collected pre-mortem. Those were all from organ donor ventilator cases where life support was about to be shut off, and the patient would have been perfused prior to organ harvest, so blood was taken prior to that happening in those cases.

A6) Was the RNA-seq protocol for GTEx strand specific?

No. RNA-seq was performed using the Illumina TruSeq library construction protocol. This is a non-strand specific polyA+ selected library. For more details, please visit our documentation page: https://gtexportal.org/home/documentationPage

A7) Where can I find the sample annotations for GTEX-111CU-1826-SM-5GZYN? I searched in the biobank but I could not find that sample identifier.

The GTEx biobank inventory contains information about samples that are currently in our freezers. The sample aliquots that were used for genotyping and RNA-seq were used up during processing, so they will not appear in the biobank inventory. The biobank inventory should contain related parent samples, and searching for the sample identifier GTEX-111CU-1826-SM-5GZYN should return those related samples (providing they have not been depleted).

To find the sample and subject annotations for samples used in an analysis release, please use the sample and subject annotation files. You can download these files here: https://gtexportal.org/home/datasets

A8) Why are there are samples in dbGaP for donors that do not have genotypes in the VCF file?

The RNA-Seq and genotyping processes are run separately. All donors will eventually be genotyped, but their genotypes may not have been produced and QC'd in time for the current release. Some donors have been excluded from the VCF due to relatedness or for biological or clinical reasons (see also below).

A9) If a donor is found to have a pre-existing condition, are any targeted organs excluded from the specimen collection process?

GTEx sample exclusion is done at the donor level. Donors were excluded from the analysis freeze if the donor was a 'biological outlier', e.g. had a large chromosomal duplication or deletion -- duplication of a whole chromosome or chromosome arm or a large CNV (>1Mb) associated with a known syndrome based on literature search, such as OMIM -- or underwent transgender surgery. We have also removed all but one individual from related donor subsets (primarily pairs). To date, we have not excluded specific donors from specific tissues based on their cause of death or medical history.

A10) Why is the effect direction of some eQTLs opposite to that reported elsewhere?

The effect sizes of eQTLs are defined as the effect of the alternative allele (ALT) relative to the reference (REF) allele in the human genome reference. In other words, the eQTL effect allele is the ALT allele, not the minor allele.

A11) What is the difference between the tissue sites Brain - Frontal Cortex and Brain - Cortex, Brain - Cerebellum and Brain - Cerebellar Hemisphere?

The Brain - Frontal Cortex and Brain- cortex, and the Brain - Cerebellum and Brain - Cerebellar Hemisphere samples should be considered as sample duplicates. One set of each pair (the Brain - Cortex and Brain - Cerebellum) were sampled at the same time as the remaining donor non-brain tissue samples, and were preserved in PAXgene tissue fixative solution.

The remaining whole brain was then shipped to the University of Miami Brain Endowment Bank, where 8-11 brain sub-regions were sampled. The Brain - Frontal Cortex and Brain - Cerebellar Hemisphere were re-sampled at this time, as close as possible to the original sampling sites. All brain sub-regions sampled at the Miami Brain bank were preserved by snap freezing. Hence the paired brain regions differ in the time of sampling (those re-sampled at the Brain Bank, have a longer ischemic time) and in the manner in which the sample was preserved.

A12) Why are the results in release v7 different from those in release v6p? My SNP of interest had a significant eQTL in v6P, but in v7 it no longer has a significant eQTL?

There may be multiple reasons for the observed differences. V7 has a larger sample size than v6p. The eQTL calls in v7 are based on whole genome data and the RNA-Seq data have been reprocessed. eQTLs that lose significance from one release to the next were likely weakly significant. You can verify this in the *egenes.txt.gz files from v6p, by comparing pval_nominal to pval_nominal_threshold. Since the v7 data is better powered and of higher quality, we recommend using the v7 results.

A13) Why are isoform and gene expression estimated separately using two different tools (RNA-SeQC for gene expression and RSEM for isoform expression)?

We found that eQTL discovery is better powered using the gene-level expression estimates generated with RNA-SeQC. The RSEM estimates are based on combining isoform-level estimates, which adds uncertainty to the resulting gene-level values (the isoform-level estimates are highly inaccurate in some cases).

A14) How was the autolysis score measured?

The autolysis score was assigned by a pathologist during a visual inspection of the histology image. The assigned values ranged from 0 to 3 (None, Mild, Moderate, and Severe).

A15) I have access to the GTEx BAM files on dbGaP, but I need FASTQ files. How can I obtain the GTEx FASTQ files?

Only BAM files are available on dbGaP, but it is easy to generate FASTQs from them using Picard tools. We provide a wrapper here: https://github.com/broadinstitute/gtex-pipeline/blob/master/rnaseq/src/run_SamToFastq.py

A16) How should eQTL effect sizes be interpreted?

The portal displays two values that are related to the eQTL effect sizes:

Allelic fold-change (aFC), a measure of cis-eQTL effect size, is defined as the log-ratio between the expression of the haplotype carrying the alternative eVariant allele to the one carrying the reference allele in log2 scale. Currently, due to computational limitation, only the aFC of the most significant variant of each eGene is available.

Normalized effect size (NES), previously known as the effect size on the portal, is defined as the slope of the linear regression, and is computed as the effect of the alternative allele (ALT) relative to the reference allele (REF) in the human genome reference GRCh37/hg19 (i.e., the eQTL effect allele is the ALT allele). NES are computed in a normalized space where magnitude has no direct biological interpretation.

eVariant: a significant variant associated with an eGene.

A17) What are TPM units? How were these calculated?

Gene and transcript expression on the GTEx Portal are shown in Transcripts per Million (TPM), calculated as

where n_t is the number of reads for transcript/gene t, l_t is the normalized transcript/gene length, and T is the set of all transcripts/genes. For additional information, see https://academic.oup.com/bioinformatics/article/26/4/493/243395

A18) Where can I find clinical information about the donors?

The publicly available donor phenotypes and sample annotations can be downloaded from the GTEx Portal in the Annotation section of each release here: https://gtexportal.org/home/datasets

Due to the nature of our donor consent, the public phenotypes and annotations are limited. If you need access to protected phenotypes and annotations, you can apply for access via dbGaP here: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000424.v7.p2

GTEx Releases

GTEx_release_table_portal_V7_090617.xlsx

GTEx Portal-specific FAQs

P1) Why are there different numbers of tissues available on the Search eQTLs page?

The "Search Precomputed Significant eQTLs" section allows you to look up cis eQTLs which have been precomputed in a +/- 1Mb cis window around the transcription start site (TSS). Significance was determined using a Q-value threshold. At least 70 samples per tissue are necessary to achieve the statistical power needed for this type of analysis.

In contrast, the "Test Your Own SNP-Gene Associations" section allows you to compute, on demand, an association between a SNP and gene of your choice. The association may be cis or trans. This calculation may be performed in tissues for which we have more than 10 samples. No Q-value filtering is performed and the user is left to interpret the significance of the p-value.

P2) What does Ref Allele mean on the eQTL plot?

REF stands for reference allele, as determined by the hg19/GRCh37 human genome reference. ALT stands for alleles that are alternate in comparison to the reference. The variant IDs are from the 1000 genomes project. A file with the chromosome positions (genome build 37), REF and ALT alleles of all variants used in the GTEx eQTL analysis can be downloaded from the "Datasets" page, under the Reference header: GTEx_var_genot_imputed_info4_maf05_CR95_CHR_POSb37_ID_REF_ALT.txt.zip. Information on the minor allele, including allele frequencies and nucleotides, can be found on dbSNP. For example, the top SNP in Lung is rs2687967 . That link leads to dbSNP, which shows that the reference allele is G and the alternate allele is A.

P3) What browsers does the GTEx portal support?

The GTEx Portal is tested on the latest versions of Chrome and Firefox, and Safari version 7+. Please note that the GTEx Portal will not work properly with Internet Explorer.

P4) Why is the GTEx Portal not working for me after the latest update?

While we have taken steps to reduce the chance of this happening, sometimes browsers cache important portal files and do not recognized that they have changed. If you are having problems with the GTEx Portal, please try clearing your cache first. If that doesn't solve the problem, then please contact us.

P5) Can I use one of the figures in the GTEx Portal in my paper?

Yes, you are free to use figures in the GTEx Portal in your publications. Most of the figures on the GTEx Portal now have a Download button above and to the left of the figure. This will download the figure in .svg format, which is a vector-based format.

Please acknowledge the GTEx Project and/or Portal. An example acknowledgement statement follows:

The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript were obtained from: [insert, where appropriate] the GTEx Portal on MM/DD/YY and/or dbGaP accession number phs000424.vN.pN on MM/DD/YYYY.

Google Sign-In FAQs

G1) Why doesn't my username/password work anymore?

We have upgraded to use Google Sign-In. You will need to sign in with your Google ID. If you don't have a Google ID, then you will need to register with Google.

G2) Will the GTEx Portal see my Google password?

No, the GTEx Portal will never see your Google password. When you sign in using your Google ID and password, you will be using a dialog that is controlled by Google, not by the GTEx Portal. Your ID and password are communicated directly to Google, not through the GTEx Portal.

G3) Why did you change to Google Sign-In?

We migrated to Google Sign-In for several reasons. First, this allows us to provide more functionality to users with less effort on our part. Using Google Sign-In, we no longer have to maintain our login, logout, password-maintenance, and forgot-password functionality. Instead, Google handles all of those features. In addition, Google already provides for more secure two-factor authentication, if you choose to enable it. We want to focus our efforts on scientific features, rather than on user account management. Second, using Google Sign-In will allow us to integrate with analytical pipeline engines like FireCloud in the future.

G4) Where can I find out more about Google Sign-In?

You can read more about Google Sign-In here: https://support.google.com/accounts/answer/112802?hl=en

G5) I've tried logging in, but nothing has changed, and I still can't access protected pages.

Note that ad-blockers may interfere with the sign-in process and prevent it from correctly loading. Please try disabling your content blocker while on the GTEx Portal, or whitelist our website. If you continue having issues with the sign in process, please contact us through the contact page and we will try our best to troubleshoot.