Tumor-only LOH inference

[Use dChip 10/21/05+] See Beroukhim et al. 2006 for the method details. A Hidden Markov Model (HMM) is used to infer the probability of LOH based on the observed LOH calls (from paired normal/tumor samples) or genotype calls (from tumor sample without paired normal). Use “Chromosome/Display inferred” (or key “I”) to toggle to the inferred probability of LOH (the right-side figure above), which is displayed from blue (1) to white (0.5) to yellow (0). The blue LOH curve on the right is a LOH score measuring the prevalence of LOH at a marker across the samples, and is computed as the average probability of LOH. “Chromosome/Permute data” can be used to assess the significance of the peak LOH score regions. The inferred LOH calls can be exported by “Chromosome/Export SNP data”.

Sometimes intervening LOH/retention calls can be inferred. They can be caused by intervening homozygous and heterozygous genotypes due to normal contamination or hyperploidy. One may try a larger “Options/Chromosome/Genotyping error” to make it smoother.

To infer the LOH status of non-informative LOH calls from paired normal/tumor LOH analysis, the method “Options/Inferred LOH method/Same boundary” ("Fill in noninformative for pair") can be used in addition to the HMM method. It finds the regions with consecutive non-informative calls, with two informative calls as the boundary. If the observed LOH calls of the two boundary SNPs are the same (both are loss or retention), we infer the call of the non-informative markers in between to be the same as the informative boundary.

For the tumor-only LOH inference, when no reference genotype file is used at "Options/Chromosome", in HMM the probability of observing AB markers giving underlying retention state is set to be "Options/Chromosome/Average HET rate" (e.g. 0.3 for 10K and 0.2 for 100/500K arrays). The smaller it is, the more likely we will observe consecutive homozygous calls from data and the less need there is to infer LOH to explain these homozygous calls. When a reference genotype file is used, the SNP specific HET rate will be estimated from the file and used for basic HMM, and the previous-marker dependent HET rate will be estimated from the file and used for “HMM considering haplotype”.

The tumor-only LOH inference using haplotype information is illustrated here using 100K SNP array data. First combine 100K Xba and Hind arrays, and read the combined data file into dChip using “Analysis/Get external data”. At “Tools/Array list file”, put only a tumor sample in a “Standardize group” to infer LOH from only tumor samples. At “Analysis/Chromosome/Options”, specify “Inferred LOH method” as “HMM considering haplotype”, and specify a normal reference genotype file at “Reference genotype file” (unzip this file: 100K genotypes of CEPH 60 parents, data source; make such files; 500K genotypes of CEPH 60 parents). V12/14/06+: When no "Options/Reference genotype file" is specified, the normal samples specified in sample info file as "Ploidy(numeric)" of 2 will be used to estimate SNP heterozygosity and genotype dependence probabilities.

If the “Remove LOH regions” threshold (T) is not 100%, an HMM-inferred LOH region in tumor-only sample will be removed if its genotypes are “consistent” with more than T % of the 60 normal reference samples (see manuscript above for details). Click OK to run HMM for tumor-only LOH inference. The figure below: (top) Using the basic HMM (“Inferred LOH method” is “Hidden Markov Model”), compare the LOH inferred from paired normal and tumor samples with the LOH inferred from tumor-only samples. Many small blue horizontal lines indicate falsely-inferred LOH. (bottom) The “HMM considering haplotype” method, also with “Remove LOH regions” threshold of 10%.