hapLOH is software for combining haplotype estimates and SNP array data for identifying somatic segmental copy number and copy-neutral mutations. Common applications include detection of tumor-associated mutations from low-purity tumor-normal mixture samples, and discovery of clonal aberrations in non-malignant tissues. To obtain the software, use the contact information at the end of the page. InstallationhapLOH is a binary executable that depends on Perl and
Python interpreters. hapLOH requires Releases are available for MacOS and Linux. The
software is distributed as a compressed tarball. To install hapLOH with version
number VERSION, move it to the directory in which you wish to install it and unpack it:
This will create a directory named Quickstart ExamplesIf you don't add the haploh executable to your path, then replace The download includes a directory haploh --baf example05.bafs --phased example05.hapguess We recommend that you set the event length and event prevalence according to the expected characteristics of your samples; see usage notes below for guidance on how to choose parameters. You can organize your output by specifying an output directory, which will be created if it doesn't already exist. haploh --baf EXAMPLES/example05.bafs --phased EXAMPLES/example05.hapguess event_mb 10 --event_prevalence 0.05 --destdir Output_10mb_prev05 --random_seed 1234 Input Files
RequiredBasic input includes two files, one for the BAFs and one for the phased genotypes, in the formats described below. Each file should contain data for one individual only. Markers should be ordered by genomic position, and paired files should contain data for exactly the same markers (missing values are allowed). BAF fileA single-line, space-delimited file containing BAFs in genomic position order. Missing values may be denoted by '?'; numerical values outside of the range [0,1] will also be considered missing. statistical haplotype file A two-line or two-column, space-delimited file with rows (or columns) corresponding to haplotypes. Alleles should be coded as A/B and missing values should be denoted by '?'. OptionalTo
estimate the overrepresented haplotype, in addition to the hapLOH inputs
above, you will need to supply a file describing the switch rates for the phase
estimates. Note that this file is required if you use the switch rate file Command Line OptionsRequired
Recommended
Advanced
Output FilesIf the --baf option is used, the prefix of the BAF input file is used as the prefix for the output files (i.e. test.baf -> test.switch_enumeration). Note: A directory intermediates/ will be created containing symbolic links to the input files. This is simply to accomodate the current implementation, and may be deleted after running.
There are three basic applications for hapLOH --- localization, testing a specific region of interest, and estimation of the over-represented haplotype. LocalizationThis is the most common use of hapLOH and is the default procedure. Two important options are the --event_mb and --event_prevalence
options. Although they have default values and therefore are not
required, we suggest the user consider specifying these according to the
expected size of the events of interest and the characteristics of the
sample. These values will be used to determine the transition
probabilities. A few guidelines when choosing parameters:
DetectionhapLOH currently does not include an option for assessing the evidence of allelic imbalance in a specific region, but you can do it yourself using a few of the output files. To perform detection (i.e. testing a specific region for
deviation from the null phase concordance rate), you will need to select values
from the .switch_enumeration file that correspond to your region of
interest. First determine which markers in
your dataset are located in the region of interest. Then use the .informative file to determine
which of those markers are informative
(note that indices are 0-based). Since
the values in the .switch_enumeration file correspond to every consecutive pair
of informative markers, there will be one fewer value than number of
informative markers. Drop the last informative marker in the region of interest and select
the values corresponding to the remaining informative markers -- the
average of these will be the observed phase concordance rate for the region. The localization HMM is run by default, but if you are only interested in detection you can turn it off with the command line flag Estimation of the Overrepresented HaplotypeTo apply the HMM for estimating the over- and underrepresented haplotypes, invoke hapLOH with the flag This procedure produces ordered haplotypes covering all of the markers in the dataset, but note that order is only meaningful when imbalance exists. Post-processing We have a set of working scripts (mostly in R, some in Perl) for various of the common next steps for summarizing and making inference from hapLOH output. Here's a partial list of procedures for which we have written scripts.
Advanced Usage NotesYou might find the advanced options useful for testing specific aspects of the method or for specialized cases in which you want to control the HMM parameters. See the table of available options above. FAQs
hapLOH relies on the genotype calls from the sample being representative
of the germline genotypes, In the case of samples with tumor purity
higher than about 25%, the genotype calls in imbalance regions may be
no-calls or may be called as homozygous and will be uninformative, and
hapLOH will not recognize the region as imbalanced. In this case, using
the genotype calls from a paired normal sample, if available, will
restore the informative genotypes. There is no special "paired" mode;
simply specify a file containing haplotype estimates made from the normal genotypes as the
Yes. To apply hapLOH to Affy data, you'll need to generate B allele frequencies, which you can do from .CEL files using the Affymetrix Power Tools (APT) and PennCNV softwares. This PennCNV page has a nice step-by-step guide for downloading and setting up APT and PennCNV and using them to convert .CEL files into genotype calls, BAFs, and LRRs. Contact and ReferenceIf you have any questions or comments, please contact Selina at svattathil@utexas.edu . hapLOH is an implementation of the method described in Vattathil, Selina, and Paul Scheet. "Haplotype-based profiling of subtle allelic imbalance with SNP arrays." Genome research 23.1 (2013): 152-158. (link) |
Software >