Infer copy numbers

Set “Options/Chromosome/Inferred copy method” to be “Hidden Markov Model” to infer copy numbers (see Zhao et al. 2004, page 2 for method). In the inferred copy number view on the right (figure above), there is no significance value associated with the inferred copy number on the right --- the whole curve is the most likely underlying copy number based on HMM model. The abnormal copies inferred in normal samples are likely to be false positive. One can toggle between the inferred and raw copy number to confirm or reject the inferred copy numbers.

Computing inferred copy number takes some time and here is a way to save some time. At the inferred copy view, use “Chromosome/Export SNP data” to save the inferred copy numbers into a file. In the next session, use the same array list file as before, set “Options/Chromosome/Inferred copy method” to be “Read from file” and specify the inferred copy number file at “Options/Chromosome/Inferred copy file” to read in existing inferred copy number rapidly. As an additional utility, in this way different groups of DCP files can be analyzed separately (e.g. normalized against different references (e.g. WGA and unamplified samples)), and then export copy number file and combine them column-wise, and finally read all DCP files together (specify parent directory as data directory) and view the inferred copy number of all arrays.

Set “Options/Chromosome/HMM length” to N to perform HMM inference of LOH and copy number for a stretch of maximum N SNP markers each time. This can increase the speed for SNP array with density > 100K, where chromosome 1 has > 9K marker but one can set “HMM length” to be 1000. Such HMM length is mainly a practical consideration since for 500K array one chromosome can contain 50K SNPs, but the SNPs far apart do not provide much information to each other. So HMM is carried out for every chromosome segment containing a particular number of SNPS (e.g. 10K as HMM length). Except at the segment boundaries, the HMM inferred copy will be very close using different large HMM lengths.

[Version 7/20/06+] Setting “Inferred copy step" to 1 (default) will infer integer copy numbers 0 to 26. Setting “Options/Chromosome/Inferred copy step” to be > 0 and < 1 to infer fractional copy numbers in the multiple of the copy step, up to 27 copy numbers.  This option can accommodate fractional copy numbers in tumor samples with high ploidy (average copy number), where the normalization across tumor and normal samples scales the overall ploidy to be 2 for tumor samples. For example, a sample with ploidy of 10 has real copy change step of 0.2, when scaled to an overall ploidy of 2 by normalization. Setting "Inferred copy" step to 0 will infer these custom set copy numbers: {0, 0.05, 0.1, 0.2, 0.4, 0.6, 0.8, 1, 1.3, 1.6, 2, 2.5, 3, 4, 5, 6, 8, 10, 12, 15, 18, 21, 25, 30, 40, 60, 100}.

The limit in raw and median-smoothed copy number is 655. In HMM inferred copy, it's 27 multiplied by the copy number step (e.g. 27 * 0.2 = 5.4), or 100 is the copy number step is 0.

Set “Options/Chromosome/Inferred copy method” to be “Median smoothing” and set a SNP marker window size (e.g. 10) to median smooth raw copy numbers as the inferred copy number. Compared to HMM-inferred copy number, this method performs faster and gives closer result to the raw copy numbers; It is also robust to outliers in raw copy numbers, and does not need parameter specification in HMM fitting. However, median-smoothed copy numbers are not as smooth as HMM-inferred copy numbers, and copy changes smaller than half of the window size will be smoothed out.

[V9/14/07+] “Options/Chromsome/d.f. of t” is the degree of freedom of t-distribution used in HMM emission probabilities. The lower value can make t-distribution have longer tails and therefore more tolerant to outliers and may lead to more smoothed HMM inferred copy. Sub-integer “Inferred step” such as 0.2 may also help to smooth. "Window" is only for median smoothing.