Conditional Random Pattern Model for Copy Number Aberration (CNA) Detection

Conditional Random Pattern Model for Copy Number Aberration (CNA) Detection

 

Software Usage:

Step 1: Download the package, and decompress it. For example, programs are decompressed at: c:\CRP_CNA

Step 2: Run Matlab (version 7.0 and higher), and change the current directory to the ‘c:\CRP_CNA’.

Step 3: Test two demo files: “sim_HapMap_Example.m” and “sim_Log2_ratio_Example.m”.

 

Functions:

There are two major functions: snpSmooth.m and simSNP_CRP_Infer.m

snpSmooth.m is used to smooth the input log2 ratio signals (the input data)

simSNP_CRP_Infer.m implements the Copy number inference using the CRP model. There are some parameters are adjustable in simSNP_CRP_Infer.m. Please check this .m file for more details. Here I list some important parameters:

a)      Transition probability (inside simSNP_CRP_Infer.m). We defined 3 status of copy number: normal, deletion and amplification. Thus the TP is 3*3 matrix that defines the state transition probability from one state to another.

TP = [0.6, 0.2, 0.2; 0.2, 0.6, 0.2; 0.2, 0.2, 0.6];

 

b)      Initial probability (inside simSNP_CRP_Infer.m), which defines the prior probability of 3 status.

IP = [0.1, 0.8, 0.1]';

 

c)      Mean values and standard deviation values of the input log2-ratio signals (outside of the simSNP_CRP_Infer.m), which are the input arguments of the simSNP_CRP_Infer.m.

Mu = [-0.5, 0, 0.38];  Sigma = ones(1,3)*0.25;

 

 Input Data:

The input data of the CRP_CNA model is the “log2-ratio” signals (which is obtained as: log2(disease snp intensity/normal snp intensity). In this study, we make use of “CNAG” to extract the log2-ratio sequences from the SNP array files (.CEL and .CHP). The users, who want to test the some new SNP array files, please use the CNAG to extract the log2-ratio sequence first. To download CNAG at: http://www.genome.umin.jp/CNAGtop2.html.

 

There are two simulated data sets. One is the simulated log2-ratios. The other is the simulated 500K SNP array data based on the HapMap samples public available on the Affymetrix website.

Simulated Log2-ratio data set: simLog2RatioData.mat (load it into matlab)

Simulated SNP array data set: ‘.\HapMap\ *.mat’.

 

The HapMap sample NA10851 was used to simulated these data at three different SNR levels. At same SNR level, there are two simulated data. One simulated the copy number loss, the other simulated the copy number gain. The original CEL and CHP files are also provided in ‘HapMapOri’ directory. The users can test them on different software packages that designed for Affymetrix array.

 

Important NOTE about CRF matlab code developed by Dr. Kevin Murthy:

We used the condition random field codes developed by Dr. Kevin Murthy, which could be downloaded at:

http://www.cs.ubc.ca/~murphyk/Software/CRF/crfGeneralOld.html . We also clarify this in our paper.

 

The paper is published here:

 

Li, F, Zhou, X., Huang, WT, Wong STC, Chang CC, Conditional Random Pattern Model for Copy Number Aberration Detection, BMC Bioinformatics, 11:200, 2010
 
 
This sofeware can be downloaded here.
Subpages (1): file_atttached