Data processing‎ > ‎Prepare data‎ > ‎

Example method description using dChip

Array normalization, expression value calculation and clustering analysis were performed using DNA-Chip Analyzer (; Li & Wong 2001a). The Invariant Set Normalization method (Li & Wong 2001b) was used to normalize arrays at probe cell level to make them comparable, and the model-based method (Li & Wong 2001b) was used for probe-selection and computing expression values. These expression levels were attached with standard errors as measurement accuracy, which were subsequently used to compute 90% confidence intervals of fold changes in two-sample or two-group comparisons (Li & Wong 2001b). The lower confidence bounds of fold changes were conservative estimate of the real fold changes. Genes with increased or decreased expression after treatments by more than 2 fold (lower confidence bound) were selected for further study.

Hierarchical clustering analysis is used to group genes with same expression pattern (Li and Wong 2003). A genes is selected for clustering if (1) its expression values in the 20 samples has coefficient of variation (standard deviation / mean) between 0.5 to 10 (2) it is called “Present” by MAS5 (or GCOS or dChip) software in more than 5 samples. Then the expression values for a gene across the 20 samples are standardized to have mean 0 and standard deviation 1 by linear transformation, and the distance between two genes is defined as 1 - r where r is the standard correlation coefficient between the 20 standardize values of two genes. Two genes with the closest distance are first merged into a super-gene and connected by branches with length representing their distance, and are deleted for future merging. The expression level of the newly formed super-gene is the average of standardized expression levels of the two genes (average-linkage) for each sample. Then the next pair of genes (super-genes) with the smallest distance are chosen to merge and the process is repeated until all genes are merged into one cluster. The dendrogram in Figure X illustrates the final clustering tree, where genes close to each other have high similarity in their standardized expression values across the 20 samples.