Data processing‎ > ‎

Read external data files

If no DAT/CEL files are available for Affymetrix array data, dChip can still read in a tab delimited data file (in text format) with expression value, absolute call and standard error (SE) data as columns:
 

probe set

130a

130a call

130a SE

130b

130b call

130b SE

AFFX-BioB-5_at

3348.89

P

281.398

3825.92

P

225.898

AFFX-BioB-M_at

3478.22

P

400.583

6778.75

P

273.612

AFFX-BioB-3_at

2322.84

P

180.836

3437.77

P

158.029

AFFX-BioC-5_at

7837.85

P

628.778

7590.25

P

402.236

AFFX-BioC-3_at

5887.03

P

501.962

6473.87

P

316.34

AFFX-BioDn-5_at

4416.52

P

711.782

8313.93

P

556.247

AFFX-BioDn-3_at

16049.3

P

1870.28

18681.5

P

1048.73

AFFX-CreX-5_at

24904.8

P

1728.4

29241.8

P

1095.15

 
The absolute call and SE columns are optional and can be specified in the “Analysis/Get External Data” dialog:
 

The data file should have the first row containing array names and the first column containing gene names. The data files exported by dChip may have the addition columns of gene annotations starting from the 2nd column (such as “gene, Accession, LocusLink, Description”). To read such files by “Analysis/Get External Data”, these columns should be deleted and then save the file as tab-delimited text file in Excel. Alternatively, specify "Skip column 2 to x" to ignore column 2 to x.

 

If there are missing values in the external data file, leave them blank in Excel and then save as tab-delimited text files so that “Get external data” will regard blank cells as missing values. If there may be any missing values in the last column, add an additional pseudo last column with all values of “1” to make the data read correctly. Afterwards use “Array list file” to specify only the real samples that will be used in the analysis.

 

The “Get External Data/Other information” tab will prompt user to read in “gene information file” or “sample information file”, and these files are the same as those used for data read in by “Analysis/Open group”. However, if in the external data file the sample (column) names are already meaningful, in sample information file one can have both “array name” and “sample name” columns identical as the sample names in the first line of the external data file.

 

Click “OK” to read in the data file. If successful, the “Modeled” indicator will appear in the lower-right corner to indicate the expression data is available for high-level analysis. The “Normalized” indicator will not be shown, and if the data has been normalized one can proceed to high-level analysis.

 

If the data has not been normalized beforehand, one can then use “Analysis/Normalize” to normalize the expression values using the Invariant Set Normalization method (Version 1.0 uses a using a simplified ISN method with fixed rank difference threshold 50 without iteration), and the standard error attached to an expression value will be scaled by the ratio of the expression values before and after normalization. Check the “Show scatter-plot…” option to show normalization scatter-plots when normalizing. (installation of R needed).

 

Afterwards, the high-level analysis can be applied without the “Analysis/Model-based expression” step (since no CEL values are read in). For example, the “Tools/Array list file” function can be used to pool replicate arrays, and “Analysis/Hierarchical Clustering” and “Analysis/Compare Samples” can be performed as usual.