Sample information file

If the CEL data file names are not informative, we can specify alternative names for them. A “Sample information file” is a tab-delimited text file; if edited in Excel make sure to save it in text format by “File/Save As/Save as type: Text (Tab delimited)”. The first header line is required. The first two columns are also required, and they are the array file names (without directory name and the .CEL or .DCP extension; can copy the “Array” column from the “Array summary filegenerated by "Open group" to the “Array name” column here) and the corresponding sample names. The sample names should be different for each array, and also be different from any array names; it can be blank so a sample name is the same as its array name. The rest columns are optional descriptions of sample properties using discrete words or numbers. Here is an example file:

 

Array name

Sample name

Grade

Marker 1

Maker 2

Maker 3

LG2000102601AA

N1

II

FALSE

positive

positive

LG2000102602AA

N3

III

FALSE

negative

positive

LG2000102603AA

N4

II

FALSE

positive

low-positive

LG2000102604AA

N5

III

FALSE

negative

negative

LG2000102605AA

N6

II

FALSE

positive

positive

LG2000102606AA

N7

III

FALSE

negative

negative

LG2000102607AA

N8

III

FALSE

negative

negative

LG2000102608AA

N9

III

TRUE

negative

negative

LG2000102609AA

N10

II

FALSE

positive

positive

LG2000102610AA

N11

III

FALSE

negative

negative

LG2000102611AA

N12

III

TRUE

negative

negative

Using a “sample information file” is highly recommended. It will be very useful in later functions such as  Significant sample clusters and selecting sample by categories. It can better facilitate the visual assessment of the sample clustering than the textual sample names. As an example, if a sample name “14c1” refers to “day 14, pair one, control sample”, we can create three sample information columns called “Day, Pair and Treatment”, and this sample has value “14, 1, C” for the three columns.

You may add a numerical column in “sample information file”. The column header needs to contain “(numeric)”, for example, “Time(numeric)”. Such continuous variable will be standardized and displayed at the top of clustering picture.

Comments