Affymetrix Microarry by Bioconductor

Base one tutorial:

# Because the R package will output some figures, we need X11 forwarding support. 
# To conform you do have X11 forwarding service, run command 'xclock' to see if you can show a clock on your screen.

# If you can see the clock, go to next step. Otherwise, please follow this link to setup x11 forwarding:

# create and folder to work with and go into it
mkdir ~/data/affy-test && cd ~/data/affy-test

# Copy testing Affymetirx testing files and sample information file
/gpfs/data/shared/biomed/affy_cell_files/* .

# Take a look at the copied files
ls -al
total 152M
-rw-rw-rw-  1 ldong ccvstaff 5070371 Nov  6  2004 COLD_12H_SHOOT_REP1.cel
-rw-rw-rw-  1 ldong ccvstaff 5070416 Nov  6  2004 COLD_12H_SHOOT_REP2.cel
-rw-rw-rw-  1 ldong ccvstaff 5071728 Nov  6  2004 COLD_6H_SHOOT_REP1.cel
-rw-rw-rw-  1 ldong ccvstaff 5070448 Nov  6  2004 COLD_6H_SHOOT_REP2.cel
-rw-rw-rw-  1 ldong ccvstaff 5070567 Nov  6  2004 COLD_CONTROL_12H_SHOOT_REP1.cel
-rw-rw-rw-  1 ldong ccvstaff 5070655 Nov  6  2004 COLD_CONTROL_12H_SHOOT_REP2.cel
-rw-r----- 1 ldong ccvstaff  224 Nov  2 13:16 pdata.txt

# Take a look at the sample information file content
#### Notice the order of the file names is the same as the output of command "ls -al" above. This is very important to have the same order.
less pdata.txt
COLD_12H_SHOOT_REP1.cel sample  12h
COLD_12H_SHOOT_REP2.cel sample  12h
COLD_6H_SHOOT_REP1.cel  sample  6h
COLD_6H_SHOOT_REP2.cel  sample  6h
COLD_CONTROL_12H_SHOOT_REP1.cel control 12h
COLD_CONTROL_12H_SHOOT_REP2.cel control 12h

# Load R module
module load R/2.15

# Start R
R version 2.15.0 (2012-03-30)
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

# Next, we will work under R. All commands will be copied and pasted after the R command tempt (>,as should above).

# First load libraries




# read in sample information

pd <- new("AnnotatedDataFrame", data = read.table("pdata.txt", header = FALSE, row.names = 1))

# Calculate gene expression and do quality control. This command will create a few figures.

eset <- affystart(groups = rep(1:3, each = 2),

                 groupnames = unique(paste(pData(pd)[,1], pData(pd)[,2], sep = "-")))

# Add sample information into gene express data

eset <- new("ExpressionSet", exprs = exprs(eset), phenoData = pd, annotation = annotation(eset))

# Set up a filter. Here,

A is: The value you want to exceed. k is: The number of elements that have to exceed A.

# Notice here '6' is in log2 scale (after log transformation).

# Here, we require at least the probe set has expression level higher than 2^6 in at least 3 of the 6 samples.

filt <- filterfun(

kOverA(3, 6)

# Find all the probe sets which pass the filter

index <- genefilter(eset, filt)

# Apply the filter

eset <- eset[index,]

# Find group name for each sample and build up design

grps <- paste(pData(eset)[,1], pData(eset)[,2], sep = ".")

design <- model.matrix(~ 0 + factor(grps))

colnames(design) <- levels(factor(grps))

# Fit gene repression data with design

fit <- lmFit(eset, design)

# list the unique groups


# Setup contrasting sample groups

contrasts <- makeContrasts(sample.12h-control.12h, sample.6h-sample.12h, levels=design)

# Fit a linear model and extract contrasts of interest.

fit2 <-, contrasts)

fit2 <- eBayes(fit2)

# Further filter and output the results

limma2annaffy(eset, fit2, design, contrasts, annotation(eset), pfilt = 0.00001, text=TRUE, interactive=FALSE)

# After this, you should see a few files
 [1] "COLD_12H_SHOOT_REP1.cel"         "COLD_12H_SHOOT_REP2.cel"        
 [3] "COLD_6H_SHOOT_REP1.cel"          "COLD_6H_SHOOT_REP2.cel"         
 [7] "Density plot.pdf"                "Digestion plot.pdf"             
 [9] "Expression values.txt"           "PCA plot.pdf"                   
[11] "pdata.txt"                       "sample.12h - control.12h.html"  
[13] "sample.12h - control.12h.txt"    "sample.6h - sample.12h.html"    
[15] "sample.6h - sample.12h.txt"     

# If you want to see all the available options for any R function/command, use this command. Here 'command' is the command you want to check:

# Quite R

# The step "limma2annaffy" above needs a probe set annotation database from Bioconductor, if the database is missing, you will see error like:
Loading required package: xlaevis2.db
Error in aaf.handler(chip = chip) : 
  Couldn't load data package xlaevis2.db
In addition: Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called 'xlaevis2.db'
# Let me know if you see this error. Or you can directly print out the data using your own annotations

# Read probe annotation from text file, which should include all probe sets on the array
annot <- read.csv("ann.txt", header = TRUE, sep = "\t", row.names = 1) # assume the first column is probe set ID

# Get the analysis result for the first contrast
analysis = topTable(fit2, coef=1, number = 1000000000000)
colnames(analysis) = paste("1-", colnames(analysis), sep="")

The summary table contains the following information: logFC is the log2-fold change, the AveExpr is the average expression value accross all arrays and channels, the moderated t-statistic (t) is the logFC to its standard error, the P.Value is the associated p-value, the adj.P.Value is the p-value adjusted for multiple testing and the B-value (B) is the log-odds that a gene is differentially expressed (the-higher-the-better). Usually one wants to base gene selection on the adjusted P-value rather than the t- or B-values. More details on this can be found in the limma PDF manual (type 'limmaUsersGuide()') or on this FAQ page.

# Get the analysis result for the second contrast
analysis1 = topTable(fit2, coef=2, number = 1000000000000)
colnames(analysis1) = paste("2-", colnames(analysis1), sep="")

# Merge the two contrast analysis together
ana = merge(x=analysis, y=analysis1, by.x="1-ID", by.y="2-ID")

# Further merge with gene expression data
anal = merge(x=exprs(eset), y=ana, by.x='row.names', by.y="1-ID")

# Further merge with probe set annotation data and write the date into a text file
write.table(merge(x=anal, y=annot, by.x="Row.names", by.y="row.names", all.x=TRUE), file="all_data_with_analysis.txt", sep="\t", col.names = TRUE)

Now you should be able to work on your own data. Let me know if you have any question.