Raw paired-end bulk RNA-seq fastq files from Lee et al. (2016) [7] were downloaded from the NCBI Sequence Read Archive (Accession number: SRR2089755), and processed on the UT EduPod server. After reviewing sequencing read quality with FastQC[9], Illumina TruSeq adapters were removed from each sample with Trim Galore[10]. Trimmed reads were aligned to the hg19 human reference genome [11] using STAR (Spliced Transcripts Alignment to a Reference) aligner [12] following the Harvard Chan Bioinformatics Core Introduction to RNA-Seq using high-performance computing guide [13]. Aligned reads were assigned to genes using featureCounts [14] with the GENCODE v19 annotation [15], producing a raw count matrix with 57,820 features across all 20 samples.
Differential gene expression analysis was done by DESeq2 [16] in RStudio locally on my laptop (R version 4.5.2) following the Harvard Chan Bioinformatics Core DGE Workshop guide [17]. The raw count matrix was entered as a DESeqDataSet with condition as the design factor across the four tissue types and ColonTumor set as the reference level. I ran four pairwise comparisons: ColonTumor vs NormalColon, LiverMet vs NormalLiver, LiverMet vs ColonTumor, and NormalLiver vs NormalColon. Log2 fold change shrinkage was applied using ashr [18], and genes were considered significantly differentially expressed at FDR < 0.05 and |log2 fold change| > 1. Gene expression counts were normalized using rlog transformation with blind = TRUE for visualizations. Principal component analysis was performed on rlog-normalized counts using the plotPCA function, and volcano plots were made with EnhancedVolcano [19]. Heatmaps were generated using pheatmap [20] with scale = "row" for z-score scaling.
To identify genes upregulated in liver metastatic tumors relative to primary colon tumors independent of the liver-specific gene transcription they adopt to survive in their new liver environment, a liver background filtering step was applied. Genes significantly upregulated in normal liver relative to normal colon (NormalLiver vs NormalColon, FDR < 0.05, log2FC > 1) were identified and excluded from the LiverMet vs ColonTumor DEG list.
The expression of genes reported as significant DEGs between metastatic tumors and their primary tumor source by Lee et al. [7] and Bocuk et al. [6] were examined in the LiverMet vs ColonTumor comparison (without liver background filtering) to look at concordance between my analysis of the clinical samples and the original authors’ and to evaluate translational relevance of the mouse model respectively. Selected candidate genes were examined in the TCGA colon adenocarcinoma cohort using GEPIA2 [8] for expression box plots and Kaplan-Meier survival analysis.