RNA-seq data from matched primary colon tumors, liver metastases, normal colon tissue, and normal liver tissue from five patients were processed using the pipeline described in Methods. All 20 samples had at least 89% of reads mapping uniquely and a minimum of 30 million uniquely mapped reads, indicating sufficient data quality for downstream differential expression analysis.
PCA of rlog-normalised counts showed clear division of the samples into each group. PC1 explains 75% of the variance and seems to represent tissue type, with colon samples clustering to the left and liver at the far right. We can see the metastatic tumors from the liver have moved towards the normal liver samples in PC1. PC2 accounted for 9% of the variance and separates cancer samples, which cluster together near the top, from normal tissue samples below.
DESeq [16] analysis was run on the raw gene counts and ashr [18] was used for log fold change shrinkage on results. For downstream analysis, FDR < 0.05 and |log2 fold change| > 1 were used as significance cutoffs. 7742 differentially expressed genes (DEGs) were identified between normal liver and normal colon tissues. This large number of DEGs is consistent with tissue type explaining the majority of variance in PC1 in the PCA above. The distinct transcriptional profiles between these samples reflects the dramatically different biochemical processes that happen in these two organs, as the liver is responsible for filtering blood and has many roles in metabolism, while the colon absorbs water and excretes solid waste at the end of the gastrointestinal tract. The set of genes identified here as upregulated in the normal liver samples relative to the normal colon samples were considered background liver signal in later analysis of DEGs between the primary colon tumors and liver metastatic tumors.
When comparing the tumor samples to their respective adjacent normal tissues, 4662 DEGs were identified between colon tumors and normal colon tissue, while 7931 DEGs were identified between the tumor cells that metastasized to the liver and normal liver tissue. DEGs between the primary colon tumors and normal colon tissue capture the cancer-associated transcriptional changes, as they have the same tissue of origin. In contrast, the much larger number of DEGs in the liver metastasis compared to normal liver tissue reflects that both cancer-associated transcriptional changes and fundamental transcriptional differences between liver and colon tissue are present in these data.
The volcano plot for DEGs between tumors that had metastasized to the liver and the primary colon tumors sampled from each patient looked drastically different from the other comparisons. Only 788 DEGs were identified, and there were approximately six times more upregulated DEGs than downregulated, while the other sample comparisons were more symmetrical. Many of the upregulated genes labeled on the plot here were also some of the most significantly upregulated in the normal liver tissue relative to normal colon tissue and labeled in that volcano plot (Fig5a).
To look at this relationship further, this heatmap was made using the 788 DEGs between primary colon tumors and metastatic liver tumors. Based on expression of these genes, hierarchical clustering cleanly separated each sample type together. The long dark purple block in the normal liver tissue confirms that much of the DEGs between the colon and liver sourced tumors are genes related to tissue-specific differences that are upregulated in normal liver cells relative to normal colon cells. The liver tumor samples look
light purple for many of these genes, while both normal colon and colon tumors look orange. Another shorter dark purple block is present in normal colon, and many of these genes are dark purple in the colon tumor samples but light orange in tumors from the liver. Together, these regions of genes support that tumors in the liver are losing expression of colon-identity genes and gaining expression of liver-identity genes. This tissue-specific transcriptional reprogramming between colon tumors that metastasize to the liver has been characterized before [21].
To find specific gene expression changes that may underlie metastatic biology, I wanted to look beyond changes that help colon tumor cells take on a more liver-like identity. I filtered out DEGs enriched in normal liver relative to normal colon tissue samples, which I termed background liver signal. From the 788 DEGs between liver and colon sourced tumors, 60 remained after this filter step. For readability, the 20 DEGs with the most significant adjusted p-values were used to make another heatmap showing expression differences between primary colon tumors and liver metastatic tumors. Hierarchical clustering by sample type was clear.
The original Lee et al. publication of these clinical RNA-seq data was focused on finding fusion genes, but they also reported genes enriched in metastatic tumors relative to primary tumors after filtering out those also enriched in either normal adjacent tissue. They used FPKM normalization rather than DESeq2, accounting for some differences in replication when I compared my results for 12 of these genes. This heatmap clustered the samples correctly, and 11/12 (91%) were enriched in metastatic tumors relative to primary tumors, with 7/12 (58%) significant at FDR < 0.05. This concordance between different analysis methods for the same data, supports the strength of these results.
To address my initial question of how well Bocuk et al. had uncovered clinically relevant biological insights into metastatic disease, I took the list of 32 genes they claimed were related to liver metastasis development (based on the DEGs between tumors that grew in the liver after injection compared to the cultured CMT-93 cell line) and made another heatmap of gene expression in the human clinical tumor samples. The samples failed to group appropriately by hierarchical clustering, with one colon tumor placed with the liver tumors. Only 11/32 (34%) genes were enriched in liver metastatic tumors relative to the primary colon tumors, and only 1/32 (3%) was significant at FDR < 0.05 (CXCR4, CXC chemokine receptor type 4). The MMP2 gene was significanlty enriched in opposite direction from Bocuk et al.'s results, showing significantly higher expression in primary tumors in human samples. This poor replication rate would be consistent with fundamental differences between their CMT-93 portal vein injection and human disease, potentially due to their injection bypassing the early steps of tumor evolution necessary for metastasis. Additionally, these 32 DEGs were found by filtering their dataset down to only look at differential expression in the 119 genes from the commercial Qiagen Tumor Metastasis RT2 Profiler PCR Array. This filter may have limited them from identifying additional candidate genes involved in metastasis in their model. Comparing the liver tumor cells to the cultured cell line rather than a matched primary colon tumor may also add transcriptional differences from adaptation to cell culture conditions through repeated passaging [22] that would not be clinically relevant.
To further explore the DEGs that looked significant between primary colon tumors and metastatic liver tumors, I explored TCGA expression and survival data using the GEPIA2 web tool [90] for the top 20 significant DEGs from my data analysis and the single significant DEG that replicated from the portal vein injection mouse model. The TCGA colon adenocarcinoma (COAD) dataset with matched normal samples was used to make boxplots showing gene expression. The Kaplan-Meier curves compare survival between COAD patients split into a high or low expression group for each gene. From my top DEGs, only FOXC1 showed a significant result on survival (logrank p = 0.013, HR = 1.9). Expression of FOXC1 trended higher in tumors than matched normal tissue, but did not meet statistical significance. In contrast, CXCR4 showed no significant difference in expression or survival analysis (logrank p = 0.77, HR = 1.1) in the COAD data set (though expression did show a larger range in tumor compared to normal).
FOXC1 is a transcription factor associated with cellular plasticity [23]. Its association with worse overall survival and upregulation from normal tissue to primary tumor, and again to liver metastasis in my data, is consistent with how the metastatic cascade strongly selects for cells higher in plasticity [24].
CXCR4 is a chemokine receptor that interacts exclusively with its ligand CXCL12, which is much more highly expressed in the liver than in the colon [25]. Despite the lack of significant findings in my TCGA results, CXCR4’s role in cancer progression, and its upregulation in liver metastatic tumors relative to primary colon tumors have been well documented [26]. The matched COAD tumors and normal tissue samples on TCGA overwhelmingly consist of primary colon tumors, so the significace of this gene in metastasis to high-CXCL12 expressing tissues does not come through.
Transcriptomic profiling is a powerful tool for assessing the translational validity of animal models, allowing for interspecies comparisons to human clinical samples.
Most changes between metastatic liver tumors and their primary colon tumors were due to metastatic tumors losing colon-identity gene expression and gaining liver-identity gene expression.
The CMT-93 portal vein injection model showed poor translational validity in my analysis of human clinical samples. Artificial injection route, cell line culture adaptations, and commercially constrained gene selection could all contribute to this lack of concordance. However, CXCR4 was a robust cross-species candidate gene that was significantly related to liver tumors in both the mouse model and clinical samples.
Using data from TCGA may miss import biology relevant to metastatic disease since the database includes mostly primary tumors. Matched clinical datasets with normal tissue, primary tumor, and metastatic tumor from the same patient are essential for studying metastasis.