RNA-Seq is a sequencing method used to determine gene expression levels. RNA-Seq data originates from extracted RNA that was reverse transcribed into DNA and sequenced on a next-generation sequencing platform. The number of reads determined to have originated from each transcript (usually by alignment) are proportional to their expression level.

RNA-Seq data in the GDC is used to generate a gene expression profile for tumor samples across many cancer types and to determine which gene expression levels are responsible for tumor development. The GDC harmonizes RNA-Seq data by aligning raw RNA reads to the GRCh38 reference genome build and calculating gene expression levels with standardized protocols1. RNA-Seq data is mostly available for tumor samples, although some normal samples have associated RNA-Seq data.


Download Tcga Rnaseq Data


Download File 🔥 https://urllie.com/2y683c 🔥



RNA-Seq data is available as aligned reads (BAM) and expression levels as: raw counts and normalized with TPM, FPKM, or FPKM-UQ. Reads that did not align are also included in BAM files to facilitate the retrieval of the original raw data.

Over the next dozen years, TCGA generated over 2.5 petabytes of genomic, epigenomic, transcriptomic, and proteomic data. The data, which has already led to improvements in our ability to diagnose, treat, and prevent cancer, will remain publicly available for anyone in the research community to use.

The Cancer Genome Atlas (TCGA) has accrued RNA-Seq-based transcriptome data for more than 4000 cancer tissue samples across 12 cancer types, translating these data into biological insights remains a major challenge. We analyzed and compared the transcriptomes of 4043 cancer and 548 normal tissue samples from 21 TCGA cancer types and created a comprehensive catalog of gene expression alterations for each cancer type. By clustering genes into co-regulated gene sets, we identified seven cross-cancer gene signatures altered across a diverse panel of primary human cancer samples. A 14-gene signature extracted from these seven cross-cancer gene signatures precisely differentiated between cancerous and normal samples, the predictive accuracy of leave-one-out cross-validation (LOOCV) were 92.04%, 96.23%, 91.76%, 90.05%, 88.17%, 94.29% and 99.10% for BLCA, BRCA, COAD, HNSC, LIHC, LUAD and LUSC, respectively. A lung cancer-specific gene signature, containing SFTPA1 and SFTPA2 genes, accurately distinguished lung cancer from other cancer samples, the predictive accuracy of LOOCV for TCGA and GSE5364 data were 95.68% and 100%, respectively. These gene signatures provide rich insights into the transcriptional programs that trigger tumorigenesis and metastasis and many genes in the signature gene panels may be of significant value to the diagnosis and treatment of cancer.

Recent advances in cancer genomics have created a rich resource for studying the causes of cancer. The Cancer Genome Atlas (TCGA)1 ( ) has accrued more than 10,000 cases of human cancer including over 25 different cancer types. Datasets including RNA-Seq, miRNA-Seq, Exon-Seq, somatic mutations, methylation, CNV for each case are publically available via the TCGA Data Portal ( -data.nci.nih.gov/tcga/tcgaHome2.jsp) and UCSC Cancer Genomics Hub ( ). Translating these data into biological insights remains a major challenge. Currently several studies have analyzed genome-wide mutational patterns in different cancer types and identified genes harboring functional mutations implicated in cancerogenesis2,3,4,5. Cancer is thought to be driven by gene expression pattern changes due to the accumulation of mutations or epigenetic modifications; thus, a comprehensive characterization of alterations in gene expression will not only advance our understanding of cancer biology, it will also provide a large number of new potential diagnostic and therapeutic targets for cancer. Cheng et al.6 introduced a method to identify cancer-associated attractors and revealed some interesting bimolecular events shared among multiple cancer types based on microarray gene expression data. However, genome-wide association analysis of RNA-Seq transcriptome data across various TCGA cancer types has rarely been reported. RNA-Seq, a revolutionary technology for genome-wide gene expression profiling, offers several key advantages compared to microarrays7, it could better characterize the transcriptomic changes associated with human cancers.

In this study, we analyzed and compared the RNA-Seq transcriptomes of 4043 cancer and 548 solid tissue normal samples across 21 types of cancer from TCGA. We created a catalog of gene expression alterations for each cancer type and our results show that the alterations in gene expression vary substantially between different tumor types. Studies have shown that cancer involves many different genes and a majority of these genes have a small to moderate effect8, it is difficult to detect these effects by single gene analysis. By clustering genes into co-regulated gene sets, we are able to examine accumulative effects of a group of functionally related genes. We performed gene set association analysis for each cancer type; our results revealed several common gene signatures shared by multiple cancer types and a lung cancer-specific gene signature. We also validated these signatures using several non-TCGA data sets. These cross-cancer and cancer-specific transcriptional aberrations improve our understanding of the etiology of human cancers and are of great importance for the diagnosis and treatment of cancer.

The cell cycle lies at the core of cancer16,17. In normal cells, the cell cycle is controlled by a series of signaling pathways by which a cell grows, replicates its DNA and divides. In cancers, as a result of mutations, this regulatory process malfunctions, resulting in uncontrolled cell proliferation that leads to carcinogenesis18,19. From the perspective of pathway, we hypothesize that there may be two potential carcinogenic mechanisms, as illustrated in Fig. 1: (1) one or more driver mutations are within a cell cycle-associated pathway, altering its expression pattern and consequently leading to cancer; (2) one or more driver mutations lie in an organ/tissue-specific pathway or other pathways not related to cell cycle, which interacts with a cell cycle-associated pathway, alters its expression pattern and ultimately results in cancer. Since the deregulation of cell cycle is a common characteristic shared by multiple cancer types, we expected that the expression of cell cycle-associated pathways would be altered across a range of cancers. By analyzing and comparing the transcriptome data of 12 cancer types, we can test this hypothesis.

We have shown that the cancerous and adjacent normal samples from BLCA, BRCA, COAD, HNSC, LIHC, LUAD and LUSC can be precisely classified using the 14-gene cross-cancer signature. To test if the same holds true for other non-TCGA data sources, we downloaded two RNA-Seq data sets, GSE40419146 and GSE50760147 and one microarray data set, GSE5364148, from the Gene Expression Omnibus (GEO: ). GSE40419 includes the RNA-Seq expression values for 87 lung adenocarcinomas and 77 adjacent normal tissues, while GSE50760 contains the RNA-Seq expression values of 54 samples (18 primary colorectal cancer, 18 liver metastasis and 18 normal colon) generated from 18 colorectal cancer patients. We performed LOOCV on these two data sets based on the expression values of the 14-gene cross-cancer signature. We found that the tumor and normal samples were accurately classified, the predictive accuracy for GSE40419 and GSE50760 were 97.14% and 93.33%, respectively. GSE5364 includes 341 samples from multiple solid cancers: 18 lung tumor samples, 12 lung normal samples, 183 breast tumor samples, 13 breast normal samples, 9 colon tumor samples, 9 colon normal samples, 9 liver tumor samples, 8 liver normal samples, 16 oesophagus tumor samples, 13 oesophagus normal samples, 35 thyroid tumor samples and 16 thyroid normal samples. LOOCV was carried out for tumor and normal samples of each tumor type in this data set, the predictive accuracy for lung, breast, colon, liver, oesophagus and thyroid samples were 100%, 93.37%, 100%, 100%, 94.12% and 68.63%, respectively. These results show that our 14-gene cross-cancer signature precisely differentiated between tumor and normal samples for all tumor types in GSE5364 except for those from the thyroid. Interestingly, we here were not able to effectively distinguish tumors from normal samples from the thyroid using this 14-gene cross-cancer signature and this is consistent with the results from the TCGA data.

We found that CLUSTER1520 is a lung cancer-specific gene signature. In the 548 adjacent normal tissue samples of 12 TCGA cancer types, the expression level of CLUSTER1520 in the lung tissue samples was strikingly higher than any other tissue samples and the same holds true for tumor samples if excluding THCA tumor samples from the analysis (Fig. 3). Moreover, CLUSTER1520 showed a substantially reduced level of expression in the lung tumor samples as compared to lung normal samples. In order to test if this signature can be used to differentiate lung tumors from other tumors, we divided all cancer samples from 12 TCGA cancer types into two classes: lung cancer samples (LUAD, LUSC) and non-lung cancer samples (BLCA, BRCA, COAD, HNSC, LIHC, KICH, KIRC, KIRP, PRAD, THCA) and performed LOOCV on these two classes of cancer samples using the expression values of CLUSTER1520. The predictive accuracy was 95.68%, namely we very effectively identified lung cancer samples out of a selection of 12 TCGA cancers based on the expression pattern of CLUSTER1520. We also validate that CLUSTER1520 is a lung cancer-specific gene signature on a non-TCGA microarray data set (GSE5364). GSE5364 includes 6 tumor types and we divided those tumor samples into two classes: lung tumor samples and non-lung tumor samples (breast, colon, liver, oesophagus, thyroid). The predictive accuracy of LOOCV for these two classes of tumor samples was 100%, this demonstrated that lung tumor samples and non-lung tumor samples were accurately classified based on CLUSTER1520. These results show that CLUSTER1520 is a lung cancer-specific gene signature and genes in this signature are potential targets for developing novel lung cancer therapies. 17dc91bb1f

net farmer work 4.7.2 download

download the bitcoin standard

umme habiba naats mp3 free download

ptt kargo iletiim

fake number for whatsapp apk download