The cancer genome atlas data processing software tools | Genome annotation
The Cancer Genome Atlas (TCGA) research network has made public a large collection of clinical and molecular phenotypes of more than 10 000 tumor patients across 33 different tumor types. Using this cohort, TCGA has published over 20 marker papers detailing the genomic and epigenomic alterations associated with these tumor types. Due to the complexity of this collection, numerous software tools have been developed for retrieving, exploring, visualizing, and analyzing TCGA data.
Assesses the effect of genes on survival using cancer samples. KM plotter is a web application, developed for meta-analysis-based biomarker assessment, that can be used for breast, ovarian, lung, gastric, and liver cancer. The software includes patients with a mean follow-up of 69 / 40/ 49 /33 months. The subtool miRpower enables the validation of the prognostic relevance of microRNAs (miRNAs) in breast and liver cancer.
Provides a web resource for exploring, visualizing, and analyzing multidimensional cancer genomics data. cBioPortal reduces molecular profiling data from cancer tissues and cell lines into readily understandable genetic, epigenetic, gene expression, and proteomic events. The query interface, combined with customized data storage, enables researchers to interactively explore genetic alterations across samples, genes, and pathways and, when available in the underlying data, to link these to clinical outcomes.
Enables users to perform a diverse range of gene expression analyses. GEPIA is an interactive web application that analyzes the RNA sequencing expression data of more than 9,000 tumors and 8,000 normal samples from The Cancer Genome Atlas (TCGA) and the GTEx projects. It provides customizable functions such as tumor/normal differential expression analysis, profiling according to cancer types or pathological stages, patient survival analysis, similar gene detection, correlation analysis and dimensionality reduction analysis.
Performs secure analyses and visualization of users’ private functional genomics data set in the context of public and shared genomic/phenotypic data sets. UCSC Xena is a Functional Genomics Browser that provides analytics, visualization and Galaxy integration for analyzing and viewing the public data hubs. It gives access to public databases (Xena Public Data Hubs) and allows users to mix in and compare private data (Xena Private Data Hubs).
Aids in querying, downloading, analyzing and integrating The Cancer Genome Atlas (TCGA) data. TCGAbiolinks can: i) facilitate the TCGA open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) allow user to download a specific version of the data and thus to easily reproduce earlier research results. It provides multiple methods for analysis and methods for visualization in order to easily develop complete analysis pipelines.
Automatically generates, testes and deploys such clients for rapid response to API changes. Firebrowse provides a raw construct of an R function including all the comments, which will be rendered as the documentation when the final package is built. Firebrowse can be integrated into existing analysis workflows. Using such API clients over the download of flat files has several advantages, including having the latest data available, making the process of data importing obsolete and avoiding data re-formatting, which often serves as an additional source of errors.
Profile the content of a miRNA sequencing run. Given a set of aligned reads in 1 or more .sam files, produce an annotated version of the .sam where each read is given an annotation based on its coordinate. Additional summary information about the content of each sample is also generated, including miRNA species and other genomic features found.
A software tool that integrates multi-resource omics data. CrossHub was designed to analyze TCGA transcriptomic and epigenomic data in the context of ENCODE, Jaspar and various miRNA target prediction algorithms. This approach is intended to reveal gene expression regulation mechanisms such as methylation, transcription factor (TF)-mediated transcription repression/activation and microRNA interference. CrossHub has a scalable design intended to analyze more various cancer types available in TCGA. This tool may be a starting point for integrating the data of several major projects such as TCGA and ENCODE.
Provides data-derived statistical network models for 8 human cancers. CL contains several functions for biological interpretability of the network models, such as pathway analysis, drug-target recommendations, survival network analysis; and candidate gene selection. Cancer Landscapes is also a community server that enables you to mark and comment predictions of interest, for instance to plan experiments with collaborators.
Provides data structures and methods to represent, manipulate, and integrate multi-assay genomic experiments. MultiAssayExperiment permits to integrate any data class that supports basic subsetting and dimension names. It supports many data classes by default without additional accommodations. The tool was used to visualize the overlap in assays performed for adrenocortical carcinoma patients. It had permit to confirm correlations between somatic mutation and copy number burden in colorectal cancer and breast cancer.
Offers a solution for the visualization, querying and exploration of data of interest within complex datasets. CGC supports best-practice, reproducible scientific research at scale. It preserves privacy and security and renders massive cancer genomic datasets available and usable for research. This tool can be considered as a gateway for an ecosystem of interoperable cloud resources to support scientific discovery.
Allows researchers to use publicly available data to study prognostic implications of genes of interest in multiple cancers. PROGgene is a survival analysis tool for performing biomarker identification with a large repository of public datasets. Users can upload their own gene expression datasets. The software can be used to conduct on the fly survival analysis and create survival plots (Kaplan Meier, KM plots) based on gene expression of user input genes in user selected datasets from multiple cancers.
An open source software package to obtain the TCGA data, wrangle it, and pre-process it into a format ready for multivariate and integrated statistical analysis in the R environment. In a user-friendly format with one single function call, our package downloads and fully processes the desired TCGA data to be seamlessly integrated into a computational analysis pipeline. No further technical or biological knowledge is needed to utilize our software, thus making TCGA data easily accessible to data scientists without specific domain knowledge.
Generates interpretable gene factor matrix by constraining the factor matrices based on prior classification information (such as functional gene sets). GIFT offers interpretable factor matrices by encoding prior knowledge as a regularization term in its objective function. One of the specificity of this software is the existence of mask matrices in the regularization term of the loss function that allows a soft regularization. It can also identify non-member genes to be related to the gene set.
Offers a visual web editor for cancer pathways. PathwayMapper can be used for viewing pre-curated cancer pathways with the ability to overlay genomic alteration data, and as an interactive graphical editing tool for creating and modifying pathways. It supports remote users collaborating on pathway editing. The tool supports alignment operations by showing guidelines as two nodes are aligned as well as facilitating operations for horizontal and vertical alignment of two or more nodes based on the first selected node.
Produces an interpretable factor matrix and silences factor values combined with the non-member of pre-defined gene groups. Silenced-TF obliges factor values of masked regions to be zeros and then update the rests by exploiting the row-wise update rule for the P-Tucker software. This software can pinpoint the significance of genes but can not find new genes that show association with a latent function, but this limit is addressed with the GIFT method.
Allows the querying and visualization of the expression, DNA methylation and clinical Cancer Genome Atlas (TCGA) data on a single-gene level. MEXPRESS is a data visualization tool that provides correlation among datasets and which is able to integrate visualizations of different data types for hundreds of samples. The software can be useful to test hypotheses that concern the discovery of DNA methylation or expression-based biomarkers.
Enables researchers to study the expression level of genes to compare primary tumor with normal tissue samples. UALCAN is an interactive web resource that provides critical information and graphic ability to make stage, grade, race and other sub status specific expression features from transcriptome sequencing data. This portal can aid in the identification of candidate biomarkers of specific cancer subclasses, with diagnostic, prognostic or therapeutic implications. It can also be used as a platform for in silico validation of target genes.
Enables data processing and prioritization of candidate cancer genes. OncoScape is a method for integrating several genome-wide datasets to characterize the molecular aberration landscape of human cancer. The software provides functionality to assess alterations of single genes for each data type and each cancer type individually. It can be applied in several additional contexts, such as the comparison of patients responding or being resistant to specific treatments or to identify specific characteristics of different subtypes of a certain disease.
Facilitates expression-based isoform-level analysis of large-scale TCGA (The Cancer Genome Atlas) multi-cancer RNA-seq data. ISOexpresso is a web application that provides isoform-level expression profiles for genes of interest and creates visualizations. The software includes two functions: (1) the Isoform Expression View function for which delivers condition or tissue specific isoform expression patterns and (2) the User Data Annotation function which offers annotation results of user provided variants via expression-based prioritization of gene isoforms.