Computational protocol: R-Spondin chromosome rearrangements drive Wnt-dependent tumour initiation and maintenance in the intestine

Similar protocols

Protocol publication

[…] The quality of the raw FASTQ files were checked with FastQC. Raw FASTQ reads were mapped to mouse reference GRCm38 using STAR two-pass alignment (v2.4.1d; default parameters), estimated gene expression using cufflinks (v2.2.1), and quantified per-gene counts using HTseq (v0.6.0). An FPKM threshold of 0.1 was set to define expressed genes and used the DESeq2 variance stabilized transform function on the read counts for downstream unsupervised analyses. To confirm the presence of the PTPRK–RSPO3 gene fusion we mapped reads to GRCm38 using STAR chimeric alignment, extracted reads that mapped +1/−1 base pair around the expected junction location, realigned to the fused and wildtype sequences and visualized with IGV. We additionally examined the read counts mapping the RSPO3 and PTPRK exons to confirm the fusion. To confirm that APC group models transcriptional profiles in colorectal cancer we extracted all gene sets from the C2 MSigDB database that contain either the keyword ‘colorectal’ or ‘colon’. We then performed GSEA pre-ranked analysis (v2.2.1) on the log2 fold change calculated after testing differential expression between the APC-mutant and APC-WT groups. To provide additional validation, we developed a gene signature of colorectal cancer (COAD) using TCGA data downloaded from Firehouse. The downloaded data included 459 COAD samples and 41 normal control samples. We tested differential expression between normal and cancer samples to develop a 1158 gene signature (adjusted P-value of 0.01 and log2 fold change of 3). We again used GSEA pre-ranked on the log2 fold changes between APC-mutant and APC-WT to confirm enrichment of our COAD gene signature. We used R (v3.2.2) and the gplots package to create all visualizations and to perform hierarchical clustering and principal component analysis. We used the vertebrate homology list provided by Mouse Genome Informatics (ftp://ftp.informatics.jax.org/pub/reports/HOM_MouseHumanSequence.rpt) to convert mouse gene symbols to human gene symbols. We used DESeq2 to perform all differential expression analyses (v1.12.3), and utilized Nextflow (http://dx.doi.org/10.6084/m9.figshare.1254958) to implement our computational pipelines. […]

Pipeline specifications

Software tools FastQC, STAR, Cufflinks, HTSeq, DESeq2, IGV, GSEA, gplots, NextFlow
Databases TCGA Data Portal MSigDB
Application RNA-seq analysis
Organisms Homo sapiens, Mus musculus
Diseases Colonic Neoplasms, Neoplasms