Computational protocol: Disruption of NCOA2 by recurrent fusion with LACTB2 in colorectal cancer

Similar protocols

Protocol publication

[…] Total RNA isolated from the primary colon tumor tissue was subject to transcriptome sequencing. Poly-A-containing mRNA purification, double-stranded cDNA synthesis, end repair, 3' end adenylation, adapter ligation and enrichment of DNA fragments for RNA-Seq library construction were performed using the reagents provided in the Illumina TruSeq RNA Sample Preparation Kit (Illumina, San Diego, CA, USA). RNA-Seq library sequencing was then performed on an Illumina HiSeq 2000 (Illumina) as per manufacturer's instructions. The sequencing reads were first filtered to remove low quality reads or reads with adapters. All remaining qualified reads were aligned to human hg19 downloaded from UCSC (Santa Cruz, CA, USA). In total, 11.5 gb qualified sequence reads were obtained, of which ~71% mapped uniquely to the human genome. The fragments per kb of exon per million fragments mapped (FPKM) expression levels for each gene were calculated using the program Cufflinks. Both SOAPfuse ( and deFuse programs (Vancouver, BC, Canada) were used for scanning of fusion RNAs using transcriptome data. The de novo transcriptome assembler SOAPdenovo-Trans was then used for transcript assembly to double check gene fusions matching somatic SVs ( [...] Genomic DNA from primary tumor and peripheral blood was fragmented to an average size of 500 nucleotides. Standard Illumina protocols (Illumina) and Illumina paired-end adapters (Illumina) were then used for library preparation. DNA library sequencing was then performed on an Illumina Solexa sequencing platform (Illumina) as per manufacturer's instructions. The general clustering diagram was used to call SVs similar to dRanger (Cambridge, MA, USA). First, fragment-length distribution was analyzed to evaluate the library insert size range and filter discordant read pairs, which refer to mapping to different chromosomes or in unexpected positions (>max library insert size+4.5 × s.d.) or unexpected orientations (incorrect order on opposite strands or any order on the same strand) on the same chromosome. Second, clusters of discordant pairs implicated potential rearrangements and determined the rough range of SVs. To filter out somatic SVs, clusters with any supporting discordant pairs in the same region from matched normal were discarded. Moreover, clusters with less than five supporting read pairs or falling to UCSC simple Repeat regions were discarded. To identify the exact SV breakpoints at single nucleotide level, an in-house program SeekSV was used. Similar to CREST, SeekSV used next-generation short reads with partial alignments with the reference genome to call SVs and included four steps as follows: (i) obtain soft-clipped reads from the BWA alignment results; (ii) align the clipped sequences (unmapped parts of the soft-clipped reads) with the human reference genome hg19; (iii) obtain the SV breakpoints according to the break end positions in the alignment results; (iv) obtain somatic SV breakpoints by comparing SVs in tumor with those in blood. SV breakpoints identified by SeekSV with concordant SV clusters predicted by the clustering method above were then selected to fill the SV gaps. Sequences disrupted by SVs were annotated to gene regions according to refGene database downloaded from UCSC. […]

Pipeline specifications

Software tools dRanger, Seeksv, BWA
Application WGS analysis
Diseases Colonic Neoplasms, Neoplasms, Colorectal Neoplasms