Computational protocol: High-resolution detection of chromosomal rearrangements in leukemias through mate pair whole genome sequencing

[…] Libraries were prepared from 1 microgram of high molecular weight genomic DNA using Illumina's Nextera Mate Pair Sample Preparation Kit, according to the manufacturer’s instructions for a gel-free preparation of 2 kb effective insert size library (size distribution mode 2 kb). The libraries were sequenced on an Illumina HiSeq 2500 sequencer, 2x100 bp to an average raw coverage depth of 5x. Raw sequence reads were base-called using CASAVA RTA 1.18. [...] Following Illumina guidelines for mate-pair post processing, adapter sequences were removed using Trimmomatic v0.32. []. The remaining pairs were aligned to the hg19 human reference genome sequence using bwa 0.7.4-r385 [] and resulted in a 3x mapped coverage. The mapped reads were processed using locally developed software TIDDIT ( [], publically available under General Public License version 3.0, implementing a sliding window analogue of a previously published procedure [].A list of detected rearrangements was generated by FindTranslocations with a sliding window size of 10000, a min insert size of 1000, max 100000, outtie orientation and a minimum supporting pairs cutoff at 8 and other parameters at default. Non-unique events were filtered out using a set of 35 germ-line genomes investigated by the same method []. None of these 35 genomes harbored rearrangements in regions recurrently involved in leukemia investigated in our sample cohort. Links supporting events connecting areas with repetitive sequences, which were annotated using repeat tracks from the UCSC genome browser, were considered misalignment artifacts and not further investigated. The remaining events are presented as a list of putative leukemia-specific events and graphically as circle plots [].A schematic representation of the whole workflow is shown in . All events were manually inspected at the nucleotide level using Integrative Genomics Viewer (IGV) []. Polymerase chain reaction and Sanger sequencing were used to validate unique events, with sufficient support from manual inspection that had escaped detection during routine analysis. Primer sequences and PCR conditions are described in the supplementary material, and . […]

Pipeline specifications

Software tools BaseSpace, Trimmomatic, BWA, TIDDIT, IGV
Databases UCSC Genome Browser
Applications WGS analysis, Genome data visualization
Organisms Ilex paraguariensis, Homo sapiens, Danio rerio
Diseases Leukemia
Chemicals Nucleotides