Similar protocols

Protocol publication

[…] Alignment and variant calling was performed within the BCbio framework ( Reads were aligned to the hg19 human reference genome assembly using BWA [], no realignment or recalibration was performed. Duplicate reads were removed from final BAM files using Samblaster []. Variants were called using VarDict [], with thresholds of minimal allowed read support of 3, minimal mean position in reads of 5, minimal mean base quality phred score of 25, and minimal mean mapping quality score of 10.Mutations were annotated using SnpEff [] according to the NCBI RefSeq’s gene model. Known somatic and germline actionable (i.e. known as responsive to a targeted therapy) mutations with allele frequency ≥ 2.5% were prioritized. Common germline SNPs, specifically SNPs not reported in COSMIC [], but reported in dbSNP [] and annotated mostly as benign or likely benign according to ClinVar [], or having a global minor allele frequency > 0.0025 in TCGA, were removed from downstream analysis. Additionally, variants were filtered by cohort frequency: novel variants present in ≥ 40% and ≥ 10 samples with average allele frequency < 15%, and any other variant present ≥ 75% and ≥ 10 samples, were considered too common to be functional. Germline variants found in the 14 matched normal samples were also excluded from downstream analysis. A comprehensive annotated mutation file is included as supplemental ( and ).Seq2C ( was used to estimate gene copy-number variation by comparing normalized mean gene coverage across samples in a cohort. Four cohorts were processed separately: two external datasets, both using regions from Agilent SureSelect Human All Exon V4 capture BED file; 38 RR samples using Agilent SureSelect Human All Exon V5 BED capture file; and 37 RR samples using Agilent SureSelect Human All Exon V5+UTR capture BED file. Outlier genes with low coverage were removed using a 3x upper/lower quartile threshold, and filtered data were segmented with the DNAcopy [] package using default settings in the R statistical software ( GISTIC2.0 [] was implemented to identify consensus copy number alterations using the following settings: gene.gistic = yes, amplifications.threshold = 0.2, deletions.threshold = 0.2, join.segment.size = 4, qv.thresh = 0.25, remove.X = yes, cap.val = 1.5, confidence.level = 0.75, broad.length.cutoff = 0.98, max.sample.segs = 2500, arm.peel = no. […]

Pipeline specifications

Software tools bcbio-nextgen, BWA, SAMBLASTER, VarDict, SnpEff, GISTIC
Databases ClinVar dbSNP TCGA Data Portal
Application WES analysis
Organisms Homo sapiens
Diseases Lymphoma, Neoplasms