Similar protocols

Protocol publication

[…] The coding regions and exon-intron boundaries (plus ≥ 10 bp of each intron) of 56 genes were enriched from germline DNA using a custom-designed HaloPlex Targeted Enrichment Assay panel (Agilent Technologies, Santa Clara, CA, USA). The libraries were sequenced on a HiSeq2500 Genome Analyzer (Illumina, San Diego, CA, USA) as described previously [].Sequencing data were processed and analysed using an in-house bioinformatics pipeline constructed using SEQLINER v0.1a (http://bioinformatics.petermac.org/seqliner). Raw reads (FASTQ files) were first quality-checked using FastQC (v0.11.2; http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and trimmed using cutadapt (1.7.1) [] to ensure high read quality. Filtered reads were then aligned to the human reference genome (GRCh37/hg19) using the Burrows-Wheeler Aligner tool [], with base quality score recalibration and indel realignment performed using the Genome Analysis Toolkit (GATK v3.2.2) []. GATK UnifiedGenotyper v2.4 (Broad Institute, Cambridge, MA, USA) [], HaplotypeCaller [] and PLATYPUS [] were used for variant calling. Annotation of variants was performed using a local copy of the Ensembl [] version R73 database and a customised version of Ensembl Variant Effect Predictor. Variants were determined by reference to the canonical transcripts. The Ensembl definition was as follows: (1) longest Consensus Coding Sequence Project translation with no stop codons; (2) if no (1), choose the longest Ensembl/Havana merged translation with no stop codons; (3) if no (2), choose the longest translation with no stop codons; (4) if no translation, choose the longest non-protein-coding transcript. Only variants that were identified by at least two variant callers with a total read depth of at least ten and an alternate allele read proportion ≥ 20% were included in the analysis. Loss-of-function (LoF) mutations were defined as stop-gained, frame shift or essential splice site mutations. The in silico assessment tools Condel [], Polymorphism Phenotyping version 2 (PolyPhen-2) [], SIFT [], Combined Annotation Dependent Depletion (CADD) [] and rare exome variant ensemble learner (REVEL) [] were used to examine the likely pathogenicity of missense variants. Variant were defined as “likely deleterious” when predicted deleterious or damaging by Condel, PolyPhen-2 or SIFT, or when they had a CADD score ≥ 15 or a REVEL sore ≥ 0.5. The Exome Aggregation Consortium (ExAC) and Exome Variant Server (EVS) databases were used as additional references for the frequency of variants in the general population. Because this study was focused on the identification of moderate- to high-penetrance alleles, which will be rare [, ], only variants with a population allele frequency ≤ 0.001 (in both overall and European Caucasian populations) were assessed. Variants were visually inspected using Integrative Genomics Viewer [, ] to exclude artifacts. […]

Pipeline specifications

Software tools FastQC, cutadapt, BWA, GATK, Platypus, Condel, PolyPhen, SIFT, CADD, REVEL, IGV
Application WGS analysis
Organisms Homo sapiens
Diseases Breast Neoplasms, Neoplasms