Computational protocol: Whole exome sequencing as a diagnostic tool for patients with ciliopathy-like phenotypes

Similar protocols

Protocol publication

[…] Exome sequencing and analysis was performed at the Centro Nacional de Análisis Genómico (CNAG-CRG, Barcelona, Spain). For exome enrichment the NimbleGen SeqCap EZ v3.0 system following manufacturer's protocol version 4.2 was used and pre-capture multiplexing was applied. Briefly, 1 μg of genomic DNA was fragmented with Covaris ™E210 and used for ligation of the adapters containing Illumina specific indexes with a KAPA Library Preparation kit (Kapa Biosystems). Adapter ligated DNA fragments were enriched by 7 cycles of pre-capture PCR using KAPA HiFi HotStart ReadyMix (2X) (Kapa Biosystems) and analysed on an Agilent 2100 Bioanalyzer with the DNA 1000 assay. Five libraries were pooled with a combined mass of 1250 ng for the baits hybridisation step (47°C; 68 h). After washing (47°C), multiplexed captured library was recovered with Capture beads and amplified with 14 cycles of post-capture PCR using KAPA HiFi HotStart ReadyMix (2X). Size, concentration and quality of the captured library were determined using an Agilent DNA 1000 chip. The success of the enrichment was measured by qPCR SYBR Green assay on a Roche LightCycler® 480 Instrument evaluating one genomic locus with pre- and post-captured material.Each library pool was sequenced on an Illumina HiSeq 2000 instrument in a fraction of a sequencing lane following the manufacturer’s protocol, with a paired end run of 2x101bp. Image analysis, base calling and quality scoring of the run were processed using the manufacturer’s software Real Time Analysis (RTA 1.13.48) and followed by generation of FASTQ sequence files by CASAVA. [...] Sequencing reads were trimmed from the 3’ end up to the first base with a Phred quality >9 and were mapped to the Human Genome Reference v37 with decoy sequences (Broad Institute), using GEM []. BAM files containing only properly paired and uniquely mapped reads were processed with picard tools v1.110 to remove duplicates, and local realignment was performed with the Genome Analysis Tool Kit (GATK) v3.1 []. Samtools v0.1.19 [] was used on the processed BAM files to call single nucleotide variants (SNVs) and small insertion deletions (INDELs). Functional annotations from Ensembl release 75 [] were added to the resulting Variant Call Format (VCF) file using snpEff []. snpSift [] was used to add information from dbSNP v137 [], the 1000 Genomes Project (1000GP) [], the NHLBI Exome Sequencing Project [Exome Variant Server, NHLBI GO Exome Sequencing Project (ESP), Seattle, WA] and a variety of conservation and deleteriousness predictions included in dbNSFP v2.5 []. [...] This process was carried out following a strategy of our own design to reach a small number of candidate variants (). Homozygous variants in known cilia-related genes were analysed first, as well as compound heterozygous variants in these genes, and then, we proceeded in the same way with homozygous variants in other genes. Only positions with a coverage of at least 15X and a genotype quality >20 (indicating only confident positions) were considered. Subsequently, variants with a minor allele frequency >0.01 in the dbSNP, and also with an alternative allele frequency >0.01 in NHLBI ESP or 1000GP databases were excluded. Next, based on functional annotation, variants with a snpEff predicted high or moderate effect were kept, whereas synonymous coding and intronic variants not associated with splice site alterations were excluded. Finally, the variants selected in the previous step were evaluated with several in silico prediction algorithms to evaluate the predicted effect at protein level (PolyPhen-2 [], SIFT [], Mutation Taster [] and the likelihood ratio test (LRT) []). Once potentially pathogenic variants were selected, COBALT [] was used to analyse protein residue conservation across species. Additionally, Endeavour [] and ToppGene Suite [] prioritisation tools were used to rank the final list of variants, with a training set of 51 known ciliary genes (). For those top genes, STRING [] and GENEMANIA [] tools were used to elucidate possible co-expression and/or interactions between the corresponding encoded proteins and other ciliary proteins. When appropriate, the potential effect on splice sites was assessed using several prediction tools: NNSplice [], NetGene2 [], Human Splicing Finder [] and ASSEDA [], all with default settings. Finally, WES data were used to check if candidate variants localize in runs of homozygosity (ROH) regions, what is expected in consanguineous cases. Thus, homozygosity mapping with exome data was carried out with PLINK v1.9 [] using optimized settings for Exome data []. […]

Pipeline specifications

Software tools BaseSpace, Picard, GATK, SAMtools, SnpEff, SnpSift, PolyPhen, SIFT, ENDEAVOUR, ToppGene Suite, GeneMANIA, NNSplice, NetGene2, HSF, ASSEDA, PLINK
Databases dbNSFP dbSNP Exome Variant Server
Applications WGS analysis, WES analysis, GWAS
Organisms Homo sapiens
Diseases Bardet-Biedl Syndrome, Genetic Diseases, Inborn