Computational protocol: Identification of a novel mutation in the APTX gene associated with ataxia oculomotor apraxia

[…] DNA libraries were processed and analyzed using Sentieon whole-exome analysis workflow (Version 201611.01) with default settings (Sentieon Inc.; Briefly, libraries were mapped with Burrows–Wheeler alignment (BWA)-mem software () (version 0.7.12) to human genome (hg19 version), and then realigned around indels with GATK IndelRealigner. Next, base recalibration was performed with GATK BaseRecalibrator taking into account the read group, quality scores, and cycle and context covariates. Variants were called with GATK HaplotypeCaller to generate genome variant call format (gVCF). Variant filtering and annotation was done using Golden Helix VarSeq Version 1.1 software (Golden Helix Inc.; The gVCF of each sequenced family member is uploaded to VarSeq and organized by pedigree. Variants are filtered for GQ > 20 and DP > 10. Of the variants that pass the filter, only those that are predicted to cause loss-of-function or missense mutations are analyzed. Only variants with allele frequencies of <0.01 or missing were analyzed. The public databases that were used in this study for determining allele frequencies are dbSNP Common 144 (NCBI;, 1000 Genome Project phase 3 (, Exome Aggregation Consortium version 0.3 (EXaC;, NHLBI GO Exome Sequencing Project (Exome Variant Server, NLHBI Exome Sequencing Project [ESP];, and UK10K project (ALSPAC—Variant Frequencies 2013-11-01, GHI; Based on the pedigree we predicted that the disease would follow an autosomal recessive pattern. Thus, we analyzed variants that are homozygous only in affected individuals.The total number of reads, percentage of aligned reads, and average read lengths of each WES sample are calculated using SAMtools with stats option. To evaluate the efficiency of the exome capture and the total coverage along the targeted regions, BEDTools software (Broad Institute; ) was used for the analysis. Briefly, BEDTools coverage with the –hist option was used to obtain a histogram coverage of each feature in the BED file of the Integrated DNA Technologies (IDT)’s exome probe target regions from the aligned bam files. The cumulative coverage plot of the exome-sequencing results was performed using R (the R Project for Statistical Computing; Briefly, the total number of reads for each alignment was calculated from the summary histogram for all of the features in the BED file. This value is then used to calculate cumulative coverage (expressed as percentage of total read) for each coverage depth. A graph of capture target bases (%) versus depth is then plotted using R's plot function. […]

Pipeline specifications