Similar protocols

Protocol publication

[…] Quality control of the amplicon sequencing data was performed using FastQC (version 0.11.2) (). Primer sequences were removed from the 5′ ends of reads using the cutadapt tool (version 1.7.1) with the ‘anchor’ option (). The maximum error rate for primer sequences was set to 10%. Reads were mapped to the human genome reference sequence hs37d5 using BWA MEM (version 0.7.10; ). Picard (version 1.85; ), and DepthOfCoverage GATK tool (version 3.3–0; ) were used to collect alignment and amplicon coverage statistics. The median amplicon coverage of the negative control sample was used to determine the threshold for including samples from the same batch in further analysis. Only samples with a median coverage that was >10× that of the negative control were included. This removed 7 cases and 11 controls, leaving 434 cases (432 independent) and 389 controls for further analysis. The UnifiedGenotyper tool, as implemented in GATK version 2.6 (), was used for variant calling. ANNOVAR (version 2014 Nov 12) was used to provide functional annotation of the variants (). Variants were annotated with reference to the transcripts listed in . [...] Stop-gain, frameshift, and splice site variants were categorized according to their predicted effect on protein function. Missense variants reported as disease-causing in association with MND in the Human Gene Mutation Database () were categorized as pathogenic. The remaining missense variants were categorized according to in silico scores of pathogenicity and conservation. The scores used are listed below, with the thresholds for supporting pathogenicity/conservation in brackets: SIFT (=D), PolyPhen HDIV (=P/D), LRT (=D), Mutation Taster (=D), Mutation Assessor (=M/H), FATHMM (=D), CADD phred (>15), GERP (>2), phyloP (>2), and SiPhy (>10). Variants for which 7–10 in silico measures supported pathogenicity/conservation were categorized as likely pathogenic, variants with 4–6 measures supporting pathogenicity/conservation were categorized as uncertain significance, and variants with 0–3 measures supporting pathogenicity/conservation were categorized as likely benign (). [...] Fisher's exact tests were used in comparisons between the number of variants in cases and controls (, and ). One-tailed tests were used because of the prior assumption that cases contain more pathogenic variants than controls. Genotype-phenotype association testing was used to compare clinical phenotypes with different MND genotypes (). Only unrelated individuals (n = 431) were included in the analysis, with the first individual recruited to the SMNDR from each pair retained for analysis. Variables were examined for collinearity using Pearson's correlation coefficient. Age of onset and age at diagnosis were highly correlated (Pearson's correlation r = 0.98, p < 5 × 10−7), as were duration of disease from onset and duration of disease from diagnosis (r = 0.87, p < 5 × 10−7). Disease onset is more important to disease biology than date of diagnosis, which relies on clinical services; therefore, variables related to diagnosis were excluded from genotype-phenotype association testing. Time to diagnosis was also excluded from genotype-phenotype analysis, as this is a derived variable. Univariate analysis was carried out for the following variables: sex, age of onset, duration of disease from onset, site of onset, and family history. Two-tailed Fisher's exact test was used for categorical data and t-test or Mann Whitney U test was used for parametric and non-parametric continuous data, respectively (). Variables with significant univariate association at p ≤ 0.1 were inputted into binomial logistic regression models dependent on gene. Logistic regression analysis was used to test the hypothesis that significant variables were independently associated with having a pathogenic or loss-of-function variant in one of the genes tested. For the purposes of the model, age of onset was grouped by decade. Results from logistic regression modeling were considered significant if p < 0.05 (). SPSS Statistics version 21 was used for all statistical analysis. […]

Pipeline specifications

Software tools FastQC, cutadapt, BWA, Picard, GATK, ANNOVAR, PolyPhen, FATHMM, PHAST, SiPhy, SPSS
Applications Miscellaneous, Amplicon sequencing analysis
Diseases Motor Neuron Disease