Computational protocol: Examining the role of common and rare mitochondrial variants in schizophrenia

Similar protocols

Protocol publication

[…] We used HaploGrep [] to assign haplotypes to haplogroups and to check for potential contamination in our dataset. Briefly, HaploGrep weights each polymorphism present in Phylotree17 (a phylogenetic tree of worldwide human mitochondrial DNA variation) based on its informativeness to define haplogroups. The set of SNPs in the input file are classified as informative or remaining (not informative). A score is given based on the weights of the “informative SNPs” but it is “penalized” by the number of remaining SNPs (for details see (14)). There are eight main Europeans haplogroups under the N clade: HV (H, V), JT (J, T), U (which includes K), I, W, and X []. To check for contamination or uncertainty in the haplogroup classification (the last due to limited number of SNPs available in the genotyping chip), we attempted to assign haplogroups with the remaining SNPs. In clean data, the reference group H2a2a1 is expected, but if a different haplogroup is assigned, it suggests contamination. We found a second haplogroup for 3.5% of the sample and we excluded these individuals (N = 380) from further analysis. Thus, the total number of samples available for analysis was 10,214 (4,591 cases and 5,623 controls). We also excluded samples with imputed haplogroups with overall rank less than 0.8. [...] We developed a novel approach for mitochondrial DNA imputation that allowed us to gain data for another 30 common SNPs giving us 71 in total (considering post-imputation filters of “info” score > 0.3 and MAF > 1%). We downloaded 7,141 public European mitochondrial sequences from Human Mitochondrial DataBase [] and used them as reference panel (SNP N = 188 after filtered by MAF > 1%). Imputation was performed using IMPUTE2 v.2 software [], following the instructions for chromosome X and recoding all individuals in Swedish data set as males for the purpose of this analysis (List of haplotypes with genotyped/imputed SNPs are in ). We evaluated imputation performance, both by removing genotyped SNPs one at a time from input files and confirming accuracy of imputed genotypes and by comparing allele frequencies of the imputed SNPs with reported frequencies from other datasets (see section).To determine the accuracy of our approach to assign haplogroups based on genotypes and to check whether imputed data would more accurately identify haplogroups than genotyped-only SNPs (from the Illumina HumaExome array), we performed the following analysis: i) We determined which haplogroups were present in our data using HaploGrep2; ii) We then extracted the full profile for each of these from Phylotree17; iii) From these profiles we selected only SNPs present on the Illumina HumanExome array to create pseudo-samples; and iv) We then imputed back any missing data for these pseudo-samples. Then, we compared the Phylotree assigned haplogroup with the haplogroups defined based on imputed (genotyped/imputed SNPs) and genotyped-only SNPs. […]

Pipeline specifications

Software tools HaploGrep, IMPUTE
Databases HmtDB nextstrain
Applications Phylogenetics, GWAS