Computational protocol: Whole mitochondrial DNA sequencing in Alpine populations and the genetic history of the Neolithic Tyrolean Iceman

Similar protocols

Protocol publication

[…] Thirty-nine DNA samples belonging to the haplogroup K were selected from more than 800 individuals from the Eastern Italian Alps (from Trentino-Alto Adige, Veneto and Friuli regions) analyzed at low resolution in previous studies (). Additionally, forty-three new buccal swabs samples from South-Tyrol were collected after the authorization by the Ethic Committee of Azienda sanitaria dell’Alto Adige. Informed consent was obtained from all subjects. The experimental protocol was carried out in accordance with the approved guidelines and was approved by the above-mentioned Ethic Committee.DNA extraction was carried out using a commercial kit (Gentra Puregene Buccal Cell Kit, Qiagen). Then, samples were typed for informative SNPs of the mtDNA haplogroup K status (A10550G, T11299C and T14798C of the coding region) by using the SNAPShot method (Applied Biosystems Carlsbad, CA). Only three samples (~7%) carried the diagnostic variants. Afterwards, the total K samples (n = 42) were analyzed for the whole mtDNA genome (16569 base pairs) using the methods described by Torroni and collaborators with minor modifications. Nomenclature of mtDNA variants were referred against the rCRS.In order to classify the sequences in the sub-lineages of haplogroup K, we compared our data with the worldwide human mtDNA phylogenetic tree (PhyloTree Build 16) using Haplogrep 2.0 Beta version (http://haplogrep.uibk.ac.at/blog/visuali ze-yo/) and supervised manually. The new sequences were deposit in GenBank (accession numbers from KT749793 to KT749816). [...] We collated from different sources (GeneBank, The 1000 Genomes Project) all available complete K1 mitogenomes from present-day populations (n = 1042; October 2014). The 1000 Genome Project mitogenomes were processed using in-house scripts. We also compiled data regarding the geographic origin of the samples. More than four hundred mitogenomes (n = 409) were from individuals with reported European ancestry, 440 with unknown geographic origin and the remaining from non-European regions (see for more details).Phylotree Build 16 was used as a reference to draft the skeleton of K1 haplogroup and its branches; some branches were updated according to new available data (). For the most closely related sequences of the Iceman haplotype we reconstructed the phylogeny using the reduced-median algorithm (). Reticulations were resolved using the relative mutation rates at each position. The phylogenetic tree was drafted using mtPhyl v3.520 (http://eltsov.org) and adjusted manually according to Phylotree Built 16 and the data added in the present study.We estimated the Time of the Most Recent Common Ancestor (TMRCA) of clades utilizing a time-dependent clock incorporating a correction for purifying selection and using the maximum likelihood (ML) procedure with PAML 3.13. A skeleton of the tree based on a well known mtDNA phylogeny (Phylotree Build 16) was initially provided to the software in newick format. Then, TMRCA was obtained assuming the HKY85 mutation model (excluding indels, and hotspot mutations such as 16182C, 16183C and 16519C), and used the gamma-distributed rates (approximated by a discrete distribution with 32 categories) and three partitions: Hypervariable region I (HVS-I: positions 16051–16400), Hypervariable region II (HVS-II: positions 68–263), and the coding region. A set of outgroups from haplogroups were used that represent most of the main branches of the worldwide phylogeny, from the rCRS to the oldest African nodes.The BSP showing changes of the effective population size of haplogroup K1 through time was performed with the Bayesian Evolutionary Analysis Sampling Trees (BEAST) software. The analysis was made using only the mtDNA coding region. MCMC samples were based on 40,000,000 generations with the first 4,000,000 generations discarded as burn-in. We used a HKY substitution model and a strict clock with a substitution rate of 1.69 × 10-8 as reported in. BEAST outputs were analyzed with the software Tracer (http://tree.bio.ed.ac.uk/software/tracer/). […]

Pipeline specifications

Software tools HaploGrep, PAML, BEAST
Databases nextstrain
Application Phylogenetics