Computational protocol: The Molecular Ecology of the Extinct New Zealand Huia

Similar protocols

Protocol publication

[…] Measures of genetic variation, including observed and expected heterozygosities (HO and HE, respectively) and number of alleles, were calculated in Arlequin 2000 ( GENEPOP version 3.4 was used to test for linkage disequilibrium and deviation from Hardy-Weinberg parameters . The sequential Bonferroni correction was applied to adjust the level of significance for multiple tests . False alleles can occur in heterozygous and homozygous genotypes. Therefore, the rate of false allele formation is estimated across all positive amplifications. In contrast, allelic dropout can only be detected in heterozygotes; therefore, the allelic dropout rate is calculated using only the positive amplifications of heterozygotes. The rate of allelic dropout (ADOμ) and the occurrence of false alleles (FAμ) was calculated for each locus and across all loci using the equations below, as recommended by Broquet & Petit (these calculations included the genotype data independently replicated at the University of Auckland).Allelic dropout was calculated using the equation ADOμ  =  Dj/Ahetj, where Dj  =  the number of amplifications of locus j where an ADO event is observed, and Ahetj  =  the number of positive amplifications of heterozygotes. The number of false alleles (FAμ) was determined using FAμ  =  Fj/Aj, where Fj  =  the number of amplifications at locus j where a false allele is observed, and Aj  =  the total number of amplifications (both hetero- and homozygotes).The probability of false homozygotes at each locus after repeated PCR reactions (P) was calculated using the equation P = (K)×(K/2) n−1 where K  =  the ADOμ at each locus and n is the number of repeated amplifications; in this work, n = 7.Consensus genotypes obtained from the multiple tubes method were used to examine population structure in Huia. The lack of provenance for the majority of samples prevented the application of traditional population genetic analyses such as F-statistics. Instead, two Bayesian clustering methods that do not require prior population information to partition samples into genetic groups were used to detect any possible genetic structuring in the Huia microsatellite dataset: STRUCTURE 2.1 , and PARTITION . Neither method requires the population of origin for individual samples, or even the number of sampled populations (K) to be known. Both methods identify clusters of individuals that are in Hardy-Weinberg and linkage equilibrium, but differ in their treatment of admixed individuals .STRUCTURE was used with no input of prior population information in relation to individual samples, and admixture was assumed. Allele frequencies among clusters were considered to be independent to prevent overestimation of cluster number . Four independent analyses of K = 1–5 were performed using 106 MCMC repetitions with the first 50,000 repetitions being discarded as ‘burn-in’ following visual confirmation that equilibrium had been reached. To select the optimal K, the posterior probabilities of the data, P(X/K) were calculated from the mean estimate log-likelihood of each K (lnP(X/K)).PARTITION was applied only to the Huia samples that possessed a full complement of genotype data because missing data are not permitted in this software package. The parameter μ (the prior probability distribution on K) was set at 1, i.e. equal probabilities of each K were assumed, and the parameter θ (the prior distribution of alleles in the ancestral population) was varied from 1 to 20. The maximum number of source populations was changed with each analysis from 4 to 8. Estimates of the posterior probabilities were made after 50,000 observations of the Markov chain, with the first 5,000 observations omitted as ‘burn-in’.Shared allele distances (1minus half the average number of shared alleles per locus) between Huia were calculated online ( The mean number of alleles (NA) per locus and mean expected heterozyosity (HE) were compared between Huia and Saddleback. Data from 41 individuals from the Hen Island population of Saddleback were used in these comparisons as all contemporary Saddleback populations derive from this island. [...] The genetic diversity parameter (θ) was calculated using a Bayesian framework that uses a coalescent approach to obtain a joint estimate of various population genetic parameters such as genetic diversity, growth, migration, and recombination rates. We estimated θ from 21 individuals for a 199 bp fragment of the mitochondrial hypervariable region. To ensure that the Bayesian estimate of θ was robust, we performed a number of repeat analyses. Fourteen preliminary analyses were conducted with a range of starting parameters (e.g. sample size and sampling increment). Posterior probability distributions were compared between runs to assess whether we converged on an estimate of θ. We also assessed convergence by calculating the effective sample size (ESS) (using the program Tracer v1.4, ( An ESS of 100–200 has been suggested to indicate convergence. Our estimates were well above this value, in the thousands or greater. After preliminary analyses were completed, a final estimate of θ was performed from 10 replicates, each with the following starting parameters: θ = 0.015, and a linear prior of 0.0001–3. Two initial chains sampled 5000 trees with a sampling increment of 40, of which the first 7000 trees sampled were discarded, followed by 4 final chains to produce the estimate, in which, after a burn-in of 5000 trees, every 50th tree of 5×106 trees was sampled. Adaptive chain heating was used (chain temperatures 1, 1.1, 1.2, and 1.3). We also performed separate LAMARC analyses to test for signatures of exponential growth or shrinkage using the same searching strategy. […]

Pipeline specifications

Software tools Arlequin, Genepop, BEAST, LAMARC
Application Population genetic analysis
Organisms Heteralocha acutirostris, Homo sapiens