Computational protocol: Historical and current introgression in a Mesoamerican hummingbird species complex: a biogeographic perspective

Similar protocols

Protocol publication

[…] Mitochondrial DNA sequences were assembled by eye and checked for stop codons using SEQUENCHER v. 4.9 (Genecodes, Ann Arbor, MI, USA) and then manually aligned with SE-AL v. 2.0a11 ( All unique sequences used in this study are deposited in GenBank (Accession numbers KM198972–KM199267; ).Using DnaSP v. 4.2 (), the sequences of each region and combined sequences were examined for haplotype variation (H), segregating sites (S), haplotype diversity (h) and nucleotide diversity (π) per species. A statistical haplotype network was generated with the median-joining algorithm in NETWORK v. ( using the combined mtDNA matrix () to visualize haplotype relationships. [...] A Bayesian inference (BI) phylogenetic tree was constructed using MRBAYES v. 3.1.2 () with data partitioned by region (tRNA) and each codon position within the coding genes (ND2, ATPase 6 and ATPase 8), as suggested by Bayes Factor analysis (). The substitution model for each partition was identified using jMODELTEST v. 0.1.1 () and the Akaike information criterion (AIC; ).Seven species of the emerald hummingbird group were included as outgroups, including conspecifics A. saucerottei from South America, A. edward, A. tobaci, A. viridigaster, A. cyanocephala, A. violiceps, A. viridifrons, and Campylopterus curvipennis based on and . Most of the sequences are new to this study; others were downloaded from GenBank (A. saucerottei, EU042523 and GU167205; A. viridigaster, EU042526).The BI analysis employed partition-specific DNA evolution models for each gene with two parallel Markov chain Monte Carlo (MCMC) analyses executed simultaneously; each was run for 20 million generations, sampling every 1000 generations. A majority consensus tree was calculated, showing nodes with a posterior probability (PP) of 0.5 or more. Bayesian PP values were calculated from the sampled trees remaining after 5000 burn-in samples were discarded () to only include trees after stationarity was reached. The consensus tree was visualized in FIGTREE v. 1.2.3 ( [...] The time of the most recent common ancestor (TMRCA) of the clade beryllina-cyanura-saucerottei was estimated using BEAST v. 1.5.4 (). We used the mtDNA sequences partitioned by gene (ND2 and ATPase 6–8) and those of the outgroup hummingbird species. The best-fit model of evolution, HKY+G for each partition, empirical base frequencies, and an uncorrelated lognormal relaxed model as the clock model were used. A coalescent model assuming constant population size was used to model the tree prior. The coalescent tree prior used in this analysis appears to be a better fit when mixed datasets are predominantly intraspecific data (). We grouped the Amazilia sequences and constrained them to be monophyletic ().To calibrate the tree we used the substitution rate of 0.029 substitutions per site per million years (s/s/My; 0.0145 s/s/l/My) for the ND2 and the geometric mean of 0.026 and 0.019 for the ATPase 6–8 (0.0222 s/s/My; 0.0111 s/s/l/My) obtained for Hawaiian honeycreepers (), or the average rate of 0.0068 s/s/My (0.0034 s/s/l/My) for the ND2 and the geometric mean of 0.0059 and 0.0047 (0.00265 s/s/l/My) for the ATPase 6–8 obtained for major bird orders ().The root of the tree was calibrated using the average 27.7 Ma (normal prior, SD 3.5, range 35.4–19.9 Ma) divergence time for the basal split between C. curvipennis and all the Amazilia hummingbirds (), followed by the age of the crown clade formed by A. cyanocephala, A. beryllina, A. cyanura, A. edward, A. saucerottei, A. viridigaster, A. violiceps and A. viridifrons (normal prior, mean 13.9 Ma, SD 3.0, range 19.8–8.0; ) as a secondary calibration.We performed one run of 30 million generations with random starting trees, sampling every 1000 steps, and discarding the first 10% of trees as burn-in. Results were viewed in TRACER v. 1.5 () to ensure that the effective sample sizes (ESS) for all priors and the posterior distribution were higher than 200, and finally annotated the trees using TREEANNOTATOR v. 1.5.4 () summarized as a maximum clade credibility tree with mean divergence times and 95% highest posterior density (HPD) intervals of age estimates and visualized in FIGTREE. [...] The demographic history of each Amazilia species was inferred by means of neutrality tests and mismatch distributions with ARLEQUIN v. 3.5 (). To test whether populations evolved under neutrality, Fu’s Fs test and Tajima’s D tests were calculated with 1000 permutations, and mismatch distributions were calculated using the sudden expansion model of with 9000 bootstrap replicates. The validity of the sudden expansion assumption was determined using the sum of squared deviations (SSD) and Harpending’s raggedness index (Hri), which are higher in stable, non-expanding populations (). [...] The extent of linkage disequilibrium between pairs of loci, and departures from Hardy-Weinberg (H-W) equilibrium within populations and loci were calculated using GENEPOP v. 4.2 () with Bonferroni corrections applied to correct for multiple simultaneous comparisons (). Null allele frequencies for each locus were estimated using FreeNA ().Population genetic structure based on Bayesian clustering was inferred using STRUCTURE v. 2.3.4 (). We ran STRUCTURE under the admixture model with correlated allele frequencies without prior information on population origin. Ten independent chains were run for each K, from K = 1 to K = 8. The length of the burn-in was 100,000 and the number of MCMC iterations after burn-in was 500,000 and convergence was achieved. The most likely number of populations was determined estimating the DeltaK (ΔK) and the log likelihood of K, Ln P(K) = L(K) between successive K values (). The identity of any individual as a putative migrant or migrant ancestry was assessed through the model with prior population information, assigning individuals in K populations, based on the DeltaK results. A migration rate of 0.05 was assumed. Finally, a mean-centred principal component analysis (PCA) on the microsatellite individual-genotype matrix was conducted using the R package ADEgenet () to compare the clustering results with those obtained using STRUCTURE. […]

Pipeline specifications

Software tools Genepop, adegenet
Application Population genetic analysis
Organisms cellular organisms