Computational protocol: Selection and geographic isolation influence hummingbird speciation: genetic, acoustic and morphological divergence in the wedge-tailed sabrewing (Campylopterus curvipennis)

Similar protocols

Protocol publication

[…] We extracted genomic DNA from one of the feather samples using chelex (5%, []), and that from tissue samples using the DNA easy blood and tissue kit (Qiagen, Inc.), following the recommended protocol. We used PCR to amplify two mitochondrial genes. The ATPase 6 and 8 coding region (875 bp), which includes two partially overlapping mitochondrial genes, was amplified using primers L8929 and H9947 []. The first domain of the control region (532 bp) was amplified using primers ARCO1F (5' AATTTTATGGTGTTTGTGTGTGAA 3') and ARCO1R (5' ACCCTAGCACAACTCGCACT 3') designed for this study from an Archilochus colubris D-loop and the complete tRNA-Phe gene sequence, and a partial 12 S ribosomal RNA gene sequence (GenBank accession EF520732.1). Amplification reactions (25 μl total volume) contained 0.72× buffer, 3.5 mM MgCl2, 0.14 mM of each dNTP, 0.2 μg/μl of BSA, 0.3 μM of each primer, 0.05 U Taq (Promega), and 2-5 μl of genomic DNA. PCR reactions were performed in a 2720 thermal cycler (Applied Biosystems) or in an Eppendorf Mastercycler thermocycler with the following temperature profile: initial denaturation at 94°C for 5 min; 35 cycles consisting of denaturation at 94°C for 1.5 min, annealing at 53-60°C for 1 min and an extension of 1.5 min at 72°C; and a final extension of 72°C for 7 min. PCR products were visualized on a 1% agarose gel stained with ethidium bromide, purified with the QIAquick PCR purification kit (Qiagen, Inc.) and sequenced using the Big Dye cycle terminator kit. Sequences were visualized in a 310 automated sequencer (Applied Biosystems) and by Macrogen Inc. ATPase genes were sequenced in both directions and the control region only in the reverse direction using primer ARCO1R except when ambiguities were present in the sequences. Sequences were edited using SEQUENCHER 4.8 demo (Gene Codes) and then manually aligned with SE-AL 2.0a11 []. All unique sequences used in this study have been deposited in the GenBank under accession nos. HQ380686-HQ380755.Samples were genotyped at 10 polymorphic microsatellite loci designed specifically for Campylopterus curvipennis ([], GeneBank accession nos. GQ294539-GQ294550). PCR conditions and fragment sizing are fully described in Abdoullaye et al. []. [...] We reconstructed intraspecific phylogenetic relationships among haplotypes using Bayesian inference in MRBAYES 3.12 []. Data were partitioned into two matrices, one corresponding to 875 bp of ATPase 6 and 8 and the other to 532 bp of control region. MODELTEST 3.7 [] was run for each partition to choose the model of molecular evolution that best fit our sequence data. The best model for ATPase under the Bayesian information criterion (BIC) was HKY + G (base frequencies: A, 0.3588; C, 0.1664; G, 0.2373; T, 0.2376; gamma distribution shape parameter = 0.1825; transition/tranversion ratio = 6.22), and for the control region was TrN + I (base frequencies: A, 0.3688; C, 0.1564; G, 0.2470; T, 0.2278; proportion of invariable sites = 0.6869). We ran the analysis for 10 million generations using four chains (twice), sampling every 100th generation. We plotted the number of generations versus likelihood scores to check stationarity. Trees prior to stationarity were discarded as burn-in (25%), and the remaining were used to generate a consensus tree, later visualized in FIGTREE 1.2.3. We used three congeners as outgroups (C. rufus, C. hemileucurus, and C. largipennis) to root the tree. Alignment for the phylogenetic analyses is included as an additional file [Additional file ].In the absence of appropriate internal calibration points for many groups of birds, the 2% divergence-per-My clock calibration has been widely used. However, the degree of heterogeneity of molecular evolution rates across lineages and genetic loci could confound the accuracy of divergence time estimates, making the use of the 2% rule controversial [-]. In a more recent study, however, Weir and Schluter [] cross-validated 90 avian clock calibrations (including one for hummingbirds) for cytochrome b obtained from fossil records and biogeographical events, demonstrating support for the 2% rule across taxonomic orders. Similar substitution rates have been described for other protein-coding mitochondrial DNA markers such as ATPase and ND2 [,-]. Nevertheless, in order to minimize potential biases of this calibration, we calculated divergence times using two rates for the ATPase coding region data (0.02 and 0.05 substitutions/site/My) as suggested by Tarr and Fleischer [], Milá et al. [], and Barber and Klicka []. Uncertain homology arising from hypervariability or heteroplasmy in non-protein coding mitochondrial control region could confound measures of molecular clock estimates [,], so we did not estimate divergence time for that locus. Time to the most recent common ancestor (TMRCA) was estimated for the resulting clades using a Bayesian MCMC sampling approach with BEAST 1.4.2 []. TMRCA estimates were obtained using a relaxed clock model with log-normally distributed and uncorrelated rates of substitution between branches. No topological constraints were used allowing topological uncertainty to be taken into account. The model of evolution determined by MODELTEST under BIC (HKY+G for ATPase), was used for the analyses, and all other priors were set as default values in the program BEAUTI 1.4.7 that was used to create the XML files for input into BEAST. We ran BEAST for 10 million generations sampled every 1,000 generations, with the first 10 percent of the sampled points removed as burn-in. We obtained TMRCA estimates employing a range of 2 - 5% My to convert TMRCA to millions of years ago.A statistical parsimony haplotype network was obtained using the program TCS 1.2.1 [] using the 95% connection probability limit. Some ambiguities were detected in the networks (loops), and were broken according to three criteria (frequency, topology and geography), as proposed by Pfenninger and Posada [].We assessed the genetic variation within populations by calculating the haplotype diversity (Hd) and nucleotide diversity (π). To estimate population genetic structure we conducted an analysis of molecular variance (AMOVA). We used the model of molecular evolution of Tamura and Nei with a gamma shape parameter = 1.2757 for the combined data set, to estimate genetic distances. A total of 16,000 permutations were estimated to determine the AMOVA's significance. Two AMOVAs were run, grouping sampling sites based on the fragmented geographical range of wedge-tailed sabrewings (Table Figure ), and grouping sampling sites into subspecific groups (SMO, TUX, and YUC). Genetic differentiation among populations was determined with Φ statistics and pairwise FST. Differences between populations were tested using 10,000 permutations among populations with Fisher's exact test. Genetic diversity estimates, AMOVAs and pairwise FST were performed in ARLEQUIN 2.0 []. [...] Expected and observed heterozygosity and mean number of alleles per locus (allelic diversity) were calculated in GENEPOP 3.4 []. Allelic richness, a measure of the number of alleles per locus among populations independent of sample size, was calculated with the program FSTAT 2.9.3. Microsatellite genotypes were tested for linkage disequilibrium between pairs of loci and for departures from Hardy-Weinberg equilibrium (HWE) within populations and loci in GENEPOP. Bonferroni corrections were applied to correct for multiple simultaneous comparisons [].To investigate population genetic structure we calculated global and pairwise RST [] and pairwise FST [] in RSTCALC 2.2 [] and ARLEQUIN, respectively. Differences between populations were tested using 10,000 permutations with a Fisher's exact test. To represent the relationships between localities we constructed a neighbor-joining tree using RST pairwise values in PAUP 4.0 [].To examine geographic patterns of population genetic substructure, we performed Bayesian genetic clustering using STRUCTURE 2.2.3 [], which is used to infer the most likely number of genetic clusters (K) through the multilocus genotypic data. We ran STRUCTURE without population information under the admixture model with correlated allele frequencies. Ten independent chains were run for each K, from K = 1 to K = 10. The length of the burn-in was 500,000 and the number of Markov chain Monte Carlo (MCMC) replications after the burn-in was 1,000,000. To determine an accurate number of clusters we calculated the statistic ΔK based on the rate of change in the log probability of data between successive K values following Evanno et al. [].Contemporary gene flow (M) among populations, and groups of populations, defined by mtDNA and microsatellite analyses, was estimated with a maximum likelihood coalescent approach using microsatellite data and MIGRATE v. 3 []. The first genealogy was started with a random tree, and initial theta and migration rate (M) parameters were obtained from FST calculations. We ran 10 short (30,000 genealogies sampled) and three long chains (800,000 genealogies sampled), after discarding the first 20,000 genealogies as a burn-in.Both mtDNA and microsatellite genetic analyses indicated three genetically diverged groups. We investigated whether these groups of populations had diverged in the face of gene flow by using IMa []. We performed two different tests, one involving populations from more recently diverged SMO and TUX populations, and the other involving populations east and west of the Isthmus of Tehuantepec. IMa is based on a Bayesian coalescent implementation of a model in which an ancestral population splits into two populations that may exchange genes in both directions at unequal rates during divergence. We performed preliminary simulations to specify the appropriate priors for each analysis, and ran three replicates with different random seed numbers with burn-in periods of 3,000,000 steps and 50,000,000 steps in chain following burn-in. We obtained estimates of migration rates between populations (m1, m2), and effective population sizes (q) from the ancestral population and after the split occurred between the two groups in both analyses. Estimates of migration rates were converted to the effective number of immigrants per generation by using estimates of theta. For estimates of effective population sizes of each genetic group we assumed a mutation rate per generation of 2.96 × 10-3 []. [...] In order to test whether the degree of male phenotypic differentiation (vocal and morphological characters) was caused by drift or divergent selection, we performed an indirect coalescent-based simulation method developed by Masta and Maddison []. Because sexual selection by female choice reduces the amount of time needed for fixation of male phenotypic characters, without reducing the effective population size of neutral mitochondrial genes, Masta and Maddison's method tests statistically whether the rate of fixation of mitochondrial sequences differ from those of male phenotypic characters. This method assumes that nuclear genes code for the phenotypic character of interest (in our case, body size measurements and predisposition to learn certain song type), and asks whether the degree of fixation of phenotypes observed is likely to occur under the assumption of neutrality (see also [,]). We performed the HKA neutrality test for each set of sequences, to determine whether the genes were appropriate neutral markers.In order to implement this method with our data, we performed two different analyses: in the first analysis for morphological characters we considered three representative populations (a higher number of populations increases the complexity of simulations; []) corresponding to the three genetic groups (Tux, Ciel and Nov). Because the use of continuous characters in phylogenetic studies can be problematic, we converted the morphological measurements into discrete characters states for this analysis. We constructed the character states of each of the continuous variables (wing chord, tail length and culmen) using the gap-weighting method in which the observed variation is divided into a number of segments of equal size (five character states), giving large weights to large differences in individual measures and small weights to small differences []. Continuous characters were coded as follows: wing chord, 0: 62-63 mm, 1: 64-66 mm, 2: 67-68 mm, 3: 69-71 mm, 4: 72-74 mm; tail length, 0: 47-49 mm, 1: 50-53 mm, 2: 54-55 mm, 3: 60-61 mm, 4: 63-64 mm; culmen, 0: 20-22 mm, 1: 23-24 mm, 2: 25-27 mm, 3: 27.5-28 mm, 4: 28.5-30 mm. The character states were fixed for the three populations considered: Nov, intermediate in size with short bill (wing chord: 0, 1; tail lengh: 0, 1; culmen: 0,1); Ciel, intermediate in size (wing chord: 0, 1; tail lengh: 1, 2; culmen: 2, 3); Tux, large in size (wing chord: 2, 3, 4; tail lengh: 3, 4; culmen: 3, 4). In the second analysis for vocal variation, we considered four populations in which vocal characters are fixed (Ciel, Ord, Tux and Nov); vocal traits were coded as presence/absence of introductory syllable type. In both analyses tests were performed alternatively by means of two assumptions of population divergence, dichotomous branching and simultaneous divergence.We first calculated the s statistic of Slatkin and Maddison [], a measure of incomplete lineage sorting among populations, for maximum parsimony trees inferred from the analysis of mtDNA sequences (results not shown); larger s values indicate greater levels of incomplete lineage sorting, suggesting a more recent population divergence, and smaller s values indicate lower levels of incomplete lineage sorting suggesting older population divergence. Then, we compared the observed s value against s values obtained from simulated trees to estimate time since population divergence. This comparison was performed with computer simulations generating 10,000 gene trees within a population tree by gene coalescence. We set Ne = 500 in each population (as suggested by Masta and Maddison, []), and through simulations the upper 95% confidence limit was estimated for the number of generations since population divergence that would be expected to give the observed s value. Finally, simulations of nuclear gene trees were run to estimate the probability of complete fixation of morphological (s = 2) and vocal (s = 3) characters. Estimates of generations since population divergence were divided by four considering that nuclear genes have four times the population size of mitochondrial genes. Estimates of s, branch lengths and coalescent simulations were performed in MESQUITE v. 2.73 []. […]

Pipeline specifications