Computational protocol: Rapid ecological isolation and intermediate genetic divergence in lacustrine cyclic parthenogens

Similar protocols

Protocol publication

[…] Specimens of D. parvula and D. retrocurva were collected from populations (n = 29) throughout the continental United States and Canada (see Additional file , Table S1). Most populations were from areas in Northern United States and southern Canada where the two species' distributions overlap (Fig. , Additional file : Table S1). We obtained from 1-10 individuals per population (average 5.44 individuals/population).Total genomic DNA was extracted using Quick Extract (Epicentre). Samples were homogenized in 35-45 μl of Quick Extract solution, then incubated at 65°C for 2 h and 98°C for 10 minutes, and stored at -20°C. One mitochondrial and three nuclear markers were sequenced from the extracted DNA. A 631 bp fragment of mitochondrial protein-coding NADH-2 (ND2) was amplified with the primers (5' - GTTCATGCCCCATTTATAGGTTA - 3') and (5'- GAAGGTTTTTAGTTTAGTTAACTTAAAATTCT-3'). The primers (5'-TTACGAGTCCAGATGGGCTT-3') and (5'- ATCCGTTATGAATCCCTGACTGA-3') were used to amplify a 669 bp fragment of protein-coding HSP90 gene. A 433 bp fragment of the nuclear rab GTPase (F6F12) gene was amplified with the primers (5'- CGTTTCGAATTGGCTTACTGA-3') and (5'- CATCGTTATCTGTCTACGTCTTGAA-3'), while a 534 bp fragment of the translation initiation factor (G6G12) gene was amplified with the primers (5'- AGAAATTCAACATGCCCAAGA-3') and (5'- CGTCGACGAAGTTGACAGTAT-3') []. Each Polymerase Chain Reaction (PCR) was 50-54 μl in total and consisted of 4-8 μl of extracted DNA, 5 μl of10 × PCR buffer [50 mM KCl, 1.5 mg MgCl2, 10 mM Tris-HCl pH 8.3, 0.01% (w/v) gelatin], 1.5 μl of DNTP's (2 mM of each), 1 μl of 10 μM of each primer and 1 μl of standard Taq DNA polymerase. Each PCR was conducted on a MJ Thermocycler with the following conditions for the mitochondrial (ND2) gene: 40 cycles at 94°C for 30 s, 48°C for 30 s, and 72°C for 1 min, and a final extension at 72°C for 7 min. The PCR temperature profiles for the nuclear (HSP90) gene were 40 cycles at 94°C for 30 s, 50°C for 30 s and 72°C for 1 min. with a final extension at 72°C for 5 min. For the nuclear (F6F12) gene the PCR temperature profile was 40 cycles at 94°C for 30 s, 58°C for 30 s and 72°C for 1 min. with a final extension at 72°C for 5 min, identical temperature profiles were used for the nuclear (G6G12) gene with the exception of the annealing temperature set at 53°C rather than 58°C. Sequences of all nuclear and mitochondrial PCR products were obtained in both directions by Genaissance Pharmaceuticals or the University of Washington and were both assembled and edited with SEQUENCHER ver. 4.2 (Gene Code) then manually aligned in SE-AL 2.0 [].An individual was considered heterozygous when overlapping peaks (lower peak > 95% of higher peak) were observed at any given site on the sequence electropherogram. Individuals with multiple heterozygous sites were cloned with the Invitrogen TOPO TA kit to determine the different alleles at each site with observed heterozygosity. The QIAprep Spin Miniprep Kit (QIAGEN) was used for plasmid purification and the respective primers for each particular locus were used to sequence the cloned inserts. For each individual, six cloned fragments were sequenced to help detect cloning artifacts []. The combined data for all four makers was a 2268 bp sequence with both alleles represented for each individual for the nuclear genes. All individuals sequenced for the F6F12 and G6G12 markers were also represented with sequences for the HSP90 and ND2 markers. DNA quality appeared to prohibit sequencing of all genes for all individuals. Therefore, we conducted analyses for 1) individuals represented by sequences for only the mitochondrial ND2 gene, 2) individuals represented by sequences with only the three nuclear genes, and 3) individuals represented by sequences for all four genes (total evidence).The number of individuals sequenced from each marker used in the analyses can be seen in Additional file : Table S1. Each individual was represented by two copies of each sequence to account for both alleles at a heterozygote site in the nuclear markers, while the mitochondrial (ND2) gene was replicated twice to correspond with the two copies of each nuclear gene sequence for the analyses. We performed a phylogenetic analysis on the total evidence data set (all four genes combined) using Maximum Likelihood (ML) method with RAxML [] and a file to partition the alignment by codon position and gene. For all analyses a GTR substitution model was implemented for the analysis and unique model parameters were optimized and applied for each partition. Support values were estimated using 100 bootstrap replicates.Because the transition from polyphyly to monophyly is continuous, a categorical description of divergence can fail to identify significant divergence before monophyly is reached []. Thus, detecting divergence in closely related or recently diverged species phylogenetically can be difficult if monophyly has not been reached. The recently described Genealogical Sorting Index (gsi) allows one to detect significant genealogical divergence before monophyly by quantifying the relative degree of exclusive ancestry of a labelled group on a rooted tree topology. Since branch tips along a tree generally represent gene copies of a given locus, the value of gsi is the degree of genealogical exclusivity among the sampled gene copies. The calculated gsi value ranges from 0-1, with 0 representing panmixia and 1 representing monophyly and permutation tests provide a statistical test of significance for this value [].We calculated the gsi value for each species (D. parvula and D. retrocurva) and tested the null hypothesis that the gene copies labelled D. retrocurva and D. parvula form a single intermixed group. Because strongly unbalanced group representation can result in a decreased power to detect significant exclusive relationships [] we determined the gsi for the mitochondrial data, the nuclear data, and the total evidence data after randomly removing D. retrocurva individuals from the data to achieve a balanced sampling scheme, since this species was over represented compared to D. parvula.A median-joining haplotype network was constructed for the mitochondrial data and the three combined nuclear gene sequence data using NETWORK ver. []. In order to further estimate divergence time we used the isolation with migration- analytic (IMa) model, which was designed to analyze recently separated populations (or in this case, species) not at equilibrium. This model estimates six demographic parameters including gene flow rates and time of divergence while generating relative likelihoods/posterior probabilities []. Unfortunately we could not get reliable migration or gene flow estimates in IMa likely due to the increased number of parameters in the model (J. Hey communication); therefore, we used IMa solely for estimating divergence time. Since the degree of current gene flow between D. parvula and D. retrocurva is unknown we ran two IMa analyses, one with the migration prior set to zero migrants per generation and another with ten migrants per generation between the two species which we treated as populations. For both analyses, we used the Hasegawa-Kishino-Yano model [] and the Infinite Sites mutation model for the mitochondrial and nuclear markers, respectively. The mutation rates entered in the model were the number of bp in the particular marker multiplied by 3.3 × 10-6 mutations/site/year for the mitochondrial marker and 1.05 × 10-7 mutations/site/year for the nuclear markers [], assuming 5 generations per year; previous estimates adjusted to seasonal duration of localities [,]. The analysis included 30 million generations post burn-in (1 million generations) with six Metropolis-coupled Markov chains with the first heating parameter for the linear heating scheme set to 0.05. Tajima's D test and the four-gamete test (DNASP ver. 4.0) indicate all markers are consistent with the expectations of neutrality and the nuclear markers show no evidence of recombination, both assumptions of the IMa model.Sequences from the rapidly evolving mitochondrial ND2 gene were used to determine the demographic history of both D. parvula and D. retrocurva lineages. Three mismatch distributions were calculated from the number of differences between pairs of haplotypes within and between species in ARLEQUIN []. The Sum of Square Deviations (SSD) was computed for each species separately then tested whether the observed distributions deviated significantly from those expected under the population expansion model using 10000 permutation replicates. We used the parameter τ to estimate the time since the expansion (t) using the equation t = τ/2 u, where u is the mutation rate per sequence per generation [] assuming a mutation rate of 6.6 × 10-8/site/generation [] and five generations/year.We ran hierarchical Analysis of Molecular Variance (AMOVA) in ARLEQUIN [] to determine the total genetic variation explained by differences (i) between the two species D. parvula and D. retrocurva, (FCT), (ii) among populations within a species, (FSC), and (iii) within a population (FIS). […]

Pipeline specifications