Computational protocol: Isolation with differentiation followed by expansion with admixture in the tunicate Pyura chilensis

Similar protocols

Protocol publication

[…] Samples were obtained via scuba diving from eight localities in the coast of Chile (Table ). For each specimen approximately 10 mm3 of tunic tissue was preserved in absolute ethanol. DNA was extracted from 25 mg of tissue using the standard phenol-chloroform procedure [].Partial sequences of the mitochondrial gene Cytochrome Oxidase I (COI), were obtained using the primers HCO and LCO []. The gene COI is often used in phylogeographic [] and Barcoding (e.g. []) projects given its high interspecies divergence and relatively low intra-species divergence. As a nuclear counterpart, we developed primers and obtained sequences of the Elongation Factor 1 alpha (EF1a) gene. This gene has shown to have a relatively fast evolutionary rate in invertebrates, and has provided informative characters for phylogeographic reconstructions (e.g. [,,-]). Contigs from 454 transcriptomic data of Pyura chilensis (Haye & Gallardo-Escárate, unpublished data) were used for GO analysis using Blast2Go []. The contig matching the EF1a gene was used to design primers for the EF1a gene for Pyura chilensis using Geneious R6 []. The primers EF1aPchF (TTGCGATCTTTTCCGCGATTGCT) and EF1aPchR (TGGGCTATATACGCAACGCTACGA) produced successful amplifications and were used to obtain partial EF1a sequences for individuals of each COI haplogroup. Primer pairs for both genes were used in PCR performed in a final volume of 10 μl with 1X PCR buffer, 1.3 Mm of MgCl2, 0.6 μM of each primer, 0.2 mM of each dNTP, 0.03 U μL-1 of Taq polymerase [Fermentas], 1.5 mg mL-1 Bovine Serum Albumin, and 20 ng of DNA template. Cycling conditions consisted of an initial soak of 10 min at 94°C followed by 35 cycles, each of 1 min at 94°C, 1 min at 40°C and 2 min at 72°C, and a final extension of 72°C for 13 minutes. Sequencing of purified amplicons was performed with an ABI 3730XL capillary automated DNA sequencer [Applied Biosystems].Sequences were aligned using Geneious R6 aligner and were visually inspected and corrected, if necessary, using amino acidic sequences (translation) as a guide. All polymorphic sites were carefully inspected to call for the right base. EF1a sequences were inspected for heterozygotes carefully and the proper IUPAC one letter code was assigned to indicate the pair of nucleotides of each site that was variable within an individual. All sequences obtained were deposited in the GenBank nucleotide database [COI GenBank accessions: KC918366 - KC918536; EF1a GenBank accessions: KC936798 - KC936876].Complete sequences, as well as each codon position, were analyzed for mutation saturation using DAMBE 5.3.16 []. Deviations from neutrality were analyzed from synonymous and nonsynonymous mutations with z-test codon-based test of neutrality (difference between nonsynonymous and synonymous substitution rates) calculated with the Nei-Gojoboti method [], and Fisher’s exact test of neutrality based on sequences also with the Nei-Gojoboti method [,]; both performed in MEGA 5.2 [].The haplotypic phases of EF1a sequence data were determined using PHASE 2.1 [] implemented in the software DnaSP 5.10 []. Five independent runs were performed with 10000 iterations, a thinning interval value of 10 and a burn-in period of 10000. The best results were selected based on the overall goodness of fit. Haplotypes that had a posterior probability of resolution ≥ 90% were chosen for further analyzes. Additionally, genotypic data of EF1a SNP loci with over 5% of polymorphism (based on minor allele frequency) was generated by visual inspection of chromatograms. [...] Sequences of the COI gene were analyzed all together as well as by COI haplogroup. For phylogenetic analyses of EF1a sequences, and to represent the variability within individuals, the nucleotide sequences analyzed included heterozygotes, denoted by the corresponding IUPAC letter code. Analyses were performed with the PAUP* 4.0b10 [] and MrBayes 2.6.0 [] plugins for Geneious R6. Trees were estimated using Bayesian inference (BI), Maximum likelihood (ML), and Maximum Parsimony (MP) optimization criteria. The best-fit molecular evolution model for each dataset (all COI sequences, each COI haplogroup and EF1a sequence data) was chosen with Akaike Information Criterion implemented in jModelTest 0.1.1 []. Chosen models were applied to BI and ML analyses. Support values for nodes of ML and MP were obtained from Bootstrap analyses with 10000 replicates. Bayesian posterior probability values were obtained from two independent runs of 1000000 iterations that led to an average standard deviation of split frequencies between chains below 0.01. Twenty percent of the trees were discarded as burn-in.ML, BI and MP analyses were performed on a COI data set consisting of the 100% consensus sequence of each haplogroup, and sequences of three outgroup taxa of the genus Pyura (GenBank accessions in Figure b). Unfortunately there were no EF1a sequences available to polarize the EF1a gene tree.The time of divergence of the COI haplogroups was estimated using the average number of differences between the sequences of haplogroups obtained using DnaSP. Tunicates have a high mutation rate of mitochondrial encoding sequences when compared to other bilaterians [], although no specific values have been published. Even though for marine invertebrates a 1.5% Myr-1 mutation rate is considered relatively high, a fastest 3% Myr-1 mutation rate is likely closer to the real mutation rate for ascidians, which has been described as more than double the “typical” marine invertebrate rate (e.g., [,]). Additionally, several authors have pointed that intraspecific mutation rates are much greater than substitution rates between species as mutation rates decrease over time [,]. However, it is not yet clear how to correctly estimate given the existing variation between taxa and genes. Under the above considerations we applied a divergence rate of 3% Myr-1 to COI data of Pyura chilensis. This rate triples the one usually used for COI of marine invertebrates (1% Myr-1) and doubles what is generally considered a fast rate (1.5% Myr-1). [...] From the EF1a SNP data, exact test for Hardy-Weinberg equilibrium for each locus and each population were calculated using the Markov chain method with 5000 iterations implemented in Genepop v4.1.4 [,] followed by sequential Bonferroni correction []. Observed and expected heterozygosities per locus per population, population level F ST , and pairwise F ST were obtained with Arlequin 3.5 [] and inbreeding coefficients (F IS ) with Fstat [].Bayesian clustering analysis was performed in STRUCTURE 2.3.4 [] using the linkage model of ancestry []. Ten independent runs for each genetic cluster (K) (1 to 8) were performed with 500000 MCMC iterations after a burn-in of 50000. The delta K method was used for the inference of the best K[]. [...] Genetic structure was evaluated for all the COI data, each of the COI haplogroups detected with phylogenetic analyses, and for the EF1a haplotype data. DnaSP software was used to estimate haplotype and nucleotide diversities, average number of differences between pairs of sequences and Hudson’s Snn []. Haplotype frequency per site was obtained from DnaSP. Arlequin was used to estimate genetic differentiation of populations with ΦST (F ST analogue that accounts for haplotype frequency and divergence) with 10000 permutations. The spatial analysis of molecular variance implemented in SAMOVA 1.0 [] was performed to estimate the number of population groups; statistical significance was evaluated with 1000 random permutations. The most likely number and composition of population groups were chosen based on the probability and F CT values. Isolation by distance was tested with a Mantel test between genetic differentiation and geographic distance [] using 10000 permutations in Arlequin. The correlation between linearized genetic differentiation and geographic distance was also explored.Tajima’s D[] and Fu & Li’s F[] neutrality tests were used to infer demographic history of all the haplotype datasets. Significance was estimated using a beta distribution in DnaSP. A negative Tajima’s D value is expected if populations have experienced an expansion. The timing of demographic events was inferred from two methods: Mismatch Distribution Analyses [] and from Bayesian Skyline plots []. Mismatch distributions of the frequency of nucleotidic differences between pairs of sequences, used to detect historical population expansion [], and calculation of the Raggedness index and its associated probability, were performed in Arlequin. The position of the mean (τ) in unimodal distributions corresponds to the start of the population expansion. Timing of the expansion can be calculated with the relationship τ = 2 μt, where μ is the substitution rate for the whole haplotype per generation and t is time in generations. Analyses of mismatch frequency distribution of pairwise differences between sequences [,] are very conservative because they use little information of the data set []. Consequently, they were only performed for haplogroups that had a signal of possible demographic expansion based on neutrality tests. From the obtained τ values, as well as the upper and lower boundary of τ, COI haplogroup expansions were estimated using mutation rates of 1.5% and 3% Myr-1. Past changes in population size were also evaluated with Bayesian Skyline Plots generated in BEAST 1.7.4 [,]. This analysis uses a coalescent approach coupled to a MCMC sampling procedure to generate a probability distribution of past population sizes. The run consisted of 200 million iterations, sampling every 1000 MCMC steps. The initial 10% was discarded as burn-in. Convergence of data and demographic plots were performed with TRACER v.1.5 []. […]

Pipeline specifications