Computational protocol: Phylogenetic surveys on the newt genus Tylototriton sensu lato (Salamandridae, Caudata) reveal cryptic diversity and novel diversification promoted by historical climatic shifts

Similar protocols

Protocol publication

[…] A total of 108 samples was used, representing all 23 recognized species (we did not use “T. daweishanensis” because it is assigned as synonym of T. yangi) of Tylototriton s.l., which were collected from 49 localities scattering across the range of the genus in the southeastern Asia (; ). Based on previous studies (; ), 23 species representing major divisions within Salamandridae were included in our phylogenetic analyses, and one Ambystoma mexicanum was used as outgroup (). In these 23 Salamandridae species, two specimens of Echinotriton chinhaiensis () were collected in this study. The Animal Care and Use Committee of Chengdu Institute of Biology, CAS provided full approval for this purely observational research (Number: CIB2010031015). Field experiments were approved by the Management Office of the Kuankuoshui Nature Reserve (number: KKSNR201204002) and the Management Office of the Dabashan Nature Reserve (number: DBSNR201204002).The genomic DNA was extracted with the Qiagen DNeasy tissue kits (Qiagen, Hilden, Germany) from tissues preserved in 95% alcohol. Fragments of two mitochondrial genes, i.e., 16S and ND2, and two nuclear genes, NCX1 and BDNF, were amplified for all samples collected in this work. Primers for amplifying ND2, NCX1 and BDNF genes were designed in this study (). The polymerase chain reaction (PCR) cycles included an initial denaturation step of 4 min at 94 °C and 35 cycles of denaturation for 30 s at 94 °C, primer annealing for 30 s at 48–58 °C, and extension for 1 min 30 s at 72 °C. Sequencing was performed in both directions using the primers in PCR on ABI3730 sequencer. New sequences were deposited in GenBank (accession numbers were shown in ). ND2 sequences of 28 specimens were retrieved from GenBank (see ), and ND2 sequences of two T. yangi samples were from .Alignments were performed in the program MAFFT v.6 () with default options. To avoid artificial bias in refining alignments, GBLOCKS v.0.91.b () with default settings was used to extract regions of defined sequence conservation from the length-variable 16S fragment. DNA sequences of each protein-coding gene were translated into amino acid sequences, and then the amino acid sequences of each gene were aligned in MEGA v.6 (). The nucleotide gaps were determined according to the positions of amino-acid gaps in the alignments of the amino acid sequences. Non-sequenced fragments were defined as missing loci. Recombination in the nuclear markers was tested in the program RDP3 () with default setting. The test indicated that there was no recombination in the nuclear markers. [...] Gene trees were reconstructed just for the mitochondrial DNA (mtDNA) data and four-gene concatenated data, respectively, because there were few variable sites in nuclear gene data for reconstructing reliable nuclear gene trees. Maximum likelihood (ML) and Bayesian Inference (BI) analyses were conducted in the programs RAXML v.8.1.24 () and MRBAYES v.3.2.6 (), respectively. To avoid under- or over-parameterization (; ), the best partition scheme and the best evolutionary model for each partition were selected for the phylogenetic analyses using the program PARTITIONFINDER v.1.1.1 (). For PARTITIONFINDER analyses, 16S gene and each codon position of each protein-coding gene was defined and the “greedy” search where branch length estimates were unlinked under the Bayesian Inference Criteria (BIC) as recommended by was performed. In ML analyses, the best tree was produced from 1,000 iterations of the random addition of taxa. For the ML tree, branch supports were drawn from 10,000 non-parametric bootstrap replicates. Four-gene concatenated data was analyzed with joint branch optimization to minimize the impact of missing loci on branch lengths. For BI analyses, gaps were scored as presence/absence data in the program GAP-CODER () using the procedure of (). In BI analyses, the parameters for each partition were unlinked, and branch lengths were allowed to vary proportionately across partitions. Two independent runs were initiated each with four simultaneous Markov Chain Monte Carlo (MCMC) chains for 100 million generations and sampled every 1,000 generations. Convergence of chains and burn-in periods were determined using the program TRACER v.1.6 (). Then the burn-in samples (the first 25% defined by Tracer) were discarded, and then a maximum bifurcating consensus tree and posterior probability (pp) were derived from the remaining samples.To explore the possibility that the unlinked gene trees might be incongruent among each other or with the species tree due to processes such as hybridization, incomplete lineage sorting, horizontal gene transfer, or gene duplication (; ; ; ), the species tree was inferred using multispecies coalescent inference. The analyses were conducted using the *BEAST option () in the program BEAST v.2.3.0 (), as described in the following section together with dating estimations. [...] Divergence times were estimated also based just on mitochondrial data and four-gene concatenated data, respectively, because there were few variable sites in nuclear gene data for reconstructing reliable nuclear gene trees. Time-calibrated phylogenies were reconstructed by incorporating fossil calibration points using BEAST. Four Salamandridae fossil records were used as calibration points. The first calibration point was the divergence between Tylototriton s.l. and Pleurodeles for which the minimum age was set to 44 Mya, based on a Tylototriton-related salamandrid fossil, Chelotriton (; ; ). The relevant fossil of Brachycormus was not used because it was alleged to have been a neotenic amphibian or a derivate of Chelotriton (; ). Secondly, the minimum age of ancestor of Taricha and Notophthalmus was presumed to be 22 Mya based on a fossil of Taricha oligocenica (). Then the minimum age of the common ancestor of Triturus was given as 24 Mya based on a Triturus fossil (). Finally, based on the fossil of Procynops miocenicus closely related to Cynops (), we presumed the minimum divergence time between Cynops and Paramesotriton at ∼15 Mya. A lognormal prior distribution was applied for each calibration point in view of the fact that this model allowed for uncertainty of the age estimation of the fossil and bias associated with the incompleteness of the fossil record ().For BEAST analyses, to avoid zero-length branches, each coalescent cluster was represented by one specimen. So the Generalized Mixed Yule Coalescent (GMYC) method (; ) was used to identify coalescent clusters within Tylototriton s.l. The GMYC method was proposed to potentially promote objective delimitation of coalescent clusters even based on single gene and supplies explanations for divergences between clusters by speciation models (). GMYC analysis was implemented in the R package SPLITS (Species Limits by Threshold Statistic) using the single-threshold option. Because the GMYC analysis needs a time-scale tree, a preliminary BEAST analysis was conducted only based on mtDNA data set with the best partitioning scheme and the best model for each partition selected above. In this analysis, a conservative parameter setting (i.e., a relaxed lognormal clock with the mean rate set to 1 and a coalescent Yule model as tree prior) was assigned (). MCMC chains were run for 20 million generations with sampling every 1,000 generations, and the first 25% generations were discarded as burn-in. GMYC analysis resulted in 39 cluster representatives (see ‘results’), which were used for the following analyses.For BEAST analyses, the best partition scheme and the best evolutionary model for each partition were also selected for the phylogenetic analyses using the program PARTITIONFINDER. An unlinked uncorrected lognormal relaxed clock to each partition, a Yule speciation process for branching rates, and the model of four rate categories for nucleotide substitutions were defined for the analyses. A ML tree was used as a starting tree inferred by RAXML based on single-GMYC-cluster representatives. Two runs of 100 million generations sampling every 1,000 generations were conducted. The convergence of MCMC chains and burn-in periods were determined by Tracer. A maximum clade credibility (MCC) tree with posterior probability was calculated using TREEANNOTATOR.Species tree was inferred using *BEAST. The method needs prior assignments of specimens to species. So the selection of specimens for *BEAST analyses was according to the GMYC clusters and the sampling number (>1) of each cluster. GMYC singletons and individuals with missing data genes were excluded. Because we obtained sequences of nuclear genes only for 91 Tylototriton s.l. samples and two E. chinhaiensis samples, outgroup fossil calibrations (see above) could not be utilized for the *BEAST analysis. So, the nucleotide substitution rate of each partition was set according to the rate estimations from the four-gene concatenated Beast analyses conducted above. For *BEAST analysis, mitochondrial DNA (16S + ND2) and each nuclear gene was defined as an independent partition, respectively. The best-fitting substitution model for each partition was selected by the program JMODELTEST v.2.1 (; ). *BEAST analyses were carried out using the unlinked uncorrected lognormal relaxed clock to each partition and a Yule speciation process as tree prior. Lognormal prior distributions were assigned for mean rate of each partition. Two runs of 150 million generations were carried out with sampling every 1,000 generations. Convergence of MCMC chains and burn-in periods were assessed using TRACER.Topology of the *BEAST tree with fewer number of species was almost consistent with the BEAST tree except for the position of one major clade (see ‘results’). Therefore, to produce a “complete species-time-tree”, a final topology-constrained BEAST analysis was conducted based on mtDNA of GMYC cluster representatives and all outgroups by applying a constraint of the topology as the *BEAST species tree. The fossil calibrations and parameter-settings were specified as constant as the mtDNA-BEAST analysis above. [...] Diversification analyses are dependent on the number of user-specified biodiversity units. Yet there is controversy on the criterion to delimit species-level lineages. To evaluate how robust models of diversification were to variation of number of species-level units, phylogenetic lineages were delimited according to three criteria: GMYC clusters (statistically inferred coalescent lineages), currently recognized nominal species (NS) and NS plus (NSP: NS plus independent lineages representing putative cryptic species, for example, T. wenxianensis lineage 2 and 3, see ‘results’). Two thousand trees were randomly sampled from the post-burn-in posterior trees resulted from the final topology-constrained Beast analysis. Then, MCC tree and 2,000 sampled posterior trees were pruned according to each lineage delimitating strategy. The number of lineages varied when we used different units of biodiversity: 39 GMYC-clusters, 23 NS, and 27 NSP (see results). Finally, the pruned 39-tips, 23-tips and 27-tips trees were used for diversification analyses.To visualize the temporal accumulation of lineages, the log-transformed lineage-through-time (LTT) plots were constructed for each lineage delimitating strategy. To test if the time-lag between the origin and divergence of extant Tylototriton s.l. departed from random expectations under the birth-death model, simulations (n = 10,000) under birth-death process were run using five different sampling fractions: 100%, 90%, 80%, 70% and 50%, respectively. Parameters for the simulations included the speciation and extinction rates, the age of the stem group and the number of extant lineages all estimated from the observed data based on the birth-death model. Simulations were carried out in the R () package TREESIM (). LTT plots of the simulated trees were then constructed to compare with the average LTT plot of the empirical data.To examine diversification rate shifts, the birth-death likelihood (BDL) test () was carried out in R packages LASER () and GEIGER (). In BDL analysis, six models (pure birth, birth-death, yule2rate, DDL (diversity-dependent linear decrease) and DDX (diversity-dependent exponential decrease)) were fitted to the observed data under the Akaike’s information criterion (AIC). The best model was selected by calculating the difference (ΔAIC) in AIC scores between the best rate-variable (RV) models and the best rate-constant (RC) models (). The significance of the observed ΔAICRC statistic was assessed by one-tailed tests (). Additionally, other three diversification models were also tested. SPVAR permits speciation rate to vary over time, EXVAR permits extinction rate to vary over time, and BOTHVAR allows both to vary over time (). Finally, Akaike weights () were calculated for each model. […]

Pipeline specifications