Similar protocols

Protocol publication

[…] Sequences of each gene were aligned using MAFFT v7.122b [] with option L-INS-i for ITS and G-INS-i for MCM7, TSR1, RPB1, and RPB2. Alignments were trimmed with BioEdit v7.2.2 [] and analyzed as described below.To initially assess whether strains formed distinct phylogenetic clusters, a preliminary neighbor joining (NJ) analysis was performed on a concatenated data set of all aligned genes using SeaView v4.4.2 [] with the following options: NJ; observed distance; do not ignore all gap sites.Next, individual genes were analyzed using four methods: neighbor joining, maximum parsimony, maximum likelihood and Bayesian inference. NJ was performed in SeaView v4.4.2 [] as described above with 1000 bootstrap replicates. Maximum parsimony heuristic searches were performed using PAUP4.10b [] with these parameters: uninformative characters excluded, midpoint rooting, simple sequence addition, TBR swapping algorithm, collapse and multitrees in effect, 100 maximum trees saved. This was followed by the computation of a parsimony strict consensus tree. RAxML 8.0.20 [] was used to compute a maximum likelihood tree using the GTRGAMMA model, chosen because it includes the parameter G for rate heterogeneity among sites. In RAxML, by default, G has 25 rate categories making the estimation of proportion of invariable sites (I) unnecessary because G mathematically accounts for I []. Support values were assessed using the ‘rapid bootstrapping’ option with 1000 replicates. Prior to Bayesian inference, jmodeltest v2.1.4 [], [] was used to calculate the best evolutionary model for each gene; for each gene alignment, likelihood scores were computed with the following options: 3 substitution schemes, base frequencies on (+F); rate variation on with 8 rate categories (+G, nCat = 8); ML optimized base tree; NNI search algorithm. The proportion of invariable sites (+I) was not considered in our model testing because it had minimal impact on estimates of rates and coalescence times for closely related species []. The HKY + G model was selected for ITS, RPB2 and TSR1 loci, and K80 + G was chosen for MCM7 and RPB1, according to the Bayesian Information Criterion (BIC) []. Bayesian inference analyses were performed with BEAST v2.1.3 []. BEAUTi v2.1.3 was used to generate the input XML file. Gene alignments were loaded in BEAUTi and each gene partition was assigned a separate site model, clock model and tree model. The site model was chosen according to the results from jmodeltest described above and the gamma category count was set to 8. All substitution rates, the gamma shape, and the kappa parameter were estimated and left on default settings. All of our Wallemia strains were closely related, so we chose the estimated strict clock and the Yule model of speciation, which does not take into account species extinction, conditional on the root for all gene partitions. The birth rate, clock rate and mutation rate priors were set to exponential, except the mutation rate for RPB2 was set to uniform. Kappa parameters for the HKY models were left on lognormal. Then, the MCMC chain length was set to 1.0 x 108 and storing one tree every 20000 generations. Three independent BEAST experiments were run with a different random seed. Convergence and effective sample size was monitored with Tracer v1.6. All gene trees from each independent run were combined with LogCombiner v2.1.3 with a burn-in of 10%. The consensus tree was generated with TreeAnnotator v2.1.3 with the target tree type set to maximum clade credibility tree and node heights set to mean heights.All trees generated from these analyses () were imported into FigTree v1.3.1 (http://tree.bio.ed.ac.uk/software/figtree/). Isolates were assigned to a clade number if they were recovered as a distinct group in the strict parsimony analyses and with >80% support values in the NJ, maximum likelihood and Bayesian analyses. We started the assessment on the right hand side of the tree (most recent in molecular time) and worked to the left, using groupings in the initial NJ tree based on the concatenated alignment.After the isolates were assigned to clades, we used the species phylogeny approach by [] implemented in *BEAST. *BEAST infers a species tree by considering divergence times, population sizes, and gene trees from multiple genes sampled from multiple individuals using a mixture of coalescent and Yule processes. Alignments were imported into the *BEAST template inside BEAUTi. We used the same setup parameters as for the Bayesian analysis described above for the site models, clock models and priors. Additionally, the Yule model conditional on the root was chosen for the species tree branching prior, the species birthrate and the population mean prior distributions were set to normal. Each strain was designated as a separate species using a mapping tab delimited file. Isolates that lacked sequence information for certain genes were included but the missing sequences were filled in with “?”, treated by BEAST as missing information. As above, the MCMC chain length was set to 1.0 x 108 and storing one tree every 20000 generations. A total of 3 independent *BEAST experiments were run with a different random seed. Convergence and effective sample size was monitored with Tracer v1.6. The species trees from all independent runs were combined with LogCombiner v2.1.3 with a burn-in of 25%. The consensus species tree was generated with TreeAnnotator v2.1.3 with the target tree type set to maximum clade credibility tree and node heights set to mean heights.To provide stronger support for the species hypothesis, a species delimitation analysis was conducted using the program BPP3 [], [], which uses a Bayesian approach to evaluate species delimitation. We used the preliminary NJ tree from the concatenated data set of all aligned genes described above as a guide tree. This method accommodates the species phylogeny as well as incomplete lineage sorting caused by ancestral polymorphism. A gamma prior G(2, 1000), with mean 2/2000 = 0.001, is used on the population size parameters (θs). The age of the root in the species tree (τ0) is assigned the gamma prior G(2, 1000), while the other divergence time parameters are assigned the Dirichlet prior []. Each analysis was run three times to confirm consistency between runs.To compare the resolution of these markers as potential secondary barcodes, MEGA 5 [] was used to calculate uncorrected pair wise distances (p-distance) between each sequence for each gene. This information was used to calculate the between clades and within clades p-distances using Microsoft Excel.All sequences were deposited in GenBank (). Alignments and trees were deposited in TreeBASE under study accession no. S15232. […]

Pipeline specifications

Software tools MAFFT, BioEdit, SeaView, PAUP*, RAxML, jModelTest, BEAST, FigTree, BPP, MEGA
Databases TreeBASE
Application Phylogenetics
Organisms Wallemia sebi