Computational protocol: A remarkable diversity of bone-eating worms (Osedax; Siboglinidae; Annelida)

Similar protocols

Protocol publication

[…] Sequences were assembled using CodonCode Aligner v. 2.06 (CodonCode Corporation, Dedham, MA, USA), aligned using Muscle [] and edited by eye using Maclade v. 4.08 []. We used MrModelTest [] and the Akaike information criterion [] to determine appropriate evolutionary models for each gene (Table ). COI and H3 were partitioned by codon position, and parameters were estimated separately for each position. RNA secondary structures were predicted with GeneBee and used to partition stems and loops in 16S, 18S, and 28S sequences. The doublet model was used for RNA stems and a standard 4 × 4 nucleotide model was used for RNA loops. The number of indel haplotypes for rRNA sequences (total number of indels, number after excluding overlapping indels, and average length of indels) were estimated with DNAsp v. 4.90.1 [] using the diallelic model. Gaps in the RNA sequences were treated as a fifth character-state in subsequent Bayesian phylogenetic analyses and as missing data in parsimony and maximum likelihood (ML) analyses. The program DAMBE [] was used to examine saturation of the mitochondrial COI sequences for the Osedax OTUs and outgroup taxa.First, each gene was analyzed separately using MrBayes v. 3.1.2 [,]. Bayesian analyses were run as six chains for 5·106 generations. Print and sample frequencies were 1,000 generations, and the burn-in was the first 100 samples. We used AWTY [] to assess whether analyses reached convergence and FigTree v. 1.1.2 [] to display the resulting trees. We then used the incongruence length difference (ILD) function implemented in Paup* v. 4.0 [] to assess congruence of the tree topologies produced by the individual gene partitions. ILD tests were conducted both with and without the outgroup taxa. The ILD partition homogeneity test was run for 1,000 replicates with 10 random additions of gene sequences.A combined analysis was conducted with concatenated sequences from the five genes. If available, multiple individuals of each OTU were sequenced for each gene; however, the concatenated multilocus sequences used in the phylogenetic analyses were obtained from a single representative individual for each OTU. The five gene regions were partitioned separately according to the previously determined model parameters. Bayesian phylogenetic analyses were then conducted with MrBayes v. 3.1.2. Maximum parsimony analysis of the combined data set was performed with Paup* v. 4.0 [] using an equally weighted character matrix, heuristic searches using the tree-bisection-reconnection branch-swapping algorithm, and 100 random addition replicates. The resulting shortest tree included 3481 steps. A parsimony jackknife analysis (with 37% deletion) was run for 100 iterations with the same settings as the parsimony search. ML analysis was conducted using RAxML 7.0.4 (with bootstrapping) using GTR+I+G as the model for each partition on combined data. RAxML analyses were performed with the CIPRES cluster at the San Diego Supercomputer Center. [...] A Bayesian, MCMC method implemented in Beast v. 1.4.8 [] was used to estimate the evolutionary ages of internal nodes in the tree topology derived from the combined phylogenetic analysis. Estimates of the time to most recent common ancestor (T) were based on two calibrations nucleotide substitution rates for mitochondrial COI. Substitution rates (r) were estimated as percentage per lineage per million years (my) so they equal one-half the divergence per unit of time (T) between taxa (r = 100 × D/2T). First, we assumed a conventional substitution rate, r1 = 0.7%, based on D = 1.4% per my pairwise divergence rate commonly cited for shallow water marine invertebrates that were isolated by the emergence of the Isthmus of Panama []. Second, we used a slower rate, r2 = 0.21%, previously calibrated from a vicariant event that split cognate-species of deep-sea hydrothermal vent annelids between the East Pacific Rise and the northeastern Pacific ridge system about 28.5 myA []. Calibrations were not available for the other genes.We used a relaxed, uncorrelated, lognormal molecular clock with a general time reversible (GTR) substitution model that was unlinked across codon positions. Initial MCMC test runs consisted of 10 million generations to optimize the scale factors of the prior function. Three independent MCMC chains were run for 100 million generations, sampled every 1000 generations. Results were visualized in and FigTree v. 1.1.2 and Tracer v. 1.4 []. […]

Pipeline specifications

Software tools CodonCode Aligner, MUSCLE, MrModelTest, DnaSP, DAMBE, MrBayes, AWTY, FigTree, RAxML, BEAST
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Caenorhabditis elegans