## Similar protocols

## Protocol publication

[…] The alignment was straightforward and required only a few unambiguous gaps in the Mll intron. Analyses were conducted separately for the Cyt b gene and the combined nuclear genes, and for each dataset, phylogenetic trees were obtained using maximum parsimony (MP), maximum likelihood (ML), and Bayesian methods. Because saturation of base substitution was suspected in the Cyt b data, we partitioned the data by codon positions and plotted transitions against transversions to detect potential saturation [] using **PAUP*** version 4.0b10 []. This analysis suggested that the third codon position of the Cyt b gene suffered from substitution saturation (Additional file ). However, trees reconstructed with and without the third codon position did not differ significantly in terms of resolution and branch support; thus, we only report results obtained using all positions combined.Using PAUP*, we performed MP analysis with 100 random addition analyses and tree bisection-reconnection (TBR) branch-swapping and we validated the robustness of the MP trees using bootstrap analysis with 1000 replications. Before the ML analysis, the program **Modeltest** 3.0 [] was used to select appropriate models of base substitution and to estimate model parameters. ML analysis of the Cyt b dataset proved to be very computer-intensive using the TBR swapping algorithm and so was done using **PhyML** [], while that of the combined nuclear dataset was performed using PAUP* with ten random addition analyses and TBR branch swapping. ML bootstrap analyses of both datasets were conducted with PhyML with 100 replications. For the Bayesian analysis, we used the program **MrModeltest** 2.2 [] to select an appropriate model of base substitution for each gene partition. Based on the selected models, we performed Bayesian analyses using **MrBayes** 3.1.2 [] with the unlink command (nuclear dataset only) to estimate substitution parameters separately for each partition. Analyses consisted of running four simultaneous chains for 2 × 106 generations, sampling trees every 1000 generations for a total of 2001 trees. We plotted the ln-likelihood of the sampled trees against generation time to identify the region of the analysis in which the parameter estimates were stable. Then, we discarded the burn-in region (trees and parameter estimates obtained before equilibrium; the initial 1001 trees), and the remaining 1000 samples were used to estimate the tree topology, branch lengths, and substitution parameters. [...] Number of vertebrae was counted for 1–19 representative individuals for each taxa identified from the Cyt b phylogeny (Additional file ). Vertebral counts were taken from radiographs produced with Softex Type C-60 (Softex, Tokyo, Japan) exposed at 20 KVp and 4 mA for 20 seconds. Then, the average vertebra number for each taxon was mapped onto the combined nuclear ML phylogeny, and ancestral state estimates of vertebra number were obtained using the squared change parsimony algorithm as implemented in **Mesquite** version 1.12 []. To test for correlated evolution between interstitial habitat use and vertebra number along cladogenesis, we calculated independent contrasts [] of each of the variables using the PDAP module of Mesquite []. Each species was assigned as either interstitial (1) or non-interstitial (0) as designated in Table . The use of discrete character states does not violate the assumption of independent contrasts []. [...] Combined nuclear gene ML phylogeny was transformed to an ultrametric tree using the penalized-likelihood relaxed clock approach with the program r8s []. Since a reliable maximum prior for the age of the root node is not available, we first obtained a linearized tree by fixing the age of the root node at one arbitrary time unit (without any internal constraints) with a smoothing parameter of 10 as inferred by cross validation. This analysis produces an ultrametric tree with fixed relative node heights but without absolute ages. Divergence times on this tree were then calibrated using (1) the split between Gymnogobius isaza and G. urotaenia, which was previously estimated at 2.3 Ma based on geological calibration [] and (2) the divergence rate of the Cyt b gene. For the later approach, every node on the ultrametric tree obtained by penalized likelihood was used individually as a calibration point to obtain a conservative range of estimates for the MRCA of the interstitial Luciogobius species. We used the Cyt b divergence rate because it is generally conserved across fish taxa, has been used previously for dating in gobies [], and clusters at approximately 2.7% per My in gobiid taxa []. Using the 2.7% per My substitution rate, the average pairwise ML distance across each node was transformed to absolute age, and this was used individually as a calibration point. We obtained 95% credibility intervals of node estimates by producing 100 bootstrapped datasets using the **SeqBoot** program in the PHYLIP package [], estimating branch lengths of the above ML phylogeny under the substitution model used in the ML calculations, and performing divergence date estimates as described above.In addition to the above two approaches using internal calibration points, we also adjusted the age of the root node so that the overall rate of Cytb substitution equals the previously inferred rate of 0.0135 substitutions/site/My, which is half the 2.7% divergence rate []. We first calculated branch lengths of the ML topology using the Cyt b data with a substitution model and parameters inferred using Modeltest. This tree was linearized by fixing the age of the root node, and substitution rate along each branch calculated, using the program r8s. We iterated the age of the root node until an average substitution rate comparable to 0.0135 substitutions/site/My was found. The inferred root node age was then used to estimate the age of the MRCA of interstitial Luciogobius by performing a penalized likelihood analysis on the combined nuclear ML phylogeny. […]

## Pipeline specifications

Software tools | PAUP*, ModelTest-NG, PhyML, MrModelTest, MrBayes, Mesquite, PHYLIP |
---|---|

Application | Phylogenetics |

Diseases | Hypertension |