Computational protocol: Did genome duplication drive the origin of teleosts? A comparative study of diversification in ray-finned fishes

Similar protocols

Protocol publication

[…] RAG1 sequences for 225 species of bony fish (including three species of lungfish, one species of coelacanths and 221 species of ray-finned fish), and two species of sharks, which we used as outgroups, were downloaded from GenBank (Additional file ). The sampling was selected in order to both maximize the number of taxonomic groups that we could include in our analysis, and the number of fossil calibration points that could be assigned to the phylogeny. Sequences were aligned automatically using ClustallW [], and the alignment was then refined by eye using MEGA 4 []. A survey of the fossil fish literature allowed us to identify 45 calibration points that were used to date 44 clades identified in the tree as well as the root of the tree (Additional file ). We used BEAST v 1.4.6 [] to estimate divergence times under a model of uncorrelated but log-normally distributed rates. We assigned soft upper bounds to the prior distributions of all fossil calibrations using log-normal distributions as described in Table . We specified a Yule prior on the rates of cladogenesis. The data set was assumed to have evolved under a GTR model with invariant sites and gamma-distributed rate heterogeneity. We constrained the monophyly of a number of groups in order to reflect generally accepted phylogenetic relationships. Five independent analyses of 20,000,000 generations each were run. Output from each run was analyzed using TRACER 1.4 []; 25% of the trees were discarded as burnin, and the remaining were combined using TreeAnnotator 1.4.6 to produce the timescale. [...] MEDUSA [] is an extension of the flexible rate shift model introduced by Rabosky et al. []. Rabosky's approach combines two likelihoods. The first is called the phylogenetic likelihood and uses the timing of splits along the resolved backbone of a phylogenetic tree to find maximum likelihood estimates for birth and death rates following equations developed by Nee et al. []. The second is called the taxonomic likelihood and uses information about the total species richness of an unresolved tip clade on a phylogeny along with the age of the split between the unresolved clade and its sister group to estimate diversification rates following methods developed by Magallon and Sanderson []. Rabosky et al. [] presented a likelihood ratio test for a model where birth and death rates are allowed to shift on one branch of a phylogeny with unresolved tip clades to a model where birth and death rates are held constant across the tree. MEDUSA extends this procedure by adding rates in a stepwise fashion. First, the AIC score of a model with a single birth and death rate is calculated for the unresolved tree using the combined likelihood estimator presented by Rabosky et al. []. This two parameter model is then compared to the best four parameter model (two birth rates and two death rates) where the birth rate and the death rate are allowed to shift on the branch in the unresolved tree that produces the greatest improvement in the likelihood score. If the difference in AIC score between the two and four parameter models is substantial (ΔAIC ≥ 4, []) then this rate shift is retained. Next the four parameter model is compared to the best six parameter model by finding the optimal place on the tree for a third rate shift. The process is continued until additional rate shifts no longer produce a substantial improvement in AIC score. Full description of MEDUSA is present in Additional file .To implement MEDUSA with the actinopterygian data, first we assembled taxonomic richness data from FISHBASE [] for major lineages of fishes. Then we pruned the timetree in Fig. , , down to 27 representative lineages. Our goal in pruning down the timetree was to preserve as much of the backbone of the timetree as would still permit us to assign species richness unambiguously to tip lineages. Thus, for example, we did not retain splitting events within Percomorpha because, although it was possible to assign species richness to some percomorph subclades such as tetraodontiforms, we could not confidently assign the entire species richness of other percomorphs to lineages included in our sampling. We used this pruned chronogram plus the taxonomic richness to estimate birth and death rates for ray-finned fishes and tested for rate shifts across the tree in R [] using the LASER [] and GEIGER [] packages. […]

Pipeline specifications

Software tools MEGA, BEAST, GEIGER
Databases FishBase TimeTree
Application Phylogenetics
Organisms Dipturus trachyderma