Computational protocol: A well-constrained estimate for the timing of the salmonid whole genome duplication reveals major decoupling from species diversification

Similar protocols

Protocol publication

[…] Transcriptome assemblies were generated for Oncorhynchus mykiss, Salmo salar and Coregonus clupeaformis using Sanger and Roche 454 sequences from NCBI ( We created local BLAST [] databases for these species, as well as Thymallus thymallus, Osmerus mordax and Esox Lucius, incorporating all available NCBI sequences. BLASTn identified 98 sequences that were putative one-to-one orthologues in E. lucius and O. mordax, which, in turn, were used in BLASTn searches against NCBI and local databases, revealing 56 putative paralogue pairs common to S. salar and O. mykiss, often represented by T. thymallus and C. clupeaformis. BLASTp searches against NCBI identified putative orthologues from Acanthoptergii and Ostariophysi. Comparative genomics was performed in Ensembl ( [...] Before performing sequencing experiments (see below), we scrutinized expectations of teleost-wide orthology and the salmonid WGD in bioinformatics-derived sequence datasets where at least two salmonid subfamilies were represented. Phylogenetic analyses were performed using ML, MP and NJ in Mega v. 5.0 [], and a BY method in BEAST v. 1.7.4 []. The BY analysis included an uncorrelated lognormal relaxed molecular clock (ULRC) model and a Yule speciation tree prior []. Tracer v. 1.5.0 was used to confirm MCMC sampling convergence in all BEAST analyses described from this point onwards. All sequence alignments described hereafter were performed in MAFFT v. 7 []. A priori criteria for teleost-wide orthology were based on branching patterns from a comprehensive multi-loci phylogenetic study spanning teleost evolution []. Thus, Ostariophysi was expected to split from other sequences at the tree root, estimated under the BY approach []. Using comparative genomics, we also demonstrated that the sequences did not include paralogues retained from the teleost WGD []. The criterion for the salmonid WGD was that salmonid sequences would form a sister group to E. lucius [], splitting into two paralogous clades represented by multiple species. When T. thymallus and/or C. clupeaformis sequences branched in one paralogous clade represented by both species of Salmoninae, we designed primers targeting cDNAs in these subfamilies (see electronic supplementary material, table S4). [...] Phylogenetic analysis was performed separately on 27 paralogous datasets including T. thymallus and C. laveretus sequences obtained experimentally. As teleost-wide orthology was strongly supported in preliminary analyses, we limited the data to include salmonids, E. lucius and O. mordax. Criteria for inclusion in combined analyses are given in . A custom R [] script generated and randomly sampled every possible concatenation of 18 separate WGD paralogue alignments meeting the stated criteria (produced by Dr Charles Paxton, School of Mathematics and Statistics, University of St Andrews). This allowed us to explore the effect of combining WGD paralogue data, where many unique concatenation possibilities exist. Accordingly, 50 randomly sampled concatenations were employed in ML, NJ and MP phylogenetic analyses, exploring the effect of the third codon position on the results (see electronic supplementary material, tables S1 and S6).Next, 36 true gene orthologues representing the 18 WGD paralogue pairs were combined into a single concatenation using E. lucius and O. mordax as outgroups to both salmonid paralogues. Phylogenetic analysis was performed employing multiple sequence character partitions (AA, nucleotides with all codon positions or just positions 1 and 2) using BY (BEAST) and ML (GARLI v. 2.0) [], employing a model identified by Partitionfinder [] as the best-fitting character partition (among different proteins or genes/codon positions). As supporting methods, we also performed NJ and MP analyses on multiple sequence character partitions. [...] A further time-calibrated BEAST tree was produced using CO1 sequences available for 65 salmonid species []. This was temporally calibrated using four deep-branching divergence times from the 7580 bp mitogenome tree, employing normally distributed priors spanning 95% credibility intervals. This was done with the explicit aim to assign additional species richness to the temporal framework estimated under the more character-rich (and presumably more robust) mitogenome-derived time scale. Several diversification analyses were performed using the CO1 tree with packages available through the R language. LTT plots were generated using phytools [], which was also used to perform a two-tailed constant-rates test based on the γ-statistic []. Analysis of temporal diversification patterns was also assessed by fitting and comparing survival models [] in Ape []. The BiSSE [] analysis was performed in DIVERSITREE [].Global sea-level estimates spanning 130 Ma to present were taken from the literature [] representing 1100 data points. Data means and s.d. were calculated spanning 1 Myr intervals, the first bin being 0–1 Ma. […]

Pipeline specifications

Software tools BLASTN, BLASTP, MEGA-V, BEAST, MAFFT, GARLI, PartitionFinder, Phytools, APE, Diversitree
Applications Phylogenetics, GWAS
Organisms Danio rerio