Computational protocol: Driving south: a multi-gene phylogeny of the brown algal family Fucaceae reveals relationships and recent drivers of a marine radiation

Similar protocols

Protocol publication

[…] Lyophilized tissue was powdered for 5 min on a Mixer Mill (MM 300 - Retsch, Germany) and total RNA was isolated using the extraction method as described in Pearson et al. []. RNA integrity was confirmed by electrophoresis on 1.2% denaturing agarose gels. For reverse transcription, a solution of 1 μg total RNA, 1 mM dNTPs and 5 μM oligo d(T) was denatured at 70°C for 5 min and placed on ice for > 1 min. First Strand Buffer, DTT (0.1 M), RNase OUT and SuperScript™III (Invitrogen) were added, the mix was incubated at 55°C for 1-2 h, and the reaction was then heat-inactivated at 80°C for 10 min. A total of 13 coding regions were selected for sequence analysis (Additional file ). Specific primers were designed from Expressed Sequence Tag (EST) consensus sequences in F. vesiculosus or F. serratus [] using Primer3 software version 0.4.0 []. PCR was carried out in 20 μl reaction volumes containing 1-3 μl of cDNA (1/40 dilution) as template, 1.5 mM, 0.2 μM dNTPs, 0.5 μM of each primer and 1 U of Taq polymerase, with the following conditions: initial denaturation at 94°C for 3 min; 35 cycles of denaturation at 94°C for 20 s, annealing at 58°C for 90 s and a final extension at 65°C for 5 min. Products were sequenced at the Centre of Marine Sciences, University of Algarve (ABI 3130xl). The resulting chromatograms were analyzed using CodonCode Aligner v1.6.3 (CodonCode Corp., Dedham, Massachusetts, USA). [...] The specificity of the cDNA primer sequences was too great to allow amplification of gene products outside the family Fucaceae, specifically for the sister families Xiphophoraceae and Hormosiraceae [,,]. In order to include in the multi-gene phylogenetic estimations the sister families outside the Fucaceae, we used additional ITS sequence information from a previous study [], but applyed more advanced methodological analyses. Those sequences were first re-aligned using MAFFT v6 [], using the E-INS-i option recommended for sequences with multiple conserved domains and long gaps []. K80 plus I plus G was selected as the best model fit to the nucleotide data set based on AIC as implemented in MrModeltest []. ITS dataset was analysed using maximum likelihood and Bayesian approaches as described above (see multi-gene phylogenetic analyses section). This rooted phylogenetic resolution of the genera in the Fucaceae based on ITS sequences was used to infer the basal genera of the family Fucaceae. These genera, Ascophyllum and Silvetia were used as outgroup to root the multi-gene phylogenetic analyses aimed at inferring the order of the previously unresolved speciation events. [...] The cDNA sequence dataset (Additional file ) was aligned first using MAFFT v6 [], using the G-INS-i option recommended for sequences with global homology []. Models of sequence evolution were selected based on Akaike Information Criterion (AIC) as implemented in MrModeltest v2.3 [] for each of the 13 partitions defined by each gene: Hasegawa-Kishino-Yano model (HKY; []) was most appropriate for the 1st, 11th and 12th partitions, HKY plus I for 5th, 6th, 7th and 10th partitions and HKY plus G for 13th partition; Kimura 2-parameter (K80; []) for 8th and 9th partitions, plus I for 2nd partition; Symmetrical model plus G (SYM; []) for 3rd partition; and General Time Reversible (GTR; []) plus I for 4th partition. The combined data set was analyzed as one partition using the GTR model plus I and G.Maximum likelihood bootstrap analysis with 999 replicates was performed to infer the phylogenetic relationships for the combined data set using PhyML v3.0.1 []. The substitution parameters were estimated over a neighbor-joining tree. Tree searching operations were set to best of nearest-neighbour interchange (NNI) with subtree pruning and regrafting (SPR). Partitioned Bremer support analysis [] was performed using TreeRot v2 [,], in order to provide a measure of how the different partitions of the data contributed to the Decay index for each node in the context of the combined data analysis.Bayesian inferences were performed with MrBayes v3.1.2 []. For the partitioned analysis, the substitution model and branch length estimates were allowed to vary independently in each partition. General forms of these models were used since there is a specific recommendation against the use of fixed priors for a and I in the software manual in order to explore more efficiently different values of these parameters. The number of generations was set to 106 with a sampling frequency of 1000 generations in a dual running process with four chains per run []. Majority rule consensus trees were computed after discarding the first 25% of the trees as burn-in, which were saved prior to MCMC convergence. Support for clades given by posterior probabilities was thus represented by the majority rule percentage. [...] Two major problems preclude a well-defined fossil record for the brown algae: a) almost all brown algae are uncalcified; b) misidentification due to the morphological similarities with some members of the Rhodophyta []. Brown algae are known, however, from Miocene rocks in California and diatomaceous sediments in Central Europe [,]. Some of these can be directly compared to genera of the extant family Sargassaceae, as Cystoseirites (similar to Cystoseira) or Paleohalidrys (which has modern representatives) that are in the order Fucales, and provide a valuable framework for evolutionary parameter estimation and molecular dating of Fucaceae [].Likelihood ratio tests significantly rejected a strict (uniform) molecular clock for the alignment. Node age estimates were therefore obtained by Bayesian-calibrated phylogenies using an uncorrelated log-normal relaxed clock as suggested for protein-coding genes in a broad variety of species []. Gene-specific gamma-distributed rate heterogeneity among sites and partition into codon position allowed separate estimation of non-synonymous and synonymous sites []. The HKY model of evolution was defined as proposed by Shapiro et al. [] for coding regions. Tree priors were fixed on the coalescent, using constant population size and expansion growth, and on Yule speciation models of demographic history. Monophyletic constraints were imposed for the nodes that were used to calibrate the evolutionary rates. Uniform priors were used for the tmrca of the Fucaceae family (Aquitanium to Tortonian age from Miocene epoch: minimum age of 7 Myr; maximum age of 23 Myr; based on [] and previous analyses using 5.8S ribosomal nuclear DNA together with ITS-1 and ITS-2 regions; see Additional file ). Tree priors were used for the tmrca of the Fucus genus. MCMC chains were run in BEAST v1.5.4 for 107 generations, with burn-in and sampling as described above []. Identical sequences or those with genetic distances less than 0.002 were removed prior to the analyses in order to prevent nodes without longitude on the dated reconstruction. Convergence and stationarity of the chains was evaluated by plotting trace files in Tracer v. 1.4 []. Phylogenetic trees were represented using R statistical software v2.13.0 [] together with "ape v2.5-1" library []. [...] Methods to estimate the influence of species' traits on lineage diversification have improved with recent advances in the detection of phylogenetic signatures of state-dependent speciation and extinction []. In particular, hypotheses of trait acquisition for a binary character and asymmetry in the direction of trait evolution can now be tested through the formulation of a model []. For example, mating system is likely to confer unequal probabilities of speciation and extinction. Two states of the character were used for mating system evolution (dioecious vs. hermaphroditic), under one-parameter (MK1) and asymmetrical 2-parameter (MK2) Markov k-state models [-]. The binary state speciation and extinction model (BiSSE, []) was also used to avoid incorrect rejection of irreversible evolution [].Alternative hypotheses concerning geographic range evolution and diversification (Pacific vs. Atlantic), were also tested using a geographic state speciation and extinction model (GeoSSE; []). We applied the model to test the relative contributions of speciation, extinction, and dispersal to diversity differences between oceans []. We also considered different combinations of state-independent and state-dependent diversification, and dispersal (Table ).BiSSE and GeoSSE model assumptions were satisfied through the use of the best rooted tree based on the dated ITS and multi-gene phylogenies: i) rooted phylogenetic tree with branch lengths; ii) contemporaneous terminal taxa and; iii) ultrametric tree []. Characters were binary with known state for each of the terminal taxa. Models were fitted by maximum likelihood nonlinear optimization from a heuristic starting point based on the character-independent birth-death model. Model results were evaluated and compared using the logarithm of the likelihood and the AIC values for the final fitted models. Ancestral character states and the associated uncertainty were also estimated from the scaled likelihood of each character state. Analyses were carried out using the R statistical software [], with "diversitree v0.7-2" and "ape v2.5-1" packages [,-].The dispersal-extinction-cladogenesis (DEC) likelihood model was also implemented to infer geographic ancestry and estimate rates of dispersal and local extinction [,]. Unconstrained and stratified biogeographical models were considered. The latter model stratified the phylogeny into different time slices, reflecting the Bering Strait configuration over time while considering divisions that retained enough phylogenetic events []. Five time slices were chosen that reflect the hypothesized openings of the Bering Strait during the history of Fucaceae: between 13 and 11 Ma, between 7.3 and 6.6 Ma, between 5.5 Ma and 4.0, between 3.6 Ma and 3.2, and between 2.5 and the present day (see Figure for a detailed time-placement of the recurrent opening events [,]). For each time slice, we defined a Q matrix in which transition rates were made dependent on the geographical connectivity between areas (i.e. opening and closing of the Bering Strait). Lagrange analyses were configured using the web application from the same authors (URL: http://www.reelab.net/lagrange/configurator;[,]) and run locally using Lagrange v.20110117 []. Results were summarized and plotted using the R statistical software [] with the "ape v2.5-1" package []. […]

Pipeline specifications

Software tools MAFFT, MrModelTest, PhyML, MrBayes, BEAST, APE, Diversitree
Application Phylogenetics
Diseases Disorders of Sex Development