Computational protocol: Broadly sampled multigene trees of eukaryotes

Protocol publication

[…] To align SSU-rDNA sequences, we used HMMER v2.1.4 [] whereas protein-coding genes were aligned by Clustal W []. For the SSU-rDNA alignment, we aligned the sequences using HMMER while incorporating secondary structure. These sequences were downloaded from The European Ribosomal Database []. The resulting alignment was further edited manually in MacClade v4.05 []. Protein coding genes were aligned as amino acids using Clustal W [] as implemented in DNAstar's Lasergene software and manually adjusted in MacClade v4.05 []. For the phylogenetic analysis, we restricted our analysis to unambiguously aligned regions for which we were confident in positional homology as assessed by eye. For a subset of our analyses, we tried two different masks (conservative vs. liberal) of ambiguous positions and found no significant differences in inferences from topologies and support (data not shown).Genealogies were inferred using MrBayes [], RAxML [] and PHYML []. Bayesian analyses were performed with the parallel version of MrBayes 3.1.2 using the GTR+I+ Γ (for nucleotide) and RtREV (for amino acid) models of sequence evolution []. Four to 16 simultaneous MCMCMC chains were run for 4 million generations sampling every 100 generations. Stationarity in likelihood scores was determined by plotting the -1nL against the generation. All trees below the observed stationarity level were discarded, resulting in a 'burnin' that comprised 25% of the posterior distribution of trees. The 50% majority-rule consensus tree was determined to calculate the posterior probabilities for each node. RAxML was run for 100 iterations using GTRGAMMA model for nucleotide data and PROTGAMMA with matrix RtREV for amino acid data. The datasets were partitioned to allow RaxML to assign different parameters for each gene. One hundred replicates for bootstrap analyses were run in RAxML and PHYML, and a 50% majority rule consensus was calculated to determine the support values for each node. MrModelTest [] and ProtTest 1.3 [] were used to select the appropriate model of sequences evolution for the nucleotides and amino acid data, respectively. […]

Pipeline specifications