Computational protocol: Phylogenomic Analyses Support Traditional Relationships within Cnidaria

Similar protocols

Protocol publication

[…] All 454 data were assembled with Newbler (version 2.5.3). Agalma (versions 0.4.0–0.5.0) [] was used for all other analysis steps through supermatrix construction. A git repository of the analysis code is available at https://bitbucket.org/caseywdunn/cnidaria2014. This source code is sufficient to reconstruct the supermatrix from the data, and includes all settings and parameters used for these intermediate steps. Agalma is a workflow that automatizes all the steps in a phylogenomic analysis, and keeps track of data provenance and parameters used in the analysis, allowing full reproducibility of the results. It takes Illumina sequence reads and after filtering and quality control, it generates fully annotated assemblies. Externally assembled transcriptomes can also be imported into Agalma for downstream analysis. Across species, Agalma identifies homologous sequences, determines gene orthology based on gene tree topology, and generates a supermatrix of concatenated orthologous genes.We sampled 1,262 genes to generate a supermatrix with 50% occupancy. This matrix has a length of 365,159 aa (). Three taxa, Calibelemnon francei, Craspedacusta sowerbii, and Obelia longissima, had less than 5% occupancy and were excluded from further analyses. The primary matrix (matrix 1) used for all phylogenetic analyses therefore has 38 taxa and 54% gene occupancy. From this matrix, we constructed a reduced matrix (matrix 2) from which two poorly sampled taxa, the ceriantharian (16.6% gene sampling) and Haliclystus sanjuanensis (6.5% gene sampling), were also removed since they were unstable in the primary analyses. This produced a reduced matrix with 57% gene occupancy.We inferred phylogenetic relationships using both Maximum Likelihood (ML) and Bayesian Inference (BI) approaches. For ML, we used ExaML v 1.0.12 [] with the WAG+Γ model of amino acid substitution on the unpartitioned matrices 1 and 2. We also ran a partitioned ML analysis on matrix 1 according to results of PartitionFinder v 1.1.1 []. For PartitionFinder, we used genes as initial partitions, linked branch lengths across partitions, used the Bayesian Information Criterion (BIC) to select among all models of amino acid substitution, and used the relaxed hierarchical clustering algorithm to search for a good partitioning scheme. Bootstrap values were estimated on the unpartitioned and partitioned analyses with 200 replicates. BI was conducted on PhyloBayes MPI v. 1.4e [] using the CAT model of evolution [] with the global exchange rates fixed to uniform values (CAT-Poisson) and inferred from the data (CAT-GTR). For these analyses, constant sites were removed from the alignment to improve MCMC mixing []. Two independent MCMC chains were run on matrix 1, adjusting the number of cycles until convergence was achieved. Analyses with the CAT-GTR setting did not converge despite long CPU time, thus we do not include results from these analyses here. Convergence was determined with time-series plots of the likelihood scores, and maximum bipartition discrepancies across chains less than 0.1. Post-burn-in (50%) sampled trees were combined and summarized with a majority rule consensus tree. […]

Pipeline specifications

Software tools Newbler, Agalma, ExaML, PartitionFinder, PhyloBayes
Application Phylogenetics