Computational protocol: Phylogeny and mitochondrial gene order variation in Lophotrochozoa in the light of new mitogenomic data from Nemertea

Similar protocols

Protocol publication

[…] Sequences were assembled using Bioedit []. Detection and annotation of tRNA genes was done making use of ARWEN [] and tRNA scan SE []. Protein-coding and rRNA genes were firstly identified by BLAST search, then gene boundaries were detected in comparison with alignments of several lophotrochozoan taxa. Nucleotide composition was computed using Bioedit and GC- and AT-skew was determined by using the formulation of Perna and Kocher []. [...] For phylogenetic analysis a concatenated dataset of mitochondrial amino acid alignments from 12 genes was built. The gene atp8 was excluded from the analysis, due to the fact that it is missing from many genomes (nematodes, platyhelminthes, chaetognaths), and that it is the smallest and least conserved of the protein-coding genes. Sequence data from 104 species, most of them with complete mt genome entries were retrieved from GenBank, for accession numbers see Additional file . Alignments were done with ClustalW [] as implemented in Bioedit []. For the large dataset non-conserved sites were excluded from likelihood analyses making use of the Gblocks software [], with the following parameter settings: minimum number of sequences for a conserved position: 55; minimum number of sequences for a flanking position: 55; maximum number of contiguous nonconserved positions: 8; minimum length of a block: 10; allowed gap positions: with half. In this case 2294 amino acid sites (= 49%) were recovered from the original dataset of 4654 amino acids. For maximum likelihood analysis, we used RAxML 7.0.4 [,] as offered on the CIPRES web portal. We choose mtRev+G+I, because mtRev was the only model derived from mitochondrial data available on this platform. We performed a search for the best tree and 100 bootstrap replicates. For more sophisticated analyses we chose a smaller dataset focussed on Lophotrochozoa (26 species) and using four species of Ecdysozoa and Deuterostomia representing the outgroup to Lophotrochozoa. Due to the better conservation among the alignments we used the complete alignments of twelve protein-coding genes and built a concatenated alignment with a final length of 3820 amino acids.We used this smaller dataset to test different models in maximum likelihood analysis (mtRev, mtZoa), run a Bayesian analysis and performed hypothesis testing of alternative topologies. With the smaller dataset a partitioned model optimization was done in that we partitioned the dataset according to the 12 genes. Besides RAxML with mtRev+G+I (100 bootstrap runs) we used Treefinder v. Oct 2008 [] to perform a maximum likelihood analysis with mtRev+G+I and the self implemented mtZoa+G+I model (each with LR-ELW, 1000 replications). The mtZoa model is optimzed for amino acid alignments from lophotrochozoan taxa []. In all likelihood analyses, models were the same for each partition but optimized in an unlinked manner between the partitions. In addition a Bayesian analysis was performed with MrBayes 3.1.2 []. 1,000,000 generations of two times four parallel chains were run, by sampling one tree out of thousand. According to the log likelihood plots 200 trees were discarded as burnin. Model settings were mtRev+G+I (unpartitioned due to time limitations). Hypothesis testing was done by computing best trees and per site likelihoods with RAxML (mtRev+G+I) for a set of constrained trees. Per site likelihoods were used to perform the AU-test [], by making use of CONSEL 0.1j []. […]

Pipeline specifications

Software tools BioEdit, ARWEN, tRNAscan-SE, Clustal W, Gblocks, RAxML, MrBayes, CONSEL
Applications Genome annotation, Phylogenetics
Organisms Lineus viridis, Terebratulina retusa
Diseases Dracunculiasis