Computational protocol: MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics

Similar protocols

Protocol publication

[…] Trees are estimated in MetaPIGA-2.0 with the Maximum Likelihood criterion (ML) using any of 5 nucleotide substitution models ([] and refs therein, []): Jukes Cantor (JC), Kimura's 2 parameters (K2P), Hasegawa-Kishino-Yano 1985 (HKY85), Tamura-Nei 1993 (TN93), and General Time Reversible (GTR). Analyses can be performed with rate heterogeneity among sites using a proportion of invariant sites (Pinv) [] and/or a discrete Gamma distribution of rates (γ-distr) [,]. All parameters of the model (transition/transversion ratio or components of the rate matrix, the shape parameter of the γ-distr, and Pinv) can be set by the user or estimated from a Neighbor Joining (NJ) tree []. The same parameters plus branch lengths and among-partition relative rates can experience intra-step optimization either periodically during the search and/or at the end of the search.Datasets can be partitioned into character sets ("charsets") either using a graphical tool (see below) or by writing the corresponding commands in a batch file. In MetaPIGA-2.0, we assume that all partitions evolve on the same topology (we therefore consider, like in the vast majority of phylogeny inference programs, that the analysis is performed on a non-recombining piece of DNA, such that the phenomena of hybridization and incomplete lineage sorting are ignored), but all other parameters (base frequencies, substitution matrix rates, shape parameter of γ-distr, and Pinv) are optimized separately for each partition. Among-partition rate variation parameters are introduced in the likelihood equation as a factor that modifies branch length for the corresponding partition. Branch lengths are optimized as usual, but the relative rates of partitions are optimized separately (with the constraint that the weighted average of among-partitions rates is 1; weighting is according to each partition size). [...] MetaPIGA-2.0 is written in Java 1.6 such that the single code runs on 32 and 64-bits platforms under MacOS X, Linux, and Windows. Computing and storing the likelihood of large trees requires large amount of Random-Access Memory (RAM). Whereas 32-bits systems can allocate a maximum of ~2Gb of memory to the Java Virtual Machine (JVM), 64-bits systems are virtually limited only by the amount of memory installed on the computer (as the theoretical limit is about 18 billions gigabytes). MetaPIGA-2.0 uses the Java Multi-Threading technology to take advantage of multiprocessor and multicore computers, such that some tasks can be run in parallel. As replicates are independent, they are particularly prone to parallelization: any number of different cores/processors can be assigned to different replicates. In addition, the metaGA heuristic itself is well suited to parallel implementation because processes such as mutation, selection, and likelihood computation are unrelated to CP and are therefore independent across populations. Hence, different metaGA populations can be distributed to different cores/processors. Parallelization of metaGA populations can be combined with parallelization of replicates (e.g., 16 cores allow running simultaneously 4 metaGA replicates with 4 populations/replicate).MetaPIGA-2.0 uses standard formats: reading and writing datasets in Nexus format [] and trees in Newick format All search settings can be saved in a metaPIGA block incorporated into the Nexus file, allowing easy management and command line runs. A Nexus file without a metaPIGA block will be correctly interpreted by MetaPIGA-2.0 and will run with default parameters.MetaPIGA-2.0 can be run in command line but it also offers an extensive graphical user interface (GUI) for access to all search settings: defining and managing charsets; including/excluding taxa, characters, and charsets; managing dataset partitions; choosing and parametrizing heuristics (Figure ); defining substitution models and their parameters (Figure ); choosing starting tree options; controlling operators (Figure ); defining stop criteria and replicates. All settings are associated with an interactive "mouse-over" help system. MetaPIGA-2.0 also implements three statistical methods (Figure ) for selecting the substitution model that best fits the data ([]; and refs therein): the Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion. The MetaPIGA-2.0 GUI provides a detailed run window showing graphs specific to the chosen heuristic (e.g., for a metaGA search with replicates: current best likelihood progression of each population as well as the current topology, branch support values, and the average branch lengths of the consensus tree; Figure ).Batch files are particularly useful for running sequentially a single data set under multiple different settings and/or several datasets with the same settings. MetaPIGA-2.0 supports the use of batch files which can be written manually or generated using tools available in the GUI: datasets and their settings can be duplicated, settings can be copy-pasted from one dataset to another, and multiple combinations of datasets and settings can be saved in a batch file that can be run either in the GUI (with various graphical information on search progress) or using command line.Input and result trees are manipulated in Newick format but visualized graphically in the GUI and can be exported for other programs. MetaPIGA-2.0 also integrates a Tree Viewer (Figure ) that allows viewing, rerooting, and printing trees as well as computing the likelihood of any tree (under any substitution model) and optimizing its model parameters. Three other tools are implemented in MetaPIGA-2.0: a tree generator (using the starting tree settings), a consensus builder (using user trees and/or trees saved in the Tree Viewer), and a memory setting tool defining the maximum amount of memory allocated to the program. […]

Pipeline specifications

Software tools MetaPiga, PHYLIP, newicktree, ATV
Application Phylogenetics