Reconstructs gene-tree by using learning gene- and species-specific substitution rates across multiple complete genomes. SPIDIR achieves significantly higher accuracy, addresses the species-level heterotachy and enables studies of gene evolution in the context of species evolution. It uses a generative model of gene-tree evolution to calculate the likelihood of a phylogeny. The tool can be viewed as specifying a prior probability on gene-tree branch lengths, which could replace the uniform branch length prior.
Selects blocks following a reproducible set of conditions. Gblocks is a program that eliminates poorly aligned positions and divergent regions of a DNA or protein alignment so that it becomes more suitable for phylogenetic analysis. It provides a web server that implements important features to make its use as simple as possible without losing the functionality that it is necessary in most of the cases.
Enables the in silico inference of bacterial taxonomy through the analysis of peptidomes. The P4P methodology uses whole genome data and in silico protein digestion to infer bacterial taxonomy, namely at the species and subspecies levels. The principal purpose is to generate a valid and manageable list of peptides that are potentially specific to each strain. This tool can also support accurate phylogenetic reconstruction for conventionally challenging groups of organisms.
Allows identification of rogue taxon. RogueNaRok is an algorithm that implements the maximum agreement subtree, the triple frequency, and node distance methods as well as a tool for pruning taxa from a set of input trees. This algorithm assists user in the study of various parameters in identification of rogue taxon. It can be used to detect rogue taxa that affect support values on the best-known tree. The results of all user searches are summarized in a single table.
Computes split scores. SplitSup is a code reads an alignment in PHYLIP format and returns either (i) a set of scores corresponding to a list of user-provided splits; or (ii) the values of a split score in a sliding-window across the length of the alignment. Users can specify the window size, the number of nucleotides to move the window for the next computation, and the minimum number of sites without gaps required to compute scores in each window.
Estimates the dates of the internal nodes of a phylogenetic tree. node.dating is a divergence-time analysis software and uses a maximum-likelihood method. It can be extended to incorporate a variable molecular clock. The molecular clock assumption implies that mutations are strictly additive over time, which is not true. It may be possible to incorporate this ‘negative’ evolution into the model.
Allows users to handle rich phylogenetic data objects. Bio::Phylo is composed of over 50 modules which authorize several functionalities in manipulating and transforming objects such as annotation, sampling and simulations of tree topologies or visualization. It is designed to accept a wide range of inputs/outputs and also includes extensions packages which can be additionally downloaded.
Provides a git-based data store for archiving and curating phylogenetic estimates of species relationships. By incorporating curation into the data storage, Phylesystem have lowered the activation cost of entering data into an archive while also allowing continued curation, whether by the original authors or researchers interested in re-using these data, to improve the associated metadata. Phylesystem is part of The Open Tree of Life project and allows to complement systems by storing phylogenetic statements and associated metadata in a consistent format while retaining the history of edits that were made to the data themselves.
Computes the triplet and quartet distances between general rooted or unrooted trees, respectively. The tqDist program is based on algorithms with running time O(n log n) for the triplet distance calculation and O(d.n log n) for the quartet distance calculation, where n is the number of leaves in the trees and d is the degree of the tree with minimum degree. The software package is carefully implemented in C++, and interfaces for scripting in Python (via the module pyTQDist) and in R (via the package rtqdist) are also provided.
Computes the gene tree probability. STELLS2 is usually more accurate than several existing methods when there is one allele per species. It outperforms these methods significantly when there are multiple alleles per species. The tool stores sub-optimal species trees (evaluated during optimization) and their likelihood values, which can be useful. It is able to evaluate multiple candidate species trees.
Allows automated and high-throughput detection of nonhomologous regions. PREQUAL is a program for pre-alignment quality filtering. It uses a probabilistic model to test evidence of homology between amino acid residues in pairs of unaligned sequences, and residues showing no statistical evidence of homology are filtered. Given a parameterized pair hidden Markov model (pairHMM), it calculates the posterior probability (PP) of a character being related to a character from another sequence.
Assists users for spatial studies of ecology, evolution and genetics. SDMtoolbox is an ArcGIS toolbox designed to automate complicated spatial analysis. It simplifies many geographic information system (GIS) analyses required for species distribution modelling and other spatial ecological analyses, improving the need for repetitive and time-consuming climate data pre-processing and post species distribution models (SDMs) analyses.
Provides a phylogenetic reconstruction method specifically designed for reconstructing gene trees in the case of a known species tree. SPIMAP uses a Bayesian framework to model sequence evolution, gene duplication, loss, and substitution rate variation, thus incorporating many disparate types of information in a principled way. This method models rate variation that is correlated across all branches of the tree (gene-specific rate) as well as rates specific to each species lineage (species-specific rates). When both these effects are modeled, the result is a more informative prior which leads to increased reconstruction accuracy.
Infers ancestral gene orders and evolutionary scenarios. OrthoAlign follows an alignment approach and deducts nonoverlapping events still noticeable in the alignment. These gene order alignment problems can be duplications, losses, rearrangements or substitutions. It can also deduce the size distributions of the events on small and large phylogenies.
Assists is carrying out and teaching meta-analysis in Ecology and Evolutionary biology (E&E). OpenMEE was developed to make advanced methods for statistical research synthesis, based on best practices, available without cost to the scientific community by providing an intuitive graphical user interface (GUI) to the diverse and growing statistical functionalities of the R ecosystem. Its interface also guides users to build appropriate synthesis models that provide high-quality analyses for the most common and important ecological questions.
Determines the topological dissimilarity for rooted phylogenetic networks. CDRPN is a web application which provides users with four possible metrics that can be run individually or simultaneously: (i) semi-equivalence metric; (ii) tripartition metric; (iii) equivalence metric and (iv) vector metric. This application can also being run as a standalone software and accepts only files formatted in extended Newick format.
Allows to work about the history of recombination events that affected a given sample of bacterial genomes. ClonalOrigin run on as comparative analysis of sequences of a sample of bacterial genomes and enables to reconstruct the recombination events that have taken place in their ancestry. It contains an algorithm which permits to perform inference under this model from sequence data alignments and demonstrates that through parallelization, the inference is conceivable for whole-genome alignments.
Models each host as a distinct population, and transmissions between hosts as migration events. This phylogenetic software can (i) infer transmission events, (ii) account for the uncertainty associated with the possible presence of non-sampled hosts and (iii) use data from multiple samples of the same host. SCOTTI not only accounts for diversity and evolution within a host, but also for other sources of bias, namely non-sampled hosts and multiple infections of the same host. SCOTTI addresses the urgent need for software to analyse genomic and epidemiological data while accommodating for incomplete or patchy host sampling, mixed infections, and within-host variation. For these reasons, this method can help to reconstruct transmission histories in a broad range of outbreaks, both bacterial and viral.
Measures regional levels of DNA gain and loss between two species via several custom scripts. Regional variation in DNA gain and loss has different script that permits to return reference, query start and end coordinates for each gap or fill as output. This toolkit can query genome for gaps, fills and ancestral elements, return enrichment P-values or use stretched genomes to obtain binned genome.
Identifies annotated protein-coding gene features, generates a maximum likelihood phylogenetic tree and reports various mitochondrial genes and sequence information in a table format. The recovered phylogenetic trees using both Bayesian and ML methods support the results of studies using fragments of mtDNA and nuclear markers and other smaller-scale studies using whole mitogenomes. In comparison to the fragment-based phylogenies, nodal support values are generally higher despite reduced taxon sampling suggesting there is value in utilising more fully mitogenomic data.
Detects signatures of selection within populations, strains, or species. Saguaro identifies regions under lineage-specific constraint for the first set, and genomic segments that we attribute to incomplete lineage sorting in the second dataset. The method detects distinct cacti describing local phylogenetic relationships without requiring any a priori hypotheses. The software is applicable to a wide variety of experimental populations.
Deals with duplications, losses and rearrangements for alignment of a set of gene orders related through a phylogenetic tree. multiOrthoAlign is based on a heuristic generalization of OrthoAlign, a developed pair-wise alignment algorithm. This software can be extended and applied to other rearrangement operations like substitutions, insertions, tandem or inverted duplications.
Factorizes branch lengths into time and rates. MAP-DP is a Bayesian framework easily adapted to take divergence time information into account, by restricting a speciation to a specific interval or using a prior distribution on the interval. This Markov chain Monte Carlo (MCMC) method is a part of the PRIME Project. This model reduces inference time for large phylogenies by orders of magnitude and could also be used for tree rooting.
Provides a web interface for a semantic-based repository of phylogenetic data. CDAO-Store is an open source platform which offers three modules: (i) a data importer module for importing phylogenies and their related data into the repository, (ii) a repository module for storage and querying, and (iii) an exporter module to allow users to interact with the repository and provide a way to visualize the stored data.
Generates customized datasets of genetic sequences. Tree Pruner provides an editor where users can pick evolutionary properties of interest. Users can choose to accentuate targeted tips or sub-trees to refine the final curated data. Besides, researchers are able to save an edited dataset for further editing or subsequent analysis. It can be used for creating datasets that highlight evolutionary representation or shared genotype.
Lightens phylogenetic data for allowing users to analyze larger datasets. Treemmer is an iterative algorithm that reduces redundancy and size of phylogenetic trees while avoiding diversity losses and without requiring additional information. The application can be run automatically or manually. It was tested on two whole genome datasets of Mycobacterium tuberculosis and influenza A virus.
Concerns especially phylogenetic sampling. physamp takes into account of phylogenetic tree to sample a sequence alignment. This tool contains two programs: (1) bppalnoptim that samples a sequence alignment by removing sequences for maximizing the number of sites suitable for a given analysis; (2) and bppphysamp that samples a sequence alignment by removing redundant sequences.
Deduces transmission clusters from a given phylogenetic tree inferred from viral sequences. TreeCluster reduces the number of clusters of leaves of a bifurcating phylogenetic tree through several modes: average clade mode, length, length clade, maximum, maximum clade, median clade, root distance and single linkage clade. It can be employed on ultra-large datasets.
Calculates ancestral gene orders by taking the phylogenetic tree and gene orders assigned the leaves of a tree into account. DupLoCut intends to detect the most parsimony assignment of gene orders under the duplication-loss evolutionary model. This software handles the capture of inverted duplications and is appropriate for dealing with larger instances or pairs of rather distant genomes.
Trims redundant operational taxonomic units (OTUs) to limit the complexity and the redundancy of phylogenetic datasets. Treetrimmer is a tree-based dereplication method that select information according support values, branch lengths and taxonomic information related to each sequence. It allows users to collect shortened reproducible OTU datasets with user-defined parameters that can also be used in further analysis.
Allows the analysis of large numbers of unaligned long DNA sequences through the application of disk-based partitioned suffix trees (based on MGB’s DiGeST). STS is designed as an easy-to-use tool to index, search, and analyze very large DNA sequence datasets. Accordingly, the program is accessed through a Java Web Start link on a web page, which automatically installs or updates the program files for the user. Results are presented in tabular form, can be sorted based on multiple criteria, and are easily integrated into subsequent queries with a mouse click, providing for a natural analysis workflow. In addition, since the initial construction of suffix trees is computationally expensive, STS allows the user to load previously constructed suffix trees for analysis. Thus, once the suffix tree forest has been constructed for a given data set, future analyses can be run much more quickly by skipping the most time-consuming step.
Characterizes loss of conserved noncoding sequences associated with retained duplicate genes from the ancient maize polyploidy. STAG-CNS integrates data from the promoters of conserved orthologous genes in three or more species simultaneously. It can also be used to identify differences in the loss or retention of conserved non-coding sequences between duplicated genes. It can be employs to identify even smaller regions of conserved sequence within gene promoters with acceptable false discovery proportions.
Facilitates research in ecological niche modelling. ENMeval provides resources for MAXENT users. First, it includes six methods to partition data for training and testing, including three designed to achieve spatially independent splits. Secondly, it executes a series of models across a user-defined range of settings. Finally, it provides six evaluation metrics to characterize model performance.
Estimates the break rates across a genome. BRAG is a program that provides a detailed survey of break rates across the genome by computing the likelihood landscape of the break rate at every site in the genome. The software, which employs an interval graph-based approach, uses pairwise alignments between the genome of interest, termed the reference, and a set of related genomes. It is sensitive to the quality of the genome assemblies utilized.
Reduces the computing time of canonical neighbour-joining. RapidNJ is a search heuristic method that provides search strategy for the optimisation criteria used for selecting the next pair to merge. The search heuristic explores for the same optimisation criteria as the original neighbour-joining method but improves on the running time by eliminating parts of the search space which cannot contain the optimal node pair.
Allows genome-wide state estimation based on multivariate features from different species, using functional genomic signals. Phylo-HMGP is a continuous-trait probabilistic model that incorporates the evolutionary affinity among multiple species into the hidden Markov model (HMM). It thus exploits both temporal dependencies across species in the context of evolution, and spatial dependencies along the genome in a continuous-trait model. The software can be applied to different types of functional genomic signals.
Assists users in the assembly and the analysis of multi-gene datasets. SequenceMatrix is a concatenation tool that was designed for sequence data and for the export of all matrices as if they contained DNA sequences. This program generates taxon and character sets according to users’ specifications. Users can also exclude individual sequences from the export.
Provides users a solution for measuring and analyzing compositional change for occurrence data using zeta diversity. Zetadiv is a program that permits several types of analysis on zeta diversity: (1) the analysis of zeta diversity; (2) the analysis of the distance decay of zeta diversity; or (3) the analysis of the hierarchical scaling of zeta diversity. It also includes features for computing zeta-diversity for a specific number of assemblages and for a range of numbers of assemblages.
Produces phylogenomic datasets using highly multiplexed amplicon sequencing. HiMAP employs amplicon sequencing based on highly multiplexed polymerase chain reaction (PCR) to generate its datasets. It provides features for locus selection, primer design, target amplification, sequencing, and post-sequencing data processing and analysis. The method requires minimal hands-on time at the bench, and data can be processed rapidly for consensus calls that permits avoiding read mapping or assembly.
Provides the workflow used to obtain whole-genome sequence data of 340 sequence type (ST) 772 Staphylococcus aureus isolates (the Bengal Bay clone). bengal-bay allows users to reproduce core analyses, including parameter settings, cluster resource configurations and versioned software distributions. The workflow implements Anaconda virtual environments, including software distributed in the Bioconda channel and is executable through Snakemake.
Generates spatially-explicit extinction date surfaces of population persistence from georeferenced sighting data of variable quality. spatExtinct measures the last likely year of presence on a cell-by-cell basis across landscapes. It can identify potential zones of persistence. This software also estimates zones of plausible persistence and is designed to address data-limited situations.
Provides a set of scripts performing several computations related to the Quartet Index. Quartet Index is defined as the sum, over all 4-tuples of different leaves of the tree, of a value that quantifies the symmetry of the joint evolution of the species they represent. It can be computed in linear time and its expected value and variance can be explicitly computed on any probabilistic model of phylogenetic trees satisfying two natural conditions: independence under relabeling and sampling consistency.
Reduces lineage evolutionary rate heterogeneity gene-by-gene in multi-gene datasets. LSX is effective in reducing long branch attraction (LBA) artefacts in simulated nucleotide data and in two biological multi-gene datasets. It can be useful to reduce phylogenetic biases. This tool employs a different criterion which considers both too fast and too slow evolving sequences for removal.
Enables detection of all orthologs that share a given evolutionary context. CLfinder-OrthNet identifies co-linearity (CL) in the arrangement of orthologous loci among multiple genomes and builds networks of orthologs to encode and visualize all evolutionary events, such as gene duplication, deletion, and transposition, in each orthologs group. It detects multiple forms of gene-level structural variation, including tandem duplications, deletions, transpositions, and also combinations of them.
Offers a suite of tools for creating data folds via blocking for evaluation of species distribution models. blockCV includes three different blocking strategies: spatial blocking, spatial buffering and environmental blocking. It also provides tools that includes deal with typical nuances in species data, chooses block size and allocates blocks to folds.
Identifies multiple subsets that meet the minimum nearest neighbor distance (NND) constraint. spThin is a spatial method that takes a set of occurrence records. It provides a spatial thinning method, which can be used to process occurrence records for use in constructing and evaluating ecological niche modeling (ENMs), as well as in other spatial analyses. It also enables research into the optimal level of thinning for various species in varying environments.
Provides an effective implementation of a top-down construction for suffix trees. Wotd is built on a space-efficient representation of suffix trees that requires 12 bytes per input character in the most extreme cases, and an average of 8.5 bytes per input character for a set of files of different types. This implementation technique avoids a constant alphabet factor in the running time.
Provides functions for species distribution modeling. dismo predicts entire geographic distributions form occurrences at a number of sites and the environment at these sites. It also provides a number of functions that can assist in using Boosted Regresssion Trees.
Allows users to import trees with bootstraps and branch lengths and then root the tree, collapse the tree, and measure the length and other attributes. TreeCollapseCL is a Java program that can (1) rooted or re-rooted the tree to the outgroup specified by the user, (2) collapse all nodes with bootstrap values below the threshold provided by the user to polytomies, and (3) calculate the length from each leaf to the node above the root node as well as the average bootstrap value for each leaf.
Computes a number of summary statistics for a processing set of trees. TreeStat integrates several statistical approaches: tree-balance statistics, tree shap, population genetic and other methods such as tree length or root-to-tip lengths.