1 - 37 of 37 results


A classification system designed for metagenomics experiments that assigns taxonomic labels to short DNA reads. PhymmBL combines two components: (i) composition-directed taxonomic predictions from Phymm and (ii) basic local alignment search tool (BLAST)-based homology results. PhymmBL combines these to label each input sequence with its best guess as to the taxonomy of the source organism. Input sequences as short as 100 base pairs can be phylogenetically classified with PhymmBL more accurately than with any other existing method. PhymmBL predicts species, genus, family, order, class and phylum for each read, allowing users to arrange results according to levels of specificity relevant to their research goals.

MetaFast / METAgenome FAST analysis toolkit

Allows to represent a shotgun metagenome from an arbitrary environment as a modified de Bruijn graph consisting of simplified components. MetaFast lies between the k-mer spectrum analysis and assembly and combines the best of these two alignment-free approaches: the speed of the former with the precision of the latter. Its independence of the reference allows to perform efficiently for both extensively studied and novel microbiota types. For multiple metagenomes, the resulting representation is used to obtain a pairwise similarity matrix. The dimensional structure of the metagenomic components preserved in our algorithm reflects the inherent subspecies-level diversity of microbiota. MetaFast is computationally efficient and especially promising for an analysis of metagenomes from novel environmental niches.


A software tool for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. Users could understand the underlying bins (genomes) of the microbes in their metagenomes by simply providing assembled metagenomic sequences and the reads coverage information or sequencing reads. For users' convenience MaxBin will report genome-related statistics, including estimated completeness, GC content and genome size in the binning summary page. Users could use MEGAN or similar software on MaxBin bins to find out the taxonomy of each bin after the binning process is finished.

PPS+ / PhyloPythiaS+

A taxonomic assignment program that produces accurate assignments with a precision of 80% or more also for low-ranking taxa from metagenome samples. PPS+ is a fully automated successor of the PhyloPythiaS software. It automatically determines the most relevant taxa to be modeled and suitable training sequences directly from the input sample, which are then used to generate a sample-specific structured output SVM taxonomic classifier for the taxonomic binning of a sample. This enables its use for researchers without experience in the field or time to search for suitable training sequences for the manual construction of well-matching taxonomic classifier to a particular metagenome sequence sample. PPS+ is best suited for the analysis of large NGS metagenome samples with assembled contigs (> 1kb) carrying marker genes or datasets including the high quality longer PacBio consensus reads.

FROGS / Find, Rapidly, OTUs with Galaxy Solution

Allows analysis of large sets of amplicon sequences and yields abundance tables of Operational Taxonomic Units (OTUs) with their taxonomic affiliation. FROGS is a set of 13 tools, designed for biologists and bioinformaticians, that processes amplicon reads coming from Illumina or Roche sequencing technologies. The software can produce accurate community compositions, including at fine scales (species or genus) and in large communities (>100 different species) with very heterogeneous abundances.


An assembly-assisted approach for reference-free metagenomic binning. MetaProb can deal with short and long reads in a novel probabilistic framework, by using probabilistic sequence signatures. We compared the binning performances over several short and long reads datasets against other state-of-art binning algorithms, showing that MetaProb achieves in most cases the best performances in terms of F-measure. The estimation of the number of species in a metagenomic sample can be performed with MetaProb, adding a degree of freedom in the analysis.


A general framework automatically bin contigs into OTUs based upon sequence composition and coverage across multiple samples. The effectiveness of COCACOLA is demonstrated in both simulated and real datasets in comparison to state-of-art binning approaches such as CONCOCT, GroopM, MaxBin and MetaBAT. The superior performance of COCACOLA relies on two aspects. One is employing L1 distance instead of Euclidean distance for better taxonomic identification during initialization. More importantly, COCACOLA takes advantage of both hard clustering and soft clustering by sparsity regularization. In addition, the COCACOLA framework seamlessly embraces customized knowledge to facilitate binning accuracy.

PPS / PhyloPythiaS

A web server for the taxonomic assignment of metagenome sequences. PhyloPythiaS is a fast and accurate sequence composition-based classifier that utilizes the hierarchical relationships between clades. Taxonomic assignments with the web server can be made with a generic model, or with sample-specific models that users can specify and create. Several interactive visualization modes and multiple download formats allow quick and convenient analysis and downstream processing of taxonomic assignments.


star_border star_border star_border star_border star_border
star star star star star
Integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. It automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node.


Allows to bin and annotate short paired-end reads. MetaCluster-TA is an assembly-assisted approach which, instead of annotating each read or assembled contig separately, bins similar reads/contigs into the same cluster and annotates the whole cluster. The software consists of three phases: (i) construction of long virtual contigs from assembly and probabilistic grouping of short reads, (ii) q-mer distribution estimation and clustering and (iii) cluster annotation and merging.

PHYSCIMM / PHY Sequence Clustering with Interpolated Markov Models

Allows to model clusters of sequences. PHYSCIMM uses interpolated Markov models (IMMs). It was tested by clustering sequencing reads from an in vitro-simulated microbial community. The tool partitions the sequences using supervised Phymm classifications before the unsupervised iterative IMM clustering stage. It can be useful in many bioinformatics applications. SCIMM and PHYSCIMM will be valuable tools for researchers seeking to determine the relationships between sequencing reads from a metagenomics project.

MrGBP / Multi-resolution Genomic Binary Patterns

Extracts local ‘texture’ changes from nucleotide sequence data in image processing. MrGBP aims to extract local changes in numerical representations of genetic sequence data. To proceed, it employs the multi-resolution local binary patterns (MLBP) method that offers a viable alternative feature space to textual representations of sequence data. The tool can be used to capture the genomic signature changes followed by dimensionality reduction steps to visualise the data in a lower dimension.


A Java based application which offers efficient and intuitive reference-independent visualization of metagenomic datasets from single samples for subsequent human-in-the-loop inspection and binning. The method is based on nonlinear dimension reduction of genomic signatures and exploits the superior pattern recognition capabilities of the human eye-brain system for cluster identification and delineation. We demonstrate the general applicability of VizBin for the analysis of metagenomic sequence data by presenting results from two cellulolytic microbial communities and one human-borne microbial consortium. The superior performance of our application compared to other analogous metagenomic visualization and binning methods is also presented.

kmer project

Allows rapid estimation of pairwise dissimilarity between metagenomes. Though we applied this technique to gut microbiota, it should be useful for arbitrary metagenomes, even metagenomes with novel microbiota. Dissimilarity measure based on k-mer spectrum provides a wider perspective in comparison with the ones based on the alignment against reference sequence sets. It helps not to miss possible outstanding features of metagenomic composition, particularly related to the presence of an unknown bacteria, virus or eukaryote, as well as to technical artifacts (sample contamination, reads of non-biological origin, etc.) at the early stages of bioinformatic analysis. Our method is complementary to reference-based approaches and can be easily integrated into metagenomic analysis pipelines.

AMBER / Assessment of Metagenome BinnERs

Calculates performance metrics and comparative visualizations. AMBER is an evaluation package for the comparative assessment of genome reconstructions from metagenome benchmark data sets. It facilitates the assessment of genome binning programs on benchmark metagenome data sets, for bioinformaticians aiming to optimize data processing pipelines and method developers. It is effective in several convenient output formats, allowing in-depth comparisons of binnings by different programs, software versions, or with varying parameter settings.


An automated binning tool that combines genomic signatures, marker genes and optional contig coverages within one or multiple samples, in order to visualize the metagenomes and to identify the reconstructed genomic fragments. We demonstrate the superior performance of MyCC compared to other binning tools including CONCOCT, GroopM, MaxBin and MetaBAT on both synthetic and real human gut communities with a small sample size (one to 11 samples), as well as on a large metagenome dataset (over 250 samples). Moreover, the visualization of metagenomes in MyCC aids in the reconstruction of genomes from distinct bins.


An alignment-free supervised metagenomic classification method. The intrinsic correlation of oligonucleotides provides the feature set, which is selected dynamically using a kernel partial least squares algorithm, and the feature matrices extracted with this set are sequentially employed to train classifiers by support vector machine (SVM). The alignment-free supervised classification method DectICO can accurately classify metagenomic samples without dependence on known microbial genomes. Selecting the intrinsic correlation of oligonucleotides (ICO) dynamically offers better stability and generality compared with sequence-composition-based classification algorithms. DectICO provides new insights in metagenomic sample classification.

MBBC / Metagenomic Binning Based on Clustering

A taxonomy-independent approach to cluster environmental shotgun reads, by considering k-mer frequency in reads and Markov properties of the inferred OTUs. Tested on twelve simulated datasets, MBBC reliably estimated the species number, the genome size, and the relative abundance of each species, independent of whether there are errors in reads. Tested on multiple experimental datasets, MBBC outperformed two state-of-the-art taxonomy-independent methods, in terms of the accuracy of the estimated species number, genome sizes, and percentages of correctly assigned reads, among other metrics.


A two-phase algorithm for the binning of metagenomic reads without using reference genomes. Instead of directly clustering reads, the main idea of BiMeta is to provide an additional preprocessing phase in which reads potentially belonging to the same cluster are grouped and each group is presented by a so-called seed of non-overlapping reads. The idea is motivated by a careful observation of the l-mer frequency distributions on sets of non-overlapping reads extracted from microbial genomes. BiMeta demonstrates to be able to achieve higher performance than the state-of-the-art binning algorithms on both simulated and real metagenomic datasets. Another strength of BiMeta is that it can work well with both short and long reads.

S-GSOM / Seeded Growing Self-Organising Map

Automatically identify clusters in the feature map using the already-available labelled samples (seeds). S-GSOM is an algorithm that consists of three core procedures: (1) the very small amounts of available or selected seeds are combined with other unlabeled samples; (2) the combined samples are presented to GSOM for training in which the seeds are treated the same as the unlabeled data; and (3) S-GSOM performs an extra phase, the cluster identification phase, as post-processing.


Carries out clustering on a simplified subset of contigs to maximize scaling according to metagenomic complexity from individual metagenome assemblies. Autometa is an algorithm that bins microbial genomes de novo from single shotgun metagenomes using sequence homology, coverage, and nucleotide composition to distinguish between contigs. The presence of marker genes can be used to estimate the genome completeness of bins, as well as the level of contamination, as each marker should only be detected once per bin.


Phylogenetically classifies variable-length DNA sequence fragments. PhyloPythia is a method that uses sequence composition to phylogenetically characterize sequence fragments. The software allows the phylogenetic classification of genomic fragments ≥ 1–3 kb for all taxonomic ranks considered (domain, phylum, class, order and genus). PhyloPythia can also achieve this for fragments originating from new organisms. It was used PhyloPythia to analyse three metagenomes: the Sargasso Sea sample and two samples of Enhanced Biological Phosphorus Removal (EBPR)-sludge used in industrial wastewater processing.


Allows user to obtain the specified species from next-generation sequencing (NGS) short reads. MetaObtainer uses overlap information to group short reads and then uses composition information to obtain specified species. It was compared with TOSS (another NGS reads classification tool) and tested on some synthetic datasets with different numbers of species, phylogenetic distances between species, abundance ratios, and sequencing error rates. The results show that the tool can perform well with large-scale datasets on personal computers with acceptable time.