1 - 50 of 101 results

MetaPhlAn / Metagenomic Phylogenetic Analysis

Estimates the relative abundance of microbial cells by mapping reads against a reduced set of clade-specific marker sequences. MetaPhlAn accurately profiles microbial communities and requires only minutes to process millions of metagenomic reads. This classifier compares each metagenomic read from a sample to this marker catalog to identify high-confidence matches. It finally compares metagenomic reads against this precomputed marker catalog using nucleotide BLAST searches in order to provide clade abundances for one or more sequenced metagenomes.


A classification system designed for metagenomics experiments that assigns taxonomic labels to short DNA reads. PhymmBL combines two components: (i) composition-directed taxonomic predictions from Phymm and (ii) basic local alignment search tool (BLAST)-based homology results. PhymmBL combines these to label each input sequence with its best guess as to the taxonomy of the source organism. Input sequences as short as 100 base pairs can be phylogenetically classified with PhymmBL more accurately than with any other existing method. PhymmBL predicts species, genus, family, order, class and phylum for each read, allowing users to arrange results according to levels of specificity relevant to their research goals.

MEGAN / MEtaGenome ANalyzer

star_border star_border star_border star_border star_border
star star star star star
forum (1)
Allows users to taxonomically and functionally explore and analyze large-scale microbiome sequencing data. MEGAN is a comprehensive microbiome analysis toolbox for metagenome, meta-transcriptome, amplicon and from other sources data. Users can perform taxonomic, functional or comparative analysis, map reads to reference sequences, reference-based multiple alignments and reference-guided assembly and integrate their own classifications.

AMPHORA / AutoMated PHylogenomic infeRence Application

Allows genome tree reconstruction and metagenomic phylotyping. AMPHORA is an application for large-scale protein phylogenetic analysis. The software supports the analyses of DNA sequences, which means that users can apply AMPHORA2 directly to metagenomic reads without the need to first annotate the sequence. It can phylotype metagenomic sequences from a mixed population of bacteria and archaea and should be useful for the study of microbial evolution and ecology in the genomic era. A web application and a flavor of AMPHOR2 are also available.


A rapid and sensitive classifier for microbial sequences with low memory requirements and a speed comparable to the fastest systems. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge classifies 10 million reads against a database of all complete prokaryotic and viral genomes within 20 minutes using one CPU core and requiring less than 8GB of RAM. Furthermore, Centrifuge can also build an index for NCBI’s entire nt database of non-redundant sequences from both prokaryotes and eukaryotes. The search requires a computer system with 128 GB of RAM, but runs over 3500 times faster than Megablast.


An ultrafast web-tool for comprehensive metagenomics data analysis and interactive results visualization. Taxonomer is unique in providing integrated nucleotide and protein-based classification and simultaneous host messenger RNA (mRNA) transcript profiling. Using real-world case-studies, we show that Taxonomer detects previously unrecognized infections and reveals antiviral host mRNA expression profiles. Taxonomer enables rapid, accurate, and interactive analyses of metagenomics data on personal computers and mobile devices.

SATIVA / Semi-Automatic Taxonomy Improvement and Validation Algorithm

A phylogeny-aware method to automatically identify taxonomically mislabeled sequences (mislabels) using statistical models of evolution. SATIVA use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity/91.7% precision) as well as correction (94.9% sensitivity/89.9% precision) of mislabels.


A customizable web server for fast metagenomic analysis. WebMGA includes over 20 commonly used tools such as ORF calling, sequence clustering, quality control of raw reads, removal of sequencing artifacts and contaminations, taxonomic analysis, functional annotation etc. WebMGA provides users with rapid metagenomic data analysis using fast and effective tools, which have been implemented to run in parallel on our local computer cluster. Users can access WebMGA through web browsers or programming scripts to perform individual analysis or to configure and run customized pipelines. WebMGA offers to researchers many fast and unique tools and great flexibility for complex metagenomic data analysis.

k-SLAM / k-mer Sorted List Alignment and Metagenomics

A metagenomic classifier for the characterization of metagenomic data. k-SLAM uses a k-mer method to rapidly produce alignments of the reads against a database and can therefore find genes and variants. A novel pseudo-assembly technique chains neighboring alignments together to improve taxonomic specificity. The main data structure is a sorted list of k-mers which makes k-SLAM extremely fast and parallelizable. In human microbiome metagenomics, k-SLAM could be used to identify gut parasites and also to detect plant sequence that can indicate dietary components.

LMAT / Livermore Metagenomics Analysis Toolkit

A method presented to shift computational costs to an offline computation by creating a taxonomy/genome index that supports scalable metagenomic classification. LMAT is designed to efficiently assign taxonomic labels to as many reads as possible in very large metagenomic datasets and report the taxonomic profile of the input sample. The quick 'single pass' analysis of every read allows to support additional more computationally expensive analysis such as metagenomic assembly or sensitive database searches on targeted subsets of reads.

PPS+ / PhyloPythiaS+

A taxonomic assignment program that produces accurate assignments with a precision of 80% or more also for low-ranking taxa from metagenome samples. PPS+ is a fully automated successor of the PhyloPythiaS software. It automatically determines the most relevant taxa to be modeled and suitable training sequences directly from the input sample, which are then used to generate a sample-specific structured output SVM taxonomic classifier for the taxonomic binning of a sample. This enables its use for researchers without experience in the field or time to search for suitable training sequences for the manual construction of well-matching taxonomic classifier to a particular metagenome sequence sample. PPS+ is best suited for the analysis of large NGS metagenome samples with assembled contigs (> 1kb) carrying marker genes or datasets including the high quality longer PacBio consensus reads.


A metagenomic read classification method, which has higher speed and affordable memory cost with higher or equal sensitivity and accuracy. deSPI recognizes and analyzes the short-token matches between the reads and the reference sequences through de Bruijn graph framework. It mainly handles the reads with two key techniques: indexing (deSPI constructs the de Bruijn graph of the reference with a user-defined k-mer size) and classification (deSPI retrieves the maximal exact matches longer than l bp). Overall, considering the speed, memory footprint, sensitivity and accuracy, deSPI can provide efficient and effective taxonomy classification for metagenomic reads.


A probabilistic method for taxonomical classification of DNA sequences. Given a pre-defined taxonomical tree structure that is partially populated by reference sequences, PROTAX decomposes the probability of one to the set of all possible outcomes. PROTAX accounts for species that are present in the taxonomy but that do not have reference sequences, the possibility of unknown taxonomical units, as well as mislabeled reference sequences. PROTAX is based on a statistical multinomial regression model, and it can utilize any kind of sequence similarity measures or the outputs of other classifiers as predictors.


A reference-based taxonomic profiler that introduces a novel top-down approach to analyze metagenomic NGS samples. Rather than predicting an organism presence in the sample based only on relative abundances, DUDes first identifies possible candidates by comparing the strength of the read mapping in each node of the taxonomic tree in an iterative manner. Instead of using the lowest common ancestor (LCA) we propose a new approach: the deepest uncommon descendent (DUD). We showed in experiments that DUDes works for single and multiple organisms and can identify low abundant taxonomic groups with high precision. Additionally, DUDes provides a strain identification method that can propose one or more strains presents in a sample.

PPS / PhyloPythiaS

A web server for the taxonomic assignment of metagenome sequences. PhyloPythiaS is a fast and accurate sequence composition-based classifier that utilizes the hierarchical relationships between clades. Taxonomic assignments with the web server can be made with a generic model, or with sample-specific models that users can specify and create. Several interactive visualization modes and multiple download formats allow quick and convenient analysis and downstream processing of taxonomic assignments.

MARTA / Metagenomic AND rDNA Taxonomic Assignment

A suite of Java-based software to better provide taxonomic assignments to DNA sequences. MARTA is useful for protistologists, virologists, mycologists and other microbial ecologists. The program relies on NCBI utilities including the BLAST software and Taxonomy database and is easily manipulated at the command-line to specify a BLAST candidate's query-coverage or percent identity requirements; other options include the ability to set minimal consensus requirements (%) for each of the eight major taxonomic ranks (Domain, Kingdom, Phylum, ...) and whether to consider lower scoring candidates when the top-hit lacks taxonomic classification.

CLARK / CLAssifier based on Reduced K-mers

star_border star_border star_border star_border star_border
star star star star star
forum (1)
An approach to classify metagenomic reads at the species or genus level with high accuracy and high speed. Extensive experimental results on various metagenomic samples show that the classification accuracy of CLARK is better or comparable to the best state-of-the-art tools and it is significantly faster than any of its competitors. In its fastest single-threaded mode CLARK classifies, with high accuracy, about 32 million metagenomic short reads per minute. CLARK can also classify BAC clones or transcripts to chromosome arms and centromeric regions.


Devotes to identify genome-specific markers (GSMs) from currently sequenced microbial genomes using a k-mer based approach. Explored GSMs could be used to identify microbial strains/species in metagenomes, especially in human microbiome where many reference genomes are available. Two different levels of GSMs, including strain-specific and species-specific GSMs are currently supported. The approach can be directly applied to identify microbial strains/species from raw metagenomes, without the effort of complex data pre-processing.

WEVOTE / WEighted VOting Taxonomic idEntification

Classifies metagenome shotgun sequencing DNA reads based on an ensemble of existing methods using k-mer-based, marker-based, and naive-similarity based approaches. WEVOTE is an efficient and automated tool that combines multiple individual taxonomic identification methods to produce more precise and sensitive microbial profiles. WEVOTE is developed primarily to identify reads generated by MetaGenome Shotgun sequencing. It is expandable and has the potential to incorporate additional tools to produce a more accurate taxonomic profile.


Indexes data structure and reads classification. MetaCache advances the state-of-the-art in k-mer based read classification approaches. It uses minhashing only a subset of k-mers, reducing the size of the data index structure significantly. This tool employs context-aware k-mer matches within a local window rather than at any position of the whole genome. It aims at context-aware classification providing highly competitive accuracy while consuming significantly less memory than other approaches.

SMURF / Short MUltiple Regions Framework

Enables identification of near full-length 16S rRNA gene sequences in microbial communities. SMURF may be applied to standard sample preparation protocols with very little modifications. It also paves the way to high-resolution profiling of low-biomass and fragmented DNA, in the case of Formalin-fixed and Paraffin-embedded samples, fossil-derived DNA or DNA exposed to other degrading conditions. This approach is not restricted to combining amplicons of the 16S rRNA gene and may be applied to any set of amplicons.


Maps taxonomic short-read data. taxMaps is designed to deal with large DNA/RNA metagenomics data. It can prioritize mapping to multiple indexes, detail mapping reports and offers interactive results visualization. The tool offers to the researchers a way to conduct very sensitive searches on very large databases. It provides class-leading accuracy and comprehensiveness while balancing performance. taxMaps appears to be useful in pathogen identification from clinical or environmental samples.

FHiTINGS / Fungal High-throughput Taxonomic Identification tool for use with Next-Generation Sequencing

Provides rapid upper taxonomic rank information for fungal sequences using the Index Fungorum database. FHiTINGS is an open source data processing software written in Python that rapidly facilitates the identification and taxonomic classification of fungal internal transcribed spacer (ITS) sequences. It provides three unique functions: (i) consolidates tying E-value top BLAST output results for an individual sequence into a single identification or classifying as “Ambiguous”, (ii) provides the identification for that sequence at taxonomic ranks from species through kingdom, and (iii) sums and sorts all results into a user-friendly list format.