Homology-Based Taxonomic Software Tools | Shotgun metagenomic sequencing data analysis
A majority of methods available for binning datasets obtained using shotgun sequencing belong to the taxonomy-dependent category. Based on the strategy used for comparing reads with sequences/pre-computed models, taxonomy-dependent methods can be sub-classified into alignment-based, composition-based and hybrid methods.
Programs search nucleotide databases by using a nucleotide query. BLASTN key features are searching with short sequencing and cross-species comparison. Users can select an optimization according to: (i) highly similar sequences, (ii) more dissimilar sequences or (iii) somewhat similar sequences. This web application proceeds by searching sets in NCBI data sources.
Allows users to taxonomically and functionally explore and analyze large-scale microbiome sequencing data. MEGAN is a comprehensive microbiome analysis toolbox for metagenome, meta-transcriptome, amplicon and from other sources data. Users can perform taxonomic, functional or comparative analysis, map reads to reference sequences, reference-based multiple alignments and reference-guided assembly and integrate their own classifications.
Estimates the relative abundance of microbial cells by mapping reads against a reduced set of clade-specific marker sequences. MetaPhlAn accurately profiles microbial communities and requires only minutes to process millions of metagenomic reads. This classifier compares each metagenomic read from a sample to this marker catalog to identify high-confidence matches. It finally compares metagenomic reads against this precomputed marker catalog using nucleotide BLAST searches in order to provide clade abundances for one or more sequenced metagenomes.
A classification system designed for metagenomics experiments that assigns taxonomic labels to short DNA reads. PhymmBL combines two components: (i) composition-directed taxonomic predictions from Phymm and (ii) basic local alignment search tool (BLAST)-based homology results. PhymmBL combines these to label each input sequence with its best guess as to the taxonomy of the source organism. Input sequences as short as 100 base pairs can be phylogenetically classified with PhymmBL more accurately than with any other existing method. PhymmBL predicts species, genus, family, order, class and phylum for each read, allowing users to arrange results according to levels of specificity relevant to their research goals.
A method that uses a subset of marker genes (MGs) for taxonomic profiling of metagenomes. mOTU is available as a standalone software and is also implemented in MOCAT. Species-level profiles are generated by mapping reads from metagenomes to a database (mOTU.v1.padded) consisting of 10 MGs extracted from 3,496 prokaryotic reference genomes (downloaded from NCBI) and 263 publicly available metagenomes (from the MetaHIT and HMP projects).
Allows genome tree reconstruction and metagenomic phylotyping. AMPHORA is an application for large-scale protein phylogenetic analysis. The software supports the analyses of DNA sequences, which means that users can apply AMPHORA2 directly to metagenomic reads without the need to first annotate the sequence. It can phylotype metagenomic sequences from a mixed population of bacteria and archaea and should be useful for the study of microbial evolution and ecology in the genomic era. A web application and a flavor of AMPHOR2 are also available.
An approach to classify metagenomic reads at the species or genus level with high accuracy and high speed. Extensive experimental results on various metagenomic samples show that the classification accuracy of CLARK is better or comparable to the best state-of-the-art tools and it is significantly faster than any of its competitors. In its fastest single-threaded mode CLARK classifies, with high accuracy, about 32 million metagenomic short reads per minute. CLARK can also classify BAC clones or transcripts to chromosome arms and centromeric regions.
Represents the consensus sequences in a region of interest (ROI). Pyrotools is based on the graph technique that resembles the partial order graph or variant graph. It can model error patterns in the sequencing reads using the conditional random field (CRF) technique. The tool uses a machine learning method to estimate the scoring function from high-throughput sequencing (HTS) data.
Allows to bin and annotate short paired-end reads. MetaCluster-TA is an assembly-assisted approach which, instead of annotating each read or assembled contig separately, bins similar reads/contigs into the same cluster and annotates the whole cluster. The software consists of three phases: (i) construction of long virtual contigs from assembly and probabilistic grouping of short reads, (ii) q-mer distribution estimation and clustering and (iii) cluster annotation and merging.
Uses the RDP naive Bayesian classifier to provide rapid classification of library sequences into the new phylogenetically consistent higher-order bacterial taxonomy. RDP Library Compare is an online library comparison tool to do microbial community comparison based on 16S rRNA sequences. It also permits to estimates the probability of observing the difference in a given taxon using a statistical test.
Stores and orders metagenomic reads of viral and fungal organisms. NBC can handle a complete pyrosequencing dataset, and it gives the full taxonomy for each read. It allows users to easily investigate the taxonomic composition of their datasets. This platform contains a list of genomes, internal transcribed spacer (ITS) and whole-genomes.
Enables creation of filters for a given reference and then categorization of sequences. BBT is a Bloom filter implementation that includes heuristics to control false positives and increase speed. The software was designed for pre-processing and quality check (QC) applications like contamination detection, but it can be suitable for other purposes.
A program for the taxonomic classification of metagenomic high-throughput sequencing reads. Each read is directly assigned to a taxon within the NCBI taxonomy by comparing it to a reference database containg microbial and viral protein sequences. By default, Kaiju uses either the available complete genomes from NCBI RefSeq or the microbial subset of the non-redundant protein database nr used by NCBI BLAST, optionally also including fungi and microbial eukaryotes.
A rapid and sensitive classifier for microbial sequences with low memory requirements and a speed comparable to the fastest systems. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge classifies 10 million reads against a database of all complete prokaryotic and viral genomes within 20 minutes using one CPU core and requiring less than 8GB of RAM. Furthermore, Centrifuge can also build an index for NCBI’s entire nt database of non-redundant sequences from both prokaryotes and eukaryotes. The search requires a computer system with 128 GB of RAM, but runs over 3500 times faster than Megablast.
An ultrafast web-tool for comprehensive metagenomics data analysis and interactive results visualization. Taxonomer is unique in providing integrated nucleotide and protein-based classification and simultaneous host messenger RNA (mRNA) transcript profiling. Using real-world case-studies, we show that Taxonomer detects previously unrecognized infections and reveals antiviral host mRNA expression profiles. Taxonomer enables rapid, accurate, and interactive analyses of metagenomics data on personal computers and mobile devices.
Maps taxonomic short-read data. taxMaps is designed to deal with large DNA/RNA metagenomics data. It can prioritize mapping to multiple indexes, detail mapping reports and offers interactive results visualization. The tool offers to the researchers a way to conduct very sensitive searches on very large databases. It provides class-leading accuracy and comprehensiveness while balancing performance. taxMaps appears to be useful in pathogen identification from clinical or environmental samples.
Uses the agreement between composition and homology to accurately classify sequences as short as 50 nt in length by assigning them to different classification groups with varying degrees of confidence. RITA is much faster than the hybrid PhymmBL approach when comparable homology search algorithms are used, and achieves slightly better accuracy than PhymmBL on an artificial metagenome. RITA can also incorporate prior knowledge about taxonomic distributions to increase the accuracy of assignments in data sets with varying degrees of taxonomic novelty, and classified sequences with higher precision than the current best rank-flexible classifier.
Devotes to identify genome-specific markers (GSMs) from currently sequenced microbial genomes using a k-mer based approach. Explored GSMs could be used to identify microbial strains/species in metagenomes, especially in human microbiome where many reference genomes are available. Two different levels of GSMs, including strain-specific and species-specific GSMs are currently supported. The approach can be directly applied to identify microbial strains/species from raw metagenomes, without the effort of complex data pre-processing.