Provides aligned and annotated ribosomal RNA (rRNA) gene sequence data, along with tools to allow researchers to analyse their own rRNA gene sequences. RDP offers tools for browsing and searching the data collections, for taxonomic classification and nearest neighbour search, for primer-probe testing and for tree building. RDP data and tools are utilized in fields as diverse as human health, microbial ecology, environmental microbiology, nucleic acid chemistry, taxonomy and phylogenetics.
Identifies the relative frequencies of reference sequences contributing to a pooled DNA sample. Karp combines the speed and low-memory requirements of k-mer based pseudoalignment with a likelihood framework that uses base quality information to better resolve multiply mapped reads. It is accurate across a variety of read lengths and when samples contain reads originating from organisms absent from the reference. Karp employs an Expectation Maximization (EM) algorithm that uses information from all the reads to accurately estimate the relative frequencies of each reference in the sample.
BROCC / BLAST Read and Operational Taxonomic Unit Consensus Classifier
Provides a well-characterized tool kit for sequence-based enumeration of eukaryotic organisms in human microbiome samples. BROCC is a pipeline for attributing sequences that was tailored for use with the complex and sometimes inconsistent taxonomic assignments characteristic of single cell eukaryotes. It also facilitates interfacing with the popular QIIME pipeline. These methods are best used for (i) comparing among communities, (ii) providing an overview of eukaryotic lineages in a community at a relatively high taxonomic level, and (iii) generating hypotheses for specific species present.
An ultrafast web-tool for comprehensive metagenomics data analysis and interactive results visualization. Taxonomer is unique in providing integrated nucleotide and protein-based classification and simultaneous host messenger RNA (mRNA) transcript profiling. Using real-world case-studies, we show that Taxonomer detects previously unrecognized infections and reveals antiviral host mRNA expression profiles. Taxonomer enables rapid, accurate, and interactive analyses of metagenomics data on personal computers and mobile devices.
RiboFR-Seq / Ribosomal RNA gene flanking region sequencing
A method for capturing both ribosomal RNA variable regions and their flanking protein-coding genes simultaneously. This approach goes beyond traditional metagenomic analysis by taking into account not only phylogenetic features of 16S rRNA typing but also metagenome-scale genes derived from the same sample. Combined with classical amplicon sequencing and shotgun metagenomic sequencing, RiboFR-Seq can link the annotations of 16S rRNA and metagenomic contigs to make a consensus classification, and can accurately locate multiple 16S rRNA sequences through BRPs and thus can assist to metagenomic assembly and binning.
PyNAST / Python Nearest Alignment Space Termination
Uses as a flexible tool for aligning sequences to a template alignment. PyNAST is a reimplementation of NAST (Nearest Alignment Space Termination), introducing new features that increase its portability and flexibility. Its availability as an open source application with three convenient interfaces will allow the application of the NAST algorithm on a wider basis, to larger datasets, and in novel domains. In this package, the user can specify an arbitrary template alignment in a standard fasta alignment file to which candidate sequences should be aligned.
SINA / SILVA Incremental Aligner
Aligns and optionally taxonomically classifies ribosomal RNA (rRNA gene sequences). SINA is part of the rRNA gene processing pipeline of the SILVA ribosomal databases project. The software can execute a homology search based on the computed alignment and generate a per sequence classifications from the search results. It also allows to convert reference alignments from FASTA to ARB format. SINA was compared with the commonly used high throughput multiple sequence alignment (MSA) programs PyNAST and mothur.
A method for taxonomic profiling based on mixture modeling of the overall oligonucleotide distribution of a sample. The main advantage of the Taxy approach over all existing methods is the inherent read length invariance of the composition estimates. First of all, this property makes it possible to fully utilize ultra-short reads from all high-throughput sequencing technologies. Secondly, without losing comparability, it allows the use of datasets with heterogeneous sequence lengths, which for instance arise from a combination of raw reads and assembled contigs. In this case, the method is also robust with respect to erroneous assemblies because no taxonomic assignment of contigs is actually performed. Finally, Taxy facilitates the comparability of data obtained from different sequencing platforms. This advantage is of particular importance because the heterogeneity of sequencing technologies and the associated read lengths is still increasing.
Enables quantitative visualizations, statistical testing, multivariate analysis, supervised learning, factor analysis, multivariable regression, network analysis and diversity estimates. Calypso is an easy-to-use online software suite that allows non-expert users to mine, interpret and compare taxonomic information from metagenomic or 16S rDNA datasets. It has a focus on multivariate statistical approaches that can identify complex environment-microbiome associations. Comprehensive help pages, tutorials and videos are provided via a wiki page.
In contrast to the oligonucleotide-based Taxy method, Taxy-Pro is based on mixture model analysis of protein signatures in terms of protein domain frequencies. The Pfam domain counts of a metagenome under study can be obtained from the CoMet webserver and are imported as a protein signature (or "profile") in Taxy-Pro. The mixture model-based estimation of metagenomic taxon abundances is realized on the basis of reference signatures from all domains of life including viruses. Furthermore, Taxy-Pro for the first time includes signatures of viral metagenomes as reference data to provide realistic estimates of the virus fraction.
SEK / Sparsity Exploiting K-mer
An approach where the estimation of the bacterial community composition is performed jointly. SEK is based on kernel density estimators and mixture density models, and it leads to solving an under-determined system of linear equations under a particular sparsity assumption. In summary, the SEK approach is implemented in three separate steps: off-line computation of k-mers using a reference database of 16S rRNA genes with known taxonomic classification, online computation of k-mers for a given sample and then final online estimation of the relative frequencies of taxonomic units in the sample by solving an under-determined system of linear equations.
SONS / Shared OTUs and Similarity
Determines the abundance distribution of OTUs that are either endemic to or shared between samples. SONS is a versatile and powerful tool that will complement the suite of tools used by microbial ecologists. Using the quantity of OTUs, it then estimates the overlaps between communities’ memberships and structures. Because SONS is directly compatible with output files from DOTUR, it is possible to quickly determine the fraction of OTUs shared by two communities for any desired distance level.
ARK / Aggregation of Reads by K-means
A software package for estimation of bacterial community composition. ARK is based on a statistical argument via mixture density formulation. The community composition estimates can be substantially improved by aggregating the reads from a sample with an unsupervised machine learning approach prior to the estimation phase. The aggregation of reads is a pre-processing approach where we use a standard K-means clustering algorithm that partitions a large set of reads into subsets with reasonable computational cost to provide several vectors of first order statistics instead of only single statistical summarization in terms of k-mer frequencies. The output of the clustering is then processed further to obtain the final estimate for each sample.
A software platform for automated taxonomic and functional analysis of metagenome data. MetaSAMS includes a pipeline consisting of three different classifiers that perform taxonomic profiling of metagenome sequences. In addition, MetaSAMS implements a functional pipeline based on contigs that automatically assigns functions to predicted coding sequences. MetaSAMS provides tools for statistical and comparative analyses based on taxonomic and functional annotations. It has been successfully applied for the analysis of a biogas-producing microbial community from a biogas-production plant.
SPINGO / SPecies level IdentificatioN of metaGenOmic amplicons
A rapid, accurate and flexible classifier that improves the taxonomic resolution of 16S rRNA gene amplicons down to species level. While its primary target is species from any type of environmental sample, it can also be adapted to arbitrary classification hierarchies, like Clostridium clusters which are commonly used for characterising mammalian gut microbiota. SPINGO was consistently the most accurate species-classifier when compared to the other methods. To end with, the efficient algorithm provides a significant speed-up compared to existing classifiers which, when combined with its high accuracy, makes SPINGO a particularly valuable tool as amplicons more now than ever are sequenced in the hundreds of millions.
TUIT / Taxonomic Unit Identification Tool
An efficient open source and platform-independent application that can perform taxonomic classification on its own or can be used in combination with the RDP II Classifier to maximize the taxonomic identification rate. TUIT is applicable for 16S rRNA gene sequence classification; however, it is not restricted to 16S rRNA sequences. In addition, TUIT may be used as a complementary tool for effective taxonomic classification of nucleotide sequences generated by many current platforms, such as Roche 454 and Illumina.
A modern graphical user interface for custom BLAST databases. SequenceServer lets you rapidly set up a BLAST+ server with an intuitive user interface for use locally and for sharing with colleagues over the web. SequenceServer has been used for research on emerging model organisms (e.g., sea cucumber, starfish, falcons, Hessian fly, sugar-apple tree, Streptocarpus rexii), and for research in bioadhesion and environmental microbiology. SequenceServer is a main querying mechanism for several community databases (e.g., Drosophila suzukii, planarians, birch and ash tree, Amborella, echinoderms, Fusarium, ants, butterfly), and is also used as an educational resource.
MALT / MEGAN alignment tool
Aligns and analyzes metagenomic DNA sequencing data. MALT is based on a taxonomic binning algorithm that can assign reads to bacterial species. Applied to ancient microbiomes from oral cavity and lung, it can pick up the weak signal of the original microbiomes and identifies multiple species that are representatives of the respective host environment. The tool is useful in a DNA screening studies, in pathogen screening in clinical contexts or in large-scale metagenomic and metatranscriptomic projects.
Predicts the environment or host phenotype from microbial samples based on k-mer distributions in shallow subsamples of 16S rRNA data. MicroPheno has three principal functions such as: i) the use of k-mers versus, ii) the benefits of shallow sub-sampling and iii) classical methods versus the deep learning approach. It was used for the comparison of k-mer representations with Operational Taxonomic Unit (OTU) features in two tasks of body-site identification and Crohn’s disease classification.
Produces fine level taxonomy classifications. TaxAss is based on an ecosystem-specific database and a comprehensive database to maintain the full biological diversity, taxonomic richness and accuracy of an amplicon dataset. It sorts operational taxonomic units (OTUs) that share high percent identity with ecosystem-specific reference sequences. This tool was applied on a variety of freshwater amplicon datasets and more specifically to the ecosystem-specific Freshwater Training Set (FreshTrain).
A multi-modular package that is based on the above assumption and automates inferring and/ or comparing the functional characteristics of an environment using taxonomic abundance generated from one or more environmental sample datasets. Vikodak is based upon the assumption that the overall metabolic/functional potential of any given environmental niche is a function of the sum total of genes/proteins/enzymes that are encoded and expressed by various interacting microbes residing in that niche. Vikodak is expected to be an important value addition to the family of existing tools for 16S based function prediction.
An algorithm for taxonomic classification of 16S rDNA fragments. The C16S algorithm adopts a two phase approach for the taxonomic classification of partial or full-length 16S rDNA sequences. First, the algorithm scores a given 16S rDNA fragment against precomputed genus-specific Hide Markov Models (HMMs) and identifies the genus corresponding to the highest scoring HMM. Then, it takes into account the quality of the HMM alignment and restricts the assignment at an appropriately higher taxonomic level. The high accuracy levels obtained with this algorithm indicates its suitability in obtaining an accurate snapshot of microbial diversity in an environment.
