A classification system designed for metagenomics experiments that assigns taxonomic labels to short DNA reads. PhymmBL combines two components: (i) composition-directed taxonomic predictions from Phymm and (ii) basic local alignment search tool (BLAST)-based homology results. PhymmBL combines these to label each input sequence with its best guess as to the taxonomy of the source organism. Input sequences as short as 100 base pairs can be phylogenetically classified with PhymmBL more accurately than with any other existing method. PhymmBL predicts species, genus, family, order, class and phylum for each read, allowing users to arrange results according to levels of specificity relevant to their research goals.
A program for unsupervised binning of metagenomic contigs by using nucleotide composition, coverage data in multiple samples and linkage data from paired end reads. CONCOCT does unsupervised binning of metagenomic contigs by using nucleotide composition - kmer frequencies - and coverage data for multiple samples.
Integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. It automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node.
Allows to bin and annotate short paired-end reads. MetaCluster-TA is an assembly-assisted approach which, instead of annotating each read or assembled contig separately, bins similar reads/contigs into the same cluster and annotates the whole cluster. The software consists of three phases: (i) construction of long virtual contigs from assembly and probabilistic grouping of short reads, (ii) q-mer distribution estimation and clustering and (iii) cluster annotation and merging.
A software tool for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. Users could understand the underlying bins (genomes) of the microbes in their metagenomes by simply providing assembled metagenomic sequences and the reads coverage information or sequencing reads. For users' convenience MaxBin will report genome-related statistics, including estimated completeness, GC content and genome size in the binning summary page. Users could use MEGAN or similar software on MaxBin bins to find out the taxonomy of each bin after the binning process is finished.
Performs assignment for bacterial 16S and amplicon sequencing. HmmUFOtu is a pipeline that aims to assist users in determining microbial community composition and diversity. It classifies every read from submitted sequences within a known reference tree to then performs a phylogeny-based operational taxonomic units (OUT) clustering and produces an assignment for each read. This application is able to support a wide range of DNA substitutions models including GTR or HKY85.
Enables improvements of the contigs produced by existing binning tools. D2SBin calculates dissimilarity between contig and the bin’s center based on the Markov model of k-tuple sequence compositions. The software also gives credence to the relative sequence composition model over the direct application of absolute sequence composition. Besides, the software only depends on the k-tuples for generating a single metagenomic sample.
Performs common tasks in metagenomic data analysis from raw read quality control to bin extraction and analysis. MetaWRAP provides a collection of modules, each being a standalone program addressing one aspect of WMG data processing or analysis, including read quality control (QC), assembly, visualization, taxonomic profiling, and binning. Users can follow the intuitive workflow or use only specific functions. Its modularity gives the investigator flexibility in their analysis approach.
A Java based application which offers efficient and intuitive reference-independent visualization of metagenomic datasets from single samples for subsequent human-in-the-loop inspection and binning. The method is based on nonlinear dimension reduction of genomic signatures and exploits the superior pattern recognition capabilities of the human eye-brain system for cluster identification and delineation. We demonstrate the general applicability of VizBin for the analysis of metagenomic sequence data by presenting results from two cellulolytic microbial communities and one human-borne microbial consortium. The superior performance of our application compared to other analogous metagenomic visualization and binning methods is also presented.
A general framework automatically bin contigs into OTUs based upon sequence composition and coverage across multiple samples. The effectiveness of COCACOLA is demonstrated in both simulated and real datasets in comparison to state-of-art binning approaches such as CONCOCT, GroopM, MaxBin and MetaBAT. The superior performance of COCACOLA relies on two aspects. One is employing L1 distance instead of Euclidean distance for better taxonomic identification during initialization. More importantly, COCACOLA takes advantage of both hard clustering and soft clustering by sparsity regularization. In addition, the COCACOLA framework seamlessly embraces customized knowledge to facilitate binning accuracy.
Allows analysis of large sets of amplicon sequences and yields abundance tables of Operational Taxonomic Units (OTUs) with their taxonomic affiliation. FROGS is a set of 13 tools, designed for biologists and bioinformaticians, that processes amplicon reads coming from Illumina or Roche sequencing technologies. The software can produce accurate community compositions, including at fine scales (species or genus) and in large communities (>100 different species) with very heterogeneous abundances.
Metagenomes are often characterized by high levels of unknown reads with no similarity to any sequences in Genbank. Although these are often discarded from analysis, they contain a wealth of information for comparative metagenomics. crAss is a tool that enables fast and intuitive analysis of complete metagenomic data sets by counting the number of shared contigs between samples in a cross-assembly of all reads.
Allows to represent a shotgun metagenome from an arbitrary environment as a modified de Bruijn graph consisting of simplified components. MetaFast lies between the k-mer spectrum analysis and assembly and combines the best of these two alignment-free approaches: the speed of the former with the precision of the latter. Its independence of the reference allows to perform efficiently for both extensively studied and novel microbiota types. For multiple metagenomes, the resulting representation is used to obtain a pairwise similarity matrix. The dimensional structure of the metagenomic components preserved in our algorithm reflects the inherent subspecies-level diversity of microbiota. MetaFast is computationally efficient and especially promising for an analysis of metagenomes from novel environmental niches.
Allows users to batch processing of fasta and fastq files specific for amplicon sequencing studies. SEED simplifies clustering, quality filtering/ trimming, taxonomic identification, creation and description of molecular taxa and their phylogenetic placements and for quick assessment of basic microbial community statistics. Moreover, it includes a graphical user interface (GUI) to process data from Illumina, Ion Torrent and Sanger sequencing.
An assembly-assisted approach for reference-free metagenomic binning. MetaProb can deal with short and long reads in a novel probabilistic framework, by using probabilistic sequence signatures. We compared the binning performances over several short and long reads datasets against other state-of-art binning algorithms, showing that MetaProb achieves in most cases the best performances in terms of F-measure. The estimation of the number of species in a metagenomic sample can be performed with MetaProb, adding a degree of freedom in the analysis.
Investigates and catches the complex structure of the metagenomic datasets. BMC3C conducts clusterings on the datasets with different initializations or algorithms. It employs independent statistics of codon usage to represent contigs. This tool enables to can synergy the advantages of base clustering methods, and neutralizes or even avoids the disadvantages of them.
Separates short paired-end reads from different organisms in a metagenomic dataset. TOSS uses abundance levels to proceed to the separation of genomes. It is able to separate unique l-mers from repeats. The tool starts by constructing a graph of l-mers and performs the clustering of unique l-mers. It can be used for very short reads and is able to handle multiple genomes with arbitrary abundance levels and sequencing errors.
An automated binning tool that combines genomic signatures, marker genes and optional contig coverages within one or multiple samples, in order to visualize the metagenomes and to identify the reconstructed genomic fragments. We demonstrate the superior performance of MyCC compared to other binning tools including CONCOCT, GroopM, MaxBin and MetaBAT on both synthetic and real human gut communities with a small sample size (one to 11 samples), as well as on a large metagenome dataset (over 250 samples). Moreover, the visualization of metagenomes in MyCC aids in the reconstruction of genomes from distinct bins.
Uses tetranucleotide frequencies, differential coverage and read mapping information to bin assembled contigs. MetaWatt uses diamond blastx, hmmer and aragorn for quality control. Metawatt is very fast, runs on a normal pc or laptop and offers a graphical user interface for effective data exploration.
Identifies, selects and maps ribosomal reads onto the 16S ribosomal gene with the possibility to perform taxonomic classification. riboFrame can perform comparison of taxonomic performance of different variable regions by addressing post hoc the region to be analyzed. It detects and positions ribosomal reads among a large number of short reads and then proceeds with taxonomic classification.