Unlock your biological data


Try: RNA sequencing CRISPR Genomic databases DESeq

Homology-Based Taxonomic Software Tools | Shotgun metagenomic sequencing data analysis

A majority of methods available for binning datasets obtained using shotgun sequencing belong to the taxonomy-dependent category. Based on the strategy used for comparing reads with sequences/pre-computed models, taxonomy-dependent methods can be sub-classified into alignment-based, composition-based and hybrid methods.

1 - 50 of 116 results
filter_list Filters
build Technology
healing Disease
settings_input_component Operating System
tv Interface
computer Computer Skill
copyright License
1 - 50 of 116 results
MEGAN / MEtaGenome ANalyzer
star_border star_border star_border star_border star_border
star star star star star
forum (1)
Allows users to taxonomically and functionally explore and analyze large-scale microbiome sequencing data. MEGAN is a comprehensive microbiome analysis toolbox for metagenome, meta-transcriptome, amplicon and from other sources data. Users can perform taxonomic, functional or comparative analysis, map reads to reference sequences, reference-based multiple alignments and reference-guided assembly and integrate their own classifications.
MetaPhlAn / Metagenomic Phylogenetic Analysis
Estimates the relative abundance of microbial cells by mapping reads against a reduced set of clade-specific marker sequences. MetaPhlAn accurately profiles microbial communities and requires only minutes to process millions of metagenomic reads. This classifier compares each metagenomic read from a sample to this marker catalog to identify high-confidence matches. It finally compares metagenomic reads against this precomputed marker catalog using nucleotide BLAST searches in order to provide clade abundances for one or more sequenced metagenomes.
A classification system designed for metagenomics experiments that assigns taxonomic labels to short DNA reads. PhymmBL combines two components: (i) composition-directed taxonomic predictions from Phymm and (ii) basic local alignment search tool (BLAST)-based homology results. PhymmBL combines these to label each input sequence with its best guess as to the taxonomy of the source organism. Input sequences as short as 100 base pairs can be phylogenetically classified with PhymmBL more accurately than with any other existing method. PhymmBL predicts species, genus, family, order, class and phylum for each read, allowing users to arrange results according to levels of specificity relevant to their research goals.
AMPHORA / AutoMated PHylogenomic infeRence Application
Allows genome tree reconstruction and metagenomic phylotyping. AMPHORA is an application for large-scale protein phylogenetic analysis. The software supports the analyses of DNA sequences, which means that users can apply AMPHORA2 directly to metagenomic reads without the need to first annotate the sequence. It can phylotype metagenomic sequences from a mixed population of bacteria and archaea and should be useful for the study of microbial evolution and ecology in the genomic era. A web application and a flavor of AMPHOR2 are also available.
CLARK / CLAssifier based on Reduced K-mers
star_border star_border star_border star_border star_border
star star star star star
forum (1)
An approach to classify metagenomic reads at the species or genus level with high accuracy and high speed. Extensive experimental results on various metagenomic samples show that the classification accuracy of CLARK is better or comparable to the best state-of-the-art tools and it is significantly faster than any of its competitors. In its fastest single-threaded mode CLARK classifies, with high accuracy, about 32 million metagenomic short reads per minute. CLARK can also classify BAC clones or transcripts to chromosome arms and centromeric regions.
Allows to bin and annotate short paired-end reads. MetaCluster-TA is an assembly-assisted approach which, instead of annotating each read or assembled contig separately, bins similar reads/contigs into the same cluster and annotates the whole cluster. The software consists of three phases: (i) construction of long virtual contigs from assembly and probabilistic grouping of short reads, (ii) q-mer distribution estimation and clustering and (iii) cluster annotation and merging.
A rapid and sensitive classifier for microbial sequences with low memory requirements and a speed comparable to the fastest systems. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge classifies 10 million reads against a database of all complete prokaryotic and viral genomes within 20 minutes using one CPU core and requiring less than 8GB of RAM. Furthermore, Centrifuge can also build an index for NCBI’s entire nt database of non-redundant sequences from both prokaryotes and eukaryotes. The search requires a computer system with 128 GB of RAM, but runs over 3500 times faster than Megablast.
An ultrafast web-tool for comprehensive metagenomics data analysis and interactive results visualization. Taxonomer is unique in providing integrated nucleotide and protein-based classification and simultaneous host messenger RNA (mRNA) transcript profiling. Using real-world case-studies, we show that Taxonomer detects previously unrecognized infections and reveals antiviral host mRNA expression profiles. Taxonomer enables rapid, accurate, and interactive analyses of metagenomics data on personal computers and mobile devices.
Maps taxonomic short-read data. taxMaps is designed to deal with large DNA/RNA metagenomics data. It can prioritize mapping to multiple indexes, detail mapping reports and offers interactive results visualization. The tool offers to the researchers a way to conduct very sensitive searches on very large databases. It provides class-leading accuracy and comprehensiveness while balancing performance. taxMaps appears to be useful in pathogen identification from clinical or environmental samples.
RITA / Rapid Identification of Taxonomic Assignments
Uses the agreement between composition and homology to accurately classify sequences as short as 50 nt in length by assigning them to different classification groups with varying degrees of confidence. RITA is much faster than the hybrid PhymmBL approach when comparable homology search algorithms are used, and achieves slightly better accuracy than PhymmBL on an artificial metagenome. RITA can also incorporate prior knowledge about taxonomic distributions to increase the accuracy of assignments in data sets with varying degrees of taxonomic novelty, and classified sequences with higher precision than the current best rank-flexible classifier.
Devotes to identify genome-specific markers (GSMs) from currently sequenced microbial genomes using a k-mer based approach. Explored GSMs could be used to identify microbial strains/species in metagenomes, especially in human microbiome where many reference genomes are available. Two different levels of GSMs, including strain-specific and species-specific GSMs are currently supported. The approach can be directly applied to identify microbial strains/species from raw metagenomes, without the effort of complex data pre-processing.
MetaCV / Metagenome Composition Vector
A composition and phylogeny-based algorithm to classify very short metagenomic reads (75-100 bp) into specific taxonomic and functional groups. MetaCV performs (for both sensitivity and specificity) as good as BlastX-based methods on simulated short reads, but runs 300 times faster, thus provides fast and accurate analysis on huge amount of NGS data. To our knowledge, MetaCV, benefited from the strategy of composition comparison, is the first algorithm that can classify millions of very short reads within affordable time.
k-SLAM / k-mer Sorted List Alignment and Metagenomics
A metagenomic classifier for the characterization of metagenomic data. k-SLAM uses a k-mer method to rapidly produce alignments of the reads against a database and can therefore find genes and variants. A novel pseudo-assembly technique chains neighboring alignments together to improve taxonomic specificity. The main data structure is a sorted list of k-mers which makes k-SLAM extremely fast and parallelizable. In human microbiome metagenomics, k-SLAM could be used to identify gut parasites and also to detect plant sequence that can indicate dietary components.
SATIVA / Semi-Automatic Taxonomy Improvement and Validation Algorithm
A phylogeny-aware method to automatically identify taxonomically mislabeled sequences (mislabels) using statistical models of evolution. SATIVA use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity/91.7% precision) as well as correction (94.9% sensitivity/89.9% precision) of mislabels.
Enables phylogenetically informed classification of metagenomic sequences. GraftM uses the open reading frame finder OrfM to translate nucleotide sequences, Hidden Markov models (HMMs) or DIAMOND-based pairwise comparison to search for target gene-families, and phylogenetic placement using pplacer for classification. It can be used to screen large public datasets for genes of interest (e.g. McrA), which lead to the discovery of three novel McrA-containing lineages distinct from the Euryarchaeota, Bathyarchaeota and Verstraetearchaeota.
SMURF / Short MUltiple Regions Framework
Enables identification of near full-length 16S rRNA gene sequences in microbial communities. SMURF may be applied to standard sample preparation protocols with very little modifications. It also paves the way to high-resolution profiling of low-biomass and fragmented DNA, in the case of Formalin-fixed and Paraffin-embedded samples, fossil-derived DNA or DNA exposed to other degrading conditions. This approach is not restricted to combining amplicons of the 16S rRNA gene and may be applied to any set of amplicons.
LMAT / Livermore Metagenomics Analysis Toolkit
forum (1)
A method presented to shift computational costs to an offline computation by creating a taxonomy/genome index that supports scalable metagenomic classification. LMAT is designed to efficiently assign taxonomic labels to as many reads as possible in very large metagenomic datasets and report the taxonomic profile of the input sample. The quick 'single pass' analysis of every read allows to support additional more computationally expensive analysis such as metagenomic assembly or sensitive database searches on targeted subsets of reads.
Indexes data structure and reads classification. MetaCache advances the state-of-the-art in k-mer based read classification approaches. It uses minhashing only a subset of k-mers, reducing the size of the data index structure significantly. This tool employs context-aware k-mer matches within a local window rather than at any position of the whole genome. It aims at context-aware classification providing highly competitive accuracy while consuming significantly less memory than other approaches.
Performs taxonomic sequence assignment by fast approximate determination of evolutionary neighbors from sequence similarities. Taxator-tk was precise in its taxonomic assignment across all ranks and taxa for a range of evolutionary distances and for short as well as for long sequences. In addition to the taxonomic binning of metagenomes, it is well suited for profiling microbial communities from metagenome samples because it identifies bacterial, archaeal and eukaryotic community members without being affected by varying primer binding strengths, as in marker gene amplification, or copy number variations of marker genes across different taxa.
MARTA / Metagenomic AND rDNA Taxonomic Assignment
A suite of Java-based software to better provide taxonomic assignments to DNA sequences. MARTA is useful for protistologists, virologists, mycologists and other microbial ecologists. The program relies on NCBI utilities including the BLAST software and Taxonomy database and is easily manipulated at the command-line to specify a BLAST candidate's query-coverage or percent identity requirements; other options include the ability to set minimal consensus requirements (%) for each of the eight major taxonomic ranks (Domain, Kingdom, Phylum, ...) and whether to consider lower scoring candidates when the top-hit lacks taxonomic classification.
TIPP / Taxon Identification and Phylogenetic Profiling
A marker-based method developed for abundance profiling of metagenomic data. TIPP combines SATé-enabled phylogenetic placement a phylogenetic placement method, with statistical techniques to control the classification precision and recall, and results in improved abundance profiles. TIPP is highly accurate even in the presence of high indel errors and novel genomes, and matches or improves on previous approaches, including NBC, mOTU, PhymmBL, MetaPhyler and MetaPhlAn.
A reference-based taxonomic profiler that introduces a novel top-down approach to analyze metagenomic NGS samples. Rather than predicting an organism presence in the sample based only on relative abundances, DUDes first identifies possible candidates by comparing the strength of the read mapping in each node of the taxonomic tree in an iterative manner. Instead of using the lowest common ancestor (LCA) we propose a new approach: the deepest uncommon descendent (DUD). We showed in experiments that DUDes works for single and multiple organisms and can identify low abundant taxonomic groups with high precision. Additionally, DUDes provides a strain identification method that can propose one or more strains presents in a sample.
A probabilistic method for taxonomical classification of DNA sequences. Given a pre-defined taxonomical tree structure that is partially populated by reference sequences, PROTAX decomposes the probability of one to the set of all possible outcomes. PROTAX accounts for species that are present in the taxonomy but that do not have reference sequences, the possibility of unknown taxonomical units, as well as mislabeled reference sequences. PROTAX is based on a statistical multinomial regression model, and it can utilize any kind of sequence similarity measures or the outputs of other classifiers as predictors.
A sensitive and fast solution to bin metagenomic samples. FOCUS2 first runs FOCUS to predict the taxa in the sample taxa and refines the profiling using a fast aligner with a reduced version of the PATRIC database created on the fly. The PATRIC database opens new horizons in the metagenomics binning world because it is over 12x bigger than previous databases and brings many new taxa into classification. The speed, sensitivity, and precision of FOCUS2 positions metagenomics to capitalize on expanding databases and ask novel interdisciplinary questions currently beyond reach.
Analyzes the structure and functions of active microbial communities using the power of multi-threading computers. MetaTrans is designed to perform two types of RNA-Seq analyses: taxonomic and gene expression. It performs quality-control assessment, rRNA removal, maps reads against functional databases and also handles differential gene expression analysis. Its efficacy was validated by analyzing data from synthetic mock communities, data from a previous study and data generated from twelve human fecal samples.
A customizable web server for fast metagenomic analysis. WebMGA includes over 20 commonly used tools such as ORF calling, sequence clustering, quality control of raw reads, removal of sequencing artifacts and contaminations, taxonomic analysis, functional annotation etc. WebMGA provides users with rapid metagenomic data analysis using fast and effective tools, which have been implemented to run in parallel on our local computer cluster. Users can access WebMGA through web browsers or programming scripts to perform individual analysis or to configure and run customized pipelines. WebMGA offers to researchers many fast and unique tools and great flexibility for complex metagenomic data analysis.
0 - 0 of 0 results
1 - 17 of 17 results
filter_list Filters
computer Job seeker
Disable 8
person Position
thumb_up Fields of Interest
public Country
language Programming Language
1 - 17 of 17 results