Unlock your biological data


Try: RNA sequencing CRISPR Genomic databases DESeq

1 - 50 of 67 results
filter_list Filters
language Programming Language
build Technology
healing Disease
settings_input_component Operating System
tv Interface
computer Computer Skill
copyright License
1 - 50 of 67 results
A pipeline for constructing operational taxonomic units (OTUs) de novo from next-generation reads that achieves high accuracy in biological sequence recovery and improves richness estimates on mock communities. UPARSE works by quality-filtering reads, trimming them to a fixed length, optionally discarding singleton reads and then clustering the remaining reads. UPARSE reports OTU sequences with ≤1% incorrect bases in artificial microbial community tests, compared with >3% incorrect bases commonly reported by other methods. The improved accuracy results in far fewer OTUs, consistently closer to the expected number of species in a community.
Assists users to extract partial ribosomal RNA (rRNA) sequences from large sequencing data sets and assigning them to an archaeal, bacterial, nuclear eukaryote, mitochondrial or chloroplast origin. metaxa is a program that enables the utilization of any genetic marker for taxonomic classification of metagenome and amplicon sequence data. Several functionalities are offered by this tool such as: (1) to sort out specific non-target sequences from the dataset; or (2) to get separated files for paired-end reads matching rRNA for further downstream analysis.
A clustering method that exploits USEARCH to assign sequences to clusters. UCLUST is superior to CD-HIT. It is usually significantly faster, uses significantly less memory, can cluster at lower identities and is more sensitive. While CD-HIT often fails to identify the closest cluster, or overlooks that a match is possible (false negative), UCLUST rarely misses a match and in most cases finds the best possible match. UCLUST also enables rapid clustering of much larger numbers of sequences.
RDP Classifier / Ribosomal Database Project Classifier
Provides rapid taxonomic placement and summary data based on rRNA sequence data. For each high-throughput experiments, the RDP Classifier can include the number of input sequences belonging to each taxon. For query sequences from regions of bacterial diversity with less-defined taxonomy, the RDP Classifier tends to provide classification results with low confidence estimates. It can also be adapted to additional phylogenetically coherent bacterial taxonomies.
forum (1)
An algorithm for hierarchical clustering analysis of massive sequence data. To avoid confusion, we note that ESPRIT-Tree is not a program for determining phylogenetic trees, but rather for producing hierarchical clusters of sequences based on sequence similarity, using a tree-like data structure. We extended the concept of space partition used by previous methods for handling sequence data of varying lengths. By assuming that sequence data lives in a pseudometric space, we created a distance-based partition of the data without explicitly defining an inner-product operator to divide the space, and organized the partition results in a pseudometric based partition tree. By repeatedly applying the triangular inequality, a fast closest-pair searching algorithm was developed within the ESPRIT-Tree framework. An efficient method for dynamic insertion and deletion of tree nodes were also developed.
microPITA / microbiomes: Picking Interesting Taxonomic Abundance
Picks interesting taxonomic abundance. microPITA is a computational tool enabling sample selection in two-stage (tiered) studies. Using two-stage designs can more efficiently allocate resources, reducing study costs, and maximizing the use of samples. A selection of samples can be performed to target various microbial communities including: (i) samples with the most diverse community (maximum diversity), (ii) samples dominated by specific microbes (targeted feature), (iii) samples with microbial communities representative of the survey (representative dissimilarity) or (iv) samples with the most extreme microbial communities in the survey (most dissimilar).
dbOTU / Distribution-based OTU calling
Provides an algorithm to inform the creation of OTUs for large next-generation sequencing studies employing the distribution of 16S rRNA sequences. dbOTU was implemented following three different ways. The third version is based on the Levenshtein edit distance and uses the number of single-position insertions, deletions, or substitutions required to modify one sequence into another, as an approximation for the sequence dissimilarity, with the aim of increasing its efficiency.
CROP / Clustering 16S rRNA for OTU Prediction
Provides a clustering tool that automatically determines the best clustering result for 16S rRNA sequences at different phylogenetic levels. Our study shows that CROP gives accurate clustering results, both in terms of the number of clusters and their abundance levels, for various types of 16S rRNA datasets. In contrast, the standard hierarchical clustering strategy, even with the preclustering process and the average linkage method, still frequently overestimates the number of operational taxonomic units (OTUs) in the presence of sequencing errors, resulting in an underestimation of the abundance level of the underlying OTUs. By applying our method to several datasets, we demonstrate that CROP is robust against sequencing errors and that it produces more accurate results than conventional hierarchical clustering methods.
Solves the problems of arbitrary global clustering thresholds and centroid selection induced input-order dependency, and creates robust and more natural Operational Taxonomic Units (OTUs) than current greedy, de novo, scalable clustering algorithms. The purpose of Swarm is to provide a novel clustering algorithm that handles massive sets of amplicons. Results of traditional clustering algorithms are strongly input-order dependent, and rely on an arbitrary global clustering threshold. Swarm results are resilient to input-order changes and rely on a small local linking threshold, representing the maximum number of differences between two amplicons. Swarm forms stable, high-resolution clusters, with a high yield of biological information.
Enables quantitative visualizations, statistical testing, multivariate analysis, supervised learning, factor analysis, multivariable regression, network analysis and diversity estimates. Calypso is an easy-to-use online software suite that allows non-expert users to mine, interpret and compare taxonomic information from metagenomic or 16S rDNA datasets. It has a focus on multivariate statistical approaches that can identify complex environment-microbiome associations. Comprehensive help pages, tutorials and videos are provided via a wiki page.
A classification method for 16S rDNA sequence samples that uses the natural structure of microbial community data encoded by a phylogenetic tree. We showed that using the phylogenetic information leads to an improved classification accuracy compared with the state-of-the-art classification algorithms. Unlike many popular classification methods, which consider features (or operational taxonomic unit (OTU) frequencies) in isolation, our method takes advantage of the similarities between OTUs encoded by the phylogenetic tree.
Classifies ribosomal RNA sequences in terms of their taxonomy and operational taxonomic unit (OTU) classification. MAPseq uses a reference set of full-length ribosomal RNA sequences for which known taxonomies are known, and for which a set of high quality OTU clusters has been previously generated. It provides sequence read mapping against hierarchically clustered and annotated reference sequences. This tool can be applied to individual samples but it can also be used to analyze very large and diverse sequence collections.
DOTUR / Distance based OTU and Richness determination
Assigns sequences to OTUs (operational taxonomic units) by using either the furthest, average, or nearest neighbor algorithm for each distance level. DOTUR uses the frequency at which each OTU is observed to construct rarefaction and collector’s curves for various measures of richness and diversity. It was designed to calculate various diversity indices and richness estimators. Diversity indices and richness estimators are useful to compare the relative complexity of two or more communities and to estimate the completeness of sampling of a community.
NINJA-OPS / NINJA Is Not Just Another - OTU Picking Solution
Takes advantage of the Burrows-Wheeler (BW) alignment using an artificial reference chromosome composed of concatenated reference sequences, the “concatesome” as the BW input. NINJA-OPS also allows for convenient quality control of data, such as fast reverse complementing, base pair trimming, and a specialized denoising transformation. This method can transform an entire MiSeq run into a QIIME-formatted BIOM table in under 10 minutes on laptop, achieving higher accuracy and more exact matches than USEARCH. It implements several pre-filtering methods that elicit substantial speedup when coupled with existing tools.
Performs parallel hierarchical clustering of sequences. ESPRIT-Forest is algorithm with a cluster version. The software inherits the same pipeline of ESPRIT and ESPRIT-Tree, which performs pre-processing, hierarchical clustering and statistical analysis. The algorithm organizes sequences into a pseudo-metric based partitioning tree for sub-linear time searching of nearest neighbors, and then uses a new multiple-pair merging criterion to construct clusters in parallel using multiple threads.
MICCA / MICrobial Community Analysis
Provides accurate results reaching a good compromise among modularity and usability. MICCA, a software pipeline for the processing of amplicon metagenomic datasets that efficiently combines quality filtering, clustering of Operational Taxonomic Units (OTUs), taxonomy assignment and phylogenetic tree inference. It provides estimates of the number of OTUs and of other common ecological indices that are more accurate and robust than currently available pipelines. Analysis of public metagenomic datasets shows that the higher consistency of results improves understanding of the structure of environmental and human associated microbial communities.
PhyloToAST / Phylogenetic Tools for Analysis of Species-level Taxa
Distributes BLAST-based OTU picking across computing clusters. PhyloToAST provides several improved/new visualization methods, tools for filtering and sub-setting results files, simple name lookup for OTU IDs, and finally, exposes the API used to build all of these tools for interested developers. In addition, PhyloToAST enables easy reproducibility by displaying the azimuth and angle during the interactive 3D plotting mode, and allowing users to input those values in later sessions.
Analyses DNA barcode datasets. jMOTU uses an explicit and determinated algorithm to define molecular operational taxonomic units. It is useful for both individual specimen-based Sanger sequencing surveys and bulk-environment metagenetic surveys using long-read next-generation sequencing (NGS) data. The tool can analyse tens of thousands of sequences in a short time on a desktop computer. jMOTU aims to use a distance metric that reflects the genuine genetic distance between sequences, as this is most likely to give clusterings that correspond to biological reality.
16S Classifier
A Random Forest based tool which is developed to carry out fast, efficient and accurate taxonomic classification of 16S rRNA sequences. 16S Classifier has the unique ability to classify small Hypervariable Regions of 16S rRNA. It displayed precision values of up to 0.91 on training datasets and the precision values of up to 0.98 on the test dataset. On real metagenomic datasets, it showed up to 99.7% accuracy at the phylum level and up to 99.0% accuracy at the genus level.
CLUSTOM-CLOUD / CLUSTering 16S NGS sequences by Overlap Minimization
A distributed clustering program that can efficiently and accurately cluster 16S sequences under distributed and cloud-computing environments. CLUSTOM-CLOUD is a significant upgrade to its predecessor, CLUSTOM. The enhancements include: (i) implementation of k-mer transformation, (ii) removal of duplicate sequences (dereplication), and importantly (iii) the implementation of IMDG technology to store data directly into RAM rather than hard disks of individual nodes. Importantly, CLUSTOM-CLOUD inherits the high accuracy of its ancestor CLUSTOM, as also confirmed by the comparative exercise.
A fast clustering tool specifically designed for clustering highly-similar DNA sequences. Given a set of sequences and a sequence similarity threshold, DNACLUST creates clusters whose radius is guaranteed not to exceed the specified threshold. Underlying DNACLUST is a greedy clustering strategy that owes its performance to novel sequence alignment and k-mer based filtering algorithms. DNACLUST can also produce multiple sequence alignments for every cluster, allowing users to manually inspect clustering results, and enabling more detailed analyses of the clustered data.
Infers Operational Taxonomic Units (OTUs) from massive 16S rRNA sequences with high accuracy and low computational complexity. DBH is a clustering method that consists of two distinct elements: (i) based on the DataBase (DB) graph theory, a seed selection strategy is introduced to reduce the read errors and (ii) a greedy heuristic clustering procedure is employed to decrease the computational burden, avoiding the large memory required for storing seeds and/or distance matrix. This method can also efficiently handle large-scale datasets.
MSClust / Multi-Seeds based Clustering algorithm
An adaptive multi-seeds based heuristic clustering method that avoids the large memory need for storing seeds and/or distance matrix. MSClust uses a greedy heuristic strategy to build one cluster at a time. Each cluster is expanded from a limited initial set with multi-seeds, where the initial multi-seeds are generated based on an adaptive strategy. Unassigned sequences are then compared to the seeds sequentially. A new sequence is added to the current cluster and removed from the input if the average distance between the sequence and seeds is smaller than the user-defined threshold; otherwise, the sequence is marked as unassigned.
Filters unclassified and/or rare operational taxonomic units from 16S rRNA gene sequence libraries by screening against consensus structural models for small-subunit (SSU) rRNA. SSUnique promotes the exploration of unclassified diversity in microbiome research and enables the discovery of substantial novel taxonomic lineages through the analysis of a large variety of existing data sets. SSUnique contains visualization tools for exploring phylogenetic novelty in microbiome data, especially useful for very large data sets.
Divides a set of amplicon reads into clusters. OTUCLUST is a sequence-clustering application that performs sequence dereplication and chimera removal. This method is based on a strategy in which the clusters are constructed incrementally by comparing an abundance-ordered list of input sequences against the representative set of already-chosen sequences. The procedure is composed by three main steps: (i) dereplication and abundance estimation, (ii) denovo chimera removal (optional, with UCHIME) and (iii) clustering using the dereplicated sequences as centroids.
1 - 4 of 4 results
filter_list Filters
computer Job seeker
Disable 3
person Position
thumb_up Fields of Interest
public Country
1 - 4 of 4 results

By using OMICtools you acknowledge that you have read and accepted the terms of the end user license agreement.