The 16S ribosomal RNA gene [or 16S ribosomal DNA (rDNA)] is a commonly used marker for bacterial identification due to its universal distribution among all bacterial species and a slow rate of sequence evolution. To reduce the dimensionality of large sequence datasets generated by high-throughput sequencing of 16S rDNAs, the reads are clustered into operational taxonomic units (OTUs) (Ye, 2011) that roughly represent taxa at phylogenetic levels defined by a user-defined sequence similarity cutoff. The abundance of each OTU is defined as the number of sequences in the OTU. Representative sequences from each OTU are chosen and used to assign taxonomy to the OTUs and to construct phylogenetic trees.
Measures the phylogenetic distance between sets of taxa in a phylogenetic tree as the fraction of the branch length of the tree that leads to descendants from either one environment or the other, but not both. UniFrac can be used to determine whether communities are significantly different, to compare many communities simultaneously using clustering and ordination techniques, and to measure the relative contributions of different factors, such as chemistry and geography, to similarities between samples.
A clustering method that exploits USEARCH to assign sequences to clusters. UCLUST is superior to CD-HIT. It is usually significantly faster, uses significantly less memory, can cluster at lower identities and is more sensitive. While CD-HIT often fails to identify the closest cluster, or overlooks that a match is possible (false negative), UCLUST rarely misses a match and in most cases finds the best possible match. UCLUST also enables rapid clustering of much larger numbers of sequences.
Aims to be a comprehensive software package that allows users to use a single piece of software to analyze community sequence data. mothur builds upon previous tools to provide a flexible and powerful software package for analyzing sequencing data. Extensive community-supported documentation and support are available through a MediaWiki-based wiki and a discussion forum.
Assigns sequences to OTUs (operational taxonomic units) by using either the furthest, average, or nearest neighbor algorithm for each distance level. DOTUR uses the frequency at which each OTU is observed to construct rarefaction and collector’s curves for various measures of richness and diversity. It was designed to calculate various diversity indices and richness estimators. Diversity indices and richness estimators are useful to compare the relative complexity of two or more communities and to estimate the completeness of sampling of a community.
Provides rapid taxonomic placement and summary data based on rRNA sequence data. For each high-throughput experiments, the RDP Classifier can include the number of input sequences belonging to each taxon. For query sequences from regions of bacterial diversity with less-defined taxonomy, the RDP Classifier tends to provide classification results with low confidence estimates. It can also be adapted to additional phylogenetically coherent bacterial taxonomies.
A pipeline for constructing operational taxonomic units (OTUs) de novo from next-generation reads that achieves high accuracy in biological sequence recovery and improves richness estimates on mock communities. UPARSE works by quality-filtering reads, trimming them to a fixed length, optionally discarding singleton reads and then clustering the remaining reads. UPARSE reports OTU sequences with ≤1% incorrect bases in artificial microbial community tests, compared with >3% incorrect bases commonly reported by other methods. The improved accuracy results in far fewer OTUs, consistently closer to the expected number of species in a community.
Allows analysis of large phylogenetic datasets. Fast Unifrac facilitates rapid identification of patterns in large and complex datasets. It permits to expand insight into a wide range of microbial processes. The tool is able to perform lineage-specific analyses by including only a subset of the tree allowing insight into the specific lineages responsible for associations with ecologically important variables.