1 - 50 of 60 results

STARS / Signal Tool for the Analysis of Relationships between Sequences

Gathers DNA sequences via clustering. STARS uses K-means algorithm and GSP methods to perform cluster analyses of DNA sequences. This software assesses the differentiation capability of DNA sequences with molecular markers employed in phylogenetic analyses. It selects via three criteria: (1) the marker must code for proteins, (2) the marker should have been used in a wide range of the tree of life and (3) the marker should possess a homogeneous length and minimum number of reported copies.


Uses MinHash locality-sensitive hashing to reduce large sequences to a representative sketch and rapidly estimate pairwise distances between genomes or metagenomes. Using Mash, we explored several use cases, including a 5,000-fold size reduction and clustering of all ~55,000 NCBI RefSeq genomes in 46 CPU hours. The resulting 93 MB sketch database includes all RefSeq genomes, effectively delineates known species boundaries, reconstructs approximate phylogenies, and can be searched in seconds using assembled genomes or raw sequencing runs from Illumina, Pacific Biosciences, and Oxford Nanopore. For metagenomics, Mash scales to thousands of samples and can replicate Human Microbiome Project and Global Ocean Survey results in a fraction of the time.

IMNGS / Integrated Microbial NGS

star_border star_border star_border star_border star_border
star star star star star
Offers a complete workflow for de novo analysis of users’ own raw 16S rRNA gene amplicon datasets for the sake of comparison with existing data. IMNGS is an innovative platform that uniformly and systematically screens for and processes all prokaryotic 16S rRNA gene amplicon datasets available in sequence read archive (SRA) and uses them to build sample-specific sequence databases and OTU-based profiles. Via a web interface, this integrative sequence resource can easily be queried by users.


Analyzes the time series data of microbial community profiles. MetaMIS is based on a Lotka-Volterra model and can interpret interaction networks. It works well with high level of missing data and the influence of rare microbes is ignored to estimate interaction information. The tool can be used in comparative studies thank to its capability to organize multiple interaction networks into a consensus network. It allows researchers to analyse interactive relations conveniently and to visualize network topology.


Serves for maintaining, persisting and searching complete matrices, built on top of BugMat. findNeighbour comprises two components: (1) an OpenMP parallelized C++ application derived from BugMat maintains an in-memory distance matrix derived from mapped genomic data; and (2) a database allows storage and querying of arbitrary meta-data about the sequence. This database permits storage of quality information about the sequence, such as the number of bases called in the sequence.


Makes simulation of horizontal gene transfers (HGTs) between the genomes of microbial communities. HgtSIM can integrates different degrees of similarity for transferred genes found in donor and recipient genomes. It is able to assess the recovery rate of HGTs from a simulated metagenomic shotgun-sequencing dataset after various sequence assembly processes. This tool can assist in development of robust pipelines that have maximal success in recovering HGT from complex metagenomic data.


Permits to identify the “core microbiome” associated with a given habitat. COREMIC uses presence/absence data to perform a complementary analysis different from that of existing methods. It allows the development of a working hypothesis in the search for microbes well suited for a habitat or host-microbe interaction. The tool can also be used to confirm laboratory studies that have identified target microbes that might be important symbionts or thought to be associated with a specific habitat.


Reconstructs haplotypes from complex microbiomes. Hansel/Gretel is composed of a data structure for the storage and manipulation of evidence (Hansel), and an algorithm for the recovery of haplotypes from a metahaplome (Gretel). It is able to recover and rank haplotypes using evidence of pairs of single nucleotide polymorphisms (SNPs) observed on sequenced reads. It can extract haplotypes from metagenomic data of microbial communities and can be applied to analogous haplotyping problems.

REBACCA / Regularized Estimation of the BAsis Covariance based on Compositional dAta

Identifies significant co-occurrence patterns by finding sparse solutions to a system with a deficient rank. To be specific, we construct the system using log ratios of count or proportion data and solve the system using the l1-norm shrinkage method. Our comprehensive simulation studies show that REBACCA (i) achieves higher accuracy in general than the existing methods when a sparse condition is satisfied; (ii) controls the false positives at a pre-specified level, while other methods fail in various cases and (iii) runs considerably faster than the existing comparable method.


Provides a set of geographic utilities for sequencing-based microbial ecology studies. Although the geographic location of samples is an important aspect of environmental microbiology, none of the major software packages used in processing microbiome data include utilities that allow users to map and explore the spatial dimension of their data. phylogeo solves this problem by providing a set of plotting and mapping functions that can be used to visualize the geographic distribution of samples, to look at the relatedness of microbiomes using ecological distance, and to map the geographic distribution of particular sequences.

CoNet / Co-occurrence Network

Detects significant non-random patterns of co-occurrence (copresence and mutual exclusion) in incidence and abundance data. CoNet serves to open new opportunities for future targeted mechanistic studies of the microbial ecology of the human microbiome. It has been designed with (microbial) ecological data in mind, but can be applied in general to infer relationships between objects observed in different samples (for example between genes present or absent across organisms).

MIDAS / Metagenomic Intra-species Diversity Analysis System

Allows users to measure bacterial strain-level gene content, single nucleotide polymorphism (SNPs) and species abundance from shotgun metagenomes. MIDAS is able to categorize genetic variants into strains to analyze large-scale population-genetic of metagenomes. The application provides a computational pipeline that combines a taxonomic profiling and an alignment of both pan-genome and whole-genome to permits users to compare over 30,000 reference genomes.


Characterizes microbial samples from nucleotide or protein sequences. Traitar provides phenotype classifiers to predict 67 traits related to the use of various substrates as carbon and energy sources, oxygen requirement, morphology, antibiotic susceptibility, proteolysis, and enzymatic activities. The software suggests protein families associated with the presence of particular phenotypes. It may help researchers in microbiology to pinpoint the traits of interest, reducing the amount of wet lab work required.


Shares, validates, and documents mock community data resources. mockrobiota includes data set and sample metadata, expected composition data, and links to raw data for each mock community data set. It does not supply physical sample materials directly, but the data set metadata included for each mock community indicate whether physical sample materials are available. The tool currently requires expected observation data in the form of sequence annotations, e.g., taxonomy or gene annotations, but also references sequences in the form of accession numbers.

COMETS / Computation Of Microbial Ecosystems in Time and Space

Consists of a modeling framework that integrates dynamic flux balance analysis with diffusion on a lattice. COMETS implements a dynamic flux balance analysis (FBA) algorithm on a lattice, making it possible to track the spatio-temporal dynamics of multiple microbial species in complex environments with complete genome scale resolution. This dynamic flux balance analysis (dFBA) allows users to perform time-dependent metabolic simulations of microbial ecosystems, bridging the gap between stoichiometric and environmental modeling.

PPANINI / Prioritization and Prediction of functional Annotation for Novel and Important genes via automated data Network Integration

Ranks genes by employing a combination of community parameters such as prevalence and abundance across samples. PPANINI is a computational pipeline that can be used with gene clustering or without gene clustering. Gene clustering provides a way to group similar unannotated sequences (based on 97% homology in translated sequences), and calculates the importance of the gene clusters rather than the individual genes. This method increases the metagenomic prevalence of unannotated genes and thus their associated probability to be ranked higher in the PPANINI "importance" scale.


Uses an elastic net regularization technique to infer which gene families are predictive and then combines these gene families to estimate the composite metabolomes. MelonnPan is composed of two high-level workflows: MelonnPan-Fit (top) and MelonnPan-Prediction (bottom). MelonnPan-Fit workflow creates a weight matrix that links an optimal set of gene families to a subset of predictable metabolites following rigorous internal and external validation. The MelonnPan-Prediction workflow takes a gene family table and outputs a metabolomic table.


Creates sample identifiers that is unique across projects, project teams, and institutions with some properties: short; correctable with respect to common types of transcription errors; opaque and compatible with existing standards without reliance on centralized infrastructure. cual-id allows users to assign universally unique identifiers (UUIDs), that are globally unique to their samples. It generates human-friendly 4- to 12-character identifiers that map to their UUIDs and are unique within a project.