1 - 50 of 156 results


star_border star_border star_border star_border star_border
star star star star star
forum (1)
Assembles reads obtained with new sequencing technologies (Illumina, 454, SOLiD) using MPI 2.2. Ray allows to reduce the number of contigs and the number of errors. It can serve as a basis to develop an assembler that can be of universal utilization. The tool can calculate assemblies in parallel using message passing interface. Ray performs very well on mixed datasets and helps to assemble genomes using high-throughput sequencing.


star_border star_border star_border star_border star_border
star star star star star
Provides a de novo assembler for short DNA sequence reads. SSAKE is designed to help leverage the information from short sequences reads by assembling them into contigs and scaffolds that can be used to characterize novel sequencing targets. SSAKE assembles whole reads (not k-mers) and as such, is well-suited for structural variant assembly/detection. SSAKE is written in PERL and runs on Linux. SSAKE cycles through short sequence reads stored in a hash table and progressively searches through a prefix tree for extension candidates. The algorithm assembled 25 to 300 bp (genome, transcriptome, amplicon) reads from viral, bacterial and fungal genomes. SSAKE is lightweight, simple to setup & run and robust.


star_border star_border star_border star_border star_border
star star star star star
Allows integrative investigation of next generation sequencing (NGS) microbiology data. Orione supports the whole life cycle of microbiology research data from production and annotation to publication and sharing. It can be used for a variety of microbiological projects including bacteria resequencing, de novo assembling and microbiome investigations. This tool is implemented on the Galaxy web platform.


Allows de novo genome assembly and multisample variant calling. Cortex is a modular set of multi-threaded programs for manipulating assembly graphs. Linked de Bruijn Graph (LdBG) data structure and associated algorithms are implemented as part of the software. It was used for two tasks where long-range information is likely to be beneficial: finding large differences from a reference and analysis of genomic context for drug resistance genes, which was validated using a PacBio reference assembled for the sample.

MHAP / MinHash Alignment Process

A reference implementation of a probabilistic sequence overlapping algorithm. MHAP is designed to efficiently detect all overlaps between noisy long-read sequence data. It efficiently estimates Jaccard similarity by compressing sequences to their representative fingerprints composed on min-mers (minimum k-mer). MHAP is included within the Canu assembler which is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION).

CLC Genomics Workbench

star_border star_border star_border star_border star_border
star star star star star
forum (1)
Allows to analyze, compare, and visualize next generation sequencing (NGS) data. CLC Genomics Workbench offers a complete and customizable solution for genomics, transcriptomics, epigenomics, and metagenomics. The software enables to generate custom workflows, which can combine quality control steps, adapter trimming, read mapping, variant detection, and multiple filtering and annotation steps into a pipeline.

PBcR / PacBio Corrected Reads

An approach that utilizes short, high-identity sequences to correct the error inherent in long, single-molecule sequences. PBcR, implemented as part of the Celera Assembler, trims and corrects individual long-read sequences by first mapping short-read sequences to them and computing a highly accurate hybrid consensus sequence: improving read accuracy from as low as 80% to over 99.9%. The corrected, “hybrid” PBcR reads may then be de novo assembled alone, in combination with other data, or exported for other applications.

SGA / String Graph Assembler

Assembles large genomes from high coverage short read data. SGA is designed as a modular set of programs, which are used to form an assembly pipeline. SGA implements a set of assembly algorithms based on the FM-index. As the FM-index is a compressed data structure, the algorithms are very memory efficient. The SGA assembly has three distinct phases. The first phase corrects base calling errors in the reads. The second phase assembles contigs from the corrected reads. The third phase uses paired end and/or mate pair data to build scaffolds from the contigs. The output of this software is a PDF report that allows the properties of the genome and data quality to be visually explored. By providing more information to the user at the start of an assembly project, this software will help increase awareness of the factors that make a given assembly easy or difficult, assist in the selection of software and parameters and help to troubleshoot an assembly if it runs into problems.


Assembles the very large data sets produced by sequencing individual human genomes. ABySS is a parallelized sequence assembler. The primary innovation in ABySS is a distributed representation of a de Bruijn graph (DBG), which allows parallel computation of the assembly algorithm across a network of commodity computers. The ABySS algorithm proceeds in two stages. First, all possible substrings of length k (termed k-mers) are generated from the sequence reads. The k-mer data set is then processed to remove read errors and initial contigs are built. In the second stage, mate-pair information is used to extend contigs by resolving ambiguities in contig overlaps.

MIRA / Mimicking Intelligent Read Assembly

Uses a Swiss army knife of sequence assembly developed and used in the past 16 years to get assembly jobs done efficiently - and especially accurately. MIRA is a whole genome shotgun (WGS) and EST sequence assembler for Sanger, 454, Solexa (Illumina), IonTorrent data and PacBio. It supports ancillary data in TRACEINFO format (from NCBI), marks places of interest with tags so that these can be found quickly in finishing programs and has a single nucleotide polymorphism (SNP) analysis pipeline for sequencing data of viruses and prokaryotes.

DISCOVAR de novo

A large (and small) de novo genome assembler. DISCOVAR de novo quickly generates highly accurate and complete assemblies using the same single library data. It requires reads from only a single PCR-free library, and has tested well on relatively homozygous human and mouse genomes. DISCOVAR de novo has the potential to create, from relatively inexpensive sequencing libraries and read coverage, assemblies that are sufficiently complete and contiguous to serve a wide range of downstream comparative, population, and functional genomic analyses.


A variant caller and small genome assembler. The heart of DISCOVAR is a de novo genome assembler, one that is accurate enough to produce assemblies that can be used for variant calling given a reference sequence. DISCOVAR can also generate de novo assemblies for small genomes, but consider using DISCOVAR de novo instead which can assemble genomes up to mammalian size. DISCOVAR provides a more complete inventory of an individual’s genetic variants than had been previously possible. As such, it adds to the tools that can be used to probe the genetic basis of disease. It may be particularly useful in cases where targeted or exome sequencing fails to find causal mutations.


Allows to create true diploid de novo assemblies. Supernova can separate homologous chromosomes over long distances, in this sense capturing the true biology of a diploid genome. The Supernova approach is based on seven human samples. These assemblies used identical code, with same parameters as a ‘pushbutton’ process that ran in two days on a single server. The diploid human assemblies from this tool use sequence from the same sample. This approach yields much longer phase blocks than the previous diploid human assemblies. In this case, the diploid human assemblies are the first to be validated using finished sequence from the same sample and the first whose phasing accuracy has been validated using parental sequences


Provides a short-read assembly method that can build a de novo draft assembly for the human-sized genomes. SOAPdenovo is specially designed to assemble Illumina GA short reads. It creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way. SOAPdenovo is made up of six modules that handle read error correction, de Bruijn graph (DBG) construction, contig assembly, paired-end (PE) reads mapping, scaffold construction, and gap closure. The major last improvements of SOAPdenovo are: i) enhancing the error correction algorithm, ii) providing a reduction in memory consumption in DBG constructions, iii) resolving longer repeat regions in contig assembly, iv) increasing assembly length and coverage in scaffolding and v) improving gap closure.


Produces assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled. A5 is an assembly pipeline that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage.


Provides a whole‐genome shotgun assembler that can generate high‐quality genome assemblies using short reads (~100bp) such as those produced by the new generation of sequencers. The ALLPATHS-LG assemblies are not necessarily linear, but instead are presented in the form of a graph. This graph representation retains ambiguities, such as those arising from polymorphism, uncorrected read errors, and unresolved repeats, thereby providing information that has been absent from previous genome assemblies. ALLPATHS‐LG requires high sequence coverage of the genome in order to compensate for the shortness of the reads. The precise coverage required depends on the length and quality of the paired reads, but typically is of the order 100x or above.


A pipeline that specifically deals with the assembly of heterozygous genomes by introducing a step to recognise and selectively remove alternative heterozygous contigs. Redundans consists of three main steps: (i) detection and selectively removal of redundant contigs from an initial standard assembly, (ii) scaffolding of such non-redundant assembly using paired-end, mate-pair and/or fosmid-based reads and (iii) gap closing. The resulting assembly represents a chimeric reference genome in which each heterozygous region results from a random sorting of the haplotypes. We tested our pipeline on simulated and naturally-occurring heterozygous genomes and compared its accuracy to other existing tools.


Manipulates de Bruijn graphs (DBG) for genomic sequence assembly. Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. It takes in short read sequences, removes errors, then produces high quality unique contigs uses paired-end read and long read information, when available, to retrieve the repeated areas between contigs. Velvet represents an approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.


A mapper, minimap, and a de novo assembler, miniasm, for efficiently mapping and assembling single molecule real-time (SMRT) and Oxford Nanopore technologies (ONT) reads without an error correction stage. Miniasm implements the ‘O’ and ‘L’ steps in the overlap-layout consensus (OLC) assembly paradigm. It confirms long noisy reads can be assembled without an error correction stage, and without this stage, the assembly process can be greatly accelerated and simplified, while achieving comparable contiguity and large-scale accuracy to existing pipelines, at least for genomes without excessive repetitive sequences. They can often assemble a sequencing run of bacterial data into a single contig in a few minutes, and assemble 45-fold C. elegans data in 9 minutes, orders of magnitude faster than the existing pipelines, though the consensus sequence error rate is as high as raw reads.


forum (1)
Allows variable read lengths while tolerating a significant level of sequencing error. MaSuRCa combines the efficiency of the de Bruijn graph and Overlap-Layout-Consensus (OLC) approaches. It transforms large numbers of paired-end reads into a much smaller number of longer ‘superreads’. The tool can significantly improve its assemblies when the original data are augmented with long reads. It has been used to assemble de novo a variety of genomes, sometimes improving on published genomes using added data, sometimes creating the first publicly available draft genome for the species.


Allows to handle large datasets and produce highly contiguous and accurate assemblies within reasonable time. PE-Assembler is based on simple 3 extension approach and does not involve representing the entire genome in the form of a graph. Fundamentally, it is similar to other 3 extension approaches such as SSAKE, VCAKE and SHARCGS. However, it improves upon such early approaches in multiple ways. The extensive use of paired-end reads ensures that the dataset is localized within the region. Hence, this method can be run in parallel to greatly speedup the execution while staying within reasonable system requirements. Ambiguities are resolved using a multiple path extension approach, which takes into account sequence coverage, support from multiple paired libraries and more subtle information such as the span distribution of the paired-end reads.

SIMBA / SImple Manager for Bacterial Assemblies

Manages assembly projects of bacterial genomes with Next Generation Sequencing (NGS) platforms. SIMBA is composed of three parts: (i) projects is a module which permits to administer sequencing project, assisted by tools which authorizes conversion and quality check; (ii) assemblies, provides several software tools to conduct de novo assemblies; and (iii) curation, which allows users to perform functionalities such as scaffolding by reference or optical mapping, genome visualization, or gap closing through supercontigs construction.


Aims to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. Assemblathon is a set of periodic collaborative efforts that all help improve methods of genome assembly. It offers a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This project is an international effort that aims to produce a genomic zoo – sequences that represent the genomes of 10,000 vertebrate species.


Takes advantage of hybrid computing architectures consisting of both shared-memory multi-core CPUs and distributed-memory compute clusters to gain efficiency and scalability. PASHA is a parallelized short read assembler using de Bruijn graphs (DBG) that is able to produce more contiguous high-quality assemblies in shorter time compared to three leading assemblers: Velvet, ABySS and SOAPdenovo. It employs a sorted vector data structure, instead of a hash-map, to store k-mers and their graph-related information.


Assemblies chloroplast and mitochondrial genomes. Organelle_PBA selects the chloroplast reads from a whole genome sequencing (WGS) pool, maps the reads to a reference sequence from a closely related species, and then performs read correction and de novo assembly using Sprai. It removes the assembly redundancy caused by a circular assembly and resolves the chloroplast genome inverted repeats. The tool was performed to chloroplast genome assemblies for two species Picea glauca and Sinningia speciose.

MITObim / MITOchondrial Baiting and Iterative Mapping

An in silico approach for the reconstruction of complete mitochondrial genomes of non-model organisms directly from next-generation sequencing (NGS) data-mitochondrial baiting and iterative mapping. MITObim is capable of reconstructing mitochondrial genomes without the need of a reference genome of the targeted species by relying solely on (a) mitochondrial genome information of more distantly related taxa or (b) short mitochondrial barcoding sequences (seeds), such as the commonly used cytochrome-oxidase subunit 1 (COI), as a starting reference.MITObim appeared superior to existing tools in terms of accuracy, runtime and memory requirements and fully automatically recovered mitochondrial genomes exceeding 99.5% accuracy from total genomic DNA derived NGS data sets in <24h using a standard desktop computer.

SATRAP / SOLiD Assembler TRAnslation Program

A computer program designed to efficiently translate de novo assembled color-space sequences into a base-space format. The program was tested and validated using simulated and real transcriptomic data; its modularity allows an easy integration into more complex pipelines, such as Oases for RNA-seq de novo assembly. SATRAP is available either as a multi-step pipeline incorporating several tools for RNA-seq assembly or as an individual module for use with the Oases package.


Is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes. The program can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either PacBio or Oxford Nanopore technologies, and achieves a contig NG50 of greater than 21 Mbp on both human and Drosophila melanogaster PacBio datasets.