Assembly scaffolding software tools | De novo genome sequencing data analysis
The de novo assembly of short-read sequencing data usually leads to a fragmented set of genomic sequences (contigs). Ordering and orientating such contigs (scaffolding) represents the first, nontrivial step towards genome finishing and usually requires extensive processing and manual editing of large blocks of sequence. The preferred approach to genome scaffolding is currently based on assembling the sequenced reads into contigs and then using paired-end information to join them into scaffolds.
A stand-alone program for scaffolding pre-assembled contigs using NGS paired-read data. It is unique in offering the possibility to manually control the scaffolding process. By using the distance information of paired-end and/or matepair data, SSPACE is able to assess the order, distance and orientation of your contigs and combine them into scaffolds.
Constructs de novo draft assembly for the human-sized genomes. SOAPdenovo is specially designed to assemble Illumina GA short reads and is able to resolve longer repeat regions in contig assembly. SOAPdenovo is made up of six modules that handle read error correction, de Bruijn graph (DBG) construction, contig assembly, paired-end (PE) reads mapping, scaffold construction, and gap closure. It was used as a basis for the MEGAHIT software.
Provides a de novo transcriptome assembler for short RNA-seq reads. Oases congregates unmapped RNA-seq reads into full length transcripts. It enables reconstruction with different k-values via dynamic cutoffs. This software adds as features an array of hash lengths, a dynamic filtering of noise, a resolution of alternative splicing (AS) events and merging of multiple assemblies.
Provides a de novo transcriptome assembler specifically made for RNA-Seq. SOAPdenovo-Trans is derived from the SOAPdenovo2 genome assembler which is made for transcriptome assembly. The software aims to process RNA-Seq data and enables alternative splicing (AS). It uses a multiple k-mers method to either merge the resultant assemblies in to one final set or to iterate several k-mers de Bruijin graph (DBG) assemblies during contig construction.
Affords a way for analysis of Pacific Biosciences long-read sequencing data. PBSuite is composed of two projects: PBJelly and PBHoney. The first one is an automated pipeline for aligning long sequencing reads to draft assembles. The second provides identification approaches for analyze high mappability of long reads considering intra red discordance and soft-clipped tails.
A stand-alone program for scaffolding pre-assembled contigs using long reads (e.g. PacBio RS reads). Using the long read information, contigs (or scaffolds) are placed in the right order and orientation in so-called super-scaffolds. The SSPACE-LongRead hybrid assembly approach has been tested on a number of bacterial genomes and in most cases results in less than 10 super-scaffolds (numbers based on draft assemblies constructed with one Illumina MiSeq paired-end and one PacBio RS C2 SMRT library, both at 100X coverage).
A scalable, exact algorithm for the scaffold assembly of large, repeat-rich genomes, with consistent improvement over state-of-the-art programs for scaffold correctness and contiguity. OPERA provides a rigorous framework for scaffolding of repetitive sequences and a systematic approach for combining data from different second-generation (Illumina, Ion Torrent) and third-generation (PacBio, ONT) sequencing technologies. OPERA efficiently scaffolds large genomes with provable scaffold properties, providing an avenue for systematic augmentation and improvement of 1000s of existing draft eukaryotic genome assemblies.