Assembly scaffolding software tools | De novo genome sequencing data analysis
The de novo assembly of short-read sequencing data usually leads to a fragmented set of genomic sequences (contigs). Ordering and orientating such contigs (scaffolding) represents the first, nontrivial step towards genome finishing and usually requires extensive processing and manual editing of large blocks of sequence. The preferred approach to genome scaffolding is currently based on assembling the sequenced reads into contigs and then using paired-end information to join them into scaffolds.
A stand-alone program for scaffolding pre-assembled contigs using NGS paired-read data. It is unique in offering the possibility to manually control the scaffolding process. By using the distance information of paired-end and/or matepair data, SSPACE is able to assess the order, distance and orientation of your contigs and combine them into scaffolds.
Constructs de novo draft assembly for the human-sized genomes. SOAPdenovo is specially designed to assemble Illumina GA short reads and is able to resolve longer repeat regions in contig assembly. SOAPdenovo is made up of six modules that handle read error correction, de Bruijn graph (DBG) construction, contig assembly, paired-end (PE) reads mapping, scaffold construction, and gap closure. It was used as a basis for the MEGAHIT software.
Provides a de novo transcriptome assembler for short RNA-seq reads. Oases congregates unmapped RNA-seq reads into full length transcripts. It enables reconstruction with different k-values via dynamic cutoffs. This software adds as features an array of hash lengths, a dynamic filtering of noise, a resolution of alternative splicing (AS) events and merging of multiple assemblies.
Provides a de novo transcriptome assembler specifically made for RNA-Seq. SOAPdenovo-Trans is derived from the SOAPdenovo2 genome assembler which is made for transcriptome assembly. The software aims to process RNA-Seq data and enables alternative splicing (AS). It uses a multiple k-mers method to either merge the resultant assemblies in to one final set or to iterate several k-mers de Bruijin graph (DBG) assemblies during contig construction.
Allows de-novo assembly of transcriptome using a reference proteome. STM exploits the fact that, by translating contigs into amino acid sequences, it is possible to search for orthologous regions in a reference proteome, even when it belongs to a distantly related organism. The method can join multiple transcript fragments that are part of a single gene, providing new and valuable information on the order and the orientation of these fragments along original transcript. Multiple- k, a method that performs multiple assemblies with various k-mer lengths and retains the best part of each one to form the final assembly is also available.
Affords a way for analysis of Pacific Biosciences long-read sequencing data. PBSuite is composed of two projects: PBJelly and PBHoney. The first one is an automated pipeline for aligning long sequencing reads to draft assembles. The second provides identification approaches for analyze high mappability of long reads considering intra red discordance and soft-clipped tails.