Gap closing software tools | De novo genome sequencing data analysis
While next-generation sequencing technologies have made sequencing genomes faster and more affordable, deciphering the complete genome sequence of an organism remains a significant bioinformatics challenge, especially for large genomes. Low sequence coverage, repetitive elements and short read length make de novo genome assembly difficult, often resulting in sequence and/or fragment “gaps” – uncharacterized nucleotide (N) stretches of unknown or estimated lengths. Some of these gaps can be closed by re-processing latent information in the raw reads.
A single-cell assembler for capturing and sequencing “microbial dark matter” that forms small pools of randomly selected single cells (called a mini-metagenome) and further sequences all genomes from the mini-metagenome at once. SPAdes is intended for both standard isolates and single-cell MDA bacteria assemblies. It works with Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. You can also provide Additional contigs can also be provided to be used as long reads. SPAdes supports paired-end reads, mate-pairs and unpaired reads and can take as input several paired-end and mate-pair libraries simultaneously.
Constructs de novo draft assembly for the human-sized genomes. SOAPdenovo is specially designed to assemble Illumina GA short reads and is able to resolve longer repeat regions in contig assembly. SOAPdenovo is made up of six modules that handle read error correction, de Bruijn graph (DBG) construction, contig assembly, paired-end (PE) reads mapping, scaffold construction, and gap closure. It was used as a basis for the MEGAHIT software.
A stand-alone program for closing gaps within pre-assembled scaffolds. GapFiller is unique in offering the possibility to manually control the gap closure process. By using the distance information of paired-read data, GapFiller seeks to close the gap from each edge in an iterative manner.
Aims to perform both ungapped and gapped alignment. SOAP is a program including special modules for alignment of pair-end, small RNA and messenger RNA (mRNA) tag sequences. It allows either a certain number of mismatches or one continuous gap for aligning a read onto the reference sequence.
Provides a toolkit for improving the quality of genome assemblies created via an assembly software. PAGIT compiled four tools: (i) ABACAS which classifies and orientates contigs and estimates the sizes of gaps between them; (ii) IMAGE uses paired-end reads to extend contigs and close gaps within the scaffolds; (iii) ICORN for identifying and correcting small errors in consensus sequences and; (iv) RATT for help annotation. The software was mainly created to analyze parasite genomes of up to about 300 Mb.
An automated finishing application that uses the succinct Bloom filter representation of a de Bruijn graph to close gaps in draft assemblies, including that of very large genomes. Sealer has broad utility for finishing genomes across the tree of life, from bacterial genomes to large plant genomes and beyond.
Closes gaps with a preassembled contig set or a long read set (i.e., error-corrected PacBio reads). GMcloser uses likelihood-based classifiers calculated from the alignment statistics between scaffolds, contigs, and paired-end reads to correctly assign contigs or long reads to gap regions of scaffolds, thereby achieving accurate and efficient gap closure. An accompanied package, GMvalue, is a tool to determine misassembly sites in contigs and scaffolds.