Gap closing software tools | De novo genome sequencing data analysis
While next-generation sequencing technologies have made sequencing genomes faster and more affordable, deciphering the complete genome sequence of an organism remains a significant bioinformatics challenge, especially for large genomes. Low sequence coverage, repetitive elements and short read length make de novo genome assembly difficult, often resulting in sequence and/or fragment “gaps” – uncharacterized nucleotide (N) stretches of unknown or estimated lengths. Some of these gaps can be closed by re-processing latent information in the raw reads.
A single-cell assembler for capturing and sequencing “microbial dark matter” that forms small pools of randomly selected single cells (called a mini-metagenome) and further sequences all genomes from the mini-metagenome at once. SPAdes is intended for both standard isolates and single-cell MDA bacteria assemblies. It works with Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. You can also provide Additional contigs can also be provided to be used as long reads. SPAdes supports paired-end reads, mate-pairs and unpaired reads and can take as input several paired-end and mate-pair libraries simultaneously.
Constructs de novo draft assembly for the human-sized genomes. SOAPdenovo is specially designed to assemble Illumina GA short reads and is able to resolve longer repeat regions in contig assembly. SOAPdenovo is made up of six modules that handle read error correction, de Bruijn graph (DBG) construction, contig assembly, paired-end (PE) reads mapping, scaffold construction, and gap closure. It was used as a basis for the MEGAHIT software.
A stand-alone program for closing gaps within pre-assembled scaffolds. GapFiller is unique in offering the possibility to manually control the gap closure process. By using the distance information of paired-read data, GapFiller seeks to close the gap from each edge in an iterative manner.
An approach to raise the quality of draft assemblies towards finished, but without manual intervention, using local assemblies of reads from gap regions. IMAGE utilises the large number of sequences that an Illumina Genome Analyzer produces. Reads that correspond to gaps or questionable regions are identified and reassembled locally before being incorporated back into the final assembly. An advantage of a local assembly as opposed to a de novo one is that the number of reads used is only a fraction of total available reads. This reduces the complexity of regions to be assembled as well as the time and computing memory required.
Provides a toolkit for improving the quality of genome assemblies created via an assembly software. PAGIT compiled four tools: (i) ABACAS which classifies and orientates contigs and estimates the sizes of gaps between them; (ii) IMAGE uses paired-end reads to extend contigs and close gaps within the scaffolds; (iii) ICORN for identifying and correcting small errors in consensus sequences and; (iv) RATT for help annotation. The software was mainly created to analyze parasite genomes of up to about 300 Mb.
Aims to perform both ungapped and gapped alignment. SOAP is a program including special modules for alignment of pair-end, small RNA and messenger RNA (mRNA) tag sequences. It allows either a certain number of mismatches or one continuous gap for aligning a read onto the reference sequence.
Closes gaps with a preassembled contig set or a long read set (i.e., error-corrected PacBio reads). GMcloser uses likelihood-based classifiers calculated from the alignment statistics between scaffolds, contigs, and paired-end reads to correctly assign contigs or long reads to gap regions of scaffolds, thereby achieving accurate and efficient gap closure. An accompanied package, GMvalue, is a tool to determine misassembly sites in contigs and scaffolds.
A graphical application to evaluate and close gaps. GapBlaster was developed via Java programming language. GapBlaster uses contigs obtained in the assembly of the genome to perform an alignment against a draft of the genome/scaffold, using BLAST or Mummer to close gaps. Then, all identified alignments of contigs that extend through the gaps in the draft sequence are presented to the user for further evaluation via the GapBlaster graphical interface. GapBlaster presents significant results compared to other similar software and has the advantage of offering a graphical interface for manual curation of the gaps.
An automated finishing application that uses the succinct Bloom filter representation of a de Bruijn graph to close gaps in draft assemblies, including that of very large genomes. Sealer has broad utility for finishing genomes across the tree of life, from bacterial genomes to large plant genomes and beyond.
Allows users to close gaps in the genome assembly as well as validate the genomic scaffold. FinIS considers all gaps simultaneously to find gap sequences that best match the read data and to correctly resolve repeats. Based on results for several real and simulated datasets, we demonstrate that FinIS validates the correctness of a larger fraction of the assembly than existing ad hoc tools. Using a test for unique optimal solutions, we show that FinIS can improve on both precision and recall values for the correctness of assembled sequences, when compared to competing programs.
Can carry out assembly and analysis of microbial genome sequences. BugBuilder takes sequence reads as inputs and produces submission-ready annotated genome assemblies, with ease-of-use prioritised to allow non-expert users to obtain acceptable results. It can be customized to work with most command-line assembly and scaffolding tools. The tool supports all sequence types used commonly in microbial genome assembly.
Minimizes the gap open and gap extension penalties depending on the length of the homopolymer run. BFAST-Gap is based on the Smith-Waterman alignment algorithm. It is specially designed to perform on Ion Torrent reads where gaps occur more frequently in longer homopolymer runs. This tool provides several functionalities to estimate the weight: constant, logistic, exponential, and piecewise constant.
Finds and uses reads that originate from repeate-related gaps. GAPPadder is a sensitive approach for closing gaps on draft genomes with paired-end reads and mate-paired reads. It takes advantage of long insert size data and performs a two-stage local assembly approach to construct more complete gap sequences. It also performs local assembly from reads that originate from gap regions. GAPPadder tool can still be useful in the age of long reads genome assembly
Closes gaps in genome assembly using error-corrected long reads. LR_Gapcloser (1) utilizes the alignments to find the bridging that cross the gap, and (2) fills the long read original sequence into the genomic gaps. A compiled bwa is provided for fragmenting and aligning the long reads to scaffolds. The software was tested among various assemblies of short contigs with long reads.
Closes the gaps between scaffolded contigs with a higher accuracy compared with a similar tool. OMACC is advantageous because it takes into account gap size carefully via rescaling optical map and applying length constraint on selecting the path of contigs for gap closure. In addition, it applies an advanced graph search algorithm to efficiently infer the correct number of repeat copies in the gap between two contigs. We apply OMACC and FINISH on both simulated and real data sets. OMACC achieves a >90% accuracy, higher than the <73% by FINISH, and more than doubles the contig N50 lengths. OMACC also maintains a similar sensitivity as FINISH does. Thus, OMACC should benefit various downstream biological studies via accurately connecting contigs into a more complete genome with the assistance of optical map.
An efficient tool to find regions to fill gaps of draft genome sequences. FGAP demands low computational resources, the results can be easily analyzed by the output generated, and it can be used for small or large genome assemblies. FGAP can effectively reduce the effort to improve draft genome sequences in few steps, minimizing the number of unknown regions for human evaluation and reducing the need to obtain new data. In addition, FGAP has been successfully used to close gaps of draft sequences of several bacterial and fungal genome projects.
A computational tool integrated with a database that allows the curation and extension of contigs produced by de novo assemblers, and the production of a scaffold, even when there is little overlap between the sequences (which can be validated by searching for homologies in biological databases). In addition, G4ALL allows various users to work on the project simultaneously, reducing the time needed for curation. G4ALL has a graphic edition and validation interface for de novo -assembled contigs, which allows the orientation and sorting of these contigs in relation to a reference genome based on the results of the alignment generated by the appropriate software in table format, with 11 columns (query, reference, alignment length, mismatches, gaps, query start, query end, source start, source end, evaluate, bitscore and identity). This software has been successfully used in a number of projects involving organisms such as Archaea and viruses.