Error correction software tools | Whole-genome sequencing data analysis
Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling,…
Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies.
An open source program and Python library for de novo sequencing, consensus and…
An open source program and Python library for de novo sequencing, consensus and variant calling on data from Oxford Nanopore Technologies’ MinION platform. Features include: de novo error…
Computes an improved consensus sequence for the assembly. LQS uses accurate…
Computes an improved consensus sequence for the assembly. LQS uses accurate short-read data and/or Pacific Biosciences circular consensus reads to correct error-prone long reads sufficiently for…
A package for error-correcting DNA barcodes. Hamming allows one run of a…
A package for error-correcting DNA barcodes. Hamming allows one run of a massively parallel pyrosequencer to process up to 1544 samples simultaneously. The tagged barcoding strategy can be used to…
A module in the Celera Assembler software package that performs error…
A module in the Celera Assembler software package that performs error correction on PacBio long reads by mapping shorter, high accuracy reads onto the long reads.
An approach that utilizes short, high-identity sequences to correct the error…
An approach that utilizes short, high-identity sequences to correct the error inherent in long, single-molecule sequences. PBcR, implemented as part of the Celera Assembler, trims and corrects…
A set of tools for fast aligning long reads for consensus and assembly. The…
A set of tools for fast aligning long reads for consensus and assembly. The Falcon toolkit is a set of simple code collection which is used for studying efficient assembly algorithm for haploid and…
Assembles large genomes from high coverage short read data. SGA is designed as…
Assembles large genomes from high coverage short read data. SGA is designed as a modular set of programs, which are used to form an assembly pipeline. SGA implements a set of assembly algorithms…
Improves the accuracy of our nanopore reads to around 97% after two rounds of…
Improves the accuracy of our nanopore reads to around 97% after two rounds of correction. Nanocorrect is a prototype nanopore correction pipeline. This pipeline is inspired by pbdagcon which used…
A memory-efficient error correction method that uses a Bloom filter as the main…
A memory-efficient error correction method that uses a Bloom filter as the main data structure. We have developed a new version of BLESS to improve runtime and accuracy while maintaining a small…
A computational method for quantifying genetic diversity in a mixed sample and…
A computational method for quantifying genetic diversity in a mixed sample and for identifying the individual clones in the population, while accounting for sequencing errors. This approach provides…
Demonstrates the value of properly accounting for errors in unique molecular…
Demonstrates the value of properly accounting for errors in unique molecular identifiers (UMIs). UMI-tools removes PCR duplicates and implements a number of different UMI deduplication schemes. It…
A method for correcting long and highly erroneous sequencing reads. LoRMA shows…
A method for correcting long and highly erroneous sequencing reads. LoRMA shows that efficient alignment free methods can be applied to highly erroneous long read data. The current approach needs…
Implements iterative error correction by using modules from String Graph…
Implements iterative error correction by using modules from String Graph Assembler (SGA). SGA-ICE is an iterative error correction pipeline that runs SGA in multiple rounds of k-mer-based correction…
A bioinformatics tool for error correction of HTS read data. SHREC can identify…
A bioinformatics tool for error correction of HTS read data. SHREC can identify erroneous reads with sensitivity and specificity of over 99% and 96% for simulated data with error rates of up to 3% as…
An error correction algorithm for correcting reads from DNA sequencing…
An error correction algorithm for correcting reads from DNA sequencing platforms such as the Illumina Genome Analyzer or HiSeq platforms or Roche/454 Genome Sequencer.
Allows accurate error correction in high-throughput sequencing data, such as…
Allows accurate error correction in high-throughput sequencing data, such as those generated by the Illumina Genome Analyzer. HiTEC algorithm uses a thorough statistical analysis of the suffix array…
Is specifically designed for noisy single-molecule sequences. Canu introduces…
Is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while…
A software tool developed in C++ for correcting sequencing errors in short…
A software tool developed in C++ for correcting sequencing errors in short reads from next-generation sequencing platforms. Reptile works with the spectrum of k-mers from the input reads, and…
An error corrector for Illumina reads. QuorUM is designed around the novel idea…
An error corrector for Illumina reads. QuorUM is designed around the novel idea of minimizing the number of distinct erroneous k-mers in the output reads and preserving the most true k-mers. It is…
A logistic regression based classifier distinguishing heterozygous sites from…
A logistic regression based classifier distinguishing heterozygous sites from systematic errors. Given a list of candidate heterozygous genomic locations and a SAM file of sequenced reads SysCall…
A massively parallelized and highly efficient error correction module for…
A massively parallelized and highly efficient error correction module for Illumina read data. Trowel both corrects erroneous base calls and boosts base qualities based on the k-mer spectrum. With…
A hybrid method to correct long third generation reads by mapping them on a…
A hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to Jabba is that this mapping is…
A profile homology search tool for PacBio reads. Frame-Pro is a tool using…
A profile homology search tool for PacBio reads. Frame-Pro is a tool using Hidden Markov Model (HMM) and directed acyclic graph to correct the errors in DNA sequencing reads. It can also provide…
A hashing algorithm tuned for processing DNA/RNA sequences. ntHash provides a…
A hashing algorithm tuned for processing DNA/RNA sequences. ntHash provides a fast way to compute multiple hash values for a given k-mer, without repeating the whole procedure for each value. To do…
Corrects noisy long reads, such as the ones produced by PacBio sequencing…
Corrects noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. CoLoRMap is based on two novel ideas:…
A fast and memory-efficient k-mer based error corrector. Unlike other error…
A fast and memory-efficient k-mer based error corrector. Unlike other error correctors using counting to obtain the solid kmers, Lighter has a novel sampling technique and uses only two bloom filters.
Allows long read error correction. HALC aligns the long reads to short read…
Allows long read error correction. HALC aligns the long reads to short read contigs from the same species with a relatively low identity requirement so that a long-read region can be aligned to at…
Processes and differentiates polymerase chain reaction (PCR) duplicates from…
Processes and differentiates polymerase chain reaction (PCR) duplicates from biological duplicates. UMI-Reducer uses Unique Molecular Identifiers (UMIs) and the mapping position of the read to…
Aligns deep coverage of short sequencing reads to correct errors in reference…
Aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. iCORN last version is based on SMALT (mapper), samtools, GATK, snp-o-matic…
An error correction technique based on multiple alignment. Karect supports…
An error correction technique based on multiple alignment. Karect supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the…
Corrects substitution errors in an Illumina archive using a k-mer trie. On real…
Corrects substitution errors in an Illumina archive using a k-mer trie. On real MiSeq and HiSeq Illumina archives, ACE yields higher gains in terms of coverage depth, outperforming state-of-the-art…
A free, fast and easy-to-use sequencing error corrector designed for Illumina…
A free, fast and easy-to-use sequencing error corrector designed for Illumina short reads. It uses a non-greedy algorithm but still maintains a speed comparable to implementations based on greedy…
Improves long reads accuracy by short read alignment. LSCplus overcomes the…
Improves long reads accuracy by short read alignment. LSCplus overcomes the disadvantage of LSC’s time consumption and improves quality. Only 1/3-1/4 of the time and 1/20-1/25 of the error…
Dynamically assesses errors within reads based on position-specific and local…
Dynamically assesses errors within reads based on position-specific and local quality scores. ADEPT is the first tool that we are aware of that dynamically processes data and relies on within-dataset…
An algorithm for correcting sequencing errors in high-throughput short-read…
An algorithm for correcting sequencing errors in high-throughput short-read data so that error-free reads can be available before DNA fragment assembly, which is of high importance to many…
A hybrid approach developed to take advantage of data generated using MinION…
A hybrid approach developed to take advantage of data generated using MinION device. We combine Illumina and Oxford Nanopore technologies to produce NaS reads of up to 60 kb that aligned with no…
Detects and corrects errors using preassembled Illumina short reads. MIRCA uses…
Detects and corrects errors using preassembled Illumina short reads. MIRCA uses an alignment-based approach, supports substitution, insertion and deletion errors, using pre-assembled short reads as a…
Employs novel alignment and error correction algorithms that are much more…
Employs novel alignment and error correction algorithms that are much more efficient than the state of art of aligners and error correction tools. MECAT can be used for effectively de novo assembling…
An effective method for correcting sequencing errors using a generalized suffix…
An effective method for correcting sequencing errors using a generalized suffix trie. PLURIBUS utilizes multiple manifestations of an error in the trie to accurately identify errors and suggest…
Assembles long error-prone reads using de Bruijn graphs. While the running time…
Assembles long error-prone reads using de Bruijn graphs. While the running time of overlap-layout-consensus (OLC) assemblers is dominated by the overlap detection step, the running time of the…
An accurate parameter-free read error-correction method that can be run on…
An accurate parameter-free read error-correction method that can be run on inexpensive hardware and can make use of multicore parallelization whenever available.
A hybrid error correction method that builds a succinct de Bruijn graph…
A hybrid error correction method that builds a succinct de Bruijn graph representing the short reads, and seeks a corrective sequence for each erroneous region in the long reads by traversing chosen…
An efficient, scalable, and robust error correction algorithm for correcting…
An efficient, scalable, and robust error correction algorithm for correcting short reads. The steps of EC can be broken into three independent tasks. At first it builds k-mers and hashes the k-mers…
A quality assessment package for next-genomics sequencing data. BIGpre contains…
A quality assessment package for next-genomics sequencing data. BIGpre contains all the functions of other quality assessment software, such as the correlation between forward and reverse reads, read…
A software application that provides evidence for the validity of base calls…
A software application that provides evidence for the validity of base calls believed to be sequencing errors and it is applicable to Ion Torrent and 454 data.
A method for the combined analysis of data from second generation sequencers…
A method for the combined analysis of data from second generation sequencers and third generation sequencers, with the former delivering a massive (108) number of accurate short reads and the latter…
Identifies and corrects read errors, targeting at repetitive genomes. Redeem is…
Identifies and corrects read errors, targeting at repetitive genomes. Redeem is different from existing methods for identifying sequencing errors. It models genome repetition and can be fed with…
A method for the correction of sequencing errors in data from the Illumina…
A method for the correction of sequencing errors in data from the Illumina Solexa sequencing platforms. SleepEC does not require a reference genome and is of relevance for microRNA studies,…
Aims to reduce next generation sequencing (NGS) error rate. RECOUNT is an…
Aims to reduce next generation sequencing (NGS) error rate. RECOUNT is an implementation of an Expectation Maximization algorithm for tag count correction. Using both the reference genome and…
A hybrid correction pipeline for SMRT reads, which can be flexibly adapted on…
A hybrid correction pipeline for SMRT reads, which can be flexibly adapted on existing hardware and infrastructure from a laptop to a high-performance computing cluster.