Unlock your biological data


Try: RNA sequencing CRISPR Genomic databases DESeq

Error correction software tools | High-throughput sequencing data analysis

Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies.

Source text:
(Laehnemann et al., 2016) Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief Bioinform.

1 - 50 of 111 results
filter_list Filters
build Technology
healing Disease
settings_input_component Operating System
tv Interface
computer Computer Skill
copyright License
1 - 50 of 111 results
Standalone hamming
A package for error-correcting DNA barcodes. Hamming allows one run of a massively parallel pyrosequencer to process up to 1544 samples simultaneously. The tagged barcoding strategy can be used to obtain sequences from hundreds of samples in a single sequencing run, and to perform phylogenetic analyses of microbial communities from pyrosequencing data. The combination of error-correcting barcodes and massively parallel sequencing rapidly revolutionizes our understanding of microbial habitats located throughout our biosphere, as well as those associated with our human bodies.
MECAT / Mapping Error Correction and de novo Assembly Tool
Aligns sequences with or without local alignment. MECAT is a program based on a pseudolinear alignment scoring algorithm that exploits distance difference factors (DDFs). The software is composed of four modules: (i) a single molecule real time (SRMT) reads pairwise mapper, (ii) an SMRT reads reference mapper; (iii) a noise corrector and (iv) a pipeline for hierarchical assembly which is an extension version of the CANU pipeline.
SGA-ICE / SGA-Iteratively Correcting Errors
star_border star_border star_border star_border star_border
star star star star star
Implements iterative error correction by using modules from String Graph Assembler (SGA). SGA-ICE is an iterative error correction pipeline that runs SGA in multiple rounds of k-mer-based correction with an increasing k-mer size, followed by a final round of overlap-based correction. By combining the advantages of small and large k-mers, this approach corrects more errors in repeats and minimizes the total amount of erroneous reads.
A user-friendly way to inspect NGS datasets obtained from the sequencing of genetic markers in microbial communities. The error calculation functionality enables the evaluation of the overall sequencing quality and can further be used to assess the outcome of NGS data processing pipelines. The interactive plots in NGS-eval quickly illustrate the read coordinates where the errors occur. High frequency of errors at specific positions can be useful for detecting novel (common) sequence variants and identifying the differences between the strains that are present in the sample and that are used as reference sequences.
BLESS / BLoom-filter-based Error correction Solution for high-throughput Sequencing reads
A memory-efficient error correction method that uses a Bloom filter as the main data structure. We have developed a new version of BLESS to improve runtime and accuracy while maintaining a small memory usage. The new version, called BLESS 2, has an error correction algorithm that is more accurate than BLESS, and the algorithm has been parallelized using hybrid MPI and OpenMP programming. BLESS 2 was compared with five top-performing tools, and it was found to be the fastest when it was executed on two computing nodes using MPI, with each node containing twelve cores. Also, BLESS 2 showed at least 11 percent higher gain while retaining the memory efficiency of the previous version for large genomes.
UMI-tools / Unique Molecular Identifiers-tools
Demonstrates the value of properly accounting for errors in unique molecular identifiers (UMIs). UMI-tools removes PCR duplicates and implements a number of different UMI deduplication schemes. It can extract, remove and append UMI sequences from fastq reads. Compared with previous method, this one is superior at estimating the true number of unique molecules. The simulations provide an insight into the impact on quantification accuracy and indicate that application of an error-aware method is even more important with higher sequencing depth.
PAGIT / Post-Assembly Genome-Improvement Toolkit
Provides a toolkit for improving the quality of genome assemblies created via an assembly software. PAGIT compiled four tools: (i) ABACAS which classifies and orientates contigs and estimates the sizes of gaps between them; (ii) IMAGE uses paired-end reads to extend contigs and close gaps within the scaffolds; (iii) ICORN for identifying and correcting small errors in consensus sequences and; (iv) RATT for help annotation. The software was mainly created to analyze parasite genomes of up to about 300 Mb.
debarcer / De-Barcoding and Error Correction
Facilitates the use of barcoded data generated by SiMSenSeq. Debarcer is a package for working with next-generation sequencing (NGS) data that contains molecular barcodes. It processes raw .fastq files containing SiMSen-seq barcoded adaptor regions using a combination of standard bioinformatic tools such as bwa, perl and R, as well as Bio-SamTools, to extract information from alignment files. Debarcer collects the read data for each amplicon and barcode (a ‘sequence family’), and then, based on the alignment extracted from the .bam file, each base is indexed by genomic position.
iCORN / Iterative Correction of Reference Nucleotides
Aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. iCORN last version is based on SMALT (mapper), samtools, GATK, snp-o-matic and PERL scripts. It was shown that, after very few iterations, iCORN is efficient at correcting homopolymer errors that are often present in 454 data, thus potentially improving the ability to combine assemblies constructed using different sequencing technologies.
Enables error correction without any uniformity assumptions. Hammer is based on a combination of the Hamming graph and a simple probabilistic model. The software was evaluated on both non-uniform single-cell data and normal multi-cell data. Its running time and memory requirements are asymptotically (and in practice) dominated by the initial sorting of distinct k-mers. Hammer was tested using reads generated from a single-cell of E.coli K-12 strain, using one lane of the Illumina GAII pipeline.
Corrects sequencing errors. Blue is a k-spectrum algorithm that uses read context to choose between alternative replacement k-mers, with the overall goal of minimizing the number of changes needed to correct an entire read. The software can correct all three types of possible errors: substitutions, insertions and deletions. It has been used to improve the assemblies for published microbial genomes derived from pure cultures and on metagenomic datasets to improve draft genome assemblies of the dominant organisms in these communities.
A massively parallelized and highly efficient error correction module for Illumina read data. Trowel both corrects erroneous base calls and boosts base qualities based on the k-mer spectrum. With high-quality k-mers and relevant base information, Trowel achieves high accuracy for different short read sequencing applications. The latency in the data path has been significantly reduced because of efficient data access and data structures. In performance evaluations, Trowel was highly competitive with other tools regardless of coverage, genome size read length and fragment size.
Karect / KAUST Assembly Read Error Correction Tool
An error correction technique based on multiple alignment. Karect supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome. Experiments with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate that Karect is more accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain), and post de novo assembly quality (up to 10% increase in NGA50).
A method for correcting long and highly erroneous sequencing reads. LoRMA shows that efficient alignment free methods can be applied to highly erroneous long read data. The current approach needs alignments to take into account the global context of errors. Reads corrected by the new method have an error rate less than half of the error rate of reads corrected by previous self-correction methods. Furthermore, the throughput of the new method is 20% higher than previous self-correction methods with read sets having coverage at least 75×.
A hashing algorithm tuned for processing DNA/RNA sequences. ntHash provides a fast way to compute multiple hash values for a given k-mer, without repeating the whole procedure for each value. To do so, a single hash value is computed from a given k-mer, and then each extra hash value is computed by few more multiplication, shifting and XOR operations on the initial hash value. This would be very useful for certain bioinformatics applications, such as those that utilize the Bloom filter data structure. Experimental results demonstrate a substantial speed improvement over conventional approaches, while retaining a near-ideal hash value distribution.
A profile homology search tool for PacBio reads. Frame-Pro is a tool using Hidden Markov Model (HMM) and directed acyclic graph to correct the errors in DNA sequencing reads. It can also provide output the profile alignments of the corrected sequences against characterized protein families. The results of Frame-Pro showed that this method enables more sensitive homology search and corrects more errors compared to a popular error correction tool that does not rely on hybrid sequencing.
Corrects noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. CoLoRMap is based on two novel ideas: using a classical shortest path algorithm to find a sequence of overlapping short reads that minimizes the edit score to a long read and extending corrected regions by local assembly of unmapped mates of mapped short reads. Our results on bacterial, fungal and insect data sets show that CoLoRMap compares well with existing hybrid correction methods.
Removes the biases inherent in raw Unique Molecular Identifier (UMI) counts and produces unbiased and low-noise measurements of transcript abundance. In these conditions, TRUmiCount can realize comparisons between different genes, exons, and other genomic feature. This algorithm exploits the tree-step bias-correction and phantom-removal in expected read counts. In addition, TRUmiCount can thus help to increase the accuracy of many quantitative applications of Next Generation Sequencing (NGS).
ShoRAH / Short Reads Assembly into Haplotypes
A computational method for quantifying genetic diversity in a mixed sample and for identifying the individual clones in the population, while accounting for sequencing errors. This approach provides the user also with an estimate of the quality of the reconstruction. Further, ShoRAH can reconstruct the global haplotypes and estimate their frequencies. ShoRAH was run on simulated data and on real data obtained in wet lab experiments to assess its reliability.
0 - 0 of 0 results
1 - 16 of 16 results
filter_list Filters
computer Job seeker
Disable 4
person Position
thumb_up Fields of Interest
public Country
language Programming Language
1 - 16 of 16 results