Counting k-mers (substrings of length k in DNA sequence data) is an essential component of many methods in bioinformatics, including for genome and transcriptome assembly, for metagenomic sequencing, and for error correction of sequence reads. Although simple in principle,…

Counting k-mers (substrings of length k in DNA sequence data) is an essential component of many methods in bioinformatics, including for genome and transcriptome assembly, for metagenomic sequencing, and for error correction of sequence reads. Although simple in principle, counting k-mers in large modern sequence data sets can easily overwhelm the memory capacity of standard computers.

A k-mer counting algorithm. It is based on a multithreaded, lock-free hash…

A k-mer counting algorithm. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data…

Identifies and excludes non-target sequences independent of database. SAG-QC…

Identifies and excludes non-target sequences independent of database. SAG-QC calculates the probability that a sequence was derived from contaminants by comparing k-mer compositions with the no…

Writes a fast and well-tested set of functions that more specialized…

Writes a fast and well-tested set of functions that more specialized bioinformatics programs can use. Needletail is a minimal-copying FASTA/FASTQ parser and k-mer processing library for Rust. The…

Estimates the frequencies of k-mers in genomics datasets. ntCard uses the…

Estimates the frequencies of k-mers in genomics datasets. ntCard uses the ntHash algorithm to efficiently compute hash values for streamed sequences. It then samples the calculated hash values to…

A hashing algorithm tuned for processing DNA/RNA sequences. ntHash provides a…

A hashing algorithm tuned for processing DNA/RNA sequences. ntHash provides a fast way to compute multiple hash values for a given k-mer, without repeating the whole procedure for each value. To do…

A user-friendly, extendible and scalable toolkit for rapidly counting,…

A user-friendly, extendible and scalable toolkit for rapidly counting, comparing and analysing k-mers from various data sources. The tools in KAT assist the user with a wide range of tasks including…

A computational method that counts the frequencies of unique k-mers in…

A computational method that counts the frequencies of unique k-mers in FASTQ-formatted genome data and uses this information to infer the genotypes of known variants. FastGT can detect the variants…

A very fast single-threaded k-mer counting algorithm. KCMBT uses cache…

A very fast single-threaded k-mer counting algorithm. KCMBT uses cache efficient burst tries to store k-mers. A burst trie is a trie in which a full node is split into multiple nodes to make space…

A disk-based approach, to efficiently perform k-mer counting for large genomes…

A disk-based approach, to efficiently perform k-mer counting for large genomes using a small amount of memory. This approach is based on a novel technique called Minimum Substring Partitioning (MSP).…

A streaming algorithm for estimating the number of distinct k-mers present in…

A streaming algorithm for estimating the number of distinct k-mers present in high throughput sequencing data. The algorithm runs in time linear in the size of the input and the space requirement are…

A fast reusable k-mer toolkit capable of running on multiple platforms.…

A fast reusable k-mer toolkit capable of running on multiple platforms. KAnalyze is packaged with an API for integration into other programs as well as a CLI for manual execution and scripted…

A method that balances time, space and accuracy requirements to efficiently…

A method that balances time, space and accuracy requirements to efficiently extract frequent k-mers even for high-coverage libraries and large genomes such as human. Turtle is designed to minimize…

Identifies all the k-mers that occur more than once in a DNA sequence data set.…

Identifies all the k-mers that occur more than once in a DNA sequence data set. BFCounter does this using a Bloom filter, a probabilistic data structure that stores all the observed k-mers implicitly…

A streaming algorithm for k-mer counting which only requires a fixed…

A streaming algorithm for k-mer counting which only requires a fixed user-defined amount of memory and disk space. This approach realizes a memory, time and disk trade-off. The multi-set of all…

A utility designed for counting k-mers (sequences of consecutive k symbols) in…

A utility designed for counting k-mers (sequences of consecutive k symbols) in a set of reads from genome sequencing projects. K-mer counting is important for many bioinformatics applications, e.g.,…

A flexible and memory-efficient collection of programs for k-mer counting and…

A flexible and memory-efficient collection of programs for k-mer counting and indexing of large sequence sets. Unlike previous methods, Tallymer is based on enhanced suffix arrays. This gives a much…