Counting k-mers (substrings of length k in DNA sequence data) is an essential component of many methods in bioinformatics, including for genome and transcriptome assembly, for metagenomic sequencing, and for error correction of sequence reads. Although simple in principle,…
G T A T C G C T A
Jellyfish
Desktop

Jellyfish

A k-mer counting algorithm. It is based on a multithreaded, lock-free hash…

A k-mer counting algorithm. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data…

G T A T C G C T A
SAG-QC
Desktop

SAG-QC

Identifies and excludes non-target sequences independent of database. SAG-QC…

Identifies and excludes non-target sequences independent of database. SAG-QC calculates the probability that a sequence was derived from contaminants by comparing k-mer compositions with the no…

G T A T C G C T A
Needletail
Desktop

Needletail

Writes a fast and well-tested set of functions that more specialized…

Writes a fast and well-tested set of functions that more specialized bioinformatics programs can use. Needletail is a minimal-copying FASTA/FASTQ parser and k-mer processing library for Rust. The…

G T A T C G C T A
ntCard
Desktop

ntCard

Estimates the frequencies of k-mers in genomics datasets. ntCard uses the…

Estimates the frequencies of k-mers in genomics datasets. ntCard uses the ntHash algorithm to efficiently compute hash values for streamed sequences. It then samples the calculated hash values to…

G T A T C G C T A
ntHash
Desktop

ntHash

A hashing algorithm tuned for processing DNA/RNA sequences. ntHash provides a…

A hashing algorithm tuned for processing DNA/RNA sequences. ntHash provides a fast way to compute multiple hash values for a given k-mer, without repeating the whole procedure for each value. To do…

G T A T C G C T A
KAT
Desktop

KAT K-mer Analysis Toolkit

A user-friendly, extendible and scalable toolkit for rapidly counting,…

A user-friendly, extendible and scalable toolkit for rapidly counting, comparing and analysing k-mers from various data sources. The tools in KAT assist the user with a wide range of tasks including…

G T A T C G C T A
FastGT
Desktop

FastGT

A computational method that counts the frequencies of unique k-mers in…

A computational method that counts the frequencies of unique k-mers in FASTQ-formatted genome data and uses this information to infer the genotypes of known variants. FastGT can detect the variants…

G T A T C G C T A
KCMBT
Desktop

KCMBT k-mer Counter based on Multiple Burst Trees

A very fast single-threaded k-mer counting algorithm. KCMBT uses cache…

A very fast single-threaded k-mer counting algorithm. KCMBT uses cache efficient burst tries to store k-mers. A burst trie is a trie in which a full node is split into multiple nodes to make space…

G T A T C G C T A
MSPKmerCounter
Desktop

MSPKmerCounter

A disk-based approach, to efficiently perform k-mer counting for large genomes…

A disk-based approach, to efficiently perform k-mer counting for large genomes using a small amount of memory. This approach is based on a novel technique called Minimum Substring Partitioning (MSP).…

G T A T C G C T A
KmerStream
Desktop

KmerStream

A streaming algorithm for estimating the number of distinct k-mers present in…

A streaming algorithm for estimating the number of distinct k-mers present in high throughput sequencing data. The algorithm runs in time linear in the size of the input and the space requirement are…

G T A T C G C T A
KAnalyze
Desktop

KAnalyze

A fast reusable k-mer toolkit capable of running on multiple platforms.…

A fast reusable k-mer toolkit capable of running on multiple platforms. KAnalyze is packaged with an API for integration into other programs as well as a CLI for manual execution and scripted…

G T A T C G C T A
Turtle
Desktop

Turtle

A method that balances time, space and accuracy requirements to efficiently…

A method that balances time, space and accuracy requirements to efficiently extract frequent k-mers even for high-coverage libraries and large genomes such as human. Turtle is designed to minimize…

G T A T C G C T A
BFCounter
Desktop

BFCounter

Identifies all the k-mers that occur more than once in a DNA sequence data set.…

Identifies all the k-mers that occur more than once in a DNA sequence data set. BFCounter does this using a Bloom filter, a probabilistic data structure that stores all the observed k-mers implicitly…

G T A T C G C T A
DSK
Desktop

DSK Disk Streaming of K-mers

A streaming algorithm for k-mer counting which only requires a fixed…

A streaming algorithm for k-mer counting which only requires a fixed user-defined amount of memory and disk space. This approach realizes a memory, time and disk trade-off. The multi-set of all…

G T A T C G C T A
KMC
Desktop

KMC K-Mer Counter

A utility designed for counting k-mers (sequences of consecutive k symbols) in…

A utility designed for counting k-mers (sequences of consecutive k symbols) in a set of reads from genome sequencing projects. K-mer counting is important for many bioinformatics applications, e.g.,…

G T A T C G C T A
TALLYMER
Desktop

TALLYMER

A flexible and memory-efficient collection of programs for k-mer counting and…

A flexible and memory-efficient collection of programs for k-mer counting and indexing of large sequence sets. Unlike previous methods, Tallymer is based on enhanced suffix arrays. This gives a much…

Related Websites
Information

By using OMICtools you acknowledge that you have read and accepted the terms of the end user license agreement.