tutorial arrow
×
Submit new tools
Share tools covering the current topic. Provide easy-to-follow guidelines to improve their usability.

k-mer software tools | Whole-genome sequencing data analysis

Counting k-mers (substrings of length k in DNA sequence data) is an essential component of many methods in bioinformatics, including for genome and transcriptome assembly, for metagenomic sequencing, and for error correction of sequence reads.…
G T A T C G C T A
KMC
Desktop

KMC K-Mer Counter

A utility designed for counting k-mers (sequences of consecutive k symbols) in…

A utility designed for counting k-mers (sequences of consecutive k symbols) in a set of reads from genome sequencing projects. K-mer counting is important for many bioinformatics applications, e.g.,…

G T A T C G C T A
Jellyfish
Desktop

Jellyfish

A k-mer counting algorithm. It is based on a multithreaded, lock-free hash…

A k-mer counting algorithm. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data…

G T A T C G C T A
KAnalyze
Desktop

KAnalyze

A fast reusable k-mer toolkit capable of running on multiple platforms.…

A fast reusable k-mer toolkit capable of running on multiple platforms. KAnalyze is packaged with an API for integration into other programs as well as a CLI for manual execution and scripted…

G T A T C G C T A
ntCard
Desktop

ntCard

Estimates the frequencies of k-mers in genomics datasets. ntCard uses the…

Estimates the frequencies of k-mers in genomics datasets. ntCard uses the ntHash algorithm to efficiently compute hash values for streamed sequences. It then samples the calculated hash values to…

G T A T C G C T A
TALLYMER
Desktop

TALLYMER

A flexible and memory-efficient collection of programs for k-mer counting and…

A flexible and memory-efficient collection of programs for k-mer counting and indexing of large sequence sets. Unlike previous methods, Tallymer is based on enhanced suffix arrays. This gives a much…

G T A T C G C T A
DSK
Desktop

DSK Disk Streaming of K-mers

A streaming algorithm for k-mer counting which only requires a fixed…

A streaming algorithm for k-mer counting which only requires a fixed user-defined amount of memory and disk space. This approach realizes a memory, time and disk trade-off. The multi-set of all…

G T A T C G C T A
Turtle
Desktop

Turtle

A method that balances time, space and accuracy requirements to efficiently…

A method that balances time, space and accuracy requirements to efficiently extract frequent k-mers even for high-coverage libraries and large genomes such as human. Turtle is designed to minimize…

G T A T C G C T A
PgSA
Desktop

PgSA Pseudogenome Suffix Array

Aims to index and query collections of next-generation sequencing (NGS) reads…

Aims to index and query collections of next-generation sequencing (NGS) reads data in main memory. PgSA contains analysis of reads, error correction and variant calling from RNA-seq experiments. This…

G T A T C G C T A
ntHash
Desktop

ntHash

A hashing algorithm tuned for processing DNA/RNA sequences. ntHash provides a…

A hashing algorithm tuned for processing DNA/RNA sequences. ntHash provides a fast way to compute multiple hash values for a given k-mer, without repeating the whole procedure for each value. To do…

G T A T C G C T A
KAT
Desktop

KAT K-mer Analysis Toolkit

A user-friendly, extendible and scalable toolkit for rapidly counting,…

A user-friendly, extendible and scalable toolkit for rapidly counting, comparing and analysing k-mers from various data sources. The tools in KAT assist the user with a wide range of tasks including…

G T A T C G C T A
KCMBT
Desktop

KCMBT k-mer Counter based on Multiple Burst Trees

A very fast single-threaded k-mer counting algorithm. KCMBT uses cache…

A very fast single-threaded k-mer counting algorithm. KCMBT uses cache efficient burst tries to store k-mers. A burst trie is a trie in which a full node is split into multiple nodes to make space…

G T A T C G C T A
microTaboo
Desktop

microTaboo

Allows efficient and extensive sequence mining of unique (k-disjoint) sequences…

Allows efficient and extensive sequence mining of unique (k-disjoint) sequences of up to 100 nucleotides in length. microTaboo is able to identify mutations, inversions, and insertions, even in…

G T A T C G C T A
SAG-QC
Desktop

SAG-QC

Identifies and excludes non-target sequences independent of database. SAG-QC…

Identifies and excludes non-target sequences independent of database. SAG-QC calculates the probability that a sequence was derived from contaminants by comparing k-mer compositions with the no…

G T A T C G C T A
BFCounter
Desktop

BFCounter

Identifies all the k-mers that occur more than once in a DNA sequence data set.…

Identifies all the k-mers that occur more than once in a DNA sequence data set. BFCounter does this using a Bloom filter, a probabilistic data structure that stores all the observed k-mers implicitly…

G T A T C G C T A
Gerbil
Desktop

Gerbil

Counts K-mer for k ≥ 32. Gerbil is the result of an intensive process of…

Counts K-mer for k ≥ 32. Gerbil is the result of an intensive process of algorithm engineering. It loads genome reads from disk and redistributes it to temporary files. The tool counts the k-mers…

G T A T C G C T A
Squeakr
Desktop

Squeakr

Allows to count and query k-mer. Squeakr is based on the counting quotient…

Allows to count and query k-mer. Squeakr is based on the counting quotient filter (CQF). It supports fast queries and dynamic k-mer insertion, deletion, and modification. The tool offers competitive…

G T A T C G C T A
FastGT
Desktop

FastGT

Allows whole-genome genotyping of genome variants from raw sequencing reads.…

Allows whole-genome genotyping of genome variants from raw sequencing reads. FastGT enables to directly genotype known variants from next-generation sequencing (NGS) data by counting unique k-mers.…

G T A T C G C T A
SRC
Desktop

SRC Short Read Connector

Estimates the number of occurrences of a read in a read set and proposes a list…

Estimates the number of occurrences of a read in a read set and proposes a list of similar reads between sets. SRC is composed of two algorithms: SRC_counter and SRC_linker. They can connect any read…

G T A T C G C T A
KmerStream
Desktop

KmerStream

A streaming algorithm for estimating the number of distinct k-mers present in…

A streaming algorithm for estimating the number of distinct k-mers present in high throughput sequencing data. The algorithm runs in time linear in the size of the input and the space requirement are…

G T A T C G C T A
Needletail
Desktop

Needletail

Writes a fast and well-tested set of functions that more specialized…

Writes a fast and well-tested set of functions that more specialized bioinformatics programs can use. Needletail is a minimal-copying FASTA/FASTQ parser and k-mer processing library for Rust. The…

G T A T C G C T A
MSPKmerCounter
Desktop

MSPKmerCounter

A disk-based approach, to efficiently perform k-mer counting for large genomes…

A disk-based approach, to efficiently perform k-mer counting for large genomes using a small amount of memory. This approach is based on a novel technique called Minimum Substring Partitioning (MSP).…

Related Websites
Information

By using OMICtools you acknowledge that you have read and accepted the terms of the end user license agreement.