tutorial arrow
×
Submit new tools
Share tools covering the current topic. Provide easy-to-follow guidelines to improve their usability.

k-mer software tools | Whole-genome sequencing data analysis

Counting k-mers (substrings of length k in DNA sequence data) is an essential component of many methods in bioinformatics, including for genome and transcriptome assembly, for metagenomic sequencing, and for error correction of sequence reads.…
seer
Desktop

seer Sequence Element EnRichment

Identifies sequence elements. seer can detect associations with antibiotic…

Identifies sequence elements. seer can detect associations with antibiotic resistance caused by both presence of a gene and by single-nucleotide polymorphism (SNP) in coding regions, as well as…

G T A T C G C T A
KMC
Desktop

KMC K-Mer Counter

A utility designed for counting k-mers (sequences of consecutive k symbols) in…

A utility designed for counting k-mers (sequences of consecutive k symbols) in a set of reads from genome sequencing projects. K-mer counting is important for many bioinformatics applications, e.g.,…

G T A T C G C T A
Jellyfish
Desktop

Jellyfish

A k-mer counting algorithm. It is based on a multithreaded, lock-free hash…

A k-mer counting algorithm. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data…

G T A T C G C T A
KAnalyze
Desktop

KAnalyze

A fast reusable k-mer toolkit capable of running on multiple platforms.…

A fast reusable k-mer toolkit capable of running on multiple platforms. KAnalyze is packaged with an API for integration into other programs as well as a CLI for manual execution and scripted…

G T A T C G C T A
ntCard
Desktop

ntCard

Estimates the frequencies of k-mers in genomics datasets. ntCard uses the…

Estimates the frequencies of k-mers in genomics datasets. ntCard uses the ntHash algorithm to efficiently compute hash values for streamed sequences. It then samples the calculated hash values to…

G T A T C G C T A
TALLYMER
Desktop

TALLYMER

A flexible and memory-efficient collection of programs for k-mer counting and…

A flexible and memory-efficient collection of programs for k-mer counting and indexing of large sequence sets. Unlike previous methods, Tallymer is based on enhanced suffix arrays. This gives a much…

G T A T C G C T A
DSK
Desktop

DSK Disk Streaming of K-mers

A streaming algorithm for k-mer counting which only requires a fixed…

A streaming algorithm for k-mer counting which only requires a fixed user-defined amount of memory and disk space. This approach realizes a memory, time and disk trade-off. The multi-set of all…

G T A T C G C T A
Turtle
Desktop

Turtle

A method that balances time, space and accuracy requirements to efficiently…

A method that balances time, space and accuracy requirements to efficiently extract frequent k-mers even for high-coverage libraries and large genomes such as human. Turtle is designed to minimize…

G T A T C G C T A
PgSA
Desktop

PgSA Pseudogenome Suffix Array

Aims to index and query collections of next-generation sequencing (NGS) reads…

Aims to index and query collections of next-generation sequencing (NGS) reads data in main memory. PgSA contains analysis of reads, error correction and variant calling from RNA-seq experiments. This…

G T A T C G C T A
ntHash
Desktop

ntHash

A hashing algorithm tuned for processing DNA/RNA sequences. ntHash provides a…

A hashing algorithm tuned for processing DNA/RNA sequences. ntHash provides a fast way to compute multiple hash values for a given k-mer, without repeating the whole procedure for each value. To do…

G T A T C G C T A
KAT
Desktop

KAT K-mer Analysis Toolkit

A user-friendly, extendible and scalable toolkit for rapidly counting,…

A user-friendly, extendible and scalable toolkit for rapidly counting, comparing and analysing k-mers from various data sources. The tools in KAT assist the user with a wide range of tasks including…

G T A T C G C T A
KCMBT
Desktop

KCMBT k-mer Counter based on Multiple Burst Trees

A very fast single-threaded k-mer counting algorithm. KCMBT uses cache…

A very fast single-threaded k-mer counting algorithm. KCMBT uses cache efficient burst tries to store k-mers. A burst trie is a trie in which a full node is split into multiple nodes to make space…

G T A T C G C T A
microTaboo
Desktop

microTaboo

Allows efficient and extensive sequence mining of unique (k-disjoint) sequences…

Allows efficient and extensive sequence mining of unique (k-disjoint) sequences of up to 100 nucleotides in length. microTaboo is able to identify mutations, inversions, and insertions, even in…

G T A T C G C T A
SAG-QC
Desktop

SAG-QC

Identifies and excludes non-target sequences independent of database. SAG-QC…

Identifies and excludes non-target sequences independent of database. SAG-QC calculates the probability that a sequence was derived from contaminants by comparing k-mer compositions with the no…

G T A T C G C T A
BFCounter
Desktop

BFCounter

Identifies all the k-mers that occur more than once in a DNA sequence data set.…

Identifies all the k-mers that occur more than once in a DNA sequence data set. BFCounter does this using a Bloom filter, a probabilistic data structure that stores all the observed k-mers implicitly…

G T A T C G C T A
findGSE
Desktop

findGSE

Estimates size of (heterozygous diploid or homozygous) genomes. findGSE is a…

Estimates size of (heterozygous diploid or homozygous) genomes. findGSE is a sophisticated method for k-mer based GSE, which relies on a mixture model by fitting k-mer frequencies iteratively using a…

G T A T C G C T A
skm-tools
Desktop

skm-tools

Includes several tools for counting and intersecting skip-mers between…

Includes several tools for counting and intersecting skip-mers between different datasets. Skm-tools is a suite including two different tools : (1) skm-count, which counts the number of occurrences…

G T A T C G C T A
Gerbil
Desktop

Gerbil

Counts K-mer for k ≥ 32. Gerbil is the result of an intensive process of…

Counts K-mer for k ≥ 32. Gerbil is the result of an intensive process of algorithm engineering. It loads genome reads from disk and redistributes it to temporary files. The tool counts the k-mers…

G T A T C G C T A
Squeakr
Desktop

Squeakr

Allows to count and query k-mer. Squeakr is based on the counting quotient…

Allows to count and query k-mer. Squeakr is based on the counting quotient filter (CQF). It supports fast queries and dynamic k-mer insertion, deletion, and modification. The tool offers competitive…

G T A T C G C T A
FastGT
Desktop

FastGT

Allows whole-genome genotyping of genome variants from raw sequencing reads.…

Allows whole-genome genotyping of genome variants from raw sequencing reads. FastGT enables to directly genotype known variants from next-generation sequencing (NGS) data by counting unique k-mers.…

G T A T C G C T A
SRC
Desktop

SRC Short Read Connector

Estimates the number of occurrences of a read in a read set and proposes a list…

Estimates the number of occurrences of a read in a read set and proposes a list of similar reads between sets. SRC is composed of two algorithms: SRC_counter and SRC_linker. They can connect any read…

G T A T C G C T A
KmerStream
Desktop

KmerStream

A streaming algorithm for estimating the number of distinct k-mers present in…

A streaming algorithm for estimating the number of distinct k-mers present in high throughput sequencing data. The algorithm runs in time linear in the size of the input and the space requirement are…

G T A T C G C T A
Gk-arrays
Desktop

Gk-arrays

Provides data structure to index k-mers of reads. Gk-arrays is an algorithm…

Provides data structure to index k-mers of reads. Gk-arrays is an algorithm that builds data structure to index reads. This structure is kept in main memory once built and repeatedly accessed to…

G T A T C G C T A
Needletail
Desktop

Needletail

Writes a fast and well-tested set of functions that more specialized…

Writes a fast and well-tested set of functions that more specialized bioinformatics programs can use. Needletail is a minimal-copying FASTA/FASTQ parser and k-mer processing library for Rust. The…

G T A T C G C T A
MSPKmerCounter
Desktop

MSPKmerCounter

A disk-based approach, to efficiently perform k-mer counting for large genomes…

A disk-based approach, to efficiently perform k-mer counting for large genomes using a small amount of memory. This approach is based on a novel technique called Minimum Substring Partitioning (MSP).…

Related Websites
Information

By using OMICtools you acknowledge that you have read and accepted the terms of the end user license agreement.