Counting k-mers (substrings of length k in DNA sequence data) is an essential component of many methods in bioinformatics, including for genome and transcriptome assembly, for metagenomic sequencing, and for error correction of sequence reads. Although simple in principle, counting k-mers in large modern sequence data sets can easily overwhelm the memory capacity of standard computers.

(Melsted and Pritchard, 2011) Efficient counting of k-mers in DNA sequences using a bloom filter. BMC Bioinformatics.

