MHAP specifications


Unique identifier OMICS_13515
Alternative name MinHash Alignment Process
Software type Pipeline/Workflow
Interface Command line interface
Restrictions to use None
Input data Long read sequences, a tab-delimited file of repetitive k-mers
Input format FASTA, DAT
Output data Overlaps in a format similar to BLASR’s M4 format
Biological technology Pacific Biosciences
Operating system Unix/Linux, Mac OS, Windows
Programming languages Java
License Apache License version 2.0
Computer skills Advanced
Version 2.1.1
Stability Stable
JDK, Apache Maven
Maintained Yes




Publication for MinHash Alignment Process

MHAP citations


The long reads ahead: de novo genome assembly using the MinION

PMCID: 5770995
PMID: 29375809
DOI: 10.5256/f1000research.12992.r24090

[…] e reads. Like PBcR, Canu was shown to succesfully assemble both MinION and PacBio reads . The pipeline includes three stages; correction, trimming and assembly. Overlaps are found using the efficient minhash alignment process (MHAP) , which hashes k-mers using different hash functions and for each hash function stores the smallest integer to which a k-mer of the sequence is hashed. Comparing the h […]


Alignment free sequence comparison: benefits, applications, and tools

Genome Biol
PMCID: 5627421
PMID: 28974235
DOI: 10.1186/s13059-017-1319-7

[…] nopore) provides an opportunity to study new genomes with unprecedented speed and quality. However, the noisy nature of sequencing data demands dedicated solutions to access more complex genomes. The MinHash Alignment Process was designed for this task employing probabilistic, locality-sensitive hashing. Integration of the MinHash Alignment Process with the Celera Assembler enabled reference-grade […]


The Complete Genome Sequence of the Phytopathogenic Fungus Sclerotinia sclerotiorum Reveals Insights into the Genome Architecture of Broad Host Range Pathogens

Genome Biol Evol
PMCID: 5381539
PMID: 28204478
DOI: 10.1093/gbe/evx030

[…] A de novo genome assembly of S. sclerotiorum strain 1980 was generated using MHAP version 1.5b1 () with default settings. To assess contiguity of the assembled sequences, they were aligned to the previously generated optical map with MapSolver version 3.2 (OpGen, Gaithersburg, […]


Single Molecule Sequencing of the Drosophila serrata Genome

PMCID: 5345708
PMID: 28143951
DOI: 10.1534/g3.116.037598
call_split See protocol

[…] 136,119 filtered subreads with a mean read length of 8840 bp and an N50 of 12,220 bp (Supplemental Material, Figure S1). The PacBio genome was assembled using the PBcR pipeline, which implements the MHAP algorithm within the Celera Assembler (version 8.3rc2) (), and polished with Quiver (GenomicConsensus version 0.9.2 and ConsensusCore version: 0.8.8) () in three steps: (1) errors were corrected […]


Innovations and challenges in detecting long read overlaps: an evaluation of the state of the art

PMCID: 5408847
PMID: 28003261
DOI: 10.1093/bioinformatics/btw811

[…] Minimap () is an overlapper/mapping tool that combines concepts from many of its predecessors, such as DALIGNER (k-mer sorting for cache efficiency), MHAP (computing minimizers) and GraphMap (clustering collinear chains of matching seeds). Minimap subsamples the hashed k-mer space by computing minimizers, and compiles the corresponding k-mers along […]


A New Chicken Genome Assembly Provides Insight into Avian Genome Structure

PMCID: 5217101
PMID: 27852011
DOI: 10.1534/g3.116.035923

[…] Sequences were assembled using the PBcR-MHAP algorithm version 8.2 (). PBcR-MHAP assemblies were derived from a total of 18.7 Gb corrected sequence data, which is estimated to represent ∼15.3× coverage, based upon the size of the Gallus_gal […]


MHAP institution(s)
Department of Chemistry and Biochemistry, University of Maryland, College Park, MD, USA; Institute for Advanced Computer Studies, University of Maryland, College Park, MD, USA; Invincea Labs, Arlington, VA, USA; National Biodefense Analysis and Countermeasures Center, Frederick, MD, USA; Pacific Biosciences of California, Inc., Menlo Park, CA, USA
MHAP funding source(s)
This work was partly funded under Agreement No. HSHQDC-07-C-00020 awarded by the Department of Homeland Security Science and Technology Directorate (DHS/S&T) for the management and operation of the National Biodefense Analysis and Countermeasures Center (NBACC), a Federally Funded Research and Development Center.

