CheckM pipeline

CheckM specifications

Information


Unique identifier OMICS_08837
Name CheckM
Software type Package/Module
Interface Command line interface
Restrictions to use None
Operating system Unix/Linux
License GNU General Public License version 2.0
Computer skills Advanced
Version 0.9.7
Stability Stable
Maintained Yes

Versioning


Add your version

Maintainer


  • person_outline Donovan H. Parks <>

Publication for CheckM

CheckM citations

 (9)
2018
PMCID: 5946040

[…] trimmed using trimmomatic version 0.32 (5) and assembled into contigs with spades version 3.9.0 (6), applying the error correction and the single-cell mode. genome completeness was estimated using checkm (7)., the sag is composed of 132 contigs totaling 1.42 mbp (n50, 74.2 kbp; 720× coverage), with a g+c content of 35.5%. the sag is of high quality, at 94.6% completeness (∼4.6% contamination), […]

2017
PMCID: 5240137

[…] which included blast similarity checks and identification of outlying contigs based on tetranucleotide frequencies. as previously, genome completeness and contamination was determined using checkm77 and the assemblies were annotated using rast78., the “core” endozoicomonas genome (i.e., genes present in all genomes) was determined by clustering high quality proteins (greater than 10 […]

2017
PMCID: 5282799

[…] the process of adapter removal, quality trimming, error-correction, and contig generation [37, 38]. the completeness and contamination of the assemblies was estimated using phylosift [22] and checkm [39]. across all strains, genome completeness was determined to be a minimum of 98.9%, and the maximum contamination was 0.99% (additional file 1)., isolates were predominantly annotated using […]

2017
PMCID: 5328954

[…] output was converted to bam format, sorted and indexed using samtools 0.1.19 (li et al., 2009) and visualized in the integrative genomics viewer 2.3.18 (thorvaldsdóttir et al., 2013). additionally, checkm 0.9.7 (parks et al., 2014) was used to assess the completeness and heterogeneity of the genomes. visualization of the synteny, gc skew and genomic islands was accomplished using circos 0.64 […]

2017
PMCID: 5401914

[…] 76 and genbank version 213 (supplementary table s1). genomes were assessed for completeness and contamination by scoring the presence of conserved single-copy marker genes within each genome using checkm (parks et al., 2015). the median estimated genome completeness for this dataset is 99.4% and the minimum is 81.9%. genomes were estimated to be less than 10% contaminated, with all but eight […]

CheckM institution(s)
Australian Centre for Ecogenomics, School of Chemistry & Molecular Biosciences, The University of Queensland, St. Lucia, Queensland, Australia; Institute for Molecular Bioscience, The University of Queensland, St. Lucia, Queensland, Australia; Advanced Water Management Centre, The University of Queensland, St. Lucia, Queensland, Australia

CheckM review

star_border star_border star_border star_border star_border
star star star star star

Sarah Turner

star_border star_border star_border star_border star_border
star star star star star
Desktop
I have found this tool extremely useful for evaluating bins created by various binning tools (MetaBAT, MaxBin, and MyCC are the ones I've used the most). It's very good for getting a preliminary overview of how "good" the bins and sometimes for assigning bins to specific taxa. (I'm working with fairly complex environmental samples, so often my bins will be fairly incomplete/contaminated, and cannot be assigned at the family/genus/species levels.)

Installation can be somewhat challenging, because the tool has a fair number of dependencies that need to be installed as well. Ultimately, I chose to install it on a virtual machine running a 64 bit version of Ubuntu; I could perform most of the basic functions there, but didn't have enough RAM for some of the more computationally-intensive functions (for example, tetra, which calculates the tetranucleotide signatures).

I find the "bin_qa_plot" very useful to asses bins at a glance, and there are a variety of other useful graphical outputs to look more closely at the coverage and composition of individual bin. The tool also offers a number of ways to modify and refine bins manually, and the combination has the potential to be very powerful in recovering more complete draft genomes from metagenomes.

The tool can also be used to evaluate the results of single-cell genomics or genomes recovered from isolates, so it has broader applications than just metagenomics.