gkmSVM statistics

info info

Citations per year

Number of citations per year for the bioinformatics software tool gkmSVM

Tool usage distribution map

This map represents all the scientific publications referring to gkmSVM per scientific context
info info

Associated diseases


Popular tool citations

chevron_left Enhancer prediction chevron_right
Want to access the full stats & trends on this tool?

gkmSVM specifications


Unique identifier OMICS_20453
Name gkmSVM
Alternative names gapped-kmer-SVM, gkmSVM-R
Software type Package/Module
Interface Command line interface
Restrictions to use None
Input format BED, FORMAT
Operating system Unix/Linux, Mac OS, Windows
Programming languages C++, R
License GNU General Public License version 3.0
Computer skills Advanced
Version 0.79.0
Stability Stable
Maintained Yes




No version available



  • person_outline Michael A. Beer
  • person_outline Mahmoud Ghandi

Additional information

https://cran.r-project.org/web/packages/gkmSVM/index.html A C++ code is also available http://www.beerlab.org/gkmsvm/downloads/gkmsvm-2.0.tar.gz.

Publications for gapped-kmer-SVM

gkmSVM citations


Prediction of enhancer promoter interactions via natural language processing

BMC Genomics
PMCID: 5954283
PMID: 29764360
DOI: 10.1186/s12864-018-4459-6

[…] According to Table , we observe that our sequence embedding features outperform experimental features in TargetFinder and sequence features computed in gkmSVM and SPEID. Here, to further improve the prediction accuracy of our model, we attempt to combine our sequence embedding features and experimental features in TargetFinder.Concretely, we concaten […]


Predicting double strand DNA breaks using epigenome marks or DNA at kilobase resolution

Genome Biol
PMCID: 5856001
PMID: 29544533
DOI: 10.1186/s13059-018-1411-7

[…] se in accuracy in the out-of-bag sample. To discriminate between DSB and non-DSB sites, we randomly selected genomic sequences that matched sizes, GC, and repeat contents of DSB sites using R package gkmSVM (https://cran.r-project.org/web/packages/gkmSVM). To learn the model, we mapped epigenomic data, DNA motifs, and DNA shape as follows. For epigenomic data including ChIP-seq and DNase-seq data, […]


Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data

PMCID: 5838836
PMID: 29618048
DOI: 10.1093/gigascience/gix136

[…] of CAGE peaks mapped, the RNA-Seq signal from 86 cattle RNA-Seq datasets, the Villar H3K27Ac signal, the SVM enhancer scores (enhancer activity predicted by a machine learning classification method, gkmSVM) [], the number of overlapping annotations, the conservation score based on the UCSC 100-way vertebrate alignment [], and the number of TFBS based on Cluster-Buster scanning []. The main filter […]


Predicting enhancers with deep convolutional neural networks

BMC Bioinformatics
PMCID: 5773911
PMID: 29219068
DOI: 10.1186/s12859-017-1878-3

[…] features are enriched in enhancers and have potential biological meaning. Ghandi et al. improved kmer-SVM by adopting another type of sequence features called gapped k-mers []. Their method, known as gkmSVM, showed robustness in the estimation of k-mer frequencies and allowed higher performance than kmer-SVM. However, k-mer features, though unbiased, may lack the ability to capture high order char […]


Chromatin accessibility prediction via convolutional long short term memory networks with k mer embedding

PMCID: 5870572
PMID: 28881969
DOI: 10.1093/bioinformatics/btx234

[…] we give an introduction to the datasets prepared for classification tasks and some details about model training procedure. Then in Section 3.2, we evaluate our method and compare its performance with gkmSVM and DeepSEA. Next in Section 3.3, we analyze k-mer embedding by probing into the k-mer statistics and visualizing the embedding vectors. Additionally in Section 3.4, we prove the effectiveness […]


Predicting the impact of non coding variants on DNA methylation

Nucleic Acids Res
PMCID: 5499808
PMID: 28334830
DOI: 10.1093/nar/gkx177

[…] d direction-included, we used the absolute value of the predictions for the same type of cell line (LCL, lymphoblastoid cell line) from which the meQTL were discovered from. For deltaSVM, we used the gkmSVM weights trained on GM12878 DNase Hyper-sensitive Sites (DHS). For Basset, we used the absolute SAD (SNP Accessibility Difference) scores predicted for GM12878. […]

Want to access the full list of citations?
gkmSVM institution(s)
The Broad Institute of MIT and Harvard, Cambridge, MA, USA; School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran; Department of Engineering Science, College of Engineering, University of Tehran, and Institute for Research in Fundamental Sciences (IPM), Tehran, Iran; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA; Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA; Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
gkmSVM funding source(s)
Supported by NIH grant R01 HG0007348 and grants from IPM (No. CS1391-4-02 and No. 94050016).

gkmSVM reviews

star_border star_border star_border star_border star_border
star star star star star

Be the first to review gkmSVM