BioLemmatizer statistics

info info

Citations per year

Number of citations per year for the bioinformatics software tool BioLemmatizer

Tool usage distribution map

This map represents all the scientific publications referring to BioLemmatizer per scientific context
info info

Associated diseases

This word cloud represents BioLemmatizer usage per disease context

Popular tool citations

chevron_left Acronym & term extraction chevron_right
Want to access the full stats & trends on this tool?


BioLemmatizer specifications


Unique identifier OMICS_04827
Name BioLemmatizer
Software type Package/Module
Interface Command line interface
Restrictions to use None
Operating system Unix/Linux
Computer skills Advanced
Version 1.2
Stability Stable
Maintained Yes


No version available


  • person_outline Haibin Liu

Publication for BioLemmatizer

BioLemmatizer citations


First steps in automatic summarization of transcription factor properties for RegulonDB: classification of sentences about structural domains and regulated processes

PMCID: 5737074
PMID: 29220462
DOI: 10.1093/database/bax070

[…] erformed Sentence split and POStagging by using the Stanford POS Tagger 3.6 program (). This is a widely-used POS tagger which utilizes tags from the Penn Treebank tag set (). After that, we used the BioLemmatizer 1.2 program () for lemmatization, which is a lemmatizer fit to the biological domain. The BioLemmatizer requires previously assigned POS tags to determine the lemmas; this was the reason […]


BCC NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition

EURASIP J Bioinform Syst Biol
PMCID: 5419958
PMID: 28477208
DOI: 10.1186/s13637-017-0060-6
call_split See protocol

[…] kens, (iii) lemmatization to convert the tokens to the basic form of the word, (iv) POS tagging, and (v) chunking. OpenNLP [] was used for sentence splitting, tokenization, POS tagging, and chunking. BioLemmatizer [] was employed for lemmatization. […]


Text mining for improved exposure assessment

PLoS One
PMCID: 5336247
PMID: 28257498
DOI: 10.1371/journal.pone.0173132
call_split See protocol

[…] pture whether a word appears or not in a given abstract against all of the words that appear in the corpus. We lemmatize (stem) the text in order to reduce sparsity of the words occurring. We use the BioLemmatizer, which is trained on biomedical texts [].8. N-gram Extraction: We extract noun compound bigrams such as “blood sample”, or “breast milk”, as they can represent a concept in the text. We […]


Molecular mechanisms involved in the side effects of fatty acid amide hydrolase inhibitors: a structural phenomics approach to proteome wide cellular off target deconvolution and disease association

NPJ Syst Biol Appl
PMCID: 5516858
PMID: 28725477
DOI: 10.1038/npjsba.2016.23

[…] A total 22,345,439 PubMed abstracts dated before July 2014 were downloaded from the National Library of Medicine. BioLemmatizer v1.2 was used to transform a word to a lemma. Part-of-speech tagger was carried out using RDRPostagger v1.13. Disease, gene and chemical name entities were recognized using DNorm v0.0.6; […]


Filtering large scale event collections using a combination of supervised and unsupervised learning for event trigger classification

J Biomed Semantics
PMCID: 4864999
PMID: 27175227
DOI: 10.1186/s13326-016-0070-4

[…] non, out, poly, post, re, self, trans, under}. We obtained this list experimentally by careful examination of the ST-set.Finally, we lemmatize all the trigger words, and all of their parts, using the BioLemmatizer tool [] which is specifically developed for the biomedical domain, and record all the produced lemmas for each trigger word.Remove any punctuation or special characters from the beginnin […]


Chemical entity recognition in patents by combining dictionary based and statistical approaches

PMCID: 4852402
PMID: 27141091
DOI: 10.1093/database/baw061
call_split See protocol

[…] nd used to train tmChem: part-of-speech (POS) tags, lemmas and word-vector clusters. We used the BioC natural language processing pipeline () to generate POS tags with MaxentTagger () and lemmas with BioLemmatizer (). Recent studies have shown that features based on clusters of word vectors can improve classification performance (, ). We used the word2vec tool ( […]

Want to access the full list of citations?
BioLemmatizer institution(s)
Colorado Computational Pharmacology, University of Colorado School of Medicine, Aurora, CO, USA

BioLemmatizer reviews

star_border star_border star_border star_border star_border
star star star star star

Be the first to review BioLemmatizer