BioLemmatizer statistics

To access cutting-edge analytics on consensus tools, life science contexts and associated fields, you will need to subscribe to our premium service.


Citations per year

Citations chart

Popular tool citations

chevron_left Acronym & term extraction chevron_right
Popular tools chart

Tool usage distribution map

Tool usage distribution map

Associated diseases

Associated diseases

BioLemmatizer specifications


Unique identifier OMICS_04827
Name BioLemmatizer
Software type Package/Module
Interface Command line interface
Restrictions to use None
Operating system Unix/Linux
Computer skills Advanced
Version 1.2
Stability Stable
Maintained Yes


Add your version


  • person_outline Haibin Liu <>

Publication for BioLemmatizer

BioLemmatizer in publications

PMCID: 5336247
PMID: 28257498
DOI: 10.1371/journal.pone.0173132

[…] whether a word appears or not in a given abstract against all of the words that appear in the corpus. we lemmatize (stem) the text in order to reduce sparsity of the words occurring. we use the biolemmatizer, which is trained on biomedical texts []., 8. n-gram extraction: we extract noun compound bigrams such as “blood sample”, or “breast milk”, as they can represent a concept in the text. […]

PMCID: 4864999
PMID: 27175227
DOI: 10.1186/s13326-016-0070-4

[…] out, poly, post, re, self, trans, under}. we obtained this list experimentally by careful examination of the st-set.finally, we lemmatize all the trigger words, and all of their parts, using the biolemmatizer tool [] which is specifically developed for the biomedical domain, and record all the produced lemmas for each trigger word., remove any punctuation or special characters […]

PMCID: 4852402
PMID: 27141091
DOI: 10.1093/database/baw061

[…] used to train tmchem: part-of-speech (pos) tags, lemmas and word-vector clusters. we used the bioc natural language processing pipeline () to generate pos tags with maxenttagger () and lemmas with biolemmatizer (). recent studies have shown that features based on clusters of word vectors can improve classification performance (, ). we used the word2vec tool […]

PMCID: 4642081
PMID: 26551594
DOI: 10.1186/1471-2105-16-S16-S2

[…] node, their relaxed pos tags (p*, allowing a plural noun form to match with a singular, or various conjugated forms of a verb to match) and the lemmatized form (l, derived from application of the biolemmatizer []) of the associated tokens must be identical ("p*+l" matching criteria)., pattern matching proceeds iteratively and bottom-up, to enable the extraction of complex and nested events. […]

PMCID: 4642041
PMID: 26551454
DOI: 10.1186/1471-2105-16-S16-S1

[…] conditional random fields (crfs) sieve to detect direct relations between b. subtilis genes that are "hidden" as target mentions within events. to better address text from biomedicine, we use the biolemmatizer [] instead of a general lemmatizer. we incorporate an additional knowledge resource - b. subtilis protein-protein interaction network from the string database [], which is used within […]

To access a full list of publications, you will need to upgrade to our premium service.

BioLemmatizer institution(s)
Colorado Computational Pharmacology, University of Colorado School of Medicine, Aurora, CO, USA

BioLemmatizer reviews

star_border star_border star_border star_border star_border
star star star star star

Be the first to review BioLemmatizer