GENIA Tagger statistics

Citations per year

Number of citations per year for the bioinformatics software tool GENIA Tagger

Tool usage distribution map

This map represents all the scientific publications referring to GENIA Tagger per scientific context
Associated diseases

This word cloud represents GENIA Tagger usage per disease context

Popular tool citations

GENIA Tagger specifications


Unique identifier OMICS_05279
Name GENIA Tagger
Software type Package/Module
Interface Command line interface
Restrictions to use None
Operating system Unix/Linux
Computer skills Advanced
Version 3.0
Stability Stable
Maintained Yes


GENIA Tagger citations


Classification and analysis of a large collection of in vivo bioassay descriptions

PLoS Comput Biol
PMCID: 5517062
PMID: 28678787
DOI: 10.1371/journal.pcbi.1005641

[…] ions based on their semantic similarity.We begin with preprocessing and grammatical analysis of the assay descriptions extracted from ChEMBL (the overview of this step is illustrated by ). We use the GENIA tagger to tokenize the sentences and to annotate the words with part-of-speech (POS) tags and other linguistic features. Next, we use custom grammatical patterns to chunk the descriptions such t […]


Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks

PMCID: 5088735
PMID: 27777244
DOI: 10.1093/database/baw140

[…] ed corpus: we use a sentence boundary detection tool called ‘Splitta’ ( to split each PubMed abstract into sentences which are subsequently tagged using the GENIA Tagger (); (ii) feature extraction: this component extracts the features from the preprocessed PubMed text, including words, part-of-speech tags, chunking information, word shape features such a […]


BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID

PMCID: 5009341
PMID: 27589962
DOI: 10.1093/database/baw121

[…] loped tools for data pre-processing: the LingPipe sentence splitter for detecting sentence boundaries (, OSCAR4’s tokenizer for segmenting sentences into tokens () and the GENIA Tagger for lemmatization as well as part-of-speech and chunk tagging (). The recognition of gene/protein (F-score 70%) and organism mentions (F-score 73%) in text was addressed by training Condi […]


Argo: enabling the development of bespoke workflows and services for disease annotation

PMCID: 4869796
PMID: 27189607
DOI: 10.1093/database/baw066

[…] tter ( These in turn were decomposed into tokens by the OSCAR4 Tokeniser () which were then assigned lemmatised forms as well as part-of-speech (POS) and chunk tags by the GENIA Tagger (). Figure 4. We employed the NERsuite package (, an implementation of conditional random fields (CRFs) (), to apply pre-trained models for sequence labelling. […]


Building a glaucoma interaction network using a text mining approach

BioData Min
PMCID: 4857381
PMID: 27152122
DOI: 10.1186/s13040-016-0096-2

[…] #ofrelevantretrievedinstances#ofrelevantinstances,F1=2*P*RP+R(1)The text retrieval step performance metrics and values are listed in Table  and Table . For the entity extraction step performance, the GENIA tagger targets a broader domain. Hence, it can be expected to tag varied entities (including localization, cell type, DNA, etc.), but possibly less genes/proteins than the GenTag tagger. This is […]


miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases

J Biomed Semantics
PMCID: 4877743
PMID: 27216254
DOI: 10.1186/s13326-015-0044-y

[…] . Chunking is the task of identifying and grouping words in a sentence into constituents (noun groups, verb groups etc.) called “chunks”. Sentences are tagged with part-of-speech (POS) tags using the Genia Tagger []. We further chunk the words based on syntactically related POS tags to form noun phrases (NPs), verb groups (VGs) and prepositional phrases (PPs).After chunking, we use iSimp [], which […]

