BC2GM-corpus statistics

BC2GM-corpus specifications


Unique identifier OMICS_20391
Name BC2GM-corpus
Alternative name BioCreative II Gene Mention-corpus
Restrictions to use None
Community driven No
Data access File download
User data submission Not allowed
Maintained Yes


  • person_outline L. Smith
  • person_outline Lorraine Tanabe
  • person_outline John Wilbur

Publication for BioCreative II Gene Mention-corpus

BC2GM-corpus citations


PALM IST: Pathway Assembly from Literature Mining an Information Search Tool

Sci Rep
PMCID: 4437304
PMID: 25989388
DOI: 10.1038/srep10021

[…] BioCreative task II gene mention (BC2GM) corpus is concerned with the named entity extraction of gene and gene product mentioned in text. BC2GM test set containing 5000 sentences were utilized for gene mention programs’ evaluation. Bi […]


A modular framework for biomedical concept recognition

BMC Bioinformatics
PMCID: 3849280
PMID: 24063607
DOI: 10.1186/1471-2105-14-281

[…] parsing and named entity recognition. However, it only presents results for gene and protein recognition, where BANNER claims the best performing results using a ML model trained on the corpus of the BioCreative II gene mention corpus []. Thus, we decided to also use Cocoa and Whatizit to compare the achieved performance results. Since Cocoa concept classes do not match directly to the ones provid […]


A corpus of full text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools

BMC Bioinformatics
PMCID: 3483229
PMID: 22901054
DOI: 10.1186/1471-2105-13-207

[…] ith the distributed models. Distributed models are trained on one of several available gene mention corpora: the NLPBA corpus [], the GENIA corpus [], the BioCreative I gene mention corpus [], or the BioCreative II gene mention corpus []. Two sets of results are provided, based on a comparison of the system output on (a) the development portion of the CRAFT public release data set and (b) the comp […]


BC2GM-corpus institution(s)
National Center for Biotechnology Information, Bethesda, MD, USA; IBM TJ Watson Research Center, Yorktown Heights, NY, USA; [etc.]
BC2GM-corpus funding source(s)
Supported by the Intramural Research Program of the NIH, NLM, and NCBI.

