Assists users in assessing text extraction from biomedical literature figures. DeTEXT contains over 500 typical biomedical literature figures existing in about 300 full-text articles randomly chosen from PubMed Central. The database provides annotation guidelines, tools and makes available to users text detection and word recognition evaluation protocols. It constitutes an image dataset for biomedical literature figure detection, recognition, and retrieval that can be used as a benchmark dataset.
Department of Computer Science and Technology, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China; School of Foreign Studies, University of Science and Technology Beijing, Beijing, China; School of Computer Science, University of Massachusetts Amherst, MA, USA; Department of Quantitative Health Sciences, University of Massachusetts Medical School, MA, USA
DeTEXT funding source(s)
Supported by National Natural Science Foundation of China (61105018,61473036) and by the National Institutes of Health the National Institute of General Medical Sciences under award number 5R01GM095476 and the National Center for Advancing Translational Sciences under award number UL1TR000161.