Computational protocol: The Pre-Eclampsia Ontology: A Disease Ontology Representing the Domain Knowledge Specific to Pre-Eclampsia

[…] We use a literature curation pipeline to collect concepts and terms about specific knowledge domains from biomedical full text articles. All functions are performed by open source software. The pipeline collects gene names, gene variants, environmental factors and clinical features. The process by which this pipeline extracts information is as follows. 1) First, it extracts the full text from article PDF files with PDFx []. PDFx is a rule-based system designed to reconstruct the logical structure of scholarly articles in PDF form, regardless of their formatting style. 2) Next, it obtains genetic variants with TmVar [] from an extracted full-text file. TmVar is a text-mining tool based on a conditional random field for extracting a wide range of sequence variants described at protein, DNA and RNA levels. 3) Then, it obtains gene names and protein names with BANNER []. BANNER is an open-source, executable survey of advances in biomedical named entity recognition. 4) Then, it obtains phenotypic features (e.g. disease states, complications and laboratory results) with text annotation to previous developed ontologies via NCBO BioPortal v4.0 REST service []. In this annotation process, we defined the terms annotated by least one of following four ontologies as phenotypic features; Human Phenotype Ontology (HP) [], Disease Ontology (DO) [], Online Mendelian Inheritance in Man (OMIM) [] and Mammalian Phenotype Ontology (MP) []. 5) Finally, it obtains environmental factors with exact matching via the Environmental Factor Dictionary. The Environmental Factor Dictionary consists of 606 environmental factors such as high BMI and ethnicity. The factors of this dictionary were collected from 40 articles with manual curation until the increase in the number of factors ceased, or the factor base became saturated (). […]

Pipeline specifications

Software tools tmVar, BANNER
Application Information extraction
Organisms Homo sapiens
Diseases Eclampsia, Hypertension, Pre-Eclampsia