Computational protocol: Ranking Biomedical Annotations with Annotator's Semantic Relevancy

Similar protocols

Protocol publication

[…] As an example, we only use protein data in the experiments. But our approach can also be applied to other biomedical entities. We firstly get manually 500 protein structures and their scientific names from, download their files like FASTA sequence and PDB, crawl their web page, extract basic attributes from the files and webpage, and import them into SQL server. Then we search the Anne OTate [] with scientific names of those protein structures and randomly get 1000 unique authors as our initial annotators. Although there are some annotations and ontology of biomedical entity in the online database, few of them are proper for the frequent pattern mining. Thus, we automatically generate 20000 historical annotations, of which 60 percent are designed as shown in and the others are randomly generated: random annotator, random biomedical entity, and random annotation with random correctness.As shown in 1000 of the annotators are classified as 9 types. Each type is designed to contribute certain number of annotations with correctness in certain range. To test the cold-start problem, several users are designed to contribute 5 or below annotations. On the other hand, to ensure the patterns can be found, at least five of each type of users will give annotations on 5 to 15 biomedical entities with common features.As for the web information, we presearched and stored their weights in database for the 20000 pairs of users and biomedical entities. First, each biomedical entity will be one-step extended in FACTA+ to get its related concepts. Then, to evaluate the weight, we get information by two ways: searching Google for news, talks, and homepages and searching PIE the search [] for papers and other documents. To search Google, we write a C# program which autosearches the predefined credible websites with Google service using keywords including name/affiliation of the annotator, scientific name of the biomedical entity, extended concept, or attribute name of the biomedical entity as a plus. On the other hand, we apply and evaluate PIE the search to count the documents that indicate their semantic relationship. The resulting corpus contains a set of medical articles in XML format. From each article we construct a text file by extracting relevant fields such as the title, the summary, and the body (if they are available). […]

Pipeline specifications

Software tools FACTA+, PIE the search
Application Information extraction