Computational protocol: Collaborative relation annotation and quality analysis in Markyt environment

Similar protocols

Protocol publication

[…] Markyt annotation environment tries to accommodate the particular requirements of each project as best as possible while maintaining a general production life cycle (). As with similar environments, Markyt requires the specification of the documents to be annotated and the types of entities and relations to be considered (as stated in the project’s guidelines) and enables the participation of one or more annotators in the project. Annotation rounds reflect the iterative nature of the production life cycle. The initial round may consider the raw, unannotated documents or a first set of annotations, based on dictionary lookup or the predictions of external automatic systems. The creation of new rounds of annotation will depend on the evaluation of the quality of existing annotations, namely IAA. Markyt allows the possibility to copy and duplicate rounds. The operation of copy annotations enables a second annotator to work over the annotations of the annotator in an independent way, i.e. the second annotator is able to amend the annotations proposed by the first annotator as desired. In turn, the operation of duplicate annotations is a means to create a data checkpoint, i.e. to allow rollback if needed. For example, it may be handy when the annotation guidelines suffer a significant change, but the new round of annotation shows that such change was detrimental to the overall annotation purposes. Once corpus quality is considered acceptable (administrative decision), the final corpus is produced based on user-specified annotation consensus rules (administrative decision) and becomes available for download.Markyt is built on top of open technologies and standards to grant extensibility and interoperability with other systems. Its web model-view-controller (MVC) design pattern and core development are supported by the open-source CakePHP framework []. User interface relies on HTML5 (http://www.w3.org/TR/html5/) and CSS3 technologies (http://www.css3.info/). Rangy library assists in browser-independent implementation of common document object model (DOM) range and selection tasks (http://code.google.com/p/rangy/), and Ajax and JQuery technologies are used to enhance user–system interaction (http://jquery.com/).The following subsections describe the rationale behind the new functionalities and the basics of their implementation. [...] Markyt can import raw, unannotated documents and existing entity and relation annotations. This functionality is useful when taking on existing annotation projects (e.g. extending an entity corpus to a relation corpus) or when the results of automated identification and relation extraction tools are considered useful as the basis of annotation.Currently, Markyt is able to import raw, unannotated documents in TSV and BioC inline XML []. Moreover, the tool is able to import annotations following the formats BRAT standoff annotation [], BioC inline XML [], BioNLP standoff representation [] and BioCreative TSV []. Both documents and annotations are stored in the relational database supporting Markyt operations. Document contents are saved in HTML format with UTF-8 encoding which ensures multilingual support. After reaching annotation consensus, users can export the final corpus in Markyt standoff and inline TSV formats, Markyt JSON or BioC XML. […]

Pipeline specifications

Software tools Markyt, BioC, BRAT, BioCreative
Application Information extraction