The advent of controlled vocabularies used for gene product annotation has had a deep impact on life science research, since it was a prerequisite for the analysis of high-throughput screens and the cross-referencing between databases of different model organisms and different types of data. Successful vocabularies in the life sciences range from formal ontologies defined in description logics via directed acyclic graphs to hierarchical terminologies which define narrower and broader terms. Over the past years, numerous ontologies have been created as evidenced by over 90 ontologies listed by the Open Biomedical Ontology (OBO) Foundry. Creating ontologies is a labour-intensive, difficult, manual process, which is supported by dedicated ontology editors. Recently, there have been efforts to alleviate these difficulties through text-mining, which comprises a host of techniques from natural language processing to statistics.

