This article is part of the supplement: Proceedings of the Bio-Ontologies Special Interest Group Meeting 2010
Automating generation of textual class definitions from OWL to English
1 School of Computer Science, The University of Manchester, Oxford Road, Manchester, M13 9PL, UK
2 European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK
3 Department of Computing, Open University, Walton Hall, Milton Keynes, MK7 6AA, UK
Journal of Biomedical Semantics 2011, 2(Suppl 2):S5 doi:10.1186/2041-1480-2-S2-S5Published: 17 May 2011
Text definitions for entities within bio-ontologies are a cornerstone of the effort to gain a consensus in understanding and usage of those ontologies. Writing these definitions is, however, a considerable effort and there is often a lag between specification of the main part of an ontology (logical descriptions and definitions of entities) and the development of the text-based definitions. The goal of natural language generation (NLG) from ontologies is to take the logical description of entities and generate fluent natural language. The application described here uses NLG to automatically provide text-based definitions from an ontology that has logical descriptions of its entities, so avoiding the bottleneck of authoring these definitions by hand.
To produce the descriptions, the program collects all the axioms relating to a given entity, groups them according to common structure, realises each group through an English sentence, and assembles the resulting sentences into a paragraph, to form as ‘coherent’ a text as possible without human intervention. Sentence generation is accomplished using a generic grammar based on logical patterns in OWL, together with a lexicon for realising atomic entities. We have tested our output for the Experimental Factor Ontology (EFO) using a simple survey strategy to explore the fluency of the generated text and how well it conveys the underlying axiomatisation. Two rounds of survey and improvement show that overall the generated English definitions are found to convey the intended meaning of the axiomatisation in a satisfactory manner. The surveys also suggested that one form of generated English will not be universally liked; that intrusion of too much ‘formal ontology’ was not liked; and that too much explicit exposure of OWL semantics was also not liked.
Our prototype tools can generate reasonable paragraphs of English text that can act as definitions. The definitions were found acceptable by our survey and, as a result, the developers of EFO are sufficiently satisfied with the output that the generated definitions have been incorporated into EFO. Whilst not a substitute for hand-written textual definitions, our generated definitions are a useful starting point.
An on-line version of the NLG text definition tool can be found at http://swat.open.ac.uk/tools/ webcite. The questionaire and sample generated text definitions may be found at http://mcs.open.ac.uk/nlg/SWAT/bio-ontologies.html webcite.