This article is part of the supplement: Proceedings of the Fourth International Symposium on Semantic Mining in Biomedicine (SMBM)
Event extraction for DNA methylation
1 Department of Computer Science, University of Tokyo, Tokyo, Japan
2 School of Computer Science, University of Manchester, Manchester, UK
3 National Centre for Text Mining, University of Manchester, Manchester, UK
Journal of Biomedical Semantics 2011, 2(Suppl 5):S2 doi:10.1186/2041-1480-2-S5-S2Published: 6 October 2011
We consider the task of automatically extracting DNA methylation events from the biomedical domain literature. DNA methylation is a key mechanism of epigenetic control of gene expression and implicated in many cancers, but there has been little study of automatic information extraction for DNA methylation.
We present an annotation scheme for DNA methylation following the representation of the BioNLP shared task on event extraction, select a set of 200 abstracts including a representative sample of all PubMed citations relevant to DNA methylation, and introduce manual annotation for this corpus marking nearly 3000 gene/protein mentions and 1500 DNA methylation and demethylation events. We retrain a state-of-the-art event extraction system on the corpus and find that automatic extraction of DNA methylation events, the methylated genes, and their methylation sites can be performed at 78% precision and 76% recall.
Our results demonstrate that reliable extraction methods for DNA methylation events can be created through corpus annotation and straightforward retraining of a general event extraction system. The introduced resources are freely available for use in research from the GENIA project homepage http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA webcite.