Topics in machine learning for biomedical literature analysis and text retrieval

Islamaj Doğan, Rezarta; Yeganova, Lana

doi:10.1186/2041-1480-3-S3-S1

Volume 3 Supplement 3

Machine Learning for Biomedical Literature Analysis and Text Retrieval in the International Conference on Machine Learning and Applications 2011

Research
Open access
Published: 05 October 2012

Topics in machine learning for biomedical literature analysis and text retrieval

Rezarta Islamaj Doğan¹ &
Lana Yeganova¹

Journal of Biomedical Semantics volume 3, Article number: S1 (2012) Cite this article

2557 Accesses
2 Citations
Metrics details

Introduction

Biomedical literature articles housed at the National Library of Medicine contain a wealth of scholarly knowledge of significant importance to researchers and health care professionals alike. This wealth of information is essential for researchers in order to build new hypothesis and to validate scientific discoveries and is essential for health care professionals in order to keep up-to-date with health related issues [1].

The ever-expanding volume of biomedical literature publications and other biomedical communications necessitates the work and study on developing better methods of efficiently accessing and retrieving relevant information from these textual resources. The digitizing of medical information particularly necessitates development of methods for efficient automatic text processing of medical and biomedical information. Automatic text processing has in its foundation natural language processing techniques, which combine linguistic knowledge and computer science theory to address the computational aspects of the task. Machine learning algorithms are heavily employed in these applications as is also experienced regularly in many other annual conference meetings.

The special session on "Machine Learning in Biomedical Literature Analysis and Text Retrieval" was held for the second time as part of the 10^th International Conference on Machine Learning and Applications, in Honolulu, Hawaii on December 18-21, 2011. The goal of this session was to present advancements in machine learning techniques that can improve the analysis of biomedical text.

In this supplement we present a collection of papers originally presented and published in the proceedings of the International Conference on Machine Learning and Applications (ICMLA 2011). These papers constitute an advance beyond the work originally presented at the conference and have gone through a separate rigorous review process.

Papers presented in this issue represent a wide cross-section of the type of work that goes on in machine learning today, with its focus on biomedical literature and clinical text. Kate [2] presents an unsupervised method which automatically induces a grammar and a parser for the sublanguage of a given genre of clinical reports from a corpus with no annotations. Author observes that clinical reports are written using a subset of natural language, and different genres of clinical reports use different sublanguages, which makes supervised training of a parser for clinical sentences very difficult. Ravikumar et al. [3] propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. They identify the amino acid residue mentions in the text using linguistic patterns and apply an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs. They demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction. Kim et al. [4] develop an unsupervised document clustering algorithm with a property that clusters are sufficiently explanatory for human understanding. For every cluster they extract subject terms and use them to describe the clusters. Yeganova et al. [5] study methods for automatically learning meaningful biomedical categories in Medline in an unsupervised fashion. They present methods for automatically extracting categories that are discussed in Medline. Rather than imposing external ontologies on Medline, they look for categories that emerge from the text. And, finally, Clematide et al. [6] present a method for extracting and raking the relations among different types of biomedical entities to make the curation process more efficient. Authors make use of existing resources such as Pharmogenomics Knowledge Base (PharmGKB) and the Comparative Toxicogenomics Database (CTD) to create a gold standard.

While covering a wide variety of topics, all papers in the supplement share one common characteristic - a shift from supervised methods towards semi-supervised and unsupervised methods. Authors agree that creating labeled training sets is extremely expensive and time-consuming, as they propose new and creative ways of automatically building training sets and demonstrate resourcefulness by using information from existing knowledge sources for compiling training data.

In conclusion, we thank the reviewers for their hard work and dedication to maintaining a professional review process. We also thank all authors of submitted papers for their diligences in responding to reviewers' comments.

References

Lu Z: PubMed and beyond: a survey of web tools for searching biomedical literature. Database (Oxford). 2011
Google Scholar
Kate R: Unsupervised Grammar Induction of Clinical Report Sublanguage. Journal of Biomedical Semantics. 2012, 3 (Suppl 3): S4-10.1186/2041-1480-3-S3-S4.
Article Google Scholar
Ravikumar K, Liu H, Cohn J, Wall M, Verspoor K: Literature mining of protein-residue associations with graph rules learned through distant supervision. Journal of Biomedical Semantics. 2012, 3 (Suppl 3): S2-
Google Scholar
Kim S, Wilbur WJ: Thematic clustering of text documents using an EM-based approach. Journal of Biomedical Semantics. 2012, 3 (Suppl 3): S6-10.1186/2041-1480-3-S3-S6.
Article Google Scholar
Yeganova L, Kim W, Comeau D, Wilbur WJ: Finding biomedical categories in Medline. Journal of Biomedical Semantics. 2012, 3 (Suppl 3): S3-
Google Scholar
Clematide S, Rinaldi F: Ranking relations between diseases, drugs and genes for a curation task. Journal of Biomedical Semantics. 2012, 3 (Suppl 3): S5-10.1186/2041-1480-3-S3-S5.
Article Google Scholar

Download references

Acknowledgements

Funding: This research was supported by the Intramural Research Program of the NIH, National Library of Medicine.

Author information

Authors and Affiliations

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
Rezarta Islamaj Doğan & Lana Yeganova

Authors

Rezarta Islamaj Doğan
View author publications
You can also search for this author in PubMed Google Scholar
Lana Yeganova
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

RID and LY are the special session co-chairs at ICMLA 2011 and contributed equally to the overall organization, reviewing and editing of this supplement on "Machine Learning for Biomedical Literature Analysis and Text Retrieval".

Rezarta Islamaj Doğan and Lana Yeganova contributed equally to this work.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Islamaj Doğan, R., Yeganova, L. Topics in machine learning for biomedical literature analysis and text retrieval. J Biomed Semant 3 (Suppl 3), S1 (2012). https://doi.org/10.1186/2041-1480-3-S3-S1

Download citation

Published: 05 October 2012
DOI: https://doi.org/10.1186/2041-1480-3-S3-S1

Machine Learning for Biomedical Literature Analysis and Text Retrieval in the International Conference on Machine Learning and Applications 2011

Topics in machine learning for biomedical literature analysis and text retrieval

Introduction

References

Acknowledgements

Author information

Authors and Affiliations

Additional information

Competing interests

Authors' contributions

Rights and permissions

About this article

Cite this article

Keywords

Journal of Biomedical Semantics

Contact us

Machine Learning for Biomedical Literature Analysis and Text Retrieval in the International Conference on Machine Learning and Applications 2011

Topics in machine learning for biomedical literature analysis and text retrieval

Introduction

References

Acknowledgements

Author information

Authors and Affiliations

Additional information

Competing interests

Authors' contributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Journal of Biomedical Semantics

Contact us