Email updates

Keep up to date with the latest news and content from Journal of Biomedical Semantics and BioMed Central.

Open Access Highly Accessed Research

Applying semantic web technologies for phenome-wide scan using an electronic health record linked Biobank

Jyotishman Pathak1*, Richard C Kiefer2, Suzette J Bielinski3 and Christopher G Chute1

Author Affiliations

1 Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA

2 Department of Information Technology, Mayo Clinic, Rochester, MN, USA

3 Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA

For all author emails, please log on.

Journal of Biomedical Semantics 2012, 3:10  doi:10.1186/2041-1480-3-10

Published: 17 December 2012

Abstract

Background

The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form “biobanks” where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on a large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypotheses generation.

Results

In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped for Type 2 Diabetes and Hypothyroidism to discover gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries.

Conclusions

This study demonstrates how Semantic Web technologies can be applied in conjunction with clinical data stored in EHRs to accurately identify subjects with specific diseases and phenotypes, and identify genotype-phenotype associations.