Journal of Biomedical Semantics


Open Access Highly Access Research

De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields

Hercules Dalianis* and Sumithra Velupillai

Author Affiliations

Department of Computer and Systems Sciences, (DSV), Stockholm University Forum 100, 164 40 Kista, Sweden

For all author emails, please log on.

Journal of Biomedical Semantics 2010, 1:6 doi:10.1186/2041-1480-1-6

Published: 12 April 2010

Additional files

Additional file 1:

An overview of annotation classes. An overview of annotation classes used for de-identification by different research groups on clinical corpora.

Format: PDF Size: 62KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

(Table 2) Results of the initial experiment using all 28 classes from the automatic Consensus Gold standard. Results of the initial experiment using all 28 classes from the automatic Consensus Gold standard, giving results on exact matches. The different divisions show which classes have been merged for the remaining six experiments on the automatic Consensus Gold standard. The classes Person name and Location have been further collapsed from the original sets of classes. The column Annotated contains the number of annotated classified instances (Gold Standard). The column Retrieved contains the number of retrieved instances (produced by the CRF classifier). The column Relevant contains the number of correctly retrieved instances.

Format: PDF Size: 91KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Results of all seven experiments using the automatic Consensus Gold standard. The total average over all classes is given. For each new experiment, conceptually similar annotation classes are merged into a more general annotation class. The final experiment shows the results of merging all annotation classes into one general PHI class.

Format: PDF Size: 49KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Results of the manual Consensus Gold standard using four-fold cross-evaluation.

Format: PDF Size: 29KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Results of the manual Consensus Gold standard using ten-fold cross-evaluation.

Format: PDF Size: 29KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

Initial annotation classes, used annotation classes and proposed annotation classes. Initial annotation classes are those proposed in [15]. Used annotation classes are those that were used in the creation of the first Gold Standard (100 EPRs), also described in [15]. Proposed annotation classes are the ones proposed in this article, which arose from consensus discussions among the annotators.

Format: PDF Size: 83KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data