De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields
-
* Corresponding author: Hercules Dalianis hercules@dsv.su.se
- Equal contributors
Department of Computer and Systems Sciences, (DSV), Stockholm University Forum 100, 164 40 Kista, Sweden
Journal of Biomedical Semantics 2010, 1:6 doi:10.1186/2041-1480-1-6
Published: 12 April 2010Additional files
Additional file 1:
An overview of annotation classes. An overview of annotation classes used for de-identification by different research groups on clinical corpora.
Format: PDF Size: 62KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 2:
(Table 2) Results of the initial experiment using all 28 classes from the automatic Consensus Gold standard. Results of the initial experiment using all 28 classes from the automatic Consensus Gold standard, giving results on exact matches. The different divisions show which classes have been merged for the remaining six experiments on the automatic Consensus Gold standard. The classes Person name and Location have been further collapsed from the original sets of classes. The column Annotated contains the number of annotated classified instances (Gold Standard). The column Retrieved contains the number of retrieved instances (produced by the CRF classifier). The column Relevant contains the number of correctly retrieved instances.
Format: PDF Size: 91KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 3:
Results of all seven experiments using the automatic Consensus Gold standard. The total average over all classes is given. For each new experiment, conceptually similar annotation classes are merged into a more general annotation class. The final experiment shows the results of merging all annotation classes into one general PHI class.
Format: PDF Size: 49KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 4:
Results of the manual Consensus Gold standard using four-fold cross-evaluation.
Format: PDF Size: 29KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 5:
Results of the manual Consensus Gold standard using ten-fold cross-evaluation.
Format: PDF Size: 29KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 6:
Initial annotation classes, used annotation classes and proposed annotation classes. Initial annotation classes are those proposed in [15]. Used annotation classes are those that were used in the creation of the first Gold Standard (100 EPRs), also described in [15]. Proposed annotation classes are the ones proposed in this article, which arose from consensus discussions among the annotators.
Format: PDF Size: 83KB Download file
This file can be viewed with: Adobe Acrobat Reader
