Table 4

Number of correct and incorrect terms for each of the rewrite and suppression rules.

Rule

Most frequent

Random


Rewrite rules

Correct

Incorrect

Correct

Incorrect


Syntactic inversion

50

0

100

0

Possessives

50

0

100

0

Short/long form

49

1

98

2

Angular brackets

50

0

97

3

Semantic type

50

0

100

0

Begin parentheses

1

25

-

-

End parentheses

49

1

96

4

Begin brackets

38

12

91

9

End brackets

46

4

95

5


Suppression rules


Dosages

50

0

100

0

Short token

50

0

100

0

At-sign

-

-

-

-

EC numbers

50

0

99

0

Any classification

50

0

100

0

Any underspecification

50

0

100

0

Miscellaneous

50

0

100

0

Words > 5

0

50

5

95


The calculations are based on the, for every rule, 50 most frequently found terms in the corpus and 100 randomly selected terms in the corpus (if available). The At-sign rule has no values because terms suppressed by this rule were not found in the corpus.

Hettne et al. Journal of Biomedical Semantics 2010 1:5   doi:10.1186/2041-1480-1-5

Open Data