This article is part of the supplement: Proceedings of the Fourth International Symposium on Semantic Mining in Biomedicine (SMBM)

Open Access Open Badges Research

Assessment of NER solutions against the first and second CALBC Silver Standard Corpus

Dietrich Rebholz-Schuhmann1*, Antonio Jimeno Yepes1, Chen Li1, Senay Kafkas1, Ian Lewin1, Ning Kang2, Peter Corbett3, David Milward3, Ekaterina Buyko4, Elena Beisswanger4, Kerstin Hornbostel4, Alexandre Kouznetsov5, René Witte6, Jonas B Laurila5, Christopher JO Baker5, Cheng-Ju Kuo7, Simone Clematide8, Fabio Rinaldi8, Richárd Farkas9, György Móra9, Kazuo Hara10, Laura I Furlong11, Michael Rautschka11, Mariana Lara Neves12, Alberto Pascual-Montano12, Qi Wei13, Nigel Collier13, Md Faisal Mahbub Chowdhury14, Alberto Lavelli14, Rafael Berlanga15, Roser Morante16, Vincent Van Asch16, Walter Daelemans16, José Luís Marina17, Erik van Mulligen2, Jan Kors2 and Udo Hahn4

  • * Corresponding author: Dietrich Rebholz-Schuhmann

Author Affiliations

1 EMBL Outstation, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, U.K

2 Dept. of Medical Informatics, Erasmus University Medical Center, Rotterdam, NL

3 Linguamatics Ltd, St. John's Innovation Centre, Cambridge, U.K

4 Language & Information Engineering (JULIE) Lab, Friedrich-Schiller-Universität, Jena, Germany

5 Dept. of Computer Science & Applied Statistics, University of New Brunswick, Canada

6 Dept. of Computer Science & Software Engineering, Concordia University, Montreal, Canada

7 Institute of Information Science, Academia Sinica, Taipei 115, Taiwan

8 University of Zürich, Switzerland

9 Research Group on Artificial Intelligence, Hungarian Academy of Sciences, Hungary

10 Nara Institute of Science and Technology, Nara, Japan

11 Research Programme on Biomedical Informatics (GRIB), IMIM (Hospital del Mar Research Institute), Universitat Pompeu Fabra, Barcelona, Spain

12 National Center for Biotechnology-CSIC, Madrid, Spain

13 National Institute of Informatics, Tokyo, Japan

14 Fondazione Bruno Kessler, Trento, Italy

15 Universitat Jaume I, Spain

16 CLiPS, University of Antwerp, Belgium

17 Complutense University of Madrid, Spain

For all author emails, please log on.

Journal of Biomedical Semantics 2011, 2(Suppl 5):S11  doi:10.1186/2041-1480-2-S5-S11

Published: 6 October 2011



Competitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic groups. To overcome these shortcomings, the CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus (SSC-I). The four semantic groups are chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). This corpus has been used for the First CALBC Challenge asking the participants to annotate the corpus with their text processing solutions.


All four PPs from the CALBC project and in addition, 12 challenge participants (CPs) contributed annotated data sets for an evaluation against the SSC-I. CPs could ignore the training data and deliver the annotations from their genuine annotation system, or could train a machine-learning approach on the provided pre-annotated data. In general, the performances of the annotation solutions were lower for entities from the categories CHED and PRGE in comparison to the identification of entities categorized as DISO and SPE. The best performance over all semantic groups were achieved from two annotation solutions that have been trained on the SSC-I.

The data sets from participants were used to generate the harmonised Silver Standard Corpus II (SSC-II), if the participant did not make use of the annotated data set from the SSC-I for training purposes. The performances of the participants’ solutions were again measured against the SSC-II. The performances of the annotation solutions showed again better results for DISO and SPE in comparison to CHED and PRGE.


The SSC-I delivers a large set of annotations (1,121,705) for a large number of documents (100,000 Medline abstracts). The annotations cover four different semantic groups and are sufficiently homogeneous to be reproduced with a trained classifier leading to an average F-measure of 85%. Benchmarking the annotation solutions against the SSC-II leads to better performance for the CPs’ annotation solutions in comparison to the SSC-I.