Workshop Papers‎ > ‎

Comparing scientific discourse annotation schemes for enhanced knowledge extraction

Anita de Waard1, Paul Thompson2, Maria Liakata3, Raheel Nawaz2 and Sophia Ananiadou2

Elsevier Labs

2National centre for Text Mining, University of Manchester

3 University or Aberystwyth and Eupoean Bioinformatics Institute

Whilst various text mining tools have been developed to extract pieces of knowledge from scientific texts, the discourse context of this extracted knowledge is often not taken into account. A particular piece of knowledge may, for example, represent an accepted fact, a hypothesis, the results of an experiment, an analysis based on experimental results etc. Furthermore, this knowledge may represent the author's current work, or work reported elsewhere. The ability to recognise different discourse elements automatically provides vital information for the correct interpretation of extracted knowledge, allowing, for example, scientific claims to be linked to experimental evidence, or for newly reported experimental knowledge to be isolated.

In this paper, we will compare three different schemes for concerning the annotation of different discourse elements within scientific papers. The schemes have different perspectives: one is driven by the need to describe biomedical events [1], another is content-driven, seeking to identify the main components of a scientific investigation [2] and the third focusses how epistemic knowledge is conveyed in discourse [3]. The schemes vary in both the types of discourse elements identified and the granularity of the units to which the annotation is applied, i.e. complete sentences [2], segments of sentences [3], or specific relationships/events occurring within these sentences [1]. The comparison of the annotation schemes is facilitated through the annotation of three full papers according to each of the schemes. The comparison will consider the relative merits of each scheme, and how the information annotated by the different schemes can complement each other to provide enriched details about knowledge extracted from the texts.

[1] Raheel Nawaz, Paul Thompson, John McNaught and Sophia Ananiadou (2010). "Meta-Knowledge Annotation of Bio-Events". Proceedings of LREC 2010, pp. 2498-2507

[2] Maria Liakata, Simone Teufel, Advaith Siddharthan and  Colin R. Batchelor (2010) Corpora for the Conceptualisation and Zoning of Scientific Papers. In Proceedings of LREC 2010, pp 2054 - 2061

[3] de Waard, A. (2009b), Categorizing Epistemic Segment Types in Biology Research Articles. Workshop on Linguistic and Psycholinguistic Approaches to Text Structuring (LPTS 2009), September 21-23 2009.