About RST

RST is a theory which aims at investigating text coherence, specially regarding relations held between parts of text, both in macro and microstructure (MANN and THOMPSON, 1988). RST has been used in Descriptive Linguistics and in Computational Linguistics. 

In Linguistics, RST is a framework for the analysis of texts. It is very useful for the description of the superstructure of diverse text genres, as pointed in researches in Brazilian Portuguese by Giering (2009), Decat (2010), Antonio (2004). Besides that, RST is a prominent theory in Functional Linguistics regarding the investigation of clause combining, describing the relations which are held between clauses in microstructure (MATTHIESSEN and THOMPSON, 1988).

A relevant research point in RST is the investigation of the linguistic means used by speakers to signal relations. According to RST, these relations are of meaning, and not formal (MANN and THOMPSON, 1988). This means that relations can be held and interpreted without being formally marked by connectives, for example. Thus, other linguistic means of signaling relations must be investigated in Phonology (Intonation in spoken language, for example), Morphology, Syntax, text genre etc (TABOADA, 2006).

A relevant aspect of RST is the fact that the theory can be applied to any language and that it can be used to describe almost all text genres, according to Marcu (2000). Many languages have already been annotated using RST: Carlson et al. (2002) annotated manually newspaper articles in English. Taboada and Renkema (2011) annotated, besides newspaper articles, advertisements, letters, magazine articles, scientific papers, books reviews and opinative articles. Stede (2004) annotated newspaper articles in German. Pardo and Seno (2005) annotated texts about computing in Brazilian Portuguese. Cardoso et al. (2011) have annotated a corpus of news texts written in Brazilian Portuguese. Da Cunha et al. (2011) annotated scientific papers of diverse areas in Spanish. Iruskieta et al. (2013) annotated abstracts of scientific texts in Basque. Iruskieta et al. (Forthcoming) annotated a multilingual corpus of scientific texts in English, Spanish and Basque.

There are many tools which make the annotation and revision tasks easier. It is the case of RSTTool, which is very useful for the annotation of rhetorical structure (O’DONNEL, 2000); RSTeval can be used for the automatic evaluation of rhetorical structures of pairs in the same language (MAZEIRO and PARDO, 2009) (implemented for English, Portuguese, Spanish and Basque); and RhetDatabase is a database in which the cues that signal rhetorical relations can be annotated (PARDO, 2005) both for linguistic and computational use. And some websites (Basque corpus at http://ixa2.si.ehu.eus/diskurtsoa/en/; English, Spanish and Basque multilingual corpus at http://ixa2.si.ehu.eus/rst/, and Portuguese corpus at http://ixa2.si.ehu.eus/rst/pt) where the user may look up: all the occurrences of any relation in the corpus, the relations of a chosen text, the linear segmentation of a text, the rhetorical relations that are linked to the central unit in the discourse structure, the signals of the rhetorical relations, and any information in the corpus based on part of speech.

Tools for performing automatic tasks have been designed using RST: automatic parsers for English (MARCU, 2000; TOFILOSKI and BROOKE ET AL., 2009), for Brazilian Portuguese (PARDO, 2006) and for Spanish (DA CUNHA and SAN JUAN ET AL., 2012) and for Basque (IRUSKIETA and DÍAZ-DE-ILARRAZA ET AL., 2011).

Within RST framework, many software applications for automatic discourse analysis have been designed: for example, there are analyzers for Japanese (SUMITA and ONO ET AL., 1992), for English (CORSTON-OLIVER, 1998; MARCU, 2000; HANNEFORTH and HEINTZE ET AL., 2003), for Spanish (MAZIERO ET AL., 2011) and for Brazilian Portuguese (PARDO and NUNES ET AL., 2004).

Although there are RST investigations about parallel corpora (ABELEN, REDEKER ET AL., 1993; CUI, 1986; MARCU, CARLOS ET AL., 2000; GUY, 2001; TABOADA, 2004, among others), as differences in rhetorical structure can be implemented in translation rules which can be implemented in automatic translation tasks (KORZEN and GYLLING, 2012) or how they affect translation strategies in rhetorical structures (DA CUNHA and IRUSKIETA, 2010; IRUSKIETA, DA CUNHA ET AL. FORTHCOMING), there is no parallel corpus annotated in discourse level which can be used as reference for scientific community and for eventual computational exploitation in automatic translation tasks.

Other researchers have used RST for different tasks. For example, Cardoso et al. (2013) have developed and evaluated a set of methods for subtopic segmentation of news texts. Their results show that discourse organization mirrors subtopic changes in a text.

A good summary of what has been done about and with RST is available at Taboada and Mann (2006) and there is plenty of information about the theory at http://www.sfu.ca/rst.



ABELEN, E.; REDEKER, G.; THOMPSON, S.A. The rhetorical structure of US-American and Dutch fund-raising letters.Text 13(3), 1993. p. 323-350.

ANTONIO, J.D. Estrutura retórica e articulação de orações em narrativas orais e em narrativas escritas do português. Araraquara, 2004. Tese (Doutorado em Linguística e Língua Portuguesa). Faculdade de Ciências e Letras/ Unesp/ Araraquara.

CARDOSO, P.C.F.; MAZIERO, E.G.; JORGE, M.L.C.; SENO, E.M.R.; DI FELIPPO, A.; RINO, L.H.M.; NUNES, M.G.V.; PARDO, T.A.S. CSTNews - A Discourse-Annotated Corpus for Single and Multi-Document Summarization of News Texts in Brazilian Portuguese. In: Proceedings of the 3rd RST Brazilian Meeting, 2011. p. 88-105. Cuiabá/MT, Brasil.

CARDOSO, P.C.F.; TABOADA, M.; PARDO, T.A.S. On the contribution of discourse to topic segmentation. In: Proceedings of the 14th Annual SIGDial Meeting on Discourse and Dialogue, 2013. p. 92-96. Metz, França.

CARLSON, L. et al. RST Discourse Treebank, LDC2002T07 [Corpus].Philadelphia: PA: Linguistic Data Consortium. 2002.

CORSTON-OLIVER, S. Identifying the linguistic correlates of rhetorical relations, Proceedings of the ACL Workshop on Discourse Relations and Discourse Markers 1998, pp. 8–14.

CUI, S. A comparison of English and Chinese expository rhetorical structures. University of California (USA): UCLA dissertation, 1986.

DA CUNHA, I., SAN JUAN, E., TORRES-MORENO, J.M., LLOBERESE, M.; CASTELLÓNE, I. DiSeg 1.0: The first system for Spanish discourse segmentation. Expert Systems with Applications, 39(2), 2012. p. 1671-1678.

DA CUNHA, I. and IRUSKIETA, M. Comparing rhetorical structures in different languages: The influence of translation strategies. Discourse Studies, 12(5), 2010. p. 563-598.

DA CUNHA, I., TORRES-MORENO, J. and SIERRA, G., 2011. On the Development of the RST Spanish Treebank, 5th Linguistic Annotation Workshop (LAW V '11). Association for Computational Linguistics, pp. 1-10. 23 June 2011.

DECAT, M. B. N. Estrutura retórica e articulação de orações em gêneros textuais diversos: uma abordagem funcionalista. In: MARINHO, J. H. C.; SARAIVA, M. E. F. (Org.). Estudos da língua em uso: da gramática ao texto. Belo Horizonte: Editora UFMG, 2010, p. 231-262.

GIERING, M. E. A organização retórica de artigos de divulgação científica midiática e a organização sequencial do texto. Desenredo (PPGL/UPF), v. 5, p. 78-99, 2009.

HANNEFORTH, T., HEINTZE, S. and STEDE, M. Rhetorical parsing with underspecification and forests, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers-Volume 2 2003, Association for Computational Linguistics, 2003. p. 31-33.

IRUSKIETA, M., ARANZABE, M., DÍAZ DE ILARRAZA, A., GONZALEZ, I., LERSUNDI, M.; LOPEZ DE LACALLE, O. The RST Basque TreeBank: an online search interface to check rhetorical relations. 4th Workshop RST and Discourse Studies, 40-49, Sociedad Brasileira de Computacao, Fortaleza, CE, Brasil. OCTOBER 20-24, 2013.

IRUSKIETA, M., DA CUNHA, I.; TABOADA. A Qualitative Comparison Method for Rhetorical Structures: Identifying different discourse structures in multilingual corpora. Language Resources and Evaluation. Forthcoming.

IRUSKIETA, M., DÍAZ-DE-ILARRAZA, A.; LERSUNDI, M., 2011. Bases para la implementación de un segmentador discursivo para el euskera, 8th Brazilian Symposium in Information and Human Language Technology (STIL 2011), OCTOBER 2011.

KORZEN, I.; GYLLING, M. Text structure in a contrastive and translational perspective: on information density and clause linkage in italian and danish. Translation: Computation, Corpora, Cognition, 2(1), 2012. p. 23-46.

MANN, W.C.; THOMPSON, S.A. Rhetorical Structure Theory: Toward a functional theory of text organization. Text-Interdisciplinary Journal for the Study of Discourse, 8(3), 1988. p. 243-281.

MARCU, D. The theory and practice of discourse parsing and summarization. Cambridge: The MIT press, 2000.

MARCU, D.; CARLSON, L.; WATANABE, M. The automatic translation of discourse structures. Paper presented at the 1st North American chapter of the Association for Computational Linguistics conference, Seattle (USA), April 30-May 4, 2000.

MATTHIESSEN, C.; THOMPSON, S.The structure of discourse and ‘subordination’. In: HAIMAN, J.; THOMPSON, S. (Eds.) Clause Combining in Grammar and Discourse. Amsterdam/Philadelphia: J. Benjamins, 1988. p. 275-329.

MAZEIRO, E.G.; PARDO, T.A.S., 2009.Metodologia de avaliaçãoautomática de estruturasretóricas [Methodology for automatic evaluation of rhetorical structures], 7th Brazilian Symposium in Information and Human Language Technology (STIL), 8-11 September 2009.

MAZIERO, E.G.; PARDO, T.A.S.; DA CUNHA, I.; TORRES-MORENO, J.M.; SANJUAN, E. DiZer 2.0 - An Adaptable On-line Discourse Parser. In: Proceedings of the 3rd RST Brazilian Meeting, pp. 1-17. October 26, 2011. Cuiabá/MT, Brasil.

O'DONNELL, M., 2000. RSTTool 2.4: a markup tool for Rhetorical Structure Theory, First International Conference on Natural Language Generation INLG '00, June12-16 2000, ACL, pp. 253-256.

PARDO, T.A.S. SENTER: um segmentador sentencial automático para o português do Brasil. 2006.

PARDO, T.A.S. Métodos para análise discursiva automática [Methods for automatic discourse analysis], Instituto de Ciências Matemáticas e de Computação, 2005.

PARDO, T.A.S., NUNES, M.G.V. and RINO, L.H.M. DiZer: An Automatic Discourse Analyzer for Brazilian Portuguese. Advances in Artificial Intelligence–SBIA 2004. p. 224-234.

PARDO, T.A.S. and SENO, E.R.M., 2005. Rhetalho: um corpus de referência anotado retoricamente [Rhetalho: a rhetorically annotated reference corpus], Anais do V Encontro de Corpora, 24-25 November 2005.

GUY, R. Rhetorical styles and news texts: A contrastive analysis of rhetorical relations in Chinese and Australian news-journal text. ASAA E-Journal of Asian Linguistics and Language-teaching 1(1), 2001. p. 1-22.

SCOTT, D.R., DELIN, J.; HARTLEY, A.F. Identifying congruent pragmatic relations in procedural texts. Languages in Contrast 1(1), 1998. p. 45-82.

STEDE, M. The Potsdam commentary corpus, 2004 ACL Workshop on Discourse Annotation, 25-26 July 2004, Association for Computational Linguistics, pp. 96-102.

SUMITA, K., ONO, K., CHINO, T.; UKITA, T. A discourse structure analyzer for Japanese text, 1992, ICOT, p. 1133-1140.

TABOADA, M. Building coherence and cohesion: Task-oriented dialogue in English and Spanish. Amsterdam and Philadelphia: John Benjamins, 2005.

TABOADA, M.; MANN, W.C. Applications of Rhetorical Structure Theory. Discourse Studies, 8(4), 2006. p. 567-588.

Taboada, M. Discourse Markers as Signals (or Not) of Rhetorical Relations. Journal of Pragmatics, 38(4), 2006. p. 567-592.

TABOADA, M.; RENKEMA, J. 2011-last update, Discourse Relations Reference Corpus2012].

TOFILOSKI, M., BROOKE, J.; TABOADA, M. A syntactic and lexical-based discourse segmenter, 47th Annual Meeting of the Association for Computational Linguistics, 2-7 August 2009, ACL, p. 77-80