About RST

RST is a theory which aims at investigating text coherence, specially regarding relations held between parts of text, both in macro and microstructure (MANN and THOMPSON, 1988). RST has been used in Descriptive Linguistics and in Computational Linguistics. 

In Linguistics, RST is a framework for the analysis of texts. It is very useful for the description of the superstructure of diverse text genres, as pointed in researches in Brazilian Portuguese by Giering (2009), Decat (2010), Antonio (2004). Besides that, RST is a prominent theory in Functional Linguistics regarding the investigation of clause combining, describing the relations which are held between clauses in microstructure (MATTHIESSEN and THOMPSON, 1988).

A relevant research point in RST is the investigation of the linguistic means used by speakers to signal relations. According to RST, these relations are of meaning, and not formal (MANN and THOMPSON, 1988). This means that relations can be held and interpreted without being formally marked by connectives, for example. Thus, other linguistic means of signaling relations must be investigated in Phonology (Intonation in spoken language, for example), Morphology, Syntax, text genre etc (TABOADA, 2006).

A relevant aspect of RST is the fact that the theory can be applied to any language and that it can be used to describe almost all text genres, according to Marcu (2000). Many languages have already been annotated using RST: Carlson et al. (2002) annotated manually newspaper articles in English. Taboada and Renkema (2011) annotated, besides newspaper articles, advertisements, letters, magazine articles, scientific papers, books reviews and opinative articles. Stede (2004) annotated newspaper articles in German. Pardo and Seno (2005) annotated texts about computing in Brazilian Portuguese. Cardoso et al. (2011) have annotated a corpus of news texts written in Brazilian Portuguese. Da Cunha et al. (2011) annotated scientific papers of diverse areas in Spanish. Iruskieta et al. (2013) annotated abstracts of scientific texts in Basque. Iruskieta et al. (Forthcoming) annotated a multilingual corpus of scientific texts in English, Spanish and Basque.

There are many tools which make the annotation and revision tasks easier. It is the case of RSTTool, which is very useful for the annotation of rhetorical structure (O’DONNEL, 2000); RSTeval can be used for the automatic evaluation of rhetorical structures of pairs in the same language (MAZEIRO and PARDO, 2009) (implemented for English, Portuguese, Spanish and Basque); and RhetDatabase is a database in which the cues that signal rhetorical relations can be annotated (PARDO, 2005) both for linguistic and computational use. And some websites (Basque corpus at http://ixa2.si.ehu.eus/diskurtsoa/en/; English, Spanish and Basque multilingual corpus at http://ixa2.si.ehu.eus/rst/, and Portuguese corpus at http://ixa2.si.ehu.eus/rst/pt) where the user may look up: all the occurrences of any relation in the corpus, the relations of a chosen text, the linear segmentation of a text, the rhetorical relations that are linked to the central unit in the discourse structure, the signals of the rhetorical relations, and any information in the corpus based on part of speech.

Tools for performing automatic tasks have been designed using RST: automatic parsers for English (MARCU, 2000; TOFILOSKI and BROOKE ET AL., 2009), for Brazilian Portuguese (PARDO, 2006) and for Spanish (DA CUNHA and SAN JUAN ET AL., 2012) and for Basque (IRUSKIETA and DÍAZ-DE-ILARRAZA ET AL., 2011).

Within RST framework, many software applications for automatic discourse analysis have been designed: for example, there are analyzers for Japanese (SUMITA and ONO ET AL., 1992), for English (CORSTON-OLIVER, 1998; MARCU, 2000; HANNEFORTH and HEINTZE ET AL., 2003), for Spanish (MAZIERO ET AL., 2011) and for Brazilian Portuguese (PARDO and NUNES ET AL., 2004).

Although there are RST investigations about parallel corpora (ABELEN, REDEKER ET AL., 1993; CUI, 1986; MARCU, CARLOS ET AL., 2000; GUY, 2001; TABOADA, 2004, among others), as differences in rhetorical structure can be implemented in translation rules which can be implemented in automatic translation tasks (KORZEN and GYLLING, 2012) or how they affect translation strategies in rhetorical structures (DA CUNHA and IRUSKIETA, 2010; IRUSKIETA, DA CUNHA ET AL. FORTHCOMING), there is no parallel corpus annotated in discourse level which can be used as reference for scientific community and for eventual computational exploitation in automatic translation tasks.

Other researchers have used RST for different tasks. For example, Cardoso et al. (2013) have developed and evaluated a set of methods for subtopic segmentation of news texts. Their results show that discourse organization mirrors subtopic changes in a text.

A good summary of what has been done about and with RST is available at Taboada and Mann (2006) and there is plenty of information about the theory at http://www.sfu.ca/rst.



