Automatic Identification of Discourse Relations in Text
Svetlana Stoyanchev
SLTC Newsletter, February 2011
Structure makes text coherent and meaningful. Discourse relations (also known as rhetorical and coherence relations) link clauses in text and compose overall text structure. Discourse relations are used in natural language processing, including text summarization and natural language generation. The area of discourse has become a prominent part of natural language processing. Discourse theory was formalized by Mann and Tompson in 1988 [1]. It proposes a set of relations such as evidence, concession, justification, background, circumstance, etc. These relations were further refined, extended, and applied to corpus annotations. In a recent SLTC Newsletter Article, Annie Louis described the Penn Discourse Treebank (PDTB) [2], a one million word corpus annotated with discourse. In this article we discuss human agreement in discourse annotation task of PDTB and RST (another publicly available resource annotated with discourse) and review approaches to automatic discourse identification.
Uses of Discourse
Discourse relations are extensively used in natural language processing applications, including text summarization and natural language generation. In summarization, discourse relations help identify which text should be included in a summary [5, 6] and produce appropriate ordering of sentences in a summary [7]. In generating speech from text, discourse relations are used to achieve a higher prosodic quality of the speech [8]. In our current work on dialogue generation from text we rely on discourse relations parsing as initial text processing step [9]. Education is another field that uses discourse relations theory. In automatic essay scoring, discourse relations have shown to improve performance [10]. In automated tutoring systems, discourse structure of the tutoring dialogue is used for studen's performance analysis [11]. In question answering, discourse relations about reason can help identify answers to "why" questions.
Practical NLP applications rely on automatic detection of discourse relation. Louis et al. [13] notes that structure of discourse relations is the most useful for content selection in summarization task, but "the state of current parsers might limit the benefits obtainable from discourse".
Human Agreement on Annotations
Performance of automatic classifiers is bound by the inter-annotator agreement on the experimental dataset. If human annotators do not reliably agree on their tag assignments, automatic algorithms can not outperform them.
In the RST corpus annotation [2], the authors report a marked improvement in annotation agreement over a 10 month period. Inter-annotator agreement of assigning relation rises from kappa = 0.60 in April 2000 to 0.76 in January 2001. This shows that even for humans, discourse relation detection is a very difficult task. To achieve a reliable level of discourse annotation, people have to be extensively trained.
RST annotation scheme posits a single tree structure on the document. RST annotation involves 1) segmenting text and 2) assigning a discourse relation from a list of 110 relations (which is a rather daunting task). PDTB[1] annotation scheme, on the other hand, does not assume a particular structure. In comparison with RST annotation process, PDTB annotation involves 1) span detection for explicit and implicit discourse connectives ("but", "because", "when", etc.) and 2) disambiguation of discourse connectives. In an implicit relation where discourse connective is not present, PDTB annotator first chooses the most applicable connective. The PDTB annotation scheme allows annotators to choose between three levels of granularity: four classes at the topmost level - "Comparison", "Contingency", "Temporal" and "Expansion" that are further subdivided into types and subtypes. On the topmost level inter-annotator agreement is 94%, while in the lowest subtype level (which corresponds to the RST annotations) it is 80%.
Observation on Tree Structure in RST Annotation
Relations on the leaf (lowest) level of RST tree correspond to relations within a sentence or between consecutive sentences. Relations on the higher level of the tree correspond to the relations between multiple sentences or paragraphs. In our attempt to annotate full tree structure on a monologue paraphrase of an expository dialogue [14], we found practically no agreement on the higher level relations and structure of RST trees. The agreement on the leaf level is 0.62, comparable with the initial (before training) RST annotations. This lack of agreement on higher level of the tree may be caused by the type of data set, or by the fact that discourse relations between multiple sentences are more ambiguous. Further experiments are needed to determine the cause for the lower agreement between leaf and higher levels of the tree.
Automatic Detection of Discourse Relations
Automatic detection of discourse relations was applied and tested on different data sets:
- detection of relations within a sentence
- identification of explicit (signalled by a discourse connective) relations
- identification of implicit (not signalled) discourse relations
- RST discourse parsing and identifying structure of the overall text
Soricut and Marcu (2003) [15] use RST corpus to train and test a sentence-level discourse parser. The authors use lexical and syntactic information in a sentence to first, identify segments, and second, to identify discourse relations between them. They report that segmentation task achieves 0.85 f-score and is not strongly affected by syntactic parser's errors. They find that relation tagging task on manually segmented data achieves f-score 0.75 - not very different from human performance of 0.77.
In an approach to automatic sense disambiguation based on the GraphBank corpus, Wellner et al. [16] use data pre-processing techniques such as event detection, modal parsing (identifying subordinate verb relations and their types), and temporal parsing over events. The authors also use knowledge resources (World Sketch Engine and Brandeis Semantic ontology) for similarity measures. Using maximum entropy classification with an extensive set of linguistically motivated features, their method achieves 81% accuracy on sense assignment task (using a set of 10 coarse-grained relations and assuming that nuclearity- relation direction - is given).
Pitler et al. [17, 18] investigate automatic detection of discourse relations between and within sentences using PDTB corpus evaluating explicit (signalled by a discourse connective) and implicit (not signalled) relations separately. The authors find that recognition of explicitly signalled relations is very good (>90% accuracy). But for relations that are implicitly conveyed, their approach yield accuracy results below 50% for 6-way (4 high level relation classes + Entity relation + no relation) classification. Similarly, 40% accuracy on identifying implicit relations was also achieved by Lin et al. [19].
DuVerle and H. Prendinger [20] develop a full RST structure parser using a range of lexical, semantic, and structural features with Support Vector Machine classification. They report achieving 73% of human inter-annotator agreement f-score. The parser web interface is publicly available.
Conclusions
Discourse relations are useful in many NLP tasks and automatic detection of discourse structure can benefit practical NLP applications. While sentence-level structure can be extracted with accuracy close to human agreement, extracting overall document structure is more challenging. Explicit discourse relations (signalled with a connective such as because, but, however, etc) between clauses can be recognized with high accuracy, however there is room for improvement for recognition of implicit discourse relations, which constitute over 45% in the PDTP corpus.
For more information, see:
References on theory of discourse:
[1] William C. Mann and Sandra A. Thompson, "Rhetorical structure theory: Towards a functional theory of text organization", Text, 8, 1988
[2] Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio Robaldo, Aravind Joshi, and Bonnie Webber, "The Penn Discourse Treebank 2.0", Proceedings of LREC 2008.
[3] Florian Wolf and Edward Gibson, "Representing discourse coherence: A corpus-based study", Computational Linguistics, 2005
[4] Lynn Carlson, Daniel Marcu and Mary Okurowski, "Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory", In Proceedings of the Second SIGdial Workshop on Discourse and Dialog, 2001
References on the use of discourse annotations:
[5] D. Marcu. From discourse structures to text summaries. In In Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization, pages 82-88, 1997.
[6] I. Mani, E. Bloedorn, and B. Gates. Using cohesion and coherence models for text summarization. In AAAI Symposium Technical Report SS-989-06, pages 69-76. AAAI Press, 1998.
[7] R. Barzilay, N. Elhadad, and K. McKeown. Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. (JAIR), 17:35-55, 2002.
[8] M. Theune. Contrast in concept-to-speech generation. Computer Speech & Language, 16(3-4), 2002.
[9] P.Piwek and S. Stoyanchev. Generating expository dialogue from monologue:motivation, corpus and preliminary rules. In Proceedings of 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2010
[10] E. Miltsakaki and K. Kukich. Evaluation of text coherence for electronic essay scoring systems. Natural Language Engineering, 10:25-55, March 2004.
[11] M. Rotaru and D. J. Litman. Discourse structure and performance analysis: Beyond the correlation.
[12] M. Theune. Contrast in concept-to-speech generation. Computer Speech & Language, 16(3-4), 2002.
[13] Annie Louis, Aravind Joshi and Ani Nenkova, Discourse indicators for content selection in summarization, Proceedings of SIGDIAL 2010.
[14] Constructing the CODA Corpus: A Parallel Corpus of Monologues and Expository Dialogues S. Stoyanchev and P. Piwek 7th international conference on Language Resources and Evaluation (LREC) 2010, Malta
References on automatic detection of discourse relations:
[15] R. Soricut and D. Marcu. 2003. Sentence level discourse parsing using syntactic and lexical information. In HLT-NAACL.
[16] B. Wellner, J. Pustejovsky, C. Havasi, A. Rumshisky, and R. Sauri. 2006. Classification of discourse co-herence relations: An exploratory study using multiple knowledge sources. In Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue.
[17] Emily Pitler, Mridhula Raghupathy, Hena Mehta, Ani Nenkova, Alan Lee, Aravind Joshi, "Easily Identifiable Discourse Relations", Proceedings of COLING, 2008.
[18] E. Pitler et al. Automatic sense prediction for implicit discourse relations in text. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2009
[19] Z. Lin, M. Kan and H. T. Ng Recognizing Implicit Discourse Relations in the Penn Discourse Treebank . In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2009
[20] D. A. duVerle and H. Prendinger. A novel discourse parser based on support vector machine classificaion. Proceedings of ACL, 2009
If you have comments, corrections, or additions to this article, please contact the author: Svetlana Stoyanchev, s.stoyanchev [at] open [dot] ac [dot] uk.
Svetlana Stoyanchev is Research Fellow at the Open University. Her interests are in dialogue systems, natural language generation, discourse, and information presentation.

