Previous: Core Goals and Graph Validation
Scheherazade isn't meant for just Aesop fables. The annotation project I ran with it, DramaBank, contains 110 annotations; most of these are Aesop, but there is some epic poetry, a news article, some contemporary nonfiction and some literary short stories. Only the Aesop encodings feature detailed "timeline" annotation, though. I realized over the course of my experiments that making everything linguistically grounded in verb and noun frames was making annotation of longer, more abstract and more complex stories not only very slow, but quite difficult. So for the non-Aesop stories, I asked annotators to do interpretative-layer annotation only by doing only "placeholder" timeline propositions:
highlight a span of the source text that relates to an agent's plan or goal
create a timeline proposition that is simply "X acts" where "X" is the agent in question
place the proposition in the correct temporal position relative to other propositions in the story's timeline, but don't bother with more specific actions or properties
head over to the Interpretations panel and annotate the plan or goal in the usual fashion
This approach allowed the annotators to complete encodings of long stories in just a few hours each. The outlier was one overachieving undergraduate and Medieval studies enthusiast who provided an agent-intentive reading of all of Beowulf! This encoding includes some 476 nodes and 413 arcs, and covers about half of its 25,000 words -- that is, the annotator associated timeline propositions and interpretative-layer content with source text spans that, collectively, spanned about half the original text. This tour de force took a bit more than 15 hours, and according to the student, gave her a new appreciation for the text (as any 15 hour close read would!).
Conversely, part of DramaBank features Aesop fables that have timeline annotation but no interpretative annotation, and yet another part features fables annotated with both "layers". This latter subset was the most interesting to work with, because I could test to see which layer was more helpful at finding story similarities and analogies. I essentially asked, when it comes to understanding which stories are most similar to one another, what's more important: the specific nouns and verbs that the author uses to describe the action, or the underlying kinds of thematic content -- goal and plan structures -- that exist beneath the nouns and verbs?
The interpretative content "won" the contest handily. Goal and plan similarity was a much stronger predictor of overall story similarity (as judged by independent raters) than noun and verb similarity. This suggests that the interpretative-layer symbols, such as the "provides for" arc, meaningfully relate to the way that we humans read, understand and remember the stories we hear.
That's not to say that timeline annotation is unnecessary or not useful -- while it didn't work as well for the similarity task, it could be useful in a variety of other situations, such as serving as a content planner for text generation systems that learn how to "voice" a story in words. There is one exciting project going on at UC Santa Cruz to this end, with Prof. Marilyn Walker and her students experimenting with combining Scheherazade encodings with other, more customizable surface generators.
I hope you enjoyed this walk-through of Scheherazade. Please contact me if you have any questions, comments or bug reports. And if you'd like to see the story encoding that I came up for "The Fox and the Crow" while writing this tutorial, it's here.
If you're a programmer interested in working with Scheherazade as a library, download the latest Linux distribution and check out the API via the Javadoc and the sample application ScheherazadeDemo.java. You might, for example, build an alternate GUI for the underlying story logic engine, or do different kinds of analysis on your own set of encoded stories.
Lastly, if you decide you'd like to cite this work in your own publications, please keep in mind an important distinction:
the Story Intention Graph or SIG is the representation (data structure) I've proposed that describes narratives in terms of goals, plans, beliefs, actions, statives, attempts and outcomes. Like, say, Rhetorical Structure Theory, it may some day be possible to automatically annotate a story into the SIG with intelligent software.
Scheherazade is a particular software tool for annotating stories with the SIG model.
I mention this to dispel the notion that this work is all about large amounts of manual annotation. My hope is to show that the SIG is a meaningful and useful representation for stories, and a worthwhile goal for automatic understanding -- and that in the meantime, we can automatically tag certain aspects of it (like attempts or outcomes) that will give us insights into the way stories work the way they do.Â
Thanks for reading, and happy annotating!