Scheherazade is a software tool and platform for symbolically encoding narratives using the Story Intention Graph or SIG representation.

"Symbolically encoding" means taking two kinds of input, a source text and a controlled vocabulary of structured data (a kind of dictionary of story elements), and recreating the story as well as possible using only the story elements in the controlled vocabulary. It's kind of like translation, except you are translating into a machine-readable language rather than another natural language.

I developed Scheherazade as part of my thesis, which was about various approaches to modeling narrative discourse in machine-readable forms. This document will teach you how to use Scheherazade to create your own story encodings.

The controlled vocabulary that Scheherazade uses -- a representation I call the Story Intention Graph, or SIG -- assumes that stories are made of:
  • nouns, which can be found in an electronic noun dictionary
  • verbs and the selectional restrictions of their semantic roles, which can be found in an electronic verb frame dictionary
  • adjectives, adverbs and other modifiers
  • a notion of the underlying timeline of a story, including states in time, and references to hypothetical actions that don't actually take place
  • a strong sense of agency among characters in the story: that they have inner worlds and desires
  • goals that those characters have
  • plans that those characters devise to reach those goals
  • attempts and by those characters to reach their goals
  • the outcomes of those attempts and the resulting affectual impacts

This is a list of elements which I've developed as the "words" in a dictionary of story elements with which we can retell all kinds of textual and nontextual narratives. All these elements fit together in one large interconnected graph, with nodes and connecting arcs. Scheherazade is a tool that makes it possible for trained annotators to perform this encoding/retelling on any story they wish. In practice, I ran three experiments:
  • having annotators carefully retell small Aesop fables in terms of nouns, verbs, adjectives and time, but not dealing with agency, goals, plans, attempts or outcomes
  • having annotators carefully retell small Aesop fables in terms of all the above elements
  • having annotators carefully retell a set of longer, more complicated and more varied stories, such as Beowulf¬†and contemporary nonfiction, in terms of time, agency, goals, plans attempts and outcomes -- but not nouns, verbs or adjectives

My main research result was that the true essence of a story was the latter half of the list, time and agency, because those features allowed me to build a system that could find similarities and analogies across different stories. The top half of the list, with more language-level features, is where most effort lies in computational linguistics -- but they are not as useful for studying narrative in particular. This points to a future where we can develop systems that listen to us tell our stories, and understand what we mean and where we are coming from. This work is a small step in that direction.

I hope Scheherazade can be useful to you in your experiments as well. What follows is a tutorial that shows how to use Scheherazade by example. Namely, together we will encode an Aesop fable using all of the above types of symbols.

Please download the software package from this page and follow the link below to begin. If you have any questions or concerns, or would like to contribute to the tutorial, please do not hesitate to contact me at delson [at] cs [dot] columbia [dot] edu. Thank you and happy annotating!
--David Elson