Voz is a system that explores techniques for automatic extraction of narrative information from text. Voz combines off-the-shelf NLP tools, common sense knowledge databases and domain knowledge to extract a symbolic representation of a text and compute features related to narrative information.
Voz implements an NLP pipeline reusing several components from open source, readily available NLP toolkits and knowledge bases. Voz is implemented in Java and Python (preview deployment, old version, unstable). Voz relies on several open source NLP toolkits (Parser Services) made available via a webservice (preview deployment, limited old version) available for download as a turnkey solution for Google App Engine (Webapp2).
For additional information or if you use any component from the system, please cite either of the papers below.
J. Valls-Vargas, J. Zhu, S. Ontañón (2015). Narrative Hermeneutic Circle: Improving Character Role Identification from Natural Language Text via Feedback Loops. IJCAI 2015. [PDF]
J. Valls-Vargas, S. Ontañón, J. Zhu (2014). Toward Automatic Character Identification in Unannotated Narrative Text. INT 7 at ELO 2014. [PDF]
The system is currently under active development. Any updates will be posted in this page.
Weka package implementing the continuous (or generalized) Jaccard distance [ZIP]
How to install? In the package manager, select unofficial [PNG]
How to use? Select it in an algorithm that uses a distance measure, i.e., IBk [PNG]
Parser Services: Webservice for Stanford Parser, Stanford CoreNLP, Apache OpenNLP and Berkeley Parser
The following packages contain the datasets used in our publications.
Dataset used in our paper at INT 2014.
Dataset used in our paper at AIIDE 2014.
Dataset used in our paper at IJCAI 2015 (dataset available on request email@example.com).
Dataset used for our experiments in our user study (dataset available on request firstname.lastname@example.org).
Please note the dataset currently does not contain the full text of the stories.
This is the link to our user study. We collected data on October 31st 2017. Responses after this date may not be considered.