What Defines A Story?

Stories are all around us, and have been argued to the one of the foundations of what makes us human. But, what is a story as a technical definition? How do you separate a statement from a narrative?

Why do we care?

If you could detect what the stories were in a transcript of a conversation, then we could answer many questions. For instance, we could study whether narrative complexity improves with age, or if a child's storytelling ability is reflective of their vocabulary ability. We could study more about the brain and how it functions in terms of creativity.

In this section, our questions are simply:

Is this line in the conversation part of a story or not?

Can we build a computer model that can recognize a story?

Transcript Labeling

The best solution to teaching a computer to identify a story is to first have humans define it. The lab made a coding manual that described what they defined as a story, and how to label a conversation for stories. One of my first tasks on July 6th was following the 23-page manual and labeling 3 conversation transcripts for stories, indicating whether any given sentence was in or out of the story, and why. I completed this on July 22nd (because of Covid).

A page from the labeling manual...

Data Transformation - September 29th

In order to have many, many transcripts, all labeled in accordance with the coding manual, the lab needed volunteers to come in and label these conversations. For perspective, there are 200 transcripts, and we need multiple people to look at each one. The lab is currently in the process of creating a streamlined experimental device that will make this process a lot simpler. We are using an experimental device creation system called Gorilla. My job was transforming the original transcripts into a file that Gorilla could understand so that it would display the transcripts and the options for answering (participants only have to label in or out). My job will also eventually be taking the results that come from Gorilla and transforming them into an understandable format.

My requirements:

  • Each transcript should be a separate excel or csv file (either is fine).

  • Every ten lines of transcript should be transposed into 10 columns titled Transcript1...10.

  • You will also need to add in a series of columns titled 'options1...10' that have "in,out" written if there is a line of transcript present, and is empty if there is not a line of transcript in the corresponding Transcript column. This is important for the final lines of each transcript if the total number of lines is not divisible by 10. Say only 6 lines remain on the final screen, we only want 6 sets of response buttons to appear. Given the constraints of the gorilla platform, this was how I figured out how to remove the extra response buttons.

  • You will also need to add a 'display' column to tell Gorilla what type of task is being used for each line of stimuli. At the start, we'll need one line that has 'Instructions' in the display column with all other columns blank, then the rest should all say 'Transcription'.

My code transformed any number of files in a folder (as long as they are the same format – that was an issue we ran into) and then converts them according to the specifications above. The code acts without much input from the user, except for a file path to the folder containing the files.

Without a loop to run through a folder, the code was finished on October 7th, and then the looped version was finished on October 20th.