The Idea
This project started off with the goal of creating a pure deep learning chatbot. It ended with generating play scripts using a combination of models.
The Data
I created a training dataset using the Shakespeare corpus and a dataset of mystery movie scripts. I removed non stage-direction and lines, separated them into sequences and created special tokens to represent if they were action or lines.
The Training
I used the GPT2 pretrained model as a starting point. I finetuned two version, one I used the Shakespeare dataset and the other the mystery script dataset.
The Result
To generate the script, I provided a short text as context. I gave the Shakespeare model the title "Padronicus" I used my special token to then let the model generate text until it returned the end sequence token. This was then passed over to the Mystery model titled "Samantha." Once again' the model received a title and the special token. This process was repeated for a number of steps and a number of times to generate a handful of dialogues between the two models.