The Impersonation Game
Louise Bryce
Music and Speech Modelling 2018
Abstract
This project examines four different impersonations of real people, two male and two female, to discuss what makes these impersonations good or bad. It uses frequency and word length to look at the similarities and differences in this speech using Praat at MATLAB analysis. Through this Zetterholms literature is discussed and is used as a foundation for the research. Overall no gender difference was found in the analysis and overall mean values between the original and impersonation were always very close.
Introduction
Impersonations are a form of entertainment widely used in both television and film to both serious and comedic effect. Actors truly feel their character and try to represent their mannerisms in a sometimes exaggerated way. This project explores for of these performances -
James Franco as Tommy Wiseau in The Disaster Artist (2017)
Colin Firth as King George VI in The Kings Speech (2011)
Natalie Portman as Jackie Kennedy in Jackie (2016)
Meryl Streep as Margaret Thatcher in The Iron Lady (2011)
All of these performances won the actors an award for their portrayals including the best actor academy award.
Literature Review
"An impersonator is used to resembling other people and has the ability to pretend and make other people believe that they are another person. "
(Zetterholm, 2006)
Imitations are an important way for young children to learn how to speak a language (Zetterholm, 1997), copying the phoetic patterns of speech helps with this learning. Imitations are also used in art to allow the performer to exaggerate features mostly for comedic effect. Impersonations take this a step further, with the performer embodying the person they are portraying, be it for comedic or serious effect (Zetterholm, 2006). In 1997 Zetterholm conducted a study and found that “It is possible that it is easier to impersonate someone who has a very special voice with characteristic features, such as a creaky or nasal voice or falsetto. Is it necessary to change the voice quality to convince the audience or is it sufficient to exaggerate some characteristic features of the target speaker?”. Many of these qualities are taken and highly exaggerated to comedic effect. For example, Alec Baldwin on Saturday Night Live's impersonation of Donald Trump. While this is not entirely realistic certain cues are kept the same so that the audience know exactly who is being impersonated. Laver (1994) refers to impersonation as a stereotyping exercise in which the performer takes their most outward traits and takes them forward to a new level.
On this, Watercutter (2017) states that "Baldwin skillfully picks out pieces of Trump’s inflection and speech patterns and amplifies them to just the right pitch.". Baldwin uses the prosody of Donald Trumps real speech to a great degree leading to a realistic, yet exaggerated performance. In previous studies the rhythm and f0 have both been analysed to discuss such performances.
Methodology
Data Set
A variety of performances were studied including Gary Oldman is Winston Churchill in Darkest Hour (2017), while this is an outstanding performance that has received much critical praise it was found that the music used within this scene was too overwhelming and could change results. This problem was also countered in many performances and so the performances chosen may have only subtle background music.
Performances were also chosen based on saying the exact same sentences for better analysis, these sentences therefore generally come from the more dramatic portions of the film as the character can be seen in the public eye. For example, saying an important speech or delivering a documentary. For each performance a scene was chosen where the actor was portraying an exact part of speech that was said by the real person. For Jackie this was a tour of the white house she presented that is performed in the film. Then shorter sentences of 8-20 seconds were chosen, for this full sentences were chosen to get a closer overview and these were picked by deciding which was clearest in each case.
Each section was taken and downloaded, these were then tidied up so that only the sentence itself was present with little noise before and after. Each of these is a stereo WAV file at 16 bit/ 44.1kHz. This was considered a high enough quality for this study. These were named with the convention filmname-original or filmname-impersonation.
Analysis
Timing
Each sentence was taken into Praat and a textgrid was created for each. This allows for manual input to tell the software where each word or pause starts and stops. A pause in this case was originally defined as over 500ms long. Each file was given its own textgrid with each word listed and pauses left blank - the data varies in length greatly which meant some files took longer than others. It was then found that the original/impersonation pairs had a different numbers of pauses and words due to dramatic interpretation changing things slightly or pauses being moved slightly so they no longer match up.
To counter this each pause within a texgrid was given a matching pause in the other, even if they were much shorter. This was so that each word and pause would match within the analysis. This meant that the datasets for each pair were the same length. After this is completed Praat gives a .txt file with information on the word or pause, start, end and duration. For this study the duration as concentrated on.
Jackie - Impersonation in Praat
MATLAB was used for the rest of the analysis. To import the .txt files into this an online tool was eventually used, created by Dafydd Gibbon (2008) this quickly converts a textgrid file into a csv which MATLAB can read. This was found to be the best method as doing this manually took much longer.
It was then found that these csv files also had to be changed by hand due to MATLAB being unable to read columns that contain words. To do this Numbers was used to delete all columns containing words leaving only four with relevant numbers. The most important (duration) being in the last.
Each csv file gives a list of numbers in the fourth column. To analyse this each goes through a for loop in MATLAB that counts how many rows there are and goes through them plotting each point on a graph. For each graph the original and impersonation data are shown together.
A bar graph was also used showing the mean differences in each pair for ease of analysis.
Frequency
First an f0 was taken from Praat for the entire file and logged by hand. A log of frequencies was taken for each word and a similar technique was used to get this data into MATLAB although the data was changed by hand from a text file to a csv. These were then analysed in the same way as above with each frequency of a word being logged on a graph. A similar bar graph was then made to show overall differences within the mean f0.
Results
Word and Pause Length
Male:
Female:
The graphs above chat how long each word or pause lasted in each phrase. Each word/pause matches exactly.
From this we can see the male impersonations do match much closer than the female.
Within Jackie there is an exact match at the start before the two begin to differ. The Iron Lady presents the biggest difference and this can clearly be heard listening to the clips. There are much longer pauses, while these pauses last only a few milliseconds in the original they are significantly longer in the impersonation.
Using the mean word length shows a great deal of difference in two of the impersonations. From this it is also apparent than in most cases the impersonation has shorter pauses and words overall.
Frequency:
Male:
Female:
The above graphs show f0 per word, this data disregards pauses and matches each word to its pair. Frequencies in the male examples have the same up and down shape even though they are different frequencies. Within the female the same theme can be seen with rising and falling happening in the same places.
This shows the f0 of the entire phrase, although this includes pauses which could skew data. In every case the impersonation is slightly lower in pitch. The Kings Speech almost matches exactly here. This does not change depending on gender which is unexpected.
Discussion
An interesting point happens in Jackie at the line "Came here to find hardly anything...". Both the real person and Natalie Portman take a breath with the exact same qualities but in two very slightly different places, they are out by two words. Natalie Portman is impersonating the character to such a degree that she is copying the speech patterns and embodying them instead of just copying sentence by sentence. This did not come up in the technical analysis but was noticed through repeated listenings and this means the pauses and words stop matching. These examples can be listened too here:
Jackie - Impersonation Jackie - Original
Through the data analysis above it can be seen that the gender does not matter in either case, while the female examples are higher pitched naturally the impersonation/original differences do not change dependant on this. The mean f0 tends to be very close and general frequency patterns can be seen to be very closely related.
Word and pause length varies much more, in many cases a large pause is present in one recording while it isn't there in the other. These cases are apparent in the graph where one line is high and the other is almost at 0. While generally the impersonation was longer in file length the mean of the words/phrase lengths were shorter, possibly due to the words being shorter but the pauses much longer.
These results were not as expected, although more data may change these findings. No direct correlations were found and different actors will study and perform real people in hugely different ways. While generally it can be heard that the actors will lengthen words and pauses for dramatic effect no definitive data was found here to show this across cases and it is, at the minute, only an observation.
Further Work
Further work could include many more examples to better correlate the data. Prosody could also be researched in more detail to better track the rhythm of the speech and give more detailed answers. nPVI (normalised pairwise variability index) could be used to better study this. Further work would also be sure to isolate only the voice or use examples where no music is present.
References
Boersma, P., & Weeninck, D. Praat, computer software, available from http://www.fon.hum.uva.nl/praat/
The Disaster Artist. (2017). [film] Hollywood: New Line Cinema.
The Iron Lady. (2011). [film] UK: Pathé.
The Kings Speech. (2010). [film] UK: UK Film Council.
MATLAB version 6.5.1, 2003, computer software, The MathWorks Inc., Natick, Massachusetts.
Gibbon, D (2008). Praat TextGrid to CSV spreadsheet format converter.. [online] Available at: http://wwwhomes.uni-bielefeld.de/gibbon/Forms/Python/PHONETICS/textgrid2csv.html .
Laver, J. (1994). Principles of phonetics. Cambridge, Cambridge University Press.
Watercutter, A. (2018). Alec Baldwin’s Trump Impression Is a Technical Marvel. [online] WIRED. Available at: https://www.wired.com/story/alec-baldwin-trump-impression-technical-analysis/ [Accessed 12 Mar. 2018].
Zetterholm, E. (1997). Impersonation: a phonetic case study of the imitation of a voice. 1st ed. Lund, pp.269-287.
Zetterholm, E. (2006). Same speaker – different voices A study of one impersonator and some of his different imitations. Lund: Lund University.
Zetterholm, E. (2003). Voice imitation. Lund: Lund University.