Effects of prosody and accommodation on the comprehension of an unfamiliar language
Guillermo G. Peeters - Music & Speech Modelling 2018
Abstract:
In situations in which two subjects have a close-relationship, strong bonds between them may cause human mimicry (both gestural and expressive) and unconscious choreographies when performing routine tasks. Vocal accommodation is a phenomenon in the area of human mimicry by which speakers start to mutually align their vocal characteristics as they interact with each other.
This project investigates whether this psychological system can work successfully across languages in dyadic dialogues, easing the understanding between each person and its respective close companion even in the cases in which the spoken words are not literally or straight away understood.
Literature Review:
Previous research on this field that is used as grounding and assessment for this study is Human Mimicry - Advances in Experimental Social Psychology (Ch. 5), by Tanya L. Chartrand and Rick Van Baaren (2009) [1]. This study provides valuable information about -amongst others- non-conscious verbal mimicry and evaluates the evidence for mimicry as a communication tool.
Juan P. Robledo et al. in their study Musical Intervals in Speech (2016) [2], studies the musical properties of speech and, very remarkable for our study, supports the existence of a psychological disposition by which two speakers reciprocally modulate the musical intervals in speech generated by each of them through mutual real-time vocal accommodation.
None of these mentioned studies covers this phenomenon taking place across-languages and neither explore the consequences of this effect on a long-term basis.
Present study approach:
The approach of this study poses that the comprehension of a message spoken in an unfamiliar language takes place between two close-related subjects due to the familiarity with the prosodic features of the speaker (stress, word duration, intonation) and accommodation from the speaker towards the listener. External possible causes for this effect to take place might be the following:
Familiarity with the vocabulary used by each determinate person in a determinate environment or daily routine.
Gestural communication.
Familiarity with the textural features of the orator’s voice and consequently, recognition and identification of the speaker.
Method:
Source of the data:
The stimuli consisted of a closed number of phrases spoken by Flemish native speakers.
The phrases were extracted from the Certificaat Nederlands als Vreemde Taal (CNaVT*) listening model exams, moreover, phrases of frequent use in a domestic environment were added to the dataset. CNaVT certificate is subdivided into four different language-skill levels:
A2: Social Informal. For applicants who want to function in informal, everyday situations.
B1: Social Formal. For applicants who want to function independently in more formal contexts in Dutch or Flemish society.
B2: Business Professional/Educational Start Skilled. For applicants who want to function on the shop floor, especially in the care sector or in an administrative profession.
C1: EDUP: Educational Professional. For applicants who want to function in education or in a business environment and need an advanced knowledge of flemish.
Fig. 1: CVNaT official degrees categorised into 4 levels and 3 domains.
The selected phrases were of approximately the same length and were categorised as:
Contextualised Informal (A1): Common and home environments i.e., how was your day?, what’s for dinner today?, I am not sleeping home tonight, Do your homework.
Social Informal (A2-Inf): Out of context phrases extracted from the A2 INFO listening exam, i.e., the building is very famous, twenty people were in a party,…
Social Formal (A2-F): Out of context phrases extracted from the B1 FORM listening exam i.e., Point with the finger is considered to be in poor taste, Voltage across an ideal conductor is proportional to the current through it,…
Social Professional(B2): Phrases extracted from B2 PROF and STRT listening exams.
Advanced Professional(C1): Phrases extracted from the EDUP listening exam.
Each phrase was voiced by three different types of orators, grouped according to their proximity towards participants.
High-Proximity Orators (H.P.): In reference to each participant, partnerships such as wife-husband relationships or long-life bonds.
Distant Orators (D.O.): Familiar to the participants but not as close related to the participants as the High-Proximity Orators.
External Orators (E.O.): These orators had never met the participants and favoured the condition of freshly-new heard voices.
Participants:
The participants were six Non-Flemish speakers close-related at least to one of the orators, the participants were asked to perform as listeners and "amateur" translators.
In addition, control subjects absolutely unfamiliar and absolutely familiar (native) to Flemish speech had been used to establish expected performance grades for each level.
Procedure:
The study aims to evaluate to what extent prosody and vocal accommodation favour the comprehension across languages. Therefore, the method must lessen the noise of external causes -mentioned in the final paragraph of the approach section- in view of their external incumbency to the area of the study.
The phrases were read rigorously word for word by the speakers disabling them to change any of the words. This fact lowered completely the effects of the familiarity of the listener to the orator's vocabulary.
The cues were recorded and played using an electronic device, annulling any possible form of gestural communication. Each recording was processed using Izotope RX professional software, this included background noise cancelling, normalization of all the audio clips and conversion into mono.
Recognition of the High-Proximity Orators voice textures by its pertinent participants was an evident risk. This effect may cause an enhancement of the comprehension of the given message. The Distant Orators, whose voices, consequently, should also have been recognised helped to discretise the contribution of this effect.
The participants listened to 15 cues from all the language levels and spoken by each of the orators in random order and rated their level of comprehension when hearing each stimulus.
The participants filled a form rating their relationship with their respective High-Proximity Orator, moreover, the form asked the participants to translate each phrase as accurately as possible and to rate their degree of confidence for each translation. Ideally, the participants should have also answered the CVNaT listening test corresponding questions. However, these questions are designed to be answered out of the context of the whole listening block and not out of discrete phrases.
Fig. 2: Form filled by one of the participants.
Evaluation of the participant's performances:
Confidence values and valorisation of the translations:
Table 1 shows the participants results for each level, phrase and type of orator. The values for each translation were extracted out of the confidence rates and a valorisation of the translation, this valorisation held the following criteria:
Word-for-word correct translations were rated 100/100.
Mistakes committed when translating trivial words -which did not alter the meaning of the phrase- were rated 90/100.
Incomplete translations missing trivial words (grammatically speaking) were rated 75/100 to 89/100 depending on the number of absent words.
Inaccurate translations containing both the subject and predicate were rated 50/100 to 75/100.
Translations missing the subject or the predicate were rated 20/100 to 49/100.
If the participant wrote one correct word the translation was rated from 10/100 up to 20/100, depending on the grammatical value of the word - i.e: a translation of direct objects or the nucleus of the subject were higher rated.
Blank answers were rated 0.
Table 1: Experiment results for each orator, participant and Flemish language level. Challenging results are marked with an * (see the exceptions section).
The phrases that constituted the C1, B1 and A2 difficulty levels were remarkably advanced for the language level of the participants. Therefore, the grades for this first levels were expected to be low. To be able to evaluate these performances and still have into account the level of the subjects an expected grade over 100 was established for each phrase out of the control participants results. This value was directly related to the difficulty of each phrase and consequently to CVNaT level.
Results:
For each level and orator category, the results shown in table 1, were averaged across participants, the minimum and maximum possible values reside between 0/100 and 100/100.
The averaged results for the Educational Professional level (C1) were 33.3 for the H.P. cases, 16.6 for D.O. and 25 for E.O. The expected averaged results out of the control subjects was 30/100. High Proximity Orators results were the highest and the only results that met the expectancy.
For the performances on the Business Professional level (B1) the averaged results were 50.5 for H.P., 17.5 for D.O. and 21.7 for E.O. In this case, the expected average result was 50/100.
The averaged results for the Social Formal (A2-F) and Social Informal (A2-Inf) phrases were 55 and 71.7 respectively for H.P.O., 50 and 45 for D.O. and 13.3 and 28.3 for E.O. In these cases, the expected average results were 60/100 on the A2-F level and 70 for the A2-Inf level.
The results for the Contextualised Informal (A1) level were of 85 for H.P., 46.6 for D.O. and 70 for E.O. The expected result for this category was 90/100.
Fig. 3: Average results for each orator category and Flemish language level next to its respective expected grade.
Exceptions:
The cases in which the hypothesis presented on this study was not supported (i.e. phrases that when spoken by Distant or External Orator had better results than for High Proximity Orators) were found in reduced numbers and considered exceptional. However, in the following section, these cues are analysed using Praat to study to which extent these more successfully comprehended voice recordings were similar to the ones that, according to this study, should have shown higher results (namely, the High Proximity Orator recording of the same phrase).
For accomplishing this task, the following analysis looks for any kind of similarities on the pitch contours, intensity and word duration (rhythm) between these unexpected successful recordings and their respective High Proximity Orator version.
The four cases mentioned are marked with an * on Table 1. This section shows a description of each case.
Case 1:
Participant 2 grades (see Table 1) on the C1 level were higher for the External Orator (left) than for the High Proximity Orator (right). The phrase at issue is “Niet alles kan je zomaar via mail regelen”, which means: "Not everything can be arranged via email". Except for the first word, niet, no similarities can be found residing on the pitch contours, word duration or intensity annotations.
Case 2:
Participant 4 grades corresponding to the B1 level were higher for the Distant Orator (left) than for the High Proximity Orator (right): In this case, the phrase is “De werkelijkheid ziet er minder wonderbaar uit”, which translates as "Reality appears not to be as miraculous". The phrases are, once more, not holding similarities as far as this analysis shows.
Case 3:
Participants 6’s grades on the A2-F level were higher for D.O.(left) than for H.P.O. (right).
The phrase in question, in this case, is “Opa wil natuurlijk liever slagroomijs maar het is nu Oma’s verjaardag” which in the English language would mean "Grandpa’ prefers whipped-cream ice cream of course, but it is now Grandma’s birthday".
In this case, the time duration of each word, the pitch contour and the intensity annotations share some similarities when comparing both recordings, except for the case of the word slagroom- in which the H.P.O. considerably raises the pitch.
Although the results of this analysis show no explanation for these exceptions to have occurred, more intensive analysis of these phrases and a further session of this experiment using a more extended dataset are yet to be done. However, it would be bold to rule out the possibility that these participants were fortunate and happened to be familiar with the words of these sentences and consequently did a good translation performance.
Conclusions:
The results for High Proximity Orators are in most cases over the expected grades. Even in the cases in which the expectancy grades are not accomplished, the results for H.P.O. are always higher than for the other two types of orators in the same level and in some cases higher across levels. This study's interpretation of this fact is that the participants, even lacking the necessary language skills such as flemish grammar and vocabulary to decode the meaning of each sentence, were more successful on the task when being closely related to the speaker and, therefore, familiar with its ways of speech.
When comparing the results for Distant against External Orators, there was no finding of an explanatory pattern. In most cases, External Orators happen to have better results than the Distant Orators, possibly due to the fact that the External Orators were professional speakers. Some of the participants had met their assigned Distant Orator numerous times, in some cases, their relationship was such as close as an aunt to niece relationship.This fact helps discard the possible external causes for this effect, previously discussed. Even though the participants knew personally the Distant Orators and recognised their voice, they did not pay more attention and were not more successful in their performances than when translating never heard before voices. An alternative reading for this results shows how the accommodation effect does not take place when the relationship between speaker and listener is not remarkably close.
Appendix:
CVNaT exam models:
Social Informal (INFO) - A2 http://cnavt.org/voorbereiding/voorbeeldexamens#maatschappelijk-informeel-info---a2 .
Social Formal (FORM) - B1 http://cnavt.org/voorbereiding/voorbeeldexamens#maatschappelijk-formeel-form---b1 .
Business Professional (PROF) - B2 http://cnavt.org/voorbereiding/voorbeeldexamens#zakelijk-professioneel-prof---b2 .
Educational Start Skilled (STRT) http://cnavt.org/voorbereiding/voorbeeldexamens#educatief-startbekwaam-strt---b2
Bibliography:
[1] Chapter 5 Human Mimicry - Advances in Experimental Social Psychology -Tanya L. Chartrand, Rick Van Baaren, © Elsevier, 2009 https://www.sciencedirect.com/science/article/pii/S006526010800405X.
[2] Musical Intervals in Speech: Psychological disposition modulates ratio precision among interlocutors’ nonlocal f0 production in real-time dyadic conversation - Juan P. Robledo y Esteban Hurtado, Felipe Prado, Domingo Román, Carlos Cornejo - Psychology of Music - vol. 44, Issue 6 pp. 1404-1418, First Published March 21, 2016,http://journals.sagepub.com/doi/10.1177/0305735616634452