Volume 6 (2015) Issue 2 - Article Shehata
JLLT 6 (2015) 2.pdf

Journal of Linguistics and Language Teaching

Volume 6 (2015) Issue 2 (PDF)

Talker Variability and Second Language Word Recognition: A New Training Study


Asmaa Shehata (Calgary, Canada)


Abstract

The purpose of this study was to investigate how training with varying talkers could affect native English speakers’ acquisition of the Arabic pharyngeal-glottal consonant contrast that is not contrastive in English. Learners’ performance on two discrimination tasks, following a word-learning phase was analyzed in terms of training type (multiple talkers vs. single talker) and task type (non-lexical vs. lexical). Findings of the two experiments revealed the significant effect of training type. That is, the multiple-talker groups in the two experiments performed more accurately on the two AXB tasks than did the single-talker groups. This finding suggests that variability in talkers may be a significant factor that affects learners’ ability to distinguish words on the basis of L2 consonant contrasts. Additionally, the results exhibited differences in the scores of subjects on the two discrimination tasks among the different groups, which were found to be insignificant, suggesting that the distinct demands of the two tasks did not have a significant beneficial effect on learning the nonnative contrastive sounds.

Key words: lexical representations, second language, talker variability, task type, word recognition



1 Introduction

Accented speech produced by second language (L2) learners who acquire their L2 after childhood is one of the significant areas in L2 speech research where several scholars have become interested in exploring the reasons behind the persistence of the foreign accent in L2 speech (Flege et al. 1999, MacKay et al. 2006, MacKay at al. 2001). Previous studies have shown several factors that contribute to the accentedness of L2 speech such as learners’ age of arrival in a new host country (Baker & Trofimovich 2006), the amount of exposure to the target language (TL) (Bradlow & Bent 2008), learners’ cultural attitudes (Moyer 2007), musical ability (Wong et al. 2007), and learners’ length of residence (Flege & Liu 2001). It is also well documented that learning L2 phonological contrasts represents a challenge to L2 learners whose first languages (L1s) do not include these distinctive features that have been reported to be one of the factors behind the complexity of L2 accented speech. For example, native Dutch speakers experience difficulty distinguishing English words like bet and bat due to difficulty with the English /æ/-/ε/ contrast (Cutler & Broersma 2005), while both native Spanish and Portuguese speakers experience difficulty discriminating the English words beat and bit (Bion et al. 2006), and native Japanese speakers experience difficulty with the English words right and light (Aoyama et al. 2004).

To examine whether or not adult L2 learners can categorize and create lexical representations for L2 phonological contrasts, previous research has reported mixed results. While some studies have displayed learners’ inability to lexically encode the target L2 contrasts (Curtin et al. 1998, Ota et al. 2009, Pater 2003, 2004), others have provided evidence that learners can successfully create lexical representations for newly-learned words differentiated by novel phonological contrasts that they exhibit difficulty to distinguish in non-lexical discrimination tasks. For example, findings of Weber & Cutler (2004) sought to examine the mapping of phonetic information to lexical entries in second language, using eye-tracking technology. They explored the native Dutch speakers’ ability to discriminate between the English lax vowel pair /ε/–/æ/ and the diphthong pair /aɪ/-/eɪ/. Participants were asked to choose only pictures shown on the computer screen that matched the words they heard. Their findings displayed that learners were able to maintain a distinction between English words containing /ε/ and /æ/ in their lexical representations, even though they could not perceive the contrast in the online auditory word identification task.

On the other hand, a large body of laboratory-based training studies including both infants and adult L2 learners have provided evidence that auditory perception training can enhance performance with respect to novel L2 contrasts (Bradlow et al. 1997, Lively et al. 1993, Lively et al. 1994, Wang et al. 2003). For instance, McCandliss et al. (2002) explored the identification of the English /r/ and /l/ contrast by native Japanese speakers in two training environments: a high-variability training where multiple talkers produced both real and nonwords, and a limited-variability training where sequences of phonemes ranging from /r/ to /l/ were spoken by a single talker. A comparison of learners’ performance in the two training environments showed a significant improvement in the performance of those in the high variability training condition after the training, suggesting that the availability of a rich training environment can improve native speakers’ of Japanese perceptual identification of the English /l/ and /r/ phonemes.

Further evidence for the interaction between high-variability training and the acquisition of L2 phonological forms can be found in Logan et al. (1991) study on native speakers of Japanese. Before training, six native speakers of Japanese were tested on their ability to identify the contrasting /l/ and /r/ via a pre-test that included 16 minimal pairs contrasting the target phonemes. Participants were instructed to mark each of the words they heard from a minimal pair printed in the answer booklet given to them. The training phase included 15 training sessions in which participants heard 272 trails (68 minimal pairs contrasting /l/ and /r/ in a variety of word positions (initial singleton, initial cluster, intervocalic, final singleton, and cluster) presented twice) produced by five different talkers (three times each) and were asked to choose which of the words they saw on the computer screen matched the words they heard. Participants could only pass to the following trail if their answer was correct, otherwise the correct answer was highlighted and another representation of the stimulus was presented that was followed by the next trail. After passing the training phase, the subjects took the post-test that was the same test presented in the pre-test phase that was followed by two generalization tests: Generalization Test 1, in which 98 novel words were presented by a new talker, and Generalization Test 2, in which 98 novel words produced by a familiar talker. The results displayed that participants were significantly able to distinguish the English /r/ and /l/, and this improvement in their performance was maintained when they were retested three weeks later. Thus, the availability of the high-variability training environment supported the native Japanese speakers’ ability to accurately discriminate the target phonemes.

It has also been shown that low-variability training environment that included one or two sources of variability (e.g., talker, stimulus, phonetic environments, and speaking rate) cannot support learners’ ability to recognize the phonological forms of the new phonemes. For example, Strange & Dittman (1984) trained eight female native Japanese speakers to distinguish the English /r/ and /l/ only in word-initial position during 14-18 training sessions. Researchers used three sets of stimuli in their training: a set of real word minimal pairs (e.g., rock/lock), and two sets of synthetic minimal pairs where feedback was provided for each correct response. Their findings revealed the inability of Japanese listeners to display a notable enhancement to distinguish /r/ and /l/ in a generalization task with natural speech tokens involving /r/-/l/ minimal pairs. While subjects were able to transfer knowledge acquired during the training to identify the unfamiliar phoneme contrast in nonword stimuli, they were not able to do so with real words. Therefore, it was concluded that word perception accuracies decrease when learners listened to word lists that lacked stimulus variability.

The training studies discussed thus far have all considered cases in which the availability of various sources of variability in the training studies such as stimuli, talkers, phonetic environments and tasks has positively affected learners’ perception of unfamiliar segmentals. Conversely, other training studies have investigated the relative influence of training on the learning of nonnative suprasegmental contrasts. For instance, Wang et al. (1999) trained eight native English speakers for two weeks to discriminate four Mandarin tones in real words spoken by native Mandarin speakers. To check the possible improvement in the subjects’ identification of Mandarin tones due to training, Wang et al. (1999) used a pretest and a posttest that were followed by two generalization tests in order to investigate whether the training benefit can be extended to new stimuli produced by new talkers. The researchers were also interested in checking the relative influence of long-term training. Therefore, they conducted a long-term retention test six months after the training. The results indicated a significant development in subjects’ performance from pretest (69% correct responses) to posttest (90% correct responses), with a 21% increase in subjects’ tone detection accuracy. Wang et al. (1999) also found that the native English speakers’ recognition of Mandarin tones was enhanced in the two generalization tests, where trainees were successfully able to extend their knowledge of the target tone contrasts to new stimuli produced by novel talkers. These findings provide evidence that using a high-variability training paradigm could improve L2 learners’ acquisition of both novel segmental and suprasegmental contrasts.

2 Talker Variability

In light of the findings of the above-mentioned training studies, talker variability is known as one of the principal sources of variability influencing learners’ perception (Halle 1985). Qualities of talkers’ voices could differ due to a number of different elements such as the shape, size, and length of the vocal tract, and how talkers use different acoustic measures, such as the rate and the length of formant transitions. Crucially, these elements have been reported to be influential in listeners’ perception of L2 speech (Cartell 1984). A series of studies has explored the effects of talker variability on speech perception in general, and word recognition in particular, both in infant and adult studies. However, these studies reported paradoxical findings that are briefly summarized in the following.

2.1 Infant Studies

The influence of talker variability on infants’ perception of novel phonemes has been thoroughly investigated. Some studies demonstrated that the availability of several talkers helped infants accurately discriminate unfamiliar phonetic categories. For example, six-month-old-infants who first learned fricative contrasts and two other vowel contrasts (e.g., the English vowels /a/-/i/, and /a/-/ɔ/), demonstrated some abilities to differentiate the target contrasts when they were spoken by different speakers (Barker & Newman 2004). Similarly, Houston & Jusczyk (2000) examined the effects of talker variability on the recognition of words in fluent speech by two different age groups of infants: seven-and-a-half-month-old and ten-and-a-half-month-old infants. When there was a one-day delay between the training and testing sessions, the seven-and-a-half-month-old infants demonstrated a significant improvement in their word recognition ability only when stimuli were spoken by talkers of the same gender as the talker in the training session. While these infants were successfully able to generalize training to words spoken by two new female talkers, they were unable to recognize words produced by two new male talkers. This finding suggests that listening to several talkers of different genders did have an impact on the perceptual identification of the spoken words by infants at this age. Nevertheless, the ten-and-a-half-month-old infants performed differently as they were able to generalize words produced by a single talker to other talkers of the opposite gender.

Further evidence for the positive role of multitalker variability can be found in Rost & McMurray’s (2009) study that tested the role of phonetic variability in two experiments. In Experiment 1, 39 monolingual English 14-month-olds saw pictures whose labels were read by one talker. Conversely, 16 monolingual English 14-month-olds participated in Experiment 2 in which they saw the same pictures shown in Experiment 1 and listened to their labels read by 18 talkers. The results showed the performance of infants who listened to labels (e.g., buk vs. puk) spoken by multiple talkers to be more accurate at distinguishing the difference between words contrasting /p/ and /b/ in initial position than infants who listened to labels spoken by a single talker. The findings revealed the positive contribution of talker variability in identifying novel contrasts by infants.

Conversely, other infant studies showed that talker variability hinders infants’ speech recognition performance. For example, Jusczyk et al. (1992) found that variability in both tokens and talkers impeded the word recognition abilities of two-month-old infants after a delay interval. While infants in the two training conditions did notice the change in the phoneme from bug in the training session into dug in the test session in the first experiment, only young infants in the single-talker group were able to observe the change in the target phoneme in the following experiments that included a two-minute delay between training and testing. Therefore, researchers concluded that listening to a single talker could assist two-month-olds to establish lexical representations for the contrasting phonemes of the target language. More recently, Schmale & Seidl (2009) reported the inability of nine-month-old-infants to distinguish words when produced by different talkers who varied in both voice and accent. Variability in talkers raised the word processing load that consequently resulted in infants’ low word recognition performance. Based on these results, it can be concluded that prior research has reported paradoxical findings concerning the role of talker variability in infants’ speech perception and word recognition ability.

2.2 Adult Studies

Contradictory results regarding talker variability and its impact on learners’ perception are not only limited to infants studies, but they can be found in adult studies as well. A number of studies reported the negative role of talker variability. For example, Mullennix et al. (1989) conducted a series of experiments. In Experiment 1, 22 native English speakers were instructed to identify 68 English words produced in noise by either a single-talker or multiple-talkers in an identification task. Experiment 2 included a naming task in which 12 native English speakers were asked to name each of the target words produced by multiple-talkers once they heard it. In Experiment 3, seventy native speakers listened to 96 English words on a naming task that varied in its word frequency - 48 low frequent words and 48 high frequent words - and were produced in two training environments: a single-talker condition and a multiple-talker condition. In Experiment 4, however, 30 native English speakers listened to the same stimuli presented in Experiment 3, and were asked to write down the words they heard. Findings of the four experiments revealed that subjects performed worse when they listened to different talkers.

Likewise, Sommers et al. (1994) asked two groups of native English speakers in two different training settings - single-talker training and multiple-talker training - to type the words they heard as well as they could. Researchers also tested two other groups of subjects in either a single speaking-rate (i.e., subjects heard words produced at either fast, medium, or slow rate) or a mixed speaking-rate (i.e., subjects heard words produced at one of the three different speaking rates). Furthermore, Sommers and colleagues recruited 60 more subjects to investigate subjects’ performance when they heard word lists differing in either talker variability or speaking rate, and when these word lists differed along these two dimensions. All tokens were shown in noise to subjects in different groups. Findings revealed that when a single talker introduced word lists to learners, they accurately identified the target words better than words spoken by different talkers. It was also found that hearing the target stimuli produced by more than one talker at different speaking rates (e.g., low, middle and high) hindered the subjects’ perception. Therefore, the researchers concluded that too much variability in the given speech signals did impede the subjects’ perception.

In contrast to the studies discussed above, a number of L2 training studies have shown the positive role of talker variability. For example, Lively et al. (1993) examined which of the three word positions (e.g., initial singleton, initial cluster, and intervocalic) Japanese learners found the most difficult one. Their findings displayed that native Japanese speakers in the multiple-talker training were more accurate than their counterparts in the other group at identifying the English /l/ and /r/ spoken by both familiar and unfamiliar talkers on the generalization task. The authors concluded that hearing a single talker did not enable listeners to generalize their word familiarity to tests with novel tokens and novel talkers compared with the performance of subjects in the multiple-talker training environment. In a follow-up study, Lively et al. (1994) trained Japanese learners of English in Japan for three weeks, using the same stimulus set of Lively et al. (1993). After 15 training sessions, the Japanese speakers’ identification of the English /r/ and /l/ contrasts was significantly improved as a result of the high variability training paradigm. Three months later, the subjects’ ability to retain the new contrasts was tested through generalization tests. Findings of these tests displayed no significant decline in the subjects’ ability to reliably categorize the English /r/ and /l/ contrasts, confirming the efficiency of this type of training in the acquisition of nonnative contrasts.

More recently Bradlow & Bent (2008) examined the influence of talker variability on learners’ transcription skills. The authors gave English sentences produced by three groups of talkers - multiple native Chinese talkers, a single native Chinese talker and five native English talkers - to 87 native Enlgish listeners. Their findings revealed the capability of learners in the multiple-talker group to transcribe the target sentences more accurately than the other two groups. The researchers concluded that the beneficial role of talker variability could be extended to accented-speech where it played an advantageous part in improving learners’ transcription skills. Studies have also demonstrated that the influence of talker variability can be marginally significant, with both multiple and single-talker trainings having been found to facilitate comprehension of unfamiliar speech. For example, Hardison’s (2003) study investigated the impact of word position - adjacent vowel, talker variability, and training type (auditory versus visual) - on native Japanese and Korean speakers’ perception of the English /r/ and /l/ contrasts. The researcher recruited 16 native Japanese speakers and eight native Korean speakers who participated in two different experiments. Experiment 1 included two training environments: the first one included auditory and visual inputs, and the second training environment only included an auditory input. Experiment 1 included the following main sessions: pretest, tainting, posttest, and two generalization tests (one with a familiar talker from the training phase and one with an unfamiliar talker). In Experiment 2, the impact of visual input on training Korean learners of English was examined. Like Experiment 1, the second experiment included two training environments: a visual and auditory training group and the auditory-only training group. Moreover, each of these training groups was divided into two groups: one in which the subjects listened to stimuli presented by multiple talkers and another one in which the subjects listened to stimuli spoken by a single talker. While findings indicated a significant impact of training type, word position, and adjacent vowel on the perception and production of /r/ and /l/ by the ESL participants in the two training conditions, they also revealed a marginal significance of talker variability on the subjects’ performance in the two generalization tests. Korean speakers displayed a minor success in generalizing the training they received to new tokens produced by unfamiliar talkers.

Based on the studies discussed thus far, two main conclusions can be made.

    • Firstly, testing the relative effects of talker variability on learners’ acquisition of novel phonological features has shown mixed findings.

While some studies found talker variability to be an ineffective factor that impairs the learners’ performance, other studies provided evidence in favor of the positive role of talker variability. A third group of prior studies, on the other hand, displayed a minor role for talker variability where both single- and multiple-talker training could help L2 learners acquire the phonological structure of L2 words. These conflicting findings clearly display the need for conducting further research to address this significant issue. Therefore, the first goal of the present study is to determine whether learners can benefit from a variety of talker-specific properties of speech to help them learn the non-native consonant contrasts.

    • Secondly, talker-variability studies have mainly used non-lexical tasks that examined the learners’ online perception of the newly learned phoneme contrasts and paid little attention to the lexical processing of these contrasts.

Thus, the second goal of the current study is to further examine the relative impact of talker variability on adult L2 learners’ ability to categorically discriminate and lexically store unfamiliar speech contrasting phonemes, using both lexical and non-lexical tasks.

2.3 Task Type

Previous L2 research has shown that different demands of tasks do influence learners’ perceptual performance (Logan & Pruitt 1995, Matthews & Brown 2004, Werker & Tees 1984b). For example, lexical tasks are reported to be more demanding than non-lexical tasks since they require listeners to access their memory for the meaning of the target stimuli (Curtin et al. 1998). To explore L2 learners’ ability to detect and encode novel L2 phonological features, prior second language acquisition (SLA) studies used both non-lexical and lexical tasks that resulted in different results. For example, native speakers of English were more accurate at discriminating the Thai voice contrasts after training in a non-lexical task than in the lexical task that required memory of the target contrasts (Curtin et al. 1998). Conversely, Hayes-Harb, & Masuda (2008) found that native English speakers who studied Japanese for one year were able to accurately discriminate the Japanese length consonant contrast in the given lexical task; however, their performance was significantly less accurate on the non-lexical one.

Unlike the two above-mentioned patterns of previous research findings, Pater’s (2003) study showed no difference in the performance of native English speakers on both non-lexical and lexical tasks. When subjects were asked to match one of the words they heard, to the corresponding picture, they performed as well on the lexical XAB identification task as they did on the non-lexical XAB identification task. Pater concluded that the similar design of the two tasks, which included the same pictures and the same number and types of phases, was behind the learners’ similar performance on the two different tasks. These conflicting findings in the literature exhibit a need for further investigation of this issue, which is the third main goal of the current study that examines the possible influence of task type (i.e., non-lexical versus lexical) on learners’ recognition of novel L2 contrasts.

3 The Study

With these three goals in mind, the present study is guided by the following research questions:

    1. Does training with single-talker versus multiple-talkers influence L2 learners’ identification of newly-learned sound contrasts in terms of generalization to novel talkers?
    2. Is there any more accurate performance with multiple-talker training than with single-talker training on a non-lexical task?
    3. Is there any more accurate performance with multiple-talker training than with single-talker training on a lexical task?
    4. Does task type training (in this case, non-lexical versus lexical) influence learners’ ability to discriminate novel L2 phoneme contrasts?

The acquisition of Arabic by native English speakers was an ideal scenario for this research because of the recent rapid increase in the enrolment in Arabic classes in North America in general and the US in particular where new Arabic programs have been established and teaching it has matured as a profession (Al-Batal & Belnap 2006). While new summer programs have been established in the Arab world and new professional organizations have witnessed increase in memberships, such as the duplication of the total number of the members of The American Association of Teachers of Arabic in less than a year (Ryding 2006), Arabic remains to be classified as one of the languages that are less studied as a second language (Rabiee 2010). In addition, Arabic includes a number of consonant contrasts that do not exist in English, and their acquisition by native speakers of English is notably difficult (Al Mahmoud 2013, Alwabari 2013). Like other Arabic contrasts, the Arabic /ħ/-/h/contrast has received no attention in the literature of L2 phonology whose “discrimination of /h/-/ħ/ was significantly worse than all other contrasts” (Al Mahmoud 2013: 22). By and large, all the aforementioned reasons justify examining this Arabic contrast in the present study.

3.1 Experiment 1

Experiment 1 was designed to explore the role of talker variability in the acquisition of novel L2 phonemes on a non-lexical discrimination task, i.e. the impact of single-talker versus multiple-talker training on the recognition of the Arabic pharyngeal-glottal contrasts by learners with no prior experience with Arabic in non-lexical tasks, in terms of generalization of training to stimuli produced by unfamiliar talkers.

3.1.1 Participants

Thirty native English speakers (11 males and 19 females) with prior knowledge of Arabic were recruited from undergraduate courses at the University of Utah (USA). Seven of them received course credit for their voluntary participation; the other 23 participants received payment for their participation. Via a background questionnaire that they filled out before performing the study, all participants reported having no speech or hearing problems and no neurological disorders. Participants also reported not being under the influence of any medication that might impact their motor skills. The participants’ mean age was computed as 22.5 years. Participants were randomly assigned to one of the two word learning environments: a single-talker environment (7 males and 8 females) and a multiple-talker environment (4 males and 11 females). To avoid talker’s idiosyncrasies, participants in the single-talker environment were randomly assigned to one of the three subgroups:

    • Group1 that listened to stimuli produced only by Talker 1,
    • Group 2 that listened to stimuli produced only by Talker 2,
    • and Group 3 that listened to stimuli spoken only by Talker 3. See Table 1 below.

Table 1: Summary of Training Environments in Experiment 1

3.1.2 Stimuli

Experiment 1 included two sets of stimuli. The first set included 12 disyllabic Arabic nonwords. These tokens consisted of six minimal pairs contrasting the target Arabic phonemes (i.e. /h/ and /ħ/) in three different positions: initial position (e.g. ħaθa-haθa), intervocalic position (e.g. diħi-dihi), and word-final position (e.g. itiħ -itih). The second set included six filler tokens that were three minimal pairs contrasting familiar phonemes found in both English and Arabic as controls in the same vowel environments as the target stimuli: word initial (e.g. sata-ʃata), intervocalic position (e.g. fisi-fiʃi), and in word-final position (e.g. anas-anaʃ). Each stimulus was randomly assigned to a picture that indicated its meaning. Using Arabic nonwords and having subjects with no prior exposure to Arabic made it easy to assign any picture to any auditory stimulus. List of pseudowords and their assigned meanings (pictures) are shown in Table 2 below.

Table 2: List of pseudowords and their assigned meanings (pictures)

Six male native speakers of Egyptian Arabic were recruited from the University of Utah community to produce the spoken materials. Talkers were recorded, reading the stimuli in a carrier sentence, “uridu ?an ?aktubu kalemeta ________” (‘I want to write the word _________’) in a sound-attenuated booth, using a Marantz PMD 660 recorder and a Samson QV microphone. Talkers were instructed to read the list of 18 Arabic nonwords that were written in Arabic script at their normal speaking rate three times, each time reading the nonwords in a different random order. The second production of each stimulus was extracted for presentation in the study. Table 3 provides information about the six native Arabic talkers.

Table 3: Talker Group: Six Native Arabic (Egyptian) Speakers

3.1.3 Procedure

This experiment was administered in a single session that took place in a sound-attenuated booth where audio and visual stimuli were introduced, using a computer and Sony MDR-7506 headphones that participants used to listen at a comfortable level. Three phases were included in Experiment 1: word-learning, criterion test, and non-lexical discrimination test. All phases were shown through the DMDX software that was developed by Forster & Forster (2003). First, in the word-learning phase, participants listened to each nonword and saw the picture indicating its meaning, and they were instructed to learn the words and their meanings as well as possible. While participants in the single-talker training environment listened to stimuli produced by a single talker (i.e. either Talker 1 or Talker 2 or Talker 3), their counterparts in the multiple-talker training environment listened to stimuli spoken by three multiple talkers (i.e. Talker 1, Talker 2, and Talker 3). The 18 Arabic nonwords were presented two times per block, and each block was presented three times. This resulted in a total of 108 presentations that were presented in random order in each training environment.

After the word-learning phase, participants started the criterion test phase in which they were tested on their knowledge of the training stimuli on a non-lexical discrimination task (that did not require lexical access). In this test, participants heard a word (X), saw a picture (A), and then saw another picture (B), and it was their task to decide whether the word (X) matched picture (A) or picture (B) by pressing either the right or left shift keys (labeled First and Second) on the keyboard. Each word appeared as (X) twice: one-half was matched with (A) and one-half was matched with (B). Thus, the criterion test included 36 test items that were introduced in a different random order where participants in the two training conditions listened to stimuli produced by the same talker(s) they heard in the word-learning phase. This task did not require any discrimination of the target contrasts. To proceed to the following phase, participants had to score 90% or better on the criterion test phase. Scoring below 90% resulted in retaking the training phase that could be repeated as many times as needed until they achieved the passing score. Figure 1 displays an example of a criterion test item.

Fig. 1: Example presentations in the criterion test (Sound-Picture-Picture) used in Experiment 1

Third, after passing the criterion test, participants proceeded to the last test - i.e. the XAB non-lexical discrimination test - that examined their ability to distinguish the Arabic pharyngeal-glottal minimal pairs. In the XAB non-lexical discrimination test (sound-sound-sound), participants in the two training groups listened to auditory stimuli produced by three unfamiliar talkers: Talker 4, Talker 5, and Talker 6, who had not participated in the word-learning phase for either group. Each trail consisted of the presentation of three auditory words (X, A, and B), and the participants were asked whether the auditory X was more similar to A or B (e.g. /diħi/-/diħi/-/anah/) by pressing either the right or left shift keys (labeled First and Second) on the computer keyboard. Unlike the criterion test, the final test included 36 trails: 24 contrast trails (in which A and B were minimal pairs) and 12 foil trails (in which A and B were not members of a minimal pair) that were shown in random order. Figure 2 shows an example of the non-lexical test stimuli as presented to subjects in the two training groups.

Fig. 2: Example presentations in the XAB non-lexical discrimination task

(Sound-Sound-Sound) used in Experiment 1

3.1.4 Results

Proportion correct (proportion of responses correctly identifying the intended production of the talker) was calculated for each participant. The data were submitted to Analysis of Variance, with item type (two levels: target, filler) as a within-subjects variable and training group (single, multiple talker) as a between-subjects variable. The main effect of the training group was significant (F (1,28) = 88.866, p<.001, partial eta squared = .760), with performance by participants in the multiple talker training group (.915) being more accurate than that of those in the single talker training group (.671). The effect of item type was also significant (F (1,28) = 79.646, p<.001, partial eta squared = .740) with performance on filler items (.911) higher than that on target items (.675). The interaction of item type and training group was also significant (F (1,28) = 39.685, p<.001, partial eta squared =.586).

Following up on the significant interaction of item type and training group, we will now focus on the results for each item type separately. There was a significant effect of training group on performance on target items (F (1,28) = 161.398, p<.001), with more accurate performance by subjects having been shown in the multiple talker training group (.881) than the single talker training group (.469). However, the effect of training group on performance on filler items was not significant (F (1,28) = 3.564, p=.069; single talker group: .872, multiple talker group: .950). Thus, while performance on filler items, on which it was expected that all subjects would perform well, did not differ significantly, performance on target items did differ significantly between the groups - and in the expected direction - with subjects in the multiple-talker training group outperforming those in the single-talker training group. Figure 3 presents a visualization of these results:

Fig. 3 Proportion correct for subjects in the two training groups on the non-lexical task;

bars represent +/-1 standard error

3.2 Experiment 2

Unlike Experiment 1, Experiment 2 tested the possible influence of talker variability training on the participants’ ability to generalize knowledge gathered from word-based training to novel talkers in a lexical discrimination task that required them to match auditory forms to pictures. Therefore, participants were mainly tested on their ability to store the contrasting sounds.

3.2.1 Participants

Thirty native English speakers without any prior knowledge of Arabic, ranging in age from 18 to 31 (M=24.5) and who were recruited from the University of Utah campus but had not participated in Experiment 1, took part in this experiment. Participants either received undergraduate course credit (N=16) or $10 payment (N=14) for their voluntary participation in the study. via the given background questionnaire, they reported having no speech or hearing problems and no neurological disorders. The questionnaire data also verified that none of them were under the influence of any medication that might affect their motor skills. Participants were randomly assigned to one of the two word learning environments: the single-talker training (4 males and 11 females) and the multiple-talker training (6 males and 9 females).

3.2.2 Stimuli

The two sets of stimuli that were previously used in Experiment 1 were also the stimuli for Experiment 2. That is, 9 minimal pairs (12 target nonwords + 6 filler nonwords) contrasting Arabic glottal-pharyngeal contrasts (i.e. /h/ and /ħ/) in three different positions: initial, intervocalic and final) in a Consonant-Vowel-Consonant-Vowel (CVCV) structure. In Experiment 2, the same pictures and the same productions from the same native Arabic speakers were also used.

3.2.3 Procedure

Like the design of Experiment 1, the first two phases - the word learning phase the and criterion test - were included in Experiment 2, using the same auditory and visual representations. Again, the word-learning training included 108 tokens (12 target words + 6 filler words - * 2 presentations * 3 blocks), and the criterion test included 36 test items (12 target words + 6 filler words - * 2 presentations) that were displayed in a different random order for each participant. When passing the criterion test with 90% or better accuracy, participants could proceed to the final test. Otherwise, the training phase began again. Like in Pater (2003), the final test was different from Experiment 1 and included an XAB lexical discrimination test (sound-picture-picture) that was identical to the criterion test in which learners in the two training groups heard a word (X), saw a picture (A), and then saw another picture (B) and were asked to match the auditory word they heard to the correct picture (either A or B) by pressing either the right or left shift keys (labeled First and Second) on the keyboard. This task required a discrimination of the target pharyngeal-glottal contrasts where A and B included members of the target minimal pairs (contrast trial, e.g. /hibi/-/ħibi/). Unfamiliar talkers produced tokens in the final test. Figure 4 shows an example of stimuli presented in the lexical discrimination test in Experiment 2.

Fig. 4: Example presentations in the XAB lexical discrimination task

(Sound-Picture-Picture) used in Experiment 2

3.2.4 Results

As in Experiment 1, proportion correct (proportion of responses correctly identifying the intended production of the talker) was calculated for each participant. The data were submitted to Analysis of Variance, with the item type (two levels: target, filler) as a within-subjects variable and the training group (single, multiple talker) as a between-subjects variable. This analysis revealed that a significant main effect of training group was significant (F (1,28) = 20.264, p < .001, partial eta squared =. 420), with subjects in the multiple-talker training group (.869) performing more accurately than their counterparts in the single-talker training group (.654). Moreover, the main effect of item type was significant (F (1,28) = 35.598, p < .001, partial eta squared =. 560), with subjects’ performance on targets (.676) being lower than that on fillers (.847). The interaction of item type and training group was significant as well (F (1,28) = 11.861, p < .001, partial eta squared =. 298).

Following up on the significant interaction of item type and training group, we will now focus on the results for each item type separately. There was a significant difference between the two training groups for target items (F (1,28) = 47.722, p < .001, partial eta squared =. 630), where subjects in the multiple-talker training group performed more accurately (.833) than those in the single-talker training group (.519). On the other hand, the effect of the training group on subjects’ performance for filler items was not significant (F (1,28) = 3.281, p = .081, partial eta squared = .105). Hence, while performance on filler items, on which it was expected that all subjects would perform well, did not differ significantly, performance on target items did differ significantly between the groups - and in the expected direction - with subjects in the multiple-talker training group outperforming those in the single-talker training group. Figure 5 presents a visualization of these results:

Fig. 5: Proportion mean correct for subjects in the two training groups on the lexical task;

bars represent +/ -1 standard error

As demonstrated above, both Experiments - 1 and 2 - revealed the expected pattern of results, with subjects in the multiple-talker training conditions outperforming subjects in the single-talker training conditions on the items tested. The data also indicated that all participants were more accurate at identifying familiar contrastive phonemes from their native language (i.e. /s/ and /ʃ/) than novel ones (i.e., /ħ/ and /h/).

3.2.4.1 Comparison of Experiment 1 and Experiment 2: Results

In order to evaluate the effect of task type - in this case, non-lexical versus lexical - the results from Experiments 1 and 2 were compared. An Analysis of Variance was performed, with task type (two levels: non-lexical and lexical) and training group (two levels: single talker and multiple talker) as a between-subjects variable and the item type (two levels: targets and fillers) as a within-subjects variable. As expected from the results reported above for Experiments 1 and 2 separately, the main effect of item type was significant (F (1,56) = 108.965, p <. 001, partial eta squared = .661; target mean: .676; filler mean: .879). In addition, the main effect of the training group was also significant (F (1,56) = 71.415, p<. 001, partial eta squared = .560; single talker mean: .663, multiple talker mean: .892), as was the interaction of item type and training group (F (1,56) = 46.304, p<. 001, partial eta squared = .453). In contrast, neither the main effect of task type (F (1,56) = 1.320, p>. 05, partial eta squared = .023; non-lexical task mean: .793, lexical task mean: .762) nor any of the two-way or three-way interactions involving the task-type variable were significant (all p >.05). Overall, these findings indicate that there was no difference in performance by subjects on the non-lexical versus the lexical tasks. In Figure 6, these findings are visually represented:

Fig. 6: Proportion mean correct for subjects in Experiment 1 (non-lexical task)

and Experiment 2 (lexical task); bars represent +/ -1 standard error

4 Discussion

In the experiments presented here, it has been examined how variability in the voice of the talker and task type can affect native English speakers’ recognition of the Arabic /ħ/-/h/ contrast on two discrimination tasks that required detection of the target contrast. To this end, two groups of native English speakers in each experiment were taught to learn 18 Arabic non-words produced either by a singer talker or several talkers. In Experiment 1, evidence was provided that participants who heard the target tokens produced by multiple talkers during the training, performed significantly more accurately on the test items than their counterparts in the single-talker training groups. That means that native English speakers who heard the target tokens spoken by various talkers during the training phase achieved a percentage correct of 88% or above when matching the test items with their correct auditory, suggesting that their knowledge of the phonological forms of the newly-learned words was improved by the availability of multiple talkers in the training environment. One possibility is that multiple talkers’ speech signals provided rich language input where indexical properties of talkers’ voices integrated with the linguistic component of the target language that consequently resulted in facilitating recognition of the phonological forms of the novel Arabic words by native English speakers in this training condition. In contrast, subjects in the single-talker conditions were deprived of this advantage, and listening to different voices for the first time at the final test added extra difficulty to their task. Not only did they have to focus on the novel contrast in the auditory forms in order to detect the difference between the target tokens, but they also needed to attend to the new voices whose productions of the newly learned words might sound different from those introduced by the familiar single talker in the previous phases. As a result, they correctly distinguished a smaller number of the newly-learned words with 47% accuracy.

Despite the difficulty of the XAB lexical task that was reported by Pater (2003), subjects in the multiple-talker group in Experiment 2 were successfully able to exploit the target phoneme contrast to discriminate the meaning of words in the lexical identification task. For example, realizing that the two tokens diħi and dihi refer to two different lexical items (i.e. a pen and a paper clip, respectively) provided subjects in the multiple-talker group with the adequate information to detect the difference between their middle consonant phonemes /ħ/ and /h/, and that knowledge accordingly helped them establish phonetic categories of the target contrasts (with 83% accuracy). Considered together, findings from the two experiments provide additional evidence supporting the positive role of talker variability in the acquisition of one of the difficult Arabic consonant contrasts (i.e. /ħ/-/h/) that learners of Arabic often find challenging to acquire. This can provide more robust results regarding the beneficial role of talker variability in L2 acquisition.

The accurate performance of the multiple-talker groups can be explained, using the framework of the exemplar models (Goldinger 1998, Johnson 1997). According to this approach, the acoustic characteristics of target tokens produced by different talkers, which include indexical information (i.e. information about talker’s gender, age, dialect, social class, and speaking rate) and phonetic information, are stored in learners’ mental lexicons, resulting in facilitating recognition of novel representations of these target words when they are produced by new talkers. While subjects in the multiple-talker groups stored three representations of each target token produced during the training that helped them in identifying the same tokens when produced by novel talkers during the test phase, subjects in the single-talk groups only stored fewer representations of each token and consequently did not have enough exemplars that could enable them to distinguish the novel productions of the new talkers.

In relation to the second question, comparing subjects’ performance in the two experiments demonstrated no significant difference between subjects’ performance on the XAB non-lexical task (92% correct) versus the XAB lexical task (87% correct) despite the different demands of each task. This finding is consistent with Pater’s (2003) study in which subjects’ performance on the two XAB tasks did not differ (78% correct on both tasks). One possible reason for this finding may be due to the similar L2 input that subjects in the two experiments received. Having the same information, whether introduced by one or multiple talkers, provided subjects with the same input that presumably resulted in mapping stimuli consistently during the two different stages of the experiments (Schneider & Shiffrin 1997) regardless of the different demands of each of the tasks that subjects performed afterwards. In other words, it can be claimed that subjects in the two experiments started the final XAB discrimination task, non-lexical or lexical, with the same mental representations of the newly learned words. Therefore, the different demands of the tasks did not influence their performance.

In the area of L2 instruction, findings from the present study are important for both teachers and designers of L2 materials. They underscore the importance of using rich acoustic input that is characterized with variability in talkers, stimuli and phonetic environments when introducing novel L2 phonetic features. This change in L2 teaching methods is expected to facilitate the acquisition of L2 phonology. Additionally, findings draw L2 teachers’ attention to the significance of using less controlled tasks, similar to the criterion tests used in this study, to better prepare learners for the demands of the controlled tasks that follow. The rationale is that learners need to practice before they are tested on their mastery of the given materials.

In terms of pedagogy, the robust result of the benefits of talker variability in the two experiments implies that L2 learners of Arabic can benefit from exposure to several talkers providing variable Arabic language input to overcome phonological and lexical confusion when they are at an early stage of learning. This finding also draws attention of L2 instructors in general - and Arabic in particular - to the importance of exposing L2 learners to numerous speakers of the target language, for instance through integrating multimedia and / or guest speakers of the target L2. Thus, a systematic examination of the impact of different factors such as talker variability and task type, is essential to elucidate confusions in the acquisition of L2 consonant contrasts. Certainly, these consonants should be addressed more directly and explicitly in pre-reading activities than consonants that the child is expected to learn because of their utility in everyday conversation.

The present study also raises some interesting questions for future research such as:

    • How are linguistic information and indexical properties retained in learners’ lexicons (i.e. in the same or separate units)?
    • How are novel L2 features are initially stored?
    • How are they transferred from learners’ working memory into their long-term memory?

Answers to these questions can possibly help us better see the big picture of speech perception development with reference to variability in talkers and consequently improve our understanding of this issue. Moreover, replicating this study with other salient Arabic contrasts, including segmentals and suprasegmentals and using both perception and production tests, is another direction for future research that can enrich word recognition investigation in particular and L2 speech research in general.

References

Al-Batal, M. & Belnap, R. K. (2006). The teaching and learning of Arabic in the United States: Realities, needs, and future directions. In K. M. Wahba, Z. A. Taha & L. England (Eds.), Handbook for Arabic language teaching professionals in the 21st century (pp. 389-399). Mahwah, NJ: Lawrence Erlbaum Associates.

Al Mahmoud, M. S. (2013). Discrimination of Arabic contrasts by American learners. Studies in Second Language, 3(2), 261-292.

Alwabari, S. (2013). Non-Native Production of Arabic Pharyngeal and Pharyngealized Consonants. Master’s Thesis). Carleton University, Ottawa. Available from ProQuest Dissertations and Theses database.

Aoyama, K., Flege, J. E., Guion, S. G., Akahane-Yamada, R., & Yamada, T. (2004). Perceived phonetic dissimilarity and L2 speech learning: The case of Japanese /r/ and English /l/ and /r/. Journal of Phonetics, 32, 233–250.

Barker, B. A., & Newman, R. S. (2004). Listen to your mother! The role of talker familiarity in infant streaming. Cognition, 94, B45-B53.

Baker, W., & Trofimovich, P. (2006). Perceptual paths to accurate production of L2 vowels: the role of individual differences. IRAL, 44, 3, 231-250.

Bion, R. A. H., Escudero, P., Rauber, A. S., & Baptista, B.O. (2006). Category formation and the role of spectral quality in the perception and production of English front vowels. Proceedings of Interspeech 2006, 1363-1366.

Bradlow, A. R., & Bent, T. (2008). Perceptual adaptation to nonnative speech. Cognition, 106(2), 707–729.

Bradlow, A. R., Pisoni, D. B., Akahane-Yamada, R., & Tohkura, Y. (1997). Training Japanese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning on speech production. The Journal of the Acoustical Society of America, 101(4), 2299-2310.

Bradlow, A. R., Akahane-Yamada, R., Pisoni, D. B., & Tohkura, Y. (1999). Training Japanese listeners to identify English /r/and /l/: Long-term retention of learning in perception and production. Perception & Psychophysics, 61(5), 977-85.

Cartell, T. D. (1984). Contributions of fundamental frequency, formant spacing, and glottal waveform to talker identification. Res. Speech Percept. Tech. Rep. No. 5 (Indiana Univ., Bloomington, IN ).

Curtin, S., Goad, H., & Pater, J. V. (1998). Phonological transfer and levels of representation: The perceptual acquisition of Thai voice and aspiration by English and French speakers. Second Language Research, 14(4), 389–405.

Cutler, A., & Broersma, M. (2005). Phonetic precision in listening. In W. J. Hardcastle & J. M. Beck (Eds.), A figure of speech: A festschrift for John Laver (pp. 63-91). Mahwah, NJ: Erlbaum.

Cutler, A., Weber, A., & Otake, T. (2006). Asymmetric mapping from phonetic to lexical representations in second- language listening. Journal of Phonetics, 34, 269–284.

Flege, J.E., & Liu, S. (2001). The effect of experience on adults’ acquisition of a second language. Studies in Second Language Acquisition, 23, 527-552.

Flege, J. E., MacKay, I. A., & Meador, D. (1999). Native Italian speakers’ perception and production of English vowels. Journal of the Acoustical Society of America, 106(5), 2973-2987.

Forster, K. I., & Forster, J. C. (2003). DMDX: A windows display program with millisecond accuracy. Behavior Research Methods, Instruments, & Computer, 35, 116-124.

Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105, 251–279.

Halle, M. (1985). Speculations about the representations of words in memory. In V. A. Fromkin (Ed.), Phonetic linguistics: Essays in honor if Peter Ladefoged (pp. 101-114). Orlando: Academic Press.

Hardison, D. M. (2003). Acquisition of second-language speech: Effects of visual cues, context, and talker variability. Applied Psycholinguistics, 24(4), 495-522.

Hayes-Harb, R., & Masuda, K. (2008). Development of the ability to lexically encode novel L2 phonemic contrast. Second Language Research, 24(1), 5–33.

Houston, D., & Jusczyk, P. (2000). The role of talker-specific information in word segmentation by infants. Journal of Experimental Psychology: Human Perception and Performance, 26(5), 1570-1582.

Johnson, K., & Mullennix, J. W. (1997). Talker variability in speech processing. San Diego: Academic Press, pp. 1–237.

Jusczyk, P., Pisoni, D., & Mullennix, J. (1992). Effects of talker variability on speech perception by 2- month-old infants. Cognition, 43(3), 253–291.

Lively, S. E., Logan, J. S., & Pisoni, D. B. (1993). Training Japanese listeners to identify English /r/ and /l/: II. The role of phonetic environment and talker variability in learning new perceptual categories. Journal of the Acoustical Society of America, 94, 1242–1255.

Lively, S. E, Pisoni, D. B., Yamada, R. A., Tohkura, Y., & Yamada, T. (1994). Training Japanese listeners to identify English /r/ and /l/: III. Long-term retention of new phonetic categories. Journal of the Acoustical Society of America, 96(4), 2076–2087.

Logan, J. S., Lively, S. E., & Pisoni, D. B. (1991). Training Japanese listeners to identify English /r/ and /l/: A first report. Journal of the Acoustical Society of America, 89, 874–886.

MacKay, I.R.A., Flege, J.E., & Imai, S. (2006). Evaluating the effects of chronological age and sentence duration on degree of perceived foreign accent. Applied Psycholinguistics, 27, 157-183.

MacKay, I. R. A., Meador, D., & Flege, J. E. (2001). The identification of English consonants by native speakers of Italian. Phonetica, 58, 103-125.

Martin, C. S., Mullennix, J. W., Pisoni, D. B., & Sommers, W. V. (1989). Effects of talker variability on recall of spoken word lists. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15(4), 676-684.

McCandliss, B. D., Fiez, J. A., Protopapas, A., Conway, M., & McClelland, J. L. (2002). Success and failure in teaching the [r]-[l] contrast to Japanese adults: Tests of a Hebbian model of plasticity and stabilization in spoken language perception. Cognitive, Affective, and Behavioral Neuroscience, 2, 89 -109.

Moyer, A. (2007). Do language attitudes determine accent? A study of bilinguals in the USA. Journal of Multilingual and Multicultural Development, 28,(6) 502-517.

Mullennix, J. W., & Pisoni, D. B. (1990). Stimulus variability and processing dependencies in speech perception. Perception and Psychophysics, 47, 379-390.

Mullennix, J. W., Pisoni, D. B., & Martin, C. S. (1989). Some effects of talker variability on spoken word recognition. Journal of the Acoustical Society of America, 85, 365–378.

Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech perception. Perception and Psychophysics, 60(3), 355–376.

Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1994). Speech perception as a talker-contingent process. Psychological Science, 5(1), 42-46.

Ota, M., Hartsuiker, R. J., & Haywood, S. (2009). The KEY to the ROCK: near-homophony in nonnative visual word recognition. Cognition, 111, 363- 269.

Pater, J. (2003). The perceptual acquisition of Thai phonology by English speakers: task and stimulus effects. Second Language Research, 19(3), 209-223.

Pater, J., Stager, C, & Werker, J. (2004). The perceptual acquisition of phonological contrasts. Language, 80(3), 384-402.

Rabiee, M. (2010). Arabic, Farsi fluency considered ‘critical’ to US national security. Retrieved from http://www.voanews.com/english/news/usa/Arabic-Farsi-Fluency-Considered-Critical-to-US-National-Security-102171449.html.

Rost, G., & McMurray, B. (2009) Speaker variability augments phonological processing in early word learning. Developmental Science, 12(2), 339-349.

Ryding, C. K. (2006). Teaching Arabic in the United States. In K. M. Wahba, Z. A. Taha & L. England (Eds.), Handbook for Arabic language teaching professionals in the 21st century (pp. 13-20). Mahwah, NJ: Lawrence Erlbaum Associates.

Schmale, R., & Seidl, A. (2009). Accommodating variability in voice and foreign accent: flexibility of early word representations. Developmental Science, 12(4), 583-601.

Schneider, W., & Shiffrin, R. M. (1997). Controlled and automatic information processing: I. Detection, search, and attention, Psychological Review, 84, 1–66.

Sommers, M. S., Nygaard, L. C., & Pisoni, D. B. (1994). Stimulus variability and spoken word recognition. I. Effects of variability in speaking rate and overall amplitude. Journal of the Acoustical Society of America, 96, 1314-1324.

Sommers, M. S., Kirk, K. I., & Pisoni, D. B. (1997). Some considerations in evaluating spoken word recognition by normal-hearing, noise-masked normal-hearing, and cochlear implant listeners. I: The effects of response format. Ear and Hearing, 18, 89-99.

Strange, W., & Dittman, S. (1984). Effect of discrimination training on the perception of /r-l/ by Japanese adults learning English. Perception and Psychophysics, 36(2), 131-145.

Wang, Y., Jongman, A., & Sereno, J. A. (2006). Second language acquisition and processing of Mandarin tone. In E. Bates, L. Tan, & O. Tzeng (Eds.), Handbook of Chinese psycholinguistics (pp. 250-257). Cambridge: Cambridge University Press.

Wang, Y., Spence, M. M., Jongman, A., & Sereno, J. A. (1999). Training American listeners to perceive Mandarin tones. Journal of the Acoustical Society of America, 106(6), 3649-3658.

Weber, A., & Cutler, A. (2004). Lexical competition in nonnative spoken-word recognition. Journal of Memory and Language, 50, 1–25.

Wong, P. C. M., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nature Neuroscience, 10, 420–422.

Author:

Dr. Asmaa Shehata

University of Calgary

Department of Linguistics, Languages and Cultures

CH C114, 2500 University Drive NW,

Calgary, Alberta T2N 1N4, Canada

Email: Asmaa.shehata@ucalgary.ca