Volume 8 (2017) Issue 2 - Article Bliss et al.
JLLT Volume 8 (2017) Issue 2

Journal of Linguistics and Language Teaching

Volume 8 (2017) Issue 2 (PDF)

pp. 173-188

Using Multimedia Resources to Integrate Ultrasound Visualization for Pronunciation Instruction into Postsecondary Language Classes*

Heather Bliss (Victoria, Canada) / Khia A. Johnson (Vancouver, Canada) /

Strang Burton (Vancouver, Canada) / Noriko Yamane (Hiroshima, Japan) /

Bryan Gick (Vancouver, Canada)


Ultrasound visualization has been demonstrated to be an effective tool for the teaching and learning of the pronunciation of challenging speech sounds. However, ultrasound can be difficult to implement in a classroom setting, as it is costly and well-suited to single participants or small groups. This article describes the development, implementation, and evaluation of a solution to this problem, which delivers ultrasound overlay videos in an online format to learners of various languages. Ultrasound overlay videos combine a midsagittal view of the tongue during speech with a side profile view of a speaker’s head, allowing learners to see the articulations of speech sounds. Videos have been developed for isolated speech sounds of all the world’s languages, as well as for words in Japanese, Cantonese, and some Salish languages of British Columbia, Canada. They have been incorporated into blended learning paradigms at Canadian and other universities. Testimonials, surveys, and classroom studies all speak to their effectiveness.

Keywords: pronunciation, ultrasound visualization, multimedia resources, blended learning

1 Introduction

Pronunciation instruction has recently observed a resurgence within second language (L2) pedagogy after a period of relative dormancy. Seemingly at odds with the Communicative Approach and its focus on function over form (e.g. Celce-Murcia, Brinton & Goodwin 1996: 7-11), pronunciation instruction has been described as having been marginalized within the academic discipline of Applied Linguistics (Derwing & Munro 2005) and has suffered from a lack of infrastructure and institutional support (Levis 2009, 2015). However, the field of L2 pronunciation is clearly growing; the conference Pronunciation in Second Language Learning and Teaching has been held annually since 2009, and the Journal of Second Language Pronunciation began in 2015, both with the goals of creating a forum for academic discourse around the topic of pronunciation instruction and of increasing the visibility of the subfield of L2 pronunciation within the fields of Applied Linguistics and Language Education. An increase in meta-analyses and reviews of pronunciation instruction studies also speak to the growth of the field (Bliss, Abel & Gick (to appear), Lee, Jang & Plonsky 2014, Thomson & Derwing 2014). Pronunciation plays a critical role in learners achieving communicative competence, and pronunciation instruction is an essential, if occasionally overlooked, aspect of L2 pedagogy (Neri et al. 2002: 3-4). Despite renewed awareness of the importance of pronunciation, it remains a challenge for instructors to address the pronunciation needs of their students; one of the hurdles is a relative scarcity of research describing and evaluating pedagogical tools, methods, and approaches for teaching pronunciation to L2 learners.

In this article, we aim to contribute to this growing body of research by reporting on the development, application, and evaluation of a pedagogical tool that uses ultrasound imaging technology in an innovative way for L2 pronunciation learning. Our goal was to democratize ultrasound visualization for L2 teachers and learners, making use of the technology in such a way that it could be effectively incorporated in a range of teaching and learning contexts, without needing specialized equipment or expertise. The solution was to develop a technique for creating ultrasound overlay videos, which combine ultrasound videos of a speaker’s tongue movements in speech with profile views of the speaker’s head, and to integrate them into L2 courses via a blended learning paradigm. This paper not only highlights the potential for ultrasound to be incorporated into L2-pronunciation teaching-and-learning contexts, but also serves as a testimonial to the type of language teaching tools that can result from collaboration between researchers, instructors, and learners themselves.

The present article proceeds as follows. In Section 2, we describe the benefits and challenges of using ultrasound visualization technology in L2 pronunciation instruction, as a means to situate ultrasound-based instruction in the context of the teaching and learning of pronunciation. In Section 3, we document the methods used to develop ultrasound overlay videos, and in Section 4, we describe the various applications that have been used for these videos, for different languages and different teaching and learning contexts. In Section 5, we discuss the types of evaluation that have been carried out for assessing the effectiveness of these videos on improving learners’ pronunciation and their impact on student learning. Section 6 concludes.

2 Benefits and Challenges of Ultrasound in L2 Pronunciation Instruction

Current research on pronunciation instruction converges on the point that the once-standard paradigm of engaging learners in “listen-and-repeat” drills is not, in itself, a successful method for pronunciation instruction (e.g. Jones 1997). Aside from the question of how these or other pronunciation training methods can be structured to incorporate the learners’ communicative and contextual needs, there are two key problems with this approach.

First, the “listen-and-repeat” approach does not support the development of learners’ metalinguistic awareness, a key aspect of pronunciation learning (Celce-Murcia et al. 1996, Neri et al. 2002). The positive impact of explicit phonetic instruction for language learners is well-established (e.g. Halliday, McIntosh & Strevens 1964, Lord 2005, Saito 2007, 2011, Gordon, Darcy & Ewert 2013, Kissling 2012, Olson 2014). One of the outcomes of such instruction is increased awareness of phonetic contrasts in the L2, and this increased awareness can lead to better comprehensibility and improved pronunciation (Kennedy & Trofimovich 2010, Venkatagiri & Levis 2007).

Second, the “listen-and-repeat” approach only draws on auditory stimuli as the input. However, the acquisition of new speech sounds is a multimodal experience: learners make use of not only auditory but also visual information to acquire the speech sounds and patterns in a new language (e.g. Catford & Pisoni 1970, Navarra & Soto-Faraco 2007). The visual modality can provide learners with critical information about various aspects of pronunciation, including articulation and prosody. Some articulatory information can be learned via lip movements, an easily accessible source of visual (and tactile) information (Dale & Poms 1994, Navarra & Soto-Faraco 2007, Neri et al. 2002). Beyond lip movements, however, other articulatory movements and processes are hidden from plain view, making it difficult if not impossible for learners to observe (and interpret) them without the use of specialized tools that facilitate visualization of articulation, either indirectly via visual displays of acoustic information from which articulatory information can be derived (e.g. spectrograms), or directly via visual displays of the vocal tract using ultrasound or intra-oral techniques such EPG or EMA (Bliss, Abel, & Gick (to appear)). Notably, the use of these visualization tools in the context of pronunciation learning requires a degree of metalinguistic awareness in order for the learner to interpret what he or she is seeing, and it has been observed that an awareness of articulatory gestures can facilitate improved pronunciation of new sounds.

Of the various tools that facilitate visualization of articulatory configurations and movements, ultrasound has proven to be particularly effective. Inspired by work with deaf and hard-of-hearing spoken language learners (e.g. Bernhardt, Gick, Bacsfalvi & Adler-Bock 2005, Bernhardt, Gick, Bacsfalvi & Ashdown 2003), ultrasound has been used as a biovisual feedback tool for the teaching and learning of difficult second language sounds for over a decade (Cleland, Scobbie, Nakai & Wrench 2015, Gick, Bernhardt, Bacsfalvi & Wilson 2008, Ouni 2014, Pillot-Loiseau, Kamiyama & Kocjančič Antolík 2015, Tateishi & Winters 2013, Tsui 2012, Wilson 2014, Wilson & Gick 2006, Wu, Gendrot, Hallé, & Adda-Decker 2015). In pronunciation instruction, ultrasound has been demonstrated to be particularly effective for improving learners’ vowel articulations, which provide little proprioceptive feedback due to the lack of contact with another articulator or a lack of fricated airflow (Cleland et al. 2015, Pillot-Loiseau et al. 2015), and it is also effective for improving lateral and rhotic articulations (e.g., l and r), which crucially rely on the timing of movements of different parts of the tongue (Gick et al. 2008, Tateishi & Winters 2013, Tsui 2012, Wu et al. 2015), including active dorsal retraction, similar to schwa and back rounded vowels (Gick, Kang, & Whalen (2002)).

Ultrasound works by emitting an ultra-high frequency sound wave through a transducer (or probe) that, in speech applications, can be held against the neck beneath the chin so that the sound can travel through the tongue. This ultra-high frequency sound wave is reflected back to the transducer and is used to create a two-dimensional midsagittal image of the tongue. As ultrasound does not image through bone or air, it can typically be used only to image the surface of the tongue, and not the palate, jaw, or pharyngeal wall.

Ultrasound is highly effective because by visualizing the articulators:

has the potential to contribute to the teaching of pronunciation through both a top-down method (i.e., by shedding more light on underlying articulatory settings) and a bottom-up method (i.e., by enabling learners to view real-time images of their tongues as they produce individual sounds). (Gick et al. 2008: 313)

In comparison, auditory-based pronunciation instruction requires the learner to map acoustic information onto articulatory movements, an extra step that may lead to confusion for the learner, and runs the risk of

drawing false inferences and incorrectly judging the learner’s pronunciation as right when it is wrong, or as wrong when it is right (Wilson 2014: 287).

Ultrasound-based pronunciation instruction, on the other hand, facilitates the observation of a learner’s articulation directly and is therefore less prone to these kinds of errors. As noted by Wilson & Gick:

if learners are able to see directly the articulators, then they probably have an improved perception of the articulatory adjustments needed to improve their pronunciation. (Wilson & Gick 2006: 148)

There are also a number of practical advantages to using ultrasound in pronunciation instruction, as compared with other biovisual feedback tools. Ultrasound is safe, non-invasive, and versatile. The machines are increasingly affordable and portable, and can be set up and used easily. Ultrasound technology is quickly advancing, with handheld devices and smartphone plug-ins being developed and evaluated for various medical applications (e.g. Clarius 2016). However, ultrasound also faces certain challenges in the language-learning context. As it is currently not feasible under most teaching conditions to have multiple ultrasound machines available at any given time, ultrasound has been used less often for teaching in large-group settings such as classrooms or language laboratories. Rather, ultrasound has been viewed as being particularly suited to a ‘single participant design’, in which a single learner’s particular needs, variations, and challenges can be addressed by a one-on-one instructor (Gick et al. 2008: 314). A second challenge with ultrasound-based instruction is in terms of the learners’ capacity to understand what they are seeing. Without specialized training in both ultrasound technology and articulatory phonetics, the images can be difficult to interpret. Trained instructors can advise learners on how to interpret ultrasound images, but this restricts learners to using ultrasound only when an instructor is available; they cannot practice independently. Given these challenges, a question that arises in the context of the language classroom is how to make available to language learners the benefits of ultrasound-based pronunciation instruction beyond the one-to-one instructor-learner paradigm. In the following section, we detail a solution we have developed that allows for flexible, independent pronunciation learning using ultrasound.

3 The Development of Ultrasound Overlay Videos

Our objective was to combine the advantages that ultrasound technology affords with multimedia techniques to develop a pronunciation learning tool that gives learners flexibility, control, and autonomy. By developing ultrasound-based pronunciation resources that could be made available online and in an open-access format, we can enhance teaching and learning by reducing lecture hours and make learning more sustainable and achievable. We wanted to develop strategies to engage students in classroom activities more effectively, and provide rich, meaningful, and intuitive learning experiences. The accessibility and practicality of these methods can help to encourage learner autonomy and lifelong learning. To address some of the limitations of ultrasound in language-learning contexts described above, and to meet the goal of making ultrasound-based pronunciation training accessible and interpretable to more learners, we developed a technique for creating ultrasound overlay videos. These videos combine ultrasound images of tongue movements with external profile views of a speaker’s head. A still frame from a video is given in Figure 1 below; some examples of the videos can be viewed at http://enunciate.arts.ubc.ca.

Figure 1: Still Frame of an Ultrasound Overlay Video

Figure 1 shows a sagittal view of a speaker’s face with an image of the speaker’s tongue overlaid on the surface, creating the illusion of being able to see inside the speaker’s mouth. In the videos themselves, the lips and the jaw move simultaneously along with the tongue to produce sounds, either on their own or as part of words in the L2; learners are able to view the shapes and movements of all of these articulators at the same time.

The videos were created by simultaneously recording a speaker’s tongue and facial movements in speech (using ultrasound and a camcorder, respectively) and then employing the Adobe Creative Cloud software suite to generate overlay videos from the raw footage. A clapperboard was used at the beginning of each filmed segment during the recording process to create a salient audio cue for the two video streams to be synchronized. Once synchronized, the higher quality audio stream (recorded with a headset microphone directly into the laptop) was retained and the other audio stream was discarded. On the ultrasound video, the tongue shape was isolated, either by erasing all other visible imagery in each frame or by using a masking technique that draws a perimeter around the tongue. The black-and-white tongue image was then brightened and coloured pink to more closely resemble a real tongue, and to make the image more interpretable for learners. The tongue image was then scaled and overlaid on the facial image to create the final product (for a more detailed description of this methodology: Abel et al. 2015 and Yamane et al. 2015).

The production methodology is replicable, meaning that instructors of various languages can create customized videos for their learners’ needs. On average, each second of video takes between 20 to 60 minutes to produce. We continue to explore methods for accelerating and automating the video creation process. Once the basic overlay videos are created, additional features can be added, such as freeze frames or slow-motion sequences, in order to highlight certain articulatory shapes and movements.

The videos have been successfully used to date as part of blended learning paradigms in large, university-level language classes. Application of the videos in L2 learning is discussed in more detail in Section 4. An online clickable IPA chart allows learners to view all of the sounds of the International Phonetic Alphabet, and in some cases, videos are embedded in self-directed online tutorials that supplement in-class lessons.

The final point regarding the development of the videos is that this project was designed and carried out collaboratively. The technology was initially developed and tested by a team of linguists, and the project was quickly expanded to include a team of language instructors and technical experts. The rationale for the collaboration was based on the insight that linguistic research into the science of sounds and their articulations (i.e. articulatory phonetics) and its implementations can be productively applied to language teaching and learning. More specifically, linguists and language educators worked together to develop educational resource videos that draw on the expertise of linguists in the area of ultrasound technology and its uses in visualizing speech sounds, as well as the expertise of language instructors in terms of identifying applications and specific pronunciation challenges for language learners. To deliver these educational resource videos successfully to students, the research team relied on the technical expertise of instructional designers and information technologists for assistance in areas including instructional design, the use of technology, web design, and software development and implementation.

4 Applications of Ultrasound Overlay Videos in L2 Teaching and Learning

4.1 Video Library

As a general resource for learners of all languages, we first produced a video library focusing on a wide range of sounds of the world’s languages. The videos in this library are freely available online and copyrighted under a Creative Commons license, which permits anyone to use them, but they cannot be adapted or used for commercial purposes without our permission. There are two types of videos in the library: videos of individual sounds presented in a clickable IPA chart, and instructional videos that help learners to understand the utility of ultrasound-based instruction in language learning. Both types of videos are described below.

4.1.1 Clickable IPA Chart

The clickable IPA chart includes 91 videos, one for each sound in the International Phonetic Alphabet. Learners can click on each sound in the chart to view detailed information about the sound, including an ultrasound overlay video that shows how the sound is articulated. For the consonant sounds, the videos consist of a number of repetitions of the consonant followed by a vowel and well as flanked by two vowels (e.g. Ca, aCa). For the vowel sounds, the videos consist of the vowel in isolation, or flanked by consonants (e.g. V, bV, Vb, bVb). As it is designed to include all documented sounds of the world’s languages, the clickable IPA chart can be used by teachers and learners of any language in any type of instructional environment in which internet access is available. The clickable IPA chart has been used as a teaching resource for hundreds of students in introductory linguistics classes at the University of British Columbia, and is also used as a supplementary resource in a number of language classes. Details of how it impacts student learning in these contexts are given in Section 5.

While useful as a general resource, the chart does not present sounds in a communicative context; learners are not seeing and hearing the sounds as they are found in real words in the language(s) they are learning. To better focus on the specific sounds of individual languages, we have produced customized videos for a number of languages as a supplementary resource to the library of sounds. These customized resources will be discussed in more detail in Section 4.2.

4.1.2 Instructional Videos

Instructional videos made available online were created to help guide students’ interpretations of the ultrasound overlay videos in the clickable IPA chart. This is an important component in using ultrasound overlay videos, as many students, especially those outside of linguistics, have not been exposed to midsagittal (mid-line) views of the tongue before. The aim of these videos is to better equip students in their interpretation of the audiovisual speech information conveyed in the overlay videos, even though the image of the tongue superimposed on the face can be interpreted much more intuitively than raw ultrasound. The videos explain what ultrasound is, and highlight the types of features that students should be aware of, such as tongue height and backness, as well as lip rounding.

4.2 Customized Videos

At the request of language teachers and learners, we have been developing customized ultrasound overlay videos in addition to those accessed from the clickable IPA chart, in order to explain and situate the L2 sounds within the context of learners’ broader experiences. Each set of custom videos has been developed through collaboration with language teachers, native speakers, and linguistics researchers. In this section, we describe three sets of videos – those developed for Japanese, Cantonese, and Salish languages. In addition, we are currently in the process of developing materials for four additional languages: German, French, Spanish, and Mandarin. For each set of videos described below, we outline who the primary collaborators were, what the goals in developing and implementing the videos were from the perspective of our collaborators and our research, and any other details pertinent to the specific project.

4.2.1 Japanese

In collaboration with Japanese instructors in the Department of Asian Studies at the University of British Columbia, we developed a self-directed online pronunciation tutorial for Japanese learners that incorporates both ultrasound overlay videos and animated videos informed by ultrasound-based research on the articulatory properties of Japanese sounds (Yamane, Howson & Wei 2015). These videos have since been used in Japanese classrooms at a number of other universities, including Columbia and Princeton, as well as English classes at universities throughout Japan. To develop the tutorial, linguists and language teachers worked together to identify common pronunciation problems (such as the r-sound of Japanese), and the tutorial focuses on these sounds in particular through the use of ultrasound overlay and animated videos. As learners progress through the tutorial, they are given opportunities to take pre- and post-tests to evaluate their perception, and they are awarded virtual badges that can be sent to their instructors; some instructors are accepting the badges towards course credit.

4.2.2 Cantonese

The University of British Columbia is home to Canada’s first university-level Cantonese program, and while the program is robust, fewer materials are available for Cantonese instruction compared to other languages taught at the university level. Based on feedback from a Cantonese instructor, we identified two sets of sounds that are consistently challenging for learners, yet important to learn early on, as they distinguish many words in Cantonese. Two sets of videos were developed, one for each set of challenging sounds. The first illustrates the minimal set of stop consonants [p]-[t]-[k] in coda position, which are unreleased and challenging to perceive. The second set illustrates the vowel length contrast [a]-[a:]. The videos were produced with one speaker of Hong Kong Cantonese. Minimal pairs or sets were identified for both the consonant and vowel contrasts, and produced in isolation by the speaker. There is one video per word, and the videos are embedded within a tutorial website that situates them pedagogically (for a detailed description of the videos and their implementation: Bliss et al. 2017). A pedagogical study evaluating the effectiveness of the videos for Cantonese learners will be discussed in Section 5.3.3.

4.2.3 Salish Languages

In collaboration with the University of Victoria, the W̱SÁNEĆ School Board, and the Stó:lō, and Splatsin communities of British Columbia, our team has worked to develop ultrasound overlay videos for three critically endangered Salish languages – SENĆOŦEN, Halq’emeylem, and Secwepemc, respectively. Each of these languages boasts a large consonant inventory with a number of sounds and contrasts that can be especially challenging for English-speaking learners. Videos were developed from word lists designed to showcase a large sample of the challenging sounds for each language. Video and ultrasound were recorded in the field with a portable setup. We had the honour to record an L1 speaker for each language, as well as several L2 SENĆOŦEN speakers. The goals in developing these videos were to develop pronunciation materials to help teach the complex consonant inventory and to provide virtual access to L2 learners. These videos contribute both to language revitalization and documentation efforts (further details can be found in Bliss, Burton & Gick 2016, and Bliss et al. (forthcoming)).

5 Evaluation of Ultrasound Overlay Video Integration

Having described the various applications of ultrasound overlay videos, we now turn to the question of whether the videos in fact benefit language learners’ pronunciation of challenging sounds. In this section, we report on three types of information that evaluate the impact of ultrasound overlay videos on pronunciation learning:

    1. instructor and community testimonials,
    2. student surveys, and
    3. comparative classroom studies.

Each of these is detailed below. Testing the efficacy of ultrasound overlay videos remains in the early stages, and further research is underway to create a fuller and more accurate assessment of the overlay videos as a resource for language teaching and learning.

5.1 Instructor and Community Testimonials

Across the board, the language instructors we collaborated with expressed enthusiasm and support for the materials, confirming that they filled a gap in the materials available for teaching pronunciation, and that they were useful resources for students to use outside of classroom time. We received formal endorsements indicating enthusiastic support for our project from several different language departments at the University of British Columbia. A common thread in the testimonials was that the ultrasound-based pronunciation materials supported rather than competed with their own curriculum. This highlights the collaborative and interdisciplinary nature of the project.

Following the development of the videos for SENĆOŦEN, Halq’emeylem and Secwepemc, we collected commentary about the project from various community members involved in the project. The videos are considered to be useful, especially for building metalinguistic awareness and providing a valuable visual resource. The videos also give learners an opportunity to listen and watch speakers many more times than would be feasible in person, particularly in contexts of language endangerment, in which there are few fluent L1 speakers and demands are placed on their time. This type of feedback has been both encouraging and useful. It highlights a high level of engagement, as well as the collaborative nature of the relationship with language learning communities.

5.2 Student Surveys

Qualitative surveys conducted with students who had used the ultrasound overlay videos in their classes as part of a blended learning paradigm similarly indicate that the videos are perceived as having a positive impact on pronunciation learning. For instance, the clickable IPA chart and accompanying instructional videos were recently incorporated into a second-year phonetics / phonology course at the University of British Columbia, and a survey was conducted to assess students’ experience with the videos. Of the 26 students who responded to the survey, 23 (88.5%) indicated that the resources were easy to use and that they helped them understand how sounds are articulated, 21 (81.8%) indicated that the resources helped them understand the differences between sounds, and 24 (92.3%) indicated that they would recommend the resources to other students (Yamane et al. (2015) for additional discussion).

Another survey was given to Japanese students after exposure to the customized Japanese materials. A Japanese instructor at the University of British Columbia implemented a flipped classroom approach, wherein students were instructed to watch (outside of classroom time) a tutorial video incorporating ultrasound overlay technology to explain a challenging set of Japanese sounds. In the classroom, they had opportunities to discuss the video and practice the sounds in peer groups. A survey was conducted to evaluate the effectiveness of the instructional videos, and of the 57 students surveyed, 46 (or 80.7%) reported that the ultrasound video helped them to understand how to pronounce the Japanese sounds, and 47 (or 82.5%) reported that it helped them to achieve the correct pronunciation (Tsuda et al. (2015) for additional discussion).

5.3 Comparative Classroom Studies

While surveys and testimonials provide important qualitative feedback regarding teachers’ and learners’ experiences with ultrasound overlay videos, these alone do not provide conclusive evidence that the videos in fact do improve pronunciation. To supplement the qualitative data, we have conducted a series of comparative classroom studies that allow us to quantitatively assess the impact of ultrasound overlay videos on learners’ pronunciation. These studies are detailed below.

5.3.1 Introductory Linguistics

In order to assess whether ultrasound overlay videos assist linguistics students in learning new speech sounds through the IPA, we conducted an experiment in which four different sections of the same introductory linguistics course were exposed to different instructional interventions (one of which was exposure to ultrasound overlay videos) and then tested on their comprehension and retention of the speech sounds. Each of the sections received the same lecture on a novel set of speech sounds, followed by one of four instructional methods:

    1. a baseline textbook-style handout explaining the contrast,
    2. classroom production practice, repeating after an instructor in unison,
    3. pairwise production practice, in which students practiced contrasts and gave each other feedback, and
    4. watching ultrasound overlay videos of each of the sounds in class.

Immediately following the instruction, students were given a quiz evaluating their comprehension and perception of the different sounds. A second quiz was administered one week later to test their knowledge retention. Results from this experiment indicate that while all groups demonstrated an improvement in their knowledge and perception, there was no significant difference between the interventions. This suggests that ultrasound overlay videos are at least as effective as other methods, though notably they do not require the same amount of face-to-face time as some of the other methods considered in the study, and could thus provide a viable alternative to more intensive approaches (for additional details about this experiment, see Abel et al. 2017).

5.3.2 Japanese

In order to assess the impact of biovisual feedback using ultrasound technology on pronunciation training, we conducted an experiment in which ultrasound recordings were made of four Japanese students (L1 Korean) both before and after the pedagogical intervention. The intervention consisted of a 45-60-minute training session that included watching online video tutorials introducing ultrasound imaging and articulatory phonetics, as well as practice with ultrasound in which learners could see their own tongue shapes while producing sounds, comparing them with those of a native speaker (the instructor). The results of the experiment show that the learners all made significant improvements, with post-training tongue shapes modeling more closely those of a native Japanese speaker (for additional details on this experiment: Noguchi et al. (2015).

5.3.3 Cantonese

The study described in this section was specifically designed to assess the value that the ultrasound overlay videos add to pronunciation training, in the context of a Cantonese language class at the University of British Columbia. The stimuli used in this experiment were described in Section 4.2.2. The results reported here are from a pilot version of this study; a follow up full version is currently in progress (Cheng et al. 2017 for preliminary results). This study asks whether “interacting with ultrasound-enhanced videos improves beginner Cantonese learners’ ability to differentiate between challenging Cantonese sounds in their perception and production” (Bliss et al., 2017). Thirteen students in an introductory Cantonese language class were split into two groups, with each given access to two near-identical websites, one with ultrasound overlay videos and one with audio recordings instead of videos. Students completed a production exam and were then given a week to interact with the websites at their leisure. After the week, students completed a second production exam as well as a perception quiz. While concrete conclusions cannot be drawn from such a small sample size, the trends are promising, as the preliminary production results are consistent with the hypothesis that students with access to the videos demonstrate greater improvement than those with a similar but audio-only experience.

6 Conclusions and Future Directions

Ultrasound overlay videos have proven to be a promising resource for language learners and linguistics students alike. This is supported through preliminary empirical research as well as feedback from learners and teachers in a variety of instructional settings. Developing effective resources for language learning that students can interact with independently and outside the classroom is an important and challenging task, for which there is high demand from our collaborators. This need is particularly acute when considering the context of teaching critically endangered languages, where access to native speakers is limited. While ultrasound overlay videos are promising, much research remains to be done.

In ongoing and future work, our team is currently completing the follow-up study with Cantonese, developing materials for additional languages and designing further classroom studies with a variety of languages. The primary goal of our upcoming research is to pinpoint the degree to which ultrasound overlay videos and ultrasound-informed pronunciation resources are useful and effective in the context of language learning, and how best to assess their efficacy.


Abel, J., B. Allen, S. Burton, M. Kazama, M. Noguchi, A. Tsuda, N. Yamane, & B. Gick (2015). Ultrasound-Enhanced Multimodal Approaches to Pronunciation Teaching and Learning. In: Canadian Acoustics 43 (2015).

Abel, J., H. Bliss, B. Gick, M. Noguchi, M. Schellenberg & N. Yamane (2017). Comparing Instructional Reinforcements in Phonetics Pedagogy. In: Isei-Jaakkola, Toshiko (ed.). Proceedings of the International Symposium on Applied Phonetics.

Bernhardt, B., B. Gick, P. Bacsfalvi & J. Ashdown (2003). Speech habilitation of hard of hearing adolescents using electropalatography and ultrasound as evaluated by trained listeners. In: Clinical Linguistics & Phonetics 17 (2003) 3, 199-216.

Bernhardt, B, B. Gick, P. Bacsfalvi & M. Adler-Bock (2005). Ultrasound in speech therapy with adolescents and adults. In: Clinical Linguistics & Phonetics 19 (2005) 6/7, 605-617.

Bliss, H., J. Abel & B. Gick (to appear). Computer-Assisted Visual Articulation Feedback in L2 Pronunciation Instruction: A Review. In: Journal of Second Language Pronunciation.

Bliss, H., S. Bird, P. A. Cooper, S. Burton & B. Gick (to appear). Seeing Speech: Ultrasound-based Multimedia Resources for Pronunciation Learning in Indigenous Languages. In: Language Documentation & Conservation.

Bliss, H., S. Burton & B. Gick (2016). Ultrasound Overlay Videos and their Application in Indigenous Language Learning and Revitalization. In: Canadian Acoustics 44 (2016) 3, 136-37.

Bliss, H., L. Cheng, Z. Lam, R. Pai, M. Schellenberg & B. Gick (2017). Ultrasound Technology and its Role in Cantonese Pronunciation Teaching and Learning. In: Levis, John & Mary Granthan O’Brien (eds.). Proceedings of the 8th Annual Conference on Pronunciation in Second Language Learning and Teaching, 33-46.

Catford, J. C. & D. B. Pisoni (1970). Auditory versus articulatory training in exotic sounds. In: The Modern Language Journal 54 (1970) 7, 477-481.

Celce-Murcia, M., D. M. Brinton & J. M. Goodwin (1996). Teaching pronunciation: A reference for teachers of English to speakers of other languages. Cambridge: Cambridge University Press.

Cheng, L., H. Bliss, M. Schellenberg & B. Gick (2017). Ultrasound Overlay Videos: Testing its Effectiveness for Teaching L2 Cantonese Sound Contrasts. Poster presented at LSURC: Language Sciences Undergraduate Research Conference. Vancouver: University of British Columbia, February 2-3, 2017.

Clarius (2016). Wireless, handheld ultrasound for iOS and Android debuts. [Press release]. (https://www.clarius.me/aium-debut-pr/; 11.12.2017).

Cleland, J., J. M. Scobbie, S. Nakai & A. Wrench (2015). Helping children learn non-native articulations: The implications for ultrasound-based clinical intervention. Paper presented at the 2015 International Conference of Phonetic Sciences, Glasgow, Scotland. (http://www.icphs2015.info/pdfs/Papers/ICPHS0698.pdf; 12.08.2015).

Dale, P. W. & L. Poms (1994). English pronunciation for international students. Englewood Cliffs, NT: Prentice Hall.

Derwing, T. & M. Munro (2005). Second language accent and pronunciation teaching: A research based approach. In: TESOL Quarterly 39 (2005), 379-97.

Gick, B., B. Bernhardt, P. Bacsfalvi & I. Wilson (2008). Ultrasound imaging applications in second language acquisition. In: Hansen Edwards, J. G. & M. L. Zampini (eds.). Phonology and second language acquisition. Amsterdam: John Benjamins, 309-322.

Gick, B., A. M. Kang & D. H. Whalen (2002). MRI evidence for commonality in the post-oral articulations of English vowels and liquids. In: Journal of Phonetics 30 (2002) 3, 357-371.

Gordon, J., I. Darcy & D. Ewert. (2013). Pronunciation teaching and learning: Effects of explicit phonetic instruction in the L2 classroom. In: Levis, J. & K. LeVelle (eds.). Proceedings of the 4th Pronunciation in Second Language Learning and Teaching Conference. Aug. 2012. Ames, IA: Iowa State University, 194-206.

Halliday, M. A. K., Angus McIntosh & Peter Strevens (1964). The Linguistic Sciences and Language Teaching. London: Longman.

Jones, R. H. (1997). Beyond “listen-and-repeat”: Pronunciation teaching materials and theories of second language acquisition. In: System 25 (1997) 1, 103-112.

Kennedy, S. & P. Trofimovich (2010). Language awareness and second language pronunciation: A classroom study. In: Language Awareness 19 (2010) 3, 171-185.

Kissling, E. (2012). The effect of phonetics instruction on adult learners’ perception and production of L2 sounds. Doctoral dissertation, Georgetown University.

Lee, J., J. Jang & L. Plonksy (2014). The effectiveness of second language pronunciation instruction: A meta-analysis. In: Applied Linguistics 36 (2014) 3, 345-355.

Levis, J. M. (2009). Pronunciation teaching: Possibilities and perils. Paper presented as part of an invited colloquium, Canadian Association of Applied Linguistics, May 2009. Carleton University, Ottawa, Canada.

Levis, J. M. (2015). The journal of second language pronunciation: An essential step towards a disciplinary identity. In: Journal of Second Language Pronunciation 1 (2015) 1, 1-10.

Lord, G. (2005). (How) can we teach foreign language pronunciation? On the effects of a Spanish phonetics course. In: Hispania 88 (2005) 3, 557-567.

Moisik, S. R. (2013). The epilarynx in speech. Unpublished doctoral dissertation, University of Victoria.

Moisik, S. R., J. H. Esling, S. Bird & H. Lin (2011). Evaluating laryngeal ultrasound to study larynx state and height. In: Lee, W. S. & E. Zee (eds.). Proceedings of the 17th International Congress of Phonetic Sciences Hong Kong, 136-139.

Navarra, J. & S. Soto-Faraco (2007). Hearing lips in a second language: Visual articulatory information enables the perception of second language sounds. In: Psychological Research 71 (2007), 4-12.

Neri, A., C. Cucchiarini, H. Strik & L. Boves (2002). The pedagogy-technology interface in computer-assisted pronunciation training. In: Computer-Assisted Language Learning 21 (2002) 5, 393-408.

Noguchi, M., N. Yamane, A. Tsuda, M. Kazama B. Kim & B. Gick (2015). Towards protocols for L2 pronunciation training using ultrasound imaging. Poster presented at the 7th Annual Pronunciation in Second Language Learning and Teaching (PSLLT) Conference, Dallas, Texas. October 2015.

Olson, D. (2014). Benefits of visual feedback on segmental production in the L2 classroom. In: Language Learning and Technology 18 (2014) 3, 173-192.

Ouni, S. (2014). Tongue control and its implication in pronunciation training. In: Computer Assisted Language Learning 27 (2014) 5, 439-453.

Pillot-Loiseau, C., T. Kamiyama & T. Kocjančič Antolík (2015). French /y/-/u/ contrast in Japanese learners with/without ultrasound feedback: Vowels, non-words and words. Paper presented at the 2015 International Conference of Phonetic Sciences, Glasgow, Scotland. (http://www.icphs2015.info/pdfs/Papers/ICPHS0485.pdf; 12.08.2015).

Saito, K. (2007). The influence of explicit pronunciation instruction on pronunciation in EFL settings: The case of English vowels and Japanese learners of English. In: The Linguistics Journal 3 (2007) 3, 16-40.

Saito, K. (2011). Examining the Role of Explicit Phonetic Instruction in Native-Like and Comprehensible Pronunciation Development: An Instructed SLA Approach to L2 Phonology. In: Language Awareness 20 (2011) 1, 45-49.

Tateishi, M. & S. Winters (2013). Does ultrasound training lead to improved perception of a non-native sound contrast? Evidence from Japanese learners of English. Paper presented at the 2013 meeting of the Canadian Linguistic Association, Victoria, BC, Canada. (http://homes.chass.utoronto.ca/~cla-acl/actes2013/Tateishi_and_Winters-2013.pdf; 12.08.2015).

Thomson, R. & T. Derwing (2014). The effectiveness of L2 pronunciation instruction: A narrative review. In: Applied Linguistics 36 (2014) 3, 326-344.

Tsuda, A., K. Yonemoto & H. Hayashi (2015). Teaching pronunciation using the online pronunciation learning website eNunciate!. Paper presented at the 2015 Annual Conference of the Canadian Association for Japanese Language Education, Vancouver, BC.

Tsui, H. M. (2012). Ultrasound speech training for Japanese adults learning English as a second language. Unpublished MSc thesis, University of British Columbia.

Venkatagiri, H. S. & J. M. Levis (2007). Phonological awareness and speech comprehensibility: An exploratory study. In: Language Awareness 16 (2007) 4, 263-277.

Wilson, I. (2014). Using ultrasound for teaching and researching articulation. In: Acoustical Science and Technology 35 (2014) 6, 285-289.

Wilson, I. & B. Gick (2006). Ultrasound technology and second language acquisition research. In: Grantham O’Brien, M., C. Shea & J. Archibald (eds.). Proceedings of the 8th Generative Approaches to Second Language Acquisition Conference (GASLA 2006). Somerville, MA: Cascadilla Proceedings Project, 148-152.

Wu, Y., C. Gendrot, P. Hallé & M. Adda-Decker (2015). On improving the pronunciation of French /r/ in Chinese learners by using real-time ultrasound visualization. Paper presented at the 2015 International Conference of Phonetic Sciences, Glasgow, Scotland. (http://www.icphs2015.info/pdfs/Papers/ICPHS0786.pdf; 12.08.2015).

Yamane, N., J. Abel, B. Allen, S. Burton, M. Kazama, M. Noguchi, A. Tsuda & B. Gick (2015). Ultrasound-integrated pronunciation teaching and learning. Ultrafest VII, University of Hong Kong.

Yamane, N., P. Howson & P.-C. Wei (2015). An Ultrasound Examination of Taps in Japanese. Proceedings of the 18th International Congress of Phonetic Sciences. Glasgow, ed. The Scottish Consortium for ICPhS 2015, UK: University of Glasgow.

* We thank all members of the eNunciate project team, including Jennifer Abel, Blake Allen, Sonya Bird, Lauretta Cheng, Misuzu Kazama, Bosung Kim, Masaki Noguchi, Raymond Pai, Murray Schellenberg, Asami Tsuda, and Yik Tung Wong. We also thank our community partners from WSÁNEĆ (Lou Claxton, PEPAḴIYE Ashley Cooper, ÍYIXELTW̱ Nick Henry, Tiffany Śwxeloselwet Joseph, Katia Olson, and Tye Swallow), Stó:lō (Siyamiateliyot Elizabeth Phillips), and Splatsin (Ntlola Emmeline Felix, Aaron Leon, Rosalind Williams, as well as liaison Christine Schreyer). This work has been supported by NIH Grant DC-02717 to Haskins Laboratories, the UBC Teaching and Learning Enhancement Fund, the UBC Faculty of Arts, and a Banting Fellowship to the first author.


Dr. Heather Bliss

Banting Postdoctoral Fellow

Department of Linguistics

University of Victoria

Email: hbliss@uvic.ca

Khia A. Johnson

Doctoral Student

Department of Linguistics

University of British Columbia

Email: khia.johnson@gmail.com

Dr. Strang Burton


Department of Linguistics

University of British Columbia

Email: strang.burton@ubc.ca

Dr. Noriko Yamane

Associate Professor

Graduate School of Integrated Arts and Sciences

Hiroshima University

Email: yamanen@hiroshima-u.ac.jp

Dr. Bryan Gick


Department of Linguistics

University of British Columbia

Email: gick@mail.ubc.ca