SLaTE 2019

8th Workshop on Speech and Language Technology in Education

Program

A summary of the SLaTE 2019 schedule is presented immediately below and details about which presentations are included in each session are presented further down on the page. Note that the oral presentation slots are 25 minutes with a target of 20 minutes for the presentation and 5 minutes for discussion.

Friday, September 20

8:00 - 8:30: Registration

8:30 - 8:45: Welcome remarks

8:45 - 9:45: Keynote Presentation (Martin Russell)

9:45 - 10:15: Coffee break

10:15 - 11:55: Paper Session I (Spoken CALL Shared Task)

11:55 - 13:00: Lunch (buffet lunch provided to SLaTE attendees)

13:00 - 14:15: Paper Session II

14:15 - 14:45: Coffee break

14:45 - 15:45: Demo Session

15:45 - 16:15: Sponsor presentations

16:15 - 17:00: SLaTE General Assembly and Panel Discussion

17:00 - 22:00: SLaTE banquet for all registered participants at the Buschenschank Dorner in the beautiful "Styrian Tuscany" wine region (bus transportation will be provided)

Saturday, September 21

8:30 - 9:45: Paper Session III

9:45 - 11:00: Coffee break and Poster Session

11:00 - 12:00: Keynote presentation (Dorothy Chun)

12:00 - 12:45: Lunch (buffet lunch provided to SLaTE attendees)

12:45 - 14:00: Paper Session IV

14:00 - 14:15: Coffee break

14:15 - 15:30: Paper Session V

15:30 - 15:45: Closing remarks

Keynote Speakers

Martin Russell, Professor of Information Engineering, School of Computer Science, University of Birmingham

Time: Friday, September 20, 8:45 - 9:45
Title: Identification of phonetic structure in deep neural networks used in speech recognition (and why this is important for SLaTE)
Abstract: Automatic speech recognition (ASR) is a fundamental technology for SLaTE. In applications such as computer assisted language learning, reading or pronunciation tuition, it is the key enabling technology rather than just an alternative means to interact with a computer. Progress in ASR over the past 40 years is characterised by steady, incremental scientific progress, punctuated by more fundamental changes. The latter includes the shift from explicit use of human knowledge to more formal mathematical methods at the end of the 1970s, the almost universal adoption of statistical techniques based on hidden Markov models (HMMs) at the end of the 1980s and the rise of deep neural networks (DNNs) at the start of the 21^st century. The trend is to abandon the use of human knowledge in favour of machine learning, enabled by improvements in computer technology and the availability of ever larger speech corpora. Unfortunately, as our reliance on human knowledge diminishes, so does our ability to understand how our ASR systems work from the perspective of speech and in particular phonetics. Researchers whose primary goal is better ASR performance may argue that this doesn’t really matter, but this is not the case in SLaTE applications, where a detailed understanding of why an ASR system has made a particular decision may be crucial to provide the user with feedback.

This talk is about understanding, from the perspective of speech science, how the DNNs that are used in ASR represent speech. I will begin with results of experiments that indicate that visualization of the patterns of activity in a low dimensional “bottleneck” layer of a DNN can be interpreted in phonetic terms. Then I will consider evidence that this interpretation is robust and useful, in that it is maintained in a topological sense across different versions of the same DNN trained on the same and different speech corpora. Next I will consider how these results might be exploited to provide phonetically useful feedback in SLaTE applications and I will propose a particular example from language acquisition in children.

In the final part of the talk I will briefly discuss how these results suggest that a particular type of mathematical object, called a topological manifold, might provide a new model of “acoustic speech space” and present the results from ASR experiments using a variant of the basic DNN-HMM structure that is inspired by these ideas.

Dorothy Chun, Professor of Education, University of California Santa Barbara

Time: Saturday, September 21, 11:00 - 12:00
Title: It’s not what you say, it’s how you say it (or Der Ton macht die Musik): Visualizing tone and intonation in L2 speech
Abstract: Based on my role as Editor-in-Chief of the journal Language Learning & Technology and on my own research in the field of Computer-Assisted Pronunciation Teaching (CAPT), I will provide a brief overview of current research on using speech technologies for second language learning (O’Brien et al., 2018), including automatic speech recognition (Van Doremalen et al., 2013) and text-to-speech (Hilbert et al., 2010). I will then focus on one of the most challenging aspects of speech for L2 learners, namely prosody, and will specifically outline how technology might be used to help with the perception and production of L2 tone and intonation, sub-components of prosody. Prosody has been shown to be even more important than individual sounds for comprehensibility and intelligibility of (L2) speech (Munro & Derwing, 2011), yet it is difficult to teach and learn. My own projects and research have involved the development and user studies of software and apps for teaching tone and intonation using multimodal technologies that provide learners with audio and visual input and feedback (Chun et al., 2015; Chun & Levis, forthcoming; Niebuhr et al., 2017). What is needed for the future are training and assessment tools for L2 intonation at the discourse level (e.g., Hardison, 2005, 2018; Wang et al., 2017). A critical component will be how to provide usable feedback to learners based on the acoustic visualizations. I will suggest ways that applied linguistics and speech technologists can fruitfully collaborate to advance cutting edge technologies for contextualized utterances and visualizations of intonation patterns that concretely display how “it’s not what you say but how you say it.”

Paper Session I (Spoken CALL Shared Task): Friday, Sept. 20, 10:15 - 11:55

Claudia Baur, Andrew Caines, Cathy Chua, Johanna Gerlach, Mengjie Qian, Manny Rayner, Martin Russell, Helmer Strik and Xizi Wei. Overview of the 2019 Spoken CALL Shared Task.

Daniele Falavigna, Roberto Gretter and Marco Matassoni. The FBK system for the 2019 Spoken CALL Shared Task.

Mengjie Qian, Peter Jancovic and Martin Russell. The University of Birmingham 2019 Spoken CALL Shared Task Systems: Exploring the importance of word order in text processing.

Volodymyr Sokhatskyi, Olga Zvyeryeva, Ievgen Karaulov and Dmytro Tkanov. Embedding-based system for the Text part of CALL v3 shared task.

Paper Session II (Pronunciation): Friday, Sept. 20, 13:15 - 14:30

Adriana Guevara-Rukoz, Alexander Martin, Yutaka Yamauchi and Nobuaki Minematsu. Prototyping a web-based phonetic training game to improve /r/-/l/ identification by Japanese learners of English.

Lei Chen, Qianyong Gao, Qiubing Liang, Jiahong Yuan and Yang Liu. Automatic Scoring Minimal-Pair Pronunciation Drills by Using Recognition Likelihood Scores and Phonological Features.

Aparna Srinivasan, Chiranjeevi Yarra and Prasanta Kumar Ghosh. Automatic assessment of pronunciation and its dependent factors by exploring their interdependencies using DNN and LSTM

Demo Session: Friday, Sept. 20, 15:00 - 16:15

Chiranjeevi Yarra and Prasanta Kumar Ghosh. voisTUTOR: Virtual Operator for Interactive Spoken English TUTORing.

Elham Akhlaghi Baghoojari, Branislav Bédi, Matthias Butterweck, Cathy Chua, Johanna Gerlach, Hanieh Habibi, Junta Ikeda, Manny Rayner, Sabina Sestigiani and Ghil'Ad Zuckermann. Demonstration of LARA: A Learning and Reading Assistant.

Ralph Rose. Fluidity: Developing second language fluency with real-time feedback during speech practice.

Gary Yeung, Alison L. Bailey, Amber Afshan, Morgan Tinkler, Marlen Q. Pérez, Alejandra Martin, Anahit A. Pogossian, Samuel Spaulding, Hae Won Park, Manushaqe Muco, Abeer Alwan and Cynthia Breazeal. A robotic interface for the administration of language, literacy, and speech pathology assessments for children.

Paper Session III (Comprehensibility, Intelligibility, and Dialect Classification): Saturday, Sept. 21, 8:30 - 9:45

Zhenchao Lin, Yusuke Inoue, Tasavat Trisitichoke, Shintaro Ando, Daisuke Saito and Nobuaki Minematsu. Native Listeners' Shadowing of Non-native Utterances as Spoken Annotation Representing Comprehensibility of the Utterances.

Wei Xue, Catia Cucchiarini, Roeland van Hout and Helmer Strik. Acoustic correlates of speech intelligibility: the usability of the eGeMAPS feature set for atypical speech.

Johanna Dobbriner and Oliver Jokisch. Implementing and evaluating methods of dialect classification on read and spontaneous German speech.

Poster Session: Saturday, Sept. 21, 9:45 - 11:00

Jorge Proença, Ganna Raboshchuk, Ângela Costa, Paula Lopez-Otero and Xavier Anguera. Teaching American English pronunciation using a TTS service.

Fred Richardson, John Steinberg, Gordon Vidaver, Steve Feinstein, Ray Budd, Jennifer Melot, Paul Gatewood and Douglas Jones. Corpora Design and Score Calibration for Text Dependent Pronunciation Proficiency Recognition.

Sweekar Sudhakara, Manoj Kumar Ramanathi, Chiranjeevi Yarra, Anurag Das and Prasanta Kumar Ghosh. Noise robust goodness of pronunciation (GoP) measures using teacher's utterance.

Yiting Lu, Katherine Knill, Mark Gales, Potsawee Manakul and Yu Wang. Disfluency Detection for Spoken Learner English.

Chiranjeevi Yarra, Manoj Kumar Ramanathi and Prasanta Kumar Ghosh. Comparison of automatic syllable stress detection quality with time-aligned boundaries and context dependencies.

Ray Budd, Tamas Marius, Doug Jones and Paul Gatewood. Using K-Means in SVR-Based Text Difficulty Estimation.

Prasanna Kothalkar, Dwight Irvin, Ying Luo, Joanne Rojas, John Nash, Beth Rous and John Hansen. Tagging child-adult interactions in naturalistic, noisy, daylong school environments using i-vector based diarization system.

Paper Session IV (Call Systems and Reading Prosody): Saturday, Sept. 21, 12:45 - 14:00

Neasa Ní Chiaráin and Ailbhe Ní Chasaide. An Scéalaí: autonomous learners harnessing speech and language technologies.

Elham Akhlaghi Baghoojari, Branislav Bédi, Matt Butterweck, Cathy Chua, Johanna Gerlach, Hanieh Habibi, Junta Ikeda, Manny Rayner, Sabina Sestigiani and Ghil'Ad Zuckermann. Overview of LARA: A Learning and Reading Assistant.

Erika Godde, Gerard Bailly and Marie-Line Bosse. Reading Prosody Development: Automatic Assessment for a Longitudinal Study.

Paper Session V (Classroom): Saturday, Sept. 21, 14:15 - 15:30

Mohamed El Hajji, Morgane Daniel and Lucile Gelin. Transfer Learning based Audio Classification for a noisy and speechless recordings detection task, in a classroom context.

Helmer Strik, Anna Ovchinnikova, Camilla Giannini, Angela Pantazi and Catia Cucchiarini. Student’s acceptance of MySpeechTrainer to improve spoken academic English.

Satoshi Kobashikawa, Atushi Odakura, Takao Nakamura, Takeshi Mori, Kimitaka Endo, Takafumi Moriya, Ryo Masumura, Yushi Aono and Nobuaki Minematsu. Does Speaking Training Application with Speech Recognition Motivate Junior High School Students in Actual Classroom? -- A Case Study.

Google Sites

Report abuse