Importance of Sound and Linguistics
The Universal Design for Learning (UDL) guidelines identify that “sound is a particularly effective way to convey the impact of information, which is why sound design is so important in movies and why the human voice is particularly effective for conveying emotion and significance."
Morgan (2021) summarizes Gardner's linguistic intelligence as the intelligence most involving a person’s ability to solve a problem or do something considered valuable in one or more cultures making it appear to be the one intelligence most widely shared by humans across the world because without linguistic skills--in semantics, phonology, syntax, and pragmatics--people would have difficulty functioning with efficacy in the world.
In the following sections, we'll explore how Eleven Labs' Text-to-Speech (TTS) and Speech-to-Speech (STS) technologies connects to learning design & technologies theories as well as research done around these technologies can impact learners of different linguistic skill levels and learning abilities.
Speech's Role in Learning Design and Knowledge Acquisition
Learners encode the sounds the words make. This is one of the reasons why much of what we teach young children is done through song, rhyme, and rhythm (Spielman et al., 2018). Eleven Labs' TTS and STS technologies enables teachers and instructors with the ability to easily enhance their learner's ability to encode prepared written or oral information by using their AI voice generator to create natural, realistic sounding voices at whatever pacing makes sense for their audience with their precision tuning tools.
Ertmer and Newby (2018) posit Constructivist assumes transfer can be facilitated by involvement in authentic tasks anchored in meaningful contexts. Similarly, three themes are often identified with Vygotsky’s ideas of sociocultural learning: (1) human development and learning originate in social, historical, and cultural interactions, (2) use of psychological tools, particularly language, mediate development of higher mental functions, and (3) learning occurs within the Zone of Proximal Development (Polly et al., 2018). In summary, learning occurs on deeper levels in relatable, real life contexts. Because of Eleven Labs' AI Voice generation, they are able to more deeply entrench their audience into a specific auditory context with their extensive AI voice library of different modulation controls (i.e. regional accents and tones).
TTS in Research
Few studies have been done on the advancement of the new technologies of TTS/STS. One common theme in the few resources found center around what Baddeley and Hitch (1974) have proposed as a model for short-term memory (aka. working memory). In this model, a central executive part of memory supervises or controls the flow of information to and from the three short-term systems: a visuospatial sketchpad, an episodic buffer, and a phonological loop (Spielman et al., 2018). Extending this model is the Cognitive Load Theory that posits that multimedia learning occurs with a limited working memory resource and distinguishes three types of load: Intrinsic, Extraneous, and Germane (Liew, 2023, as cited in Kalyuga, 2011).
In researching TTS's impacts for students with reading disorders, Bonifacci et al. (2022) found for the text-to-speech condition, both groups showed better reading comprehension and reduced rates of mind wandering. Students with dyslexia were significantly more on task in the text-to-speech condition compared to the self-paced reading condition. It was also noted that TTS would be helpful to all students (not just students with dyslexia) when engaging in long/difficult texts/material that evoke low levels of motivations. Eleven Labs' TTS and STS technologies have a clear place and connection to the reading impairment space as they make it easy for educators to add another modality layer to aid the reading comprehension. Relating this back to the working memory model, the TTS and STS provide a phonological stimulus to aid the executive part in better germination--retention and comprehension--of the information into memory.
There is research from de Almeida, et al. (2022) that suggests TTS can facilitate pronunciation practice, in terms of both perception and production. De Almeida (2022) highlights that pronunciation inaccuracies result in communication breakdowns, and that language users must improve their pronunciation abilities in order to deliver and comprehend L2 speech, but tailoring pronunciation instruction in a classroom environment proves to be too taxing on the instructors time to be able to tailor lessons to each individual student's needs as they can all be different. Liew (2023, as cited in Davis et al., 2019) indicates that voice prosody--or the vocal qualities such as pitch, tempo, stress, intonation, melody, loudness, accent, and pause--can influence cognitive load, specifically germane cognitive load among non-native English speakers. The two aforementioned research articles highlight Eleven Labs' as a candidate for the top spot to help language educators by providing them an AI generated library of thousands of different voices and capabilities to translate TTS in 29 different languages.
In stark contrast to the other two research articles, Liew specifies the case for the quality of the neutral style of TTS. While this review of Eleven Labs' technology has highlighted the depth and diversity of its many TTS/STS features, the research by Liew calls to attention the most important aspect of Eleven Labs' technology, the subtle, but major difference in the quality and customizability of the TTS/STS prosody. Liew (2023) finds that for lower-level and/or non-native English speakers, a voice imbued with strong prosodic cues (a strong-prosodic voice) can impede fluent understanding of the intended content. Liew (2023) summarizes Plass' and Kalyuga's 'Emotions as Suppressor' view suggesting that processing emotions can impose extraneous cognitive load, which competes for working memory resources during multimedia learning.
References
CAST (2018). Universal Design for Learning Guidelines version 2.2. Retrieved from http://udlguidelines.cast.org
Bonifacci, P., Colombini, E., Marzocchi, M., Tobia, V., & Desideri, L. (2022). Text‐to‐speech applications to reduce mind wandering in students with dyslexia. Journal of Computer Assisted Learning, 38(2), 440–454. https://doi.org/10.1111/jcal.12624
de Almeida, J. F., Gottardi, W., & Tumolo, C. H. S. (2022). Automatic Speech Recognition and Text-to-Speech Technologies for L2 Pronunciation Improvement: Reflections on their Affordances. Texto Livre, 15, e36736-. https://doi.org/10.35699/1983-3652.2022.36736
Ertmer, P. A. & Newby, T. (2018). Behaviorism, Cognitivism, Constructivism: Comparing Critical Features From an Instructional Design Perspective. In R. E. West, Foundations of Learning and Instructional Design Technology: The Past, Present, and Future of Learning and Instructional Design Technology. EdTech Books. https://edtechbooks.org/lidtfoundations/behaviorism_cognitivism_constructivism
Liew, T. W., Tan, S.-M., Pang, W. M., Khan, M. T. I., & Kew, S. N. (2023). I am Alexa, your virtual tutor!: The effects of Amazon Alexa’s text-to-speech voice enthusiasm in a multimedia learning environment. Education and Information Technologies, 28(2), 1455–1489. https://doi.org/10.1007/s10639-022-11255-6
Morgan, Hani. (2021). Howard Gardner’s Multiple Intelligences Theory and His Ideas on Promoting Creativity. Distributed by ERIC Clearinghouse.
Polly, D., Allman, B. , Casto, A., & Norwood, J. (2018). Sociocultural Perspectives of Learning. In R. E. West, Foundations of Learning and Instructional Design Technology: The Past, Present, and Future of Learning and Instructional Design Technology. EdTech Books. https://edtechbooks.org/lidtfoundations/sociocultural_perspectives_of_learning
Spielman, R. , Dumper, K., Jenkins, W. , Lacombe, A., Lovett, M., & Perlmutter, M. (2018). Memory. In R. E. West (Ed.), Foundations of Learning and Instructional Design Technology. EdTech Books. https://edtechbooks.org/lidtfoundations/memory
Text to Speech & AI Voice Generator. ElevenLabs. (n.d.). https://elevenlabs.io/