10/30: Universal translation & AI that speaks

Laura Romig, Language Ambassador, Class of 2025

Imagine this scenario: You can speak any language in the world, at any time. Within seconds, something you've written or said in your native language can be spoken out loud, in your voice, with your inflections and pitch, in a completely different language you've never learned or even heard of. It sounds almost utopian, a universal language black-box that you'd find in a work of science fiction, like the Babel fish in Hitchhiker's Guide. And yet, in today's world, it's already a reality — sort of. 


Several companies, from technology giants like Meta to specialized startups like Resemble.AI, now offer technology that is able to replicate your voice in dozens of different languages. From just a few seconds of you speaking in your voice, it can generate new clips of an AI-voice resembling yours, speaking in any of the available languages. 


When this sort of technology appears in science fiction, its implications are usually that language differences no longer matter; thought has been universalized, and you can hear anyone speak, naturally enough, in your native language. What this also seems to imply is that there is no longer any need for people to learn languages, at least for those who can afford the technology. And that has been the conclusion some AI-optimists are drawing from this technology.


What does this mean for our current world? While the AI voice tech isn't quite at the fantastic level of science fiction, it's powerful enough that it could easily be mistaken for natural language. It's possible you've heard AI-voiced language already, in public spaces, on websites with audio options, or more. And the tool is powerful enough that prominent political figures — Mayor Eric Adams of New York, for example — are using them to disseminate information to citizens.


This type of technology allows people like Mayor Adams to present themselves to voters as speakers of other languages: he used Yiddish, Spanish, and Mandarin, all of whom have large populations of speakers in New York. It also does accomplish one goal of linguistic justice and accessibility: offering information to citizens in their first languages, and acknowledging the linguistic diversity that exists in our country. But it does so using technology and generated voices, not the care and inclusion of speakers of that language. Furthermore, this seemingly noble goal of accessibility and justice is difficult to separate from the goal of trying to promote one's campaign and reach more voters for personal gain.


Beyond disseminating public information, what are the other practical applications of technology like this? One startup with AI-voice technology for different languages, ElevenLabs, said they hoped it could be used for narrating audiobooks and "eliminate... linguistic barriers to content." But why rely on AI-generated voices when there are real speakers of every existing language out in the world who could be hired to read the audiobook, or to translate the content? Relying on AI instead might accomplish the exact opposite goal: in trying to further spread the language, it neglects the actual speakers and communities who use it, denies them professional and creative opportunities, and potentially even hinders the development and growth of that language.


What about personal use, for traveling in countries where a language you don't know is spoken, or interacting with people without a mutual language, whether in a business or a personal setting? It's a shortcut, a quick fix like using Google Translate in another country, but with the added element of human-sounding speech. But its convenience prevents your brain from even trying to do the work to understand a language you don't speak, or to begin understanding the cultural elements and practices that are embedded in any language. 


In science fiction, universal voice translators connect directly to your thoughts and allow you to express yourself authentically using a language you don't know; the fantastical tinge allows us to forgo wondering how exactly the technology produces words in the unspoken language, or translates them back to the listener. But in reality, the machines themselves aren't neutral pieces of magic used for plot: their translation and use of your voice are based on a program, written by humans, based on the input you give and the input it has been given in the past. I've already discussed the implications of this in previous articles: that certain languages have more or better data, or receive more attention, while others are neglected.


I'll add to that: some languages may be given political preference, to influence communities of voters, or commercial preference, to influence groups of the market. Language could become like a commodity: purchase the entirety of a language, at your fingertips to speak through a machine, for 19.99 a month... And neglect all of the cultural knowledge and personal connections that make learning a language meaningful. Not to mention, using AI-generated language also prevents you from achieving the documented benefits, like cognitive flexibility, empathy, and more, that come from learning a language. Ultimately, technology like this is an impressive tool and product, but it reduces language to the same level. And language has never been just a tool or a product. 


For more, read about the legal implications of AI-generated voices. And if you're inclined, email and tell me what you think about AI-voiced language and the role of language learning in a world of "universal translation."