I’ve long felt that words carry distinct vibrations—war resonates differently than clouds—which led me to wonder whether the inner dialogue we speak or the poetry we write has a measurable sonic character, even as those vibrations shift across cultures and languages. Phonetic portrait is a generative sound experience that translates spoken language into music by analyzing phonetic structure rather than semantic meaning.
Keywords: p5.js, creative computing, ML - speech recognition
Process
First Iteration:
Central Idea/Question: I wanted to focus on creating a vibration for a word. How can we create character for each word and represent that as a vibration?
Ideation: I typically start conceptualizing my ideas by jotting down thoughts I have for execution and in this case, how I want to represent the character of each word.
Implementation: Using p5.js ML Speech Recognition library, speech is
Converted to text
Parsed into letters
Each letter is classified as vowels or consonants, each is assigned a weighted value (vowels = 5, consonants = 2).
Consonant percentage selects the audio sample, while accumulated vowel values modulate playback timing over time. As speech continues, these values compound, producing an evolving soundscape.
Playtesting:
Second Iteration:
Pivot Quesion: In the second iteration I thought, how can we better reach the vision of the words creating the sound?
Ideation: Back to the notebook, only this time without a G2 pen (rare sighting)!
Implementation: In this second iteration, speech is converted into sound using oscillators and envelopes. Each word is mapped to a note in the C major pentatonic scale so the output remains harmonically cohesive. Letters are grouped by phonetic type (vowel, plosive, fricative, affricative, nasal, glide, or liquid) and assigned estimated frequencies and durations based on linguistic literature and general phonetic conventions. These values are averaged per word to determine its pitch and length, creating a musical representation of speech that reflects phonetic texture.
Playtesting:
Future/Challenges/Learnings: With no prior experience using oscillators or envelopes, I combined three oscillators and experimented by trial and error. I struggled with the overly electronic sound, wanting the audio to still feel rooted in the words themselves rather than assigning meaning to specific tracks. But after some feedback, I realized there is nothing wrong with using samples, especially after reaching the conclusion that I don't like the digital sounds. As for next steps, I'm applying to "Open Legends" by Black Interactive with an iteration of this piece for the Black History Month exhibit. I plan on presenting this piece with a phone booth set up, giving the users headphones and a microphone to speak to themselves and immerse themselves in the sounds.
Some iterations I'll make to the final version:
Add prompts to guide the user to give the choice of freestyling or not
Add multiple language options
Make the initial visual inviting