Sonification Algorithm

The DSP Tools We Used

2D Convolution of our Image
Edge Detection
Frequency-Domain Analysis (Spectrograms)
Linear Filtering

Out of Class Tools We Used

Music-specific audio analysis tools (Chromagrams)
Sound synthesis (Square Wave Table Synthesis)
Music21 Python Library

High-Level Algorithm

The figure on the right provides a high level overview of our algorithm. Our project can be used to sonify both text and images. Both routes ultimately build off our sonify_character() algorithm, which forms the foundation of our work.

Sonify_Character()

To develop our sonify_character() algorithm, we went through the following steps. We start off by creating a binary ASCII matrix to represent an image of the character. While we create that matrix, we also create a matrix of musical notes based on a pre-seeded musical chord, assigning them upwards from chord root. Both matrices are the same size. From there, we overlay the ASCII representation of our character onto the music note matrix using element-wise multiplication. This helps us identify all the music notes present in the spatial representation of the character. We then create a MIDI chord objects for each matrix column going from left to right. Lastly, we synthesize audio from the MIDI chord objects using a square wave table synthesize.

The figure on the right shows this process for the letter "W", with the ASCII representation, musical note matrix, overlayed note matrix, MIDI chord object representation, and audio spectrogram in that order.

Issues with This Algorithm

While this algorithm proved to be effective, we ran into trouble when it came to letters like "H" or "A" where the "middle bars" in the letters were hard to distinguish. Upon further research, we learned that the human ear is attracted to high, varying melodies nd not static inner voices. Thus, in order to make it easier to auditorally identify similar elements, we determined we needed to implement a static contour amplification algorithm. We decided to tackle this problem by using an image-processing based approach. For the image-based approach, we implemented a horizontal edge detector to identify static contours in our ASCII character.

Image-Based Approach

When creating our edge detection algorithm we wanted to find where there were long horizontial parts in the image of our letter. To do this, we used a horizontal gradient kernel edge detection, which was convoluted with the ASCII character representation in a 2D convolution. Once we were able to detect the edges, we applied an increasing gain on the detected edges from left to right to a pre-set, experimentally determined maximum value. Then, the gains are applied to the MIDI messages that are used by the synthesizer, in order to amplify static contours in the final generated audio.

On the right, is an image of the edge detection algorithm applied on the letter H. From top to bottom, we see the detected edges, the calculated gains, and the final spectrogram of the generated audio signal (where we can see the intensity of the middle bar increase from left to right).

Sonify_Word()

To sonify words, we implemented our sonify_character() algorithm for every letter in a single word. While repeating it, we varied its pan across the stereo field, so when one listens to the generated audio, they hear it going from left to right. This image on the right shows a spectogram and chromagram of the word "HI". From the chromagram, we can identify the presence of different musical notes in the frequency domain

Sonify_Phrase() & Sonify_Text()

When trying to sonify mutliple words or a phrase, it is not as simple as using sonify_word() for multiple words. This is because sentences are more than words concatenated together. Each sentence is comprised of constituents phrases. Western musical syntax is often organized similarly. Therefore, we chose to assign a single chord for each word of the phrase. To determine the chord progression for moving between words, we attempt to create a connection between hierarchies of words and hierarchies of chords. We do this by classifying the sentence by a type of emotion and determining its tonal center.

Below, we see this implemented with the phrase "HELLO WORLD, WE MEET AGAIN." From the chromagram, we can clearly see that each individual word corresponds to a unique musical chord.

Sonify_Images()

Using sonify_character(), we also can sonify low resolution (simple) images. To do this, we first convert the image to a 0-1 grayscale. Then we would it with a "Gaussian blur" low-pass filter to prevent aliasing in the next step. Next, we downsample the image by discarding every other pixel. We continue to repeat this process until an image of the desired resolution is obtained. Then, we round each pixel value to either a 0 or a 1, and apply the sonify_character() algorithm to the result. The picture on the right shows the progression of taking a simple image of a pair of glasses, blurring and filtering it, and converting it into its MIDI file to sonify.

We also believe that it would be possible to sonify higher resolution images that are very "complex" by breaking that image up into a grid and then using sonify_simple_image() on each small square in that grid. However, that would require an extremely high level of training for one to be able to recognize the image, and we acknowledge that the complexity of the challenge increases exponentially as the elements present in an image begin to interact spatially with each other.