Results and Conclusions

Results from Our Project

As you can hear in the Try It Out!, our current algorithm works well as it sonify characters, words, phrases, and low resolution images pretty distinctly. Within our group, we have been able to detect what letter is being played ~90% of the time. We are able to detect words about ~80% and phrases about ~70% of the time. We are satisified with this result because these numbers are close the the benchmarks we strived for at the beginning of project development process. In addition, with our expansion to sonifying images, we are able to detect what the image is about ~50% of the time. However, we have to state that what we are hearing is an image and not a letter/word/phrase. There were also instance where we had to provide some basic level context to the image to hint each other in the right direction.

Strengths & Weaknesses

Strengths in Our Approach:

Weaknesses in Our Approach:

Reflection

 We believe this project overall helped us getting a better understanding of how convolution is used from the context of images. We had to preform a 2D convolution for our image based approach and developing that process taught us a lot of convolution in the context of images. In addition, when it came to our edge detection algorithm, we gained a much better understanding of the nuances of different edge detectors. We intially tried to use a Sobel Edge Detector, but quickly learned that it detects edges with any horizontal component. However, we only wished to identify edges that were purely horizontal for our algorithm; thus, a horizontal gradient detector performed significantly better.

We also learned more about the significant characteristics of a sound in the context of audio analysis. For example, a spectrogram doesn’t tell the whole story for music analysis, since different frequencies in music analysis are considered the same note. This is where the chromagram comes in, because it allows you to see that the different frequencies are actually the same note, and see chord patterns emerge. This is relevent to our project because we hope to be a stepping stone to an SSD. Having a clear understanding of how we can identify, analyze, and characterize various known musical characteristics is critical towards helping us to understand how we could better improve the effectiveness of our tool at communicating a message.

We were able to overcome a lot of the challenges we faced previously. One of the earlier challenges we had was having code both in Matlab and Python and trying to get it all in one language. We tried to use some online tools help with that at first, but we ultimately decided on just retyping the code using the same logic just in a different language. We believed it was the only way to ensure that there were no big mistakes in our language transfer. We were also able to control the speed of the MIDI file interpretation. However, we also learned that faster doesn't always mean better because we are dealing with human interpretation. We would want to play the audio at such a speed that lets the human clearly listen to it, even if it takes a little longer.

Significant improvements to our work lie on the image sonification side of our algorithms. As mentioned previously, accurate interpretation of a sonified image is challenging. Typically, a person would need to know what they could be hearing in order to accurately identify an image. We think expanding on this aspect of our current algorithm could bring very interesting applications to our algorithm and truly grow its scope. However, the complex, nature of images presents a significant challenge to the listener and the algorithm, and we currently have no methods of interpreting and communicating color.

We are also considering an alternate approach to detect and amplify static contours in our sonify_character() algorithm. In this approach, a DFT is applied to the generated audio signal to identify common, repeated frequencies in each character. These common frequencies would show up as peaks in the DFT. After identifying peaks, we apply gain factors to these select peak frequencies (which correspond to rows in our music note matrix). After applying these gains, we hope to be able to more clearly hear the static contours in our character. One thing worth mentioning though, is that this approach would likely benefit from a switch to a sine wave synthesizer instead of a square wave synthesizer, to ensure the detected frequencies are more "pure." This would have helped make the peaks much larger when present and more easily distinguished, thus making the algorithm more precise and effective.