Home

Track Separation of Multiphonic Music

I created and implemented an algorithm that will split a chord into its individual notes.

For our demo, we decomposed a E7 sharp 9 chord played on a guitar. An E7 sharp 9 chord consists of 5 notes: E, G#, B, D, and G (not necessarily in that order). Using our rudimentary filter that searches for peaks in amplitudes in the spectrogram, we found that we could split this guitar chord into its comprising 5 notes. Below, you will find the spectrogram of the original chord. Notice the amplitudes of the harmonics, especially right after the chord is strummed.

Here's a spectrogram of an E7#9 chord on guitar. 
Picture

Below, we have the decomposed sound files of the 5 notes that comprise the E7 sharp 9 chord: E, G#, B, D, and G. The clearest note seems to be the G sharp and the most muddled note is the G. Because of the proximity of G and G sharp notes, the filter showed the worst performance with the G note: the G and G sharp note will both be heard in the last wav file below.
Regardless, the files below demonstrate that our chord has been completely decomposed using our filter. By combining the 5 wav files below, an accurate manifestation of the original E7 sharp 9 chord would be produced.
Note: The sound files below are very quiet; you may need to increase the volume to hear the files.

Below is the combined sound files from above. You can hear that this combination is similar to the original sound file, but sounds a little distorted. This is because we have combined only certain strips of frequencies, not the whole spectrum. In other words, there is much less information in this new sound file than in the original.
Nevertheless, the demo is shown to work!

This work resulted in a publication with recommended citation: Greer, Tim and Remba, Joshua, "Track Separation in Multiphonic Music" (2014). Washington University Undergraduate Research Digest, Volume 9, Issue 2


The code and full demo of this work is found at http://wuseparateways.weebly.com

Synthesizing Speech

Can we synthesize speech? Here is an excerpt from a woman saying, "She had your dark greasy suit in greasy wash water all year":

Using the power, f0 values, and formants, I tried to synthesize the voice of this woman. I used an impulse train exciter for the voiced speech and white noise for unvoiced speech. Here was the result:

Only certain parts of the speech are recognizable. This suggests that a more accurate excitation model would be better for synthesizing this passage. Overall, I was impressed with how well this very rudimentary model performed!

The code and full demo of this work is found at https://github.com/timothydgreer/speech/blob/master/potpourri/synthesized.m


Pitch Detection Using Cepstrum

For this little project, I used cepstrum to determine the pitch of a speaker. Here, a man is saying "ahhh":

Using cepstral analysis, I was able to "cut through" the formants and get the fundamental frequency of this: 120 Hz. This is found by taking the peak of the cepstral graph and dividing the sampling rate by this index. In this case, we found the peak at index 83, and 10000/83 ~= 120 Hz



Robotic Vowels

We know that vowels tend to have certain formants. I tried to simulate the speech of two vowels: 'a' and 'e'.

I convolved two decaying sines to represent the formants of these vowels and an exciter that was represented as an impulse train.

Here is the output I get from a "male" (200 Hz) speaker saying "e":




The code and full demo of this work is found at https://github.com/timothydgreer/speech/blob/master/potpourri/produce_vowels.m Feel free to play around with the code!


Visualizing Sounds

Here is a sound file of a man saying, "Six nine eight nine six four two": https://github.com/timothydgreer/speech/blob/master/show_spectrograms/specs.wav

Here are two spectrograms of the sound, computed from MATLAB:

This is called a narrowband spectrogram. Notice the quantized vertical lines? Those are harmonics.
Narrowband spectrograms are generally used to find the fundamental frequency.

This is called a wideband spectrogram. How there aren't vertical lines here?
Wideband spectrograms are generally used to find the intensity of a signal. The center of each band of energy is generally taken to be the formant frequency.



Analyzing Sounds using Sinusoidal Model
One way of analyzing sand reconstructing audio signals is to approximate these signals using a sinusoidal model. These sinusoidal models are useful in resynthesizing audio signals, ignoring the "non-sinusoidal" components of sound.

I experimented with the sinusoidal model. Here is a short saxophone phrase:

I ran my algorithm on the last 7 seconds of this audio file. Here is the sinusoidal approximation to the sound:



And here is the residual audio file (the sound clip that is left over by the approximation by sinusoids:



This was an interesting study on how we can approximate sounds using sinusoids!

Turing Machine
Together with two CCSC high school students and Ben Nahill, Jack Lepird, and Chad Spensky from MIT Lincoln Laboratory, I created a Turing machine.

A Turing machine is an abstraction of a computer: it reads, writes, and erases ones and zeros.

Here is a demo of the Turing Machine:


A full write-up of what we did can be found here: https://www.ll.mit.edu/news/StudentsBuildReplicaOfTuringMachine.html

Code for the Turing machine can be found here: https://github.com/timothydgreer/turing_machine


Movie Recommender System
I wanted to get a flavor for how Netflix creates its recommendations for users, so I created a recommender system that predicts which movies a user may like based on how that user's favorite (and least-favorite) movies. Using a database of 1692 movies and 943 user recommendations (from IMDB), I rated movies that I enjoyed and used my algorithm to predict which unwatched movies I might enjoy. Here were 5 movies that I used as part of my input:
Rated 5 for Toy Story (1995)
Rated 4 for GoldenEye (1995)
Rated 4 for Seven (Se7en) (1995)
Rated 1 for Spellbound (1945)
Rated 3 for Rosencrantz and Guildenstern Are Dead (1990)

Here was the output (I'm only including the top 5 recommendations):
Predicting rating 8.7 for movie Shawshank Redemption, The (1994)
Predicting rating 8.6 for movie Good Will Hunting (1997)
Predicting rating 8.5 for movie Usual Suspects, The (1995)
Predicting rating 8.5 for movie Schindler's List (1993)
Predicting rating 8.4 for movie Star Wars (1977)

I can say that although I might be a biased user, I enjoyed my recommendations!

If you want to get some recommendations yourself, feel free to check out my project on my GitHub: https://github.com/timothydgreer/machine_learning/tree/master/HW8

For help, see the the readme of the Github.

Digit Recognition
Would you call this a 0 or a 6?

My neural network, made for optimal character recognition, classifies this number as a 6.

Try the algorithm for yourself! See my Github: https://github.com/timothydgreer/machine_learning/tree/master/HW3/ and run ex3_nn to see this algorithm in action.

Performances
I've been blessed to play with some incredible musicians on saxophone and piano. Below are some highlights from my senior recital at Washington University in St. Louis. I'm being backed by some pretty talented musicians in the St. Louis area. We played "Star Eyes," "These Foolish Things," and "Just Friends." I'm playing the tenor saxophone here, trying to emulate the styles of Oliver Nelson (a fellow Wash U alumnus!) and Dexter Gordon

I also had the honor of playing in Petra and the Priorities at Washington University in St. Louis. We opened for the Gym Class Heroes, Fitz and the Tantrums, and the Dum Dum Girls. Here's a sample from this band, which was steeped in funk, soul, and Motown. I'm playing sax here on a song penned by the band: