Home

Paper accepted at Nature "Scientific Reports"

After almost a year of revisions and rewrites, my paper "Creating musical features using multi-faceted, multi-task encoders based on transformers", was accepted by Nature's "Scientific Reports". For this, work I investigated interpretable ways of describing music so that we can better classify songs by genre, identify what kind of mood a certain piece is in, and more. My hope is that this work will encourage others to study music from both a computational and*musical perspective, because I think it can better illuminate how we consume and enjoy music.

Secured Position of Applied Scientist at Amazon Music

I'm pleased to announce I'll be joining Amazon Music's Personalization Team in the Fall of 2022, where I'll be applying my knowledge of recommendation systems and music information retrieval to help bring great songs and artists to customer ears!

Successfully Submitted Dissertation

My PhD dissertation, "Creating Cross-Modal, Context-Aware Representations of Music for Downstream Tasks," was submitted in August 2022! The DOI is here: https://doi.org/10.25549/usctheses-oUC111376000. Currently preparing two submissions to journals and preparing for future opportunities.

Successfully Defended Thesis

I have successfully defended my thesis, "Creating Cross-Modal, Context-Aware Representations of Music for Downstream Tasks," in May 2022! My committee members were Dr. Shri Narayanan, Dr. David Kempe, Dr. Mohammad Soleymani, Dr. Jonas Kaplan, and Dr. Morteza Dehghani. Dr. Assal Habibi gave valuable input as well. Next up: editing and submitting my written dissertation!

Preparing for Thesis Defense

I am now preparing to defend my thesis in the early summer 2022! I am on the hunt for post-graduation opportunities in industry. Please contact me if you think you might know a good fit for my skillset!

Late-Breaking Demo Accepted at ISMIR

My late-breaking demo paper titled, "Harmonize This Melody: Automatic Four-Part Harmony Generation Using Neo-Riemannian Voice-Leading," was accepted at the International Society of Music Information Retrieval 2021. The paper can be found here, and code can be found at my GitHub: https://github.com/timothydgreer/4_part_harmonizer

Passed Thesis Proposal

My thesis proposal, titled, "Creating Cross-Modal, Context-Aware Representations of Music for Downstream Tasks," was presented to a committee and was accepted. Upon completion of this PhD requirement, I can now continue to work on my dissertation and defend by Spring 2022!

Paper Accepted at Content-Based Multimedia Indexing (CBMI)

My music research team at USC just learned that our paper, "Loss Function Approaches for Multi-label Music Tagging," was accepted for publication in CBMI. This work shows that it is possible to use deep learning models to automatically label and classify music by genre. The conference is be held virtually in June 2021, after which the paper will be published.

Paper Accepted at Prestigious PLOS One Journal

I contributed as a first author to the paper titled, "A computational lens into how music characterizes genre in film," which was published in PLOS One on 04/08/2021. In this general-interest paper, we show how it is possible to predict the genre of a film using the music that is used in its soundtrack. You can read the full paper here.

Music Tagging System Wins MediaEval 2020

My music research team at the University of Southern California just won an international music tagging challenge called MediaEval, where we classified the musical genre and mood of 55,000 songs better than competitors from all over the world. It is a great honor to have our work be recognized! The code that we used to train and run our best performing model is here.

Nominated for Best Production Music Artist at 2020 Mark Awards

The group that I play in, Saticöy, was nominated for Best Production Music Artist at the 2020 Mark Awards, hosted by the Production Music Association. While we did not win the award, it was an honor to be nominated and showcase our work, which can be found here and here.

Presented at ICASSP

I presented my paper, "The Role of Annotation Fusion Methods in the Study of Human-Reported Emotion Experience During Music Listening," at ICASSP, which was hosted online. Here is the link to my talk: https://2020.ieeeicassp-virtual.org/presentation/poster/role-annotation-fusion-methods-study-human-reported-emotion-experience-during

Awarded Best Presentation at USC’s 12th Annual Graduate Research Symposium

My presentation "A Multimodal View into Music's Effect on Human Neural, Physiological, and Emotional Experience" was given "Best Presentation" honors at USC’s 12th Annual Graduate Research Symposium. Additionally, this same presentation was given "Best Poster Session" honors at USC's Computer Science PhD Visit Day Poster Session.

Work Accepted at ICASSP

"The Role of Annotation Fusion Methods in the Study of Human-Reported Emotion Experience During Music Listening" was just accepted for poster presentation at ICASSP in Barcelona!

ACM Multimedia Work in the Press

"A Multimodal View into Music's Effect on Human Neural, Physiological, and Emotional Experience," my paper that was presented at ACM Multimedia, has been featured in the press! See this link for the original story: https://viterbischool.usc.edu/news/2019/10/why-music-makes-us-feel-according-to-ai/

A Vox-style video was also created that presented the findings of my work. See below!

Presented at ACM Multimedia

I was fortunate enough to present my work, "A Multimodal View into Music's Effect on Human Neural, Physiological, and Emotional Experience," at The Acropolis in Nice, France. I received very helpful feedback and comments about the poster, which will serve me well in future research!

Now a member of Indie-Pop Group Saticöy

I'm now playing saxophone and keyboards in an Indie-Pop group called Saticöy! We collaborated on a Capitol Records release called "Soundtrack of Dreams" (listen to my sax playing here), and now we are songwriting and playing live shows. Recently, we were lucky to perform at the Troubadour and Capitol Records Studio A! More music and shows to come!

My work (presented at SMM '19) in the Press

"Using Shared Representations of Words and Chords in Music for Genre Classification" has been featured in several renowned news sources. It is an extreme honor to be recognized by these sources! See below for links!

Aspen Institute's 5 Best Ideas: https://www.aspeninstitute.org/ideas/2019/08/15/

MEL Magazine: https://melmagazine.com/en-us/story/its-science-now-old-town-road-is-pop

Scienmag: https://scienmag.com/ai-tool-characterizes-a-songs-genre-provides-insights-regarding-perception-music/

Silicon Republic: https://www.siliconrepublic.com/machines/old-town-road-lil-nas-x-ai-genre-tool

Science Blog: https://scienceblog.com/509483/ai-tool-characterizes-a-songs-genre-provides-insights-regarding-perception-music/

Science Daily: https://www.sciencedaily.com/releases/2019/08/190812160536.htm

Azo Robotics: https://www.azorobotics.com/News.aspx?newsID=10775

Govtech: https://www.govtech.com/question-of-the-day/Question-of-the-Day-for-08212019.html

Work accepted at Speech, Music, and Mind 2019 (SMM '19)

My recent work, "Using Shared Representations of Words and Chords in Music for Genre Classification," has been accepted at SMM '19! Below this paragraph is a visualization of hit songs that you know and love, according to how these songs' chords and lyrics align. T-SNE was used to determine distances in this unitless, 2D space. Please be patient as this loads! Zoom by dragging, and resize to the original size by double-clicking!

Paper Accepted at ACM Multimedia!

My paper on affective computing during music listening was accepted at ACM Multimedia! I'm tremendously honored to have this paper accepted, and I look forward to giving a lecture presentation in Nice in October!

Work accepted at International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

My recent work, "Learning Shared Vector Representations of Lyrics and Chords in Music" has been accepted for lecture presentation at ICASSP 2019! The paper can be found here: https://ieeexplore.ieee.org/abstract/document/8683735/

Presentation on Music Processing Given at Annenberg Symposium

My work was featured in USC's Annenberg Research Symposium in April. This work is similar to that listed below ("Examining the Interplay Between Chords and Lyrics in Music"). Here is a picture:

Beatboxing Work Featured in New York Times! (and other news sources)

The work we have done on beatboxing has attracted a lot of press! Very humbling and exciting to have so many people interested in this work!

New York Times: https://www.nytimes.com/2018/11/07/science/beatboxing-mri-scanner.html

Live Science: https://www.livescience.com/64032-beatboxers-mri-scan.html

Science Daily: https://www.sciencedaily.com/releases/2018/11/181107172901.htm

Smithsonian: https://www.smithsonianmag.com/smart-news/take-look-beatboxing-inside-180970758/

More to come!

(11/08/2018)

Presenting Work on Beatboxing

I work with a group of researchers at USC to learn more about how beatboxers create sounds. I'm presenting our findings at the Acoustical Society of America Meeting in Victoria, Canada.

If you're interested in learning more about this topic, please refer to this website: https://sail.usc.edu/span/beatboxingproject/ (11/01/2018)

Examining the Interplay Between Lyrics and Chords in Music

I developed a script that collects data from Ukutabs.com. Ukutabs has chords and lyrics of pop songs in parallel.

After collecting this data, we can embed these modalities, and use these representations of chords, lyrics, or chords and lyrics together to perform tasks. A paper is submitted that talks more about this work.

Here is a visualization of the embedding space of lyrics and chords together, after converting all songs to the key of C:

You can see that chords that are enharmonic to the key are close in space (C, G, F, Am, Dm) and chords that are not enharmonic are also close in space (Ebm, Bbm, Db). Also, dominant chords are clustered together. This indicates that lyrics and chords are used in tandem to create music. When analyzing songs, it is important to capture the interplay between lyrics and chords.

Code can be found at: https://github.com/timothydgreer/chord2vec

(04/28/2018)

Detecting When Laugh Tracks are Being Used in Sitcoms

I developed a method of detecting laugh tracks in sitcoms.

This method uses MFCCs and the phonological symmetry of group laughter to find when a laugh track is happening in Friends.

Using this method, we may be able to identify when someone is being humorous in media.

If you're interested in learning more, check out this github: https://github.com/timothydgreer/humor_detection (04/10/2018)

Featured in Music Video

I was recently featured on a song by Youtube sensation Michael Constantino (03/05/2018): https://www.youtube.com/watch?v=1rr9WWs6XU4

Check it out!

Track Separation of Multiphonic Music

I created and implemented an algorithm that will split a chord into its individual notes.

For our demo, we decomposed a E7 sharp 9 chord played on a guitar. An E7 sharp 9 chord consists of 5 notes: E, G#, B, D, and G (not necessarily in that order). Using our rudimentary filter that searches for peaks in amplitudes in the spectrogram, we found that we could split this guitar chord into its comprising 5 notes. Below, you will find the spectrogram of the original chord. Notice the amplitudes of the harmonics, especially right after the chord is strummed.

Here's a spectrogram of an E7#9 chord on guitar.

The audio file is here: http://wuseparateways.weebly.com/uploads/2/1/9/1/21910058/e7sharp9.wav

Below, we have the decomposed sound files of the 5 notes that comprise the E7 sharp 9 chord: E, G#, B, D, and G. The clearest note seems to be the G sharp and the most muddled note is the G. Because of the proximity of G and G sharp notes, the filter showed the worst performance with the G note: the G and G sharp note will both be heard in the last wav file below.

Regardless, the files below demonstrate that our chord has been completely decomposed using our filter. By combining the 5 wav files below, an accurate manifestation of the original E7 sharp 9 chord would be produced.

Note: The sound files below are very quiet; you may need to increase the volume to hear the files.

http://wuseparateways.weebly.com/uploads/2/1/9/1/21910058/e7sharp9e.wav

http://wuseparateways.weebly.com/uploads/2/1/9/1/21910058/e7sharp9gsharp.wav

http://wuseparateways.weebly.com/uploads/2/1/9/1/21910058/e7sharp9b.wav

http://wuseparateways.weebly.com/uploads/2/1/9/1/21910058/e7sharp9d.wav

http://wuseparateways.weebly.com/uploads/2/1/9/1/21910058/e7sharp9g.wav

Below is the combined sound files from above. You can hear that this combination is similar to the original sound file, but sounds a little distorted. This is because we have combined only certain strips of frequencies, not the whole spectrum. In other words, there is much less information in this new sound file than in the original.

Nevertheless, the demo is shown to work!

http://wuseparateways.weebly.com/uploads/2/1/9/1/21910058/e7sharp9reconstructed.wav

This work resulted in a publication with recommended citation: Greer, Tim and Remba, Joshua, "Track Separation in Multiphonic Music" (2014). Washington University Undergraduate Research Digest, Volume 9, Issue 2

The abstract can be found here: http://openscholarship.wustl.edu/cgi/viewcontent.cgi?article=1019&context=vol9_iss2

The code and full demo of this work is found at http://wuseparateways.weebly.com

Synthesizing Speech

Can we synthesize speech? Here is an excerpt from a woman saying, "She had your dark greasy suit in greasy wash water all year":

Using the power, f0 values, and formants, I tried to synthesize the voice of this woman. I used an impulse train exciter for the voiced speech and white noise for unvoiced speech. Here was the result:

Only certain parts of the speech are recognizable. This suggests that a more accurate excitation model would be better for synthesizing this passage. Overall, I was impressed with how well this very rudimentary model performed!

The code and full demo of this work is found at https://github.com/timothydgreer/speech/blob/master/potpourri/synthesized.m

Pitch Detection Using Cepstrum

For this little project, I used cepstrum to determine the pitch of a speaker. Here, a man is saying "ahhh":

Using cepstral analysis, I was able to "cut through" the formants and get the fundamental frequency of this: 120 Hz. This is found by taking the peak of the cepstral graph and dividing the sampling rate by this index. In this case, we found the peak at index 83, and 10000/83 ~= 120 Hz

Full code can be found here: https://github.com/timothydgreer/speech/blob/master/cepstral_analysis/find_freq_cepstrum.m

Robotic Vowels

We know that vowels tend to have certain formants. I tried to simulate the speech of two vowels: 'a' and 'e'.

I convolved two decaying sines to represent the formants of these vowels and an exciter that was represented as an impulse train.

Here is the output I get from a "male" (200 Hz) speaker saying "e":

The code and full demo of this work is found at https://github.com/timothydgreer/speech/blob/master/potpourri/produce_vowels.m Feel free to play around with the code!

Visualizing Sounds

Here is a sound file of a man saying, "Six nine eight nine six four two": https://github.com/timothydgreer/speech/blob/master/show_spectrograms/specs.wav

Here are two spectrograms of the sound, computed from MATLAB:

The first (pictured above) is called a narrowband spectrogram. Notice the quantized vertical lines? Those are harmonics.

Narrowband spectrograms are generally used to find the fundamental frequency.

This is called a wideband spectrogram. See how there aren't vertical lines here?

Wideband spectrograms are generally used to find the intensity of a signal. The center of each band of energy is generally taken to be the formant frequency.

The code and full demo of this work is found at https://github.com/timothydgreer/speech/blob/master/show_spectrograms/show_spectrograms.m

Analyzing Sounds using Sinusoidal Model

One way of analyzing sand reconstructing audio signals is to approximate these signals using a sinusoidal model. These sinusoidal models are useful in resynthesizing audio signals, ignoring the "non-sinusoidal" components of sound.

I experimented with the sinusoidal model. Here is a short saxophone phrase:

http://www.freesound.org/people/xserra/sounds/204182/

I ran my algorithm on the last 7 seconds of this audio file. Here is the sinusoidal approximation to the sound:

And here is the residual audio file (the sound clip that is left over by the approximation by sinusoids:

This was an interesting study on how we can approximate sounds using sinusoids!

Turing Machine

Together with two CCSC high school students and Ben Nahill, Jack Lepird, and Chad Spensky from MIT Lincoln Laboratory, I created a Turing machine.

A Turing machine is an abstraction of a computer: it reads, writes, and erases ones and zeros.

Here is a demo of the Turing Machine:

A full write-up of what we did can be found here: https://www.ll.mit.edu/news/StudentsBuildReplicaOfTuringMachine.html

Code for the Turing machine can be found here: https://github.com/timothydgreer/turing_machine

Movie Recommender System

I wanted to get a flavor for how Netflix creates its recommendations for users, so I created a recommender system that predicts which movies a user may like based on how that user's favorite (and least-favorite) movies. Using a database of 1692 movies and 943 user recommendations (from IMDB), I rated movies that I enjoyed and used my algorithm to predict which unwatched movies I might enjoy. Here were 5 movies that I used as part of my input:

Rated 5 for Toy Story (1995)

Rated 4 for GoldenEye (1995)

Rated 4 for Seven (Se7en) (1995)

Rated 1 for Spellbound (1945)

Rated 3 for Rosencrantz and Guildenstern Are Dead (1990)

Here was the output (I'm only including the top 5 recommendations):

Predicting rating 8.7 for movie Shawshank Redemption, The (1994)

Predicting rating 8.6 for movie Good Will Hunting (1997)

Predicting rating 8.5 for movie Usual Suspects, The (1995)

Predicting rating 8.5 for movie Schindler's List (1993)

Predicting rating 8.4 for movie Star Wars (1977)

I can say that although I might be a biased user, I enjoyed my recommendations!

If you want to get some recommendations yourself, feel free to check out my project on my GitHub: https://github.com/timothydgreer/machine_learning/tree/master/HW8

For help, see the the readme of the Github.

Digit Recognition

Would you call this a 0 or a 6?

My neural network, made for optimal character recognition, classifies this number as a 6.

Try the algorithm for yourself! See my Github: https://github.com/timothydgreer/machine_learning/tree/master/HW3/ and run ex3_nn to see this algorithm in action.

Performances

I've been blessed to play with some incredible musicians on saxophone and piano. Below are some highlights from my senior recital at Washington University in St. Louis. I'm being backed by some pretty talented musicians in the St. Louis area. We played "Star Eyes," "These Foolish Things," and "Just Friends." I'm playing the tenor saxophone here, trying to emulate the styles of Oliver Nelson (a fellow Wash U alumnus!) and Dexter Gordon

https://soundcloud.com/user-53962697/senior-concert-highlights

I had the honor of playing in Petra and the Priorities at Washington University in St. Louis. We opened for the Gym Class Heroes, Fitz and the Tantrums, and the Dum Dum Girls. Here's a sample from this band, which was steeped in funk, soul, and Motown. I'm playing sax here on a song penned by the band:

https://soundcloud.com/petraandthepriorities/original (Link no longer works)