Rhythmic Motif Detection in Hip-Hop
By Robert Blaauboer
Differences in Flow
An emcee, as the vocalists in Hip-Hop are called, has his/her own flow. Flow, a term which was defined by Krims (2000) as 'an MC's rhythmic delivery' is related to the kind of internal rhythm an emcee has. It was further defined by Adams (2009) as 'the rap equivalent of technique for instrumentalist'. Examples of these techniques are metrical techniques such as the placement of rhyming syllables or the placement of accented syllables. They can also be articulative techniques such as the amount of legato used or the degree of articulation of consonants. A clear example of differences in flow can be found in old school Hip-Hop ( Kurtis Blow - Basketball ) as opposed to modern Hip-Hop (Aesop Rock - None Shall Pass ).
Recent Work
Hip-Hop has been relatively void of research until recently. Work by Mitch Ohriner (2015) compares the approach taken by three emcees on a single track with an ambiguous rhythm. Concurrently a new digital corpus of rap transcriptions called MCFlow has been developed by Nathaniel Condit-Schultz (2016). Within this recent work the different layers of an emcee's flow are set out (see figure below). Through these four layers: Rhyme, Pitch Stress, Syllable Stress and the Surface layer boundaries can be created using rhythmic breaks and pitch contours creating nestable prosodic units.
Motivation
In musicological research regarding Hip-Hop transcriptions of songs are a valuable asset. By developing a way to detect rhythmic motifs within songs we could possibly analyse the entire corpus of a single emcee. In this way the rhythmic tendencies of an emcee could be determined and further analysis could be done on similarities between emcees.
Data
In order to detect rhythmic motifs based on syllable timings an annotation of syllable timings was needed. These syllable timings correspond to the Surface layer described above. A single verse was annotated in two songs by Aesop Rock ('None Shall Pass' and '39 Thieves'). His relatively monotonous rap causes reliance on rhyme and syllable stress instead of pitch stress. This should mean that a rhythmic motif detection on his syllable timings should provide good results. Across the two songs a total of 465 syllables were annotated.
Method
In order to detect rhythmic motifs we have to look for syllable groups with similar relative timings. In order to find similarly timed groups we use Dynamic Time Warping. This algorithm takes two same length sequences and calculates the amount it has to adjust each point for the sequences to line up. This distances can be interpreted as an error of sorts. The segments that will be compared will be between 3 and 10 syllables. Syllable groups of length 2 will correspond too strongly with the beat and more syllables than 10 might contain two motifs.
If we compute the distance between all sequences of syllables for the same length we get the following plot. Here the total error between a syllable group and all other syllable groups is plotted.
The peak in the middle corresponds with a pause of multiple seconds. Since such a gap in a syllable group does not correlate with any other group a very large error occurs here. By zooming in on a section of the graph we gain a little more insight.
For each length length we can see certain local minima which should indicate syllable groups that have other syllable groups which are similar. Selecting one of those local minima we can plot the individual error between this group and every other group. By selecting the local minima below a certain threshold we hopefully find the position of syllable groups with similar relative timings.
Plotting the relative timings of the selected syllable groups will indicate whether these are similar. A similar curve in the figure below indicates that these 3-syllable segments have similar timings and could be considered a motif ( although a 3-syllable motif is very short and might be more indicative of another feature of the emcee's flow).
In order to find all motifs of a certain length we can take all the local minima of the 'total error' plot and find all the groups that correspond with those minima. Certain segments will inevitably occur multiple times so we take all unique syllable groups. Another improvement can be made if we consider that two syllable groups at the same position are likely part of the same motif and that the shorter segment is likely a part of the longer segment. If we look at the plot below we can see that these are likely to be two rhythmic motifs of 5 syllables since the curves diverge along two paths.
Possible Improvements
Currently the error between segments is an average error for each syllable. In order to more accurately detect rhythmic motifs a maximum error would be more useful since currently two sequences which completely coincide except for a single large deviation will have a relatively low error while rhythmically being very different.
The current annotation of syllables is based on the surface layer containing every single syllable. A second annotation layer indicating which syllables are stressed could also increase accuracy since these stresses are important to rhythmic motifs and are very likely to coincide.
References
Krims, Adam. 2000. Rap Music and the Poetics of Identity. Cambridge: Cambridge University Press.
Adams, K. (2009). On the Metrical Techniques of Flow in Rap Music. Music Theory Online
Ohriner, Mitchell (2015). Metric Ambiguity and Flow in Rap Music: A Corpus-Assisted Study of Outkast’s “Mainstream” (1996)
Condit-Schultz, Nathaniel (2016). A Digital Corpus of Rap Transcriptions