What features of music influence our perception of the pulse?
Abstract
In this experiment, participants were asked to annotate excerpts from six pieces of western popular and rock music, first only listening to the isolated drums and then to the full recordings. The differences between annotations were analysed to find relationships between the instrumentation and the perception of the pulse.
Introduction
When listening to music, people often refer to features such as the ‘groove’, ‘swing’, the drummer being ‘ahead’ or ‘behind’ the beat or the perception of ‘pushing’ and ‘pulling’. The use of these expressions suggests that the perception of the timing of the pulse might be independent of the exact timing of onsets of any single instrument. Instead, it is possible that listeners extrapolate the pulse from the relative timing of onsets of multiple instruments, which could have some implications.
For example, it might be an important consideration during the implementation and evaluation of beat tracking algorithms, especially if they are designed to imitate human tapping or if they are being evaluated against hand-annotated datasets. Furthermore, gaining a deeper understanding of how the rhythmical relationships between instruments affect the perception of the pulse could be used to inform decisions during the production of music.
Method
To measure the relationship between instrumentation and the perception of the pulse, a dataset was created containing 40-second excerpts of six pieces of western popular and rock music. The dataset included two versions of each of these recordings; the first version with only the isolated drums and the second version with all the instruments and vocals as they appear on the released recording. The first three songs chosen for the dataset were recent and most likely recorded to a click track. The other three songs were recorded earlier, before using a click track during recording was as common as it is today; these were most likely not recorded to a click track. Six solo click track recordings were also included at the average tempo detected for each of these recordings.
Isolated drum recordings were chosen for two reasons. First, drums play a vital role in denoting the pulse and rhythm in western popular music. Second, the onsets of drums are easily detected, which was important to obtain the ground truth for the evaluation of the data collected.
The ground truth dataset was generated using BeatRoot to automatically annotate the isolated drum recordings. BeatRoot uses spectral flux to achieve an accurate onset detection. Correct detection of tempo octave was not essential for this experiment, as the method for evaluation the datasets was designed to account for differences in the tempo octave perception between the ground truth and participant annotations. Furthermore, the absolute position of the detected onsets would not affect the results as the evaluation focuses on the differences as opposed to trying to obtain absolute values.
Five participants took part in this experiment. They were asked to produce annotations for each of the excerpts in the data set using the Sonic Visualiser. The participants were four male and one female between 25-35 years old.
Analysis
For the analysis, each time instant inside the participant annotation has been matched to the closest time instant in the ground truth annotation. If the time difference between the time instants was larger than half of the mean average of the IOI (inter-onset-intervals) of the ground truth annotations minus the standard deviation of the IOIs of the ground truth dataset, the time instant was excluded from the analysis. Only the closest match was selected for cases where multiple participant annotations were found to be close to the same ground truth annotation. For each match found, the difference in timing of the ground truth and the participant annotation has been measured as the amount of delay of the participant annotation against the ground truth annotation.
The first step of the analysis was to measure the median standard deviation of all the six excerpts for each participant and for each category: click, beat and track. The worst performance of each participant was excluded from their statistic. The following table shows the different standard deviations for each category measured in milliseconds:
Table 1:
A pattern can be observed for many participants of the accuracy improving for the isolated drum recordings. This could be the first indication that the presence of other instruments in the recording has an effect on the timing of participants’ tapping. Even though the measurements were taken against ground truth which was obtained from isolated drum annotations, the increase in value still suggests that the participants based their perception of the pulse on other elements in the recording when these were present. Therefore, it might not necessarily mean decreased accuracy.
This is probably not the case for the first category, the ‘Click’, which very often has the worst score. As this category does not present any ambiguity in interpretation, the low accuracy of the annotation for most participants could be due to lack of music training or lack of experience of playing with a click track.
The next stage of the analysis was to compare the scores for each music excerpt separately and look for trends and patterns.
Excerpt 1
Table 2:
This table shows statistics of the measured delay (in milliseconds) in tapping compared to the ground truth annotations for each participant for the isolated drums (‘Beat’) and full recording (‘Track’) for excerpt 1. The last part of the table shows the change in each of the values obtained by subtracting each ‘Beat’ value from the corresponding ‘Track’ value. The ‘Average Difference’ row shows the mean average for each of the differences, signifying if there is a strong trend between participants to shift the timing of their tapping in a single direction. The last row shows the mean average of the absolute values of the differences. This value is useful to distinguish the average amount of shift in either direction.
For this particular excerpt, we can see a clear trend that when presented with the full recording, all the participants responded by tapping earlier, resulting in negative values in the last table, signifying a decrease in the measured delay from ground truth. Could this mean that the drummer is playing ‘behind’ the beat, e.g. the drum onsets occur later compared to the onsets of other instruments? Given that the delay values have been measured against the detected onsets of the drums, the decrease in the delay signifies that the participants were tapping closer to the onsets o the drum when listening to the full recording. The question is whether the participants were building an expectation of the next drum hit, or whether they were merely following the drum recording. In the latter case, the presence of instruments with earlier onsets and the drummer playing ‘behind the beat’ would explain this trend.
However, this might not be the only explanation. For example, it might be easier to predict the next beat when listening to a full band. Another reason could be that the presence of other instruments creates a feeling of increased ‘energy’ resulting from other features than the relative timing of onsets. These features could be timbre (more bass), perceived loudness or a sense of movement produced by compression and similar.
The plots below show the measured delay for both the ‘Beat’ and ‘Track’ annotations for each participant shown against the ground truth annotations.
Figure 1:
Looking at these plots provides better insights as they present some interesting patterns. The plot for participant 1 shows that after the first five seconds (perhaps the time that it took to adjust to the tempo and rhythmical pattern), the patterns for each version of the excerpt look almost identical with only the ‘Track’ annotation shifted lower (decreased delay) and later in time (relative to the x-axis). The peak between 15-20 seconds is similar to the peak found in the ‘Beat’ annotation of participant 5 and the trough in both ‘Beat’ and ‘Track’ annotations of participant 4. These patterns can be explained as a reaction to a short break in the drum beat that occurs at this time.
Another interesting observation is that the plots of the ‘Beat’ vs. the ‘Track’ annotations look like mirrored versions of each other in annotations produced by participants 2, 4 and 5. These could possibly be signs of ‘pushing’ and ‘pulling’ interactions between the musicians. The plot of participant 5 shows oscillating patterns; this could mean that this participant was paying more attention to shorter structural patterns within the music.
Excerpt 2
Table 3:
Figure 2:
This table shows another significant decrease in the delay for the full recording for all of the participants. There are yet again common patterns and features that these plots share. For example, many of the curves show either a sharp change, peak or a trough between 15-20 seconds; a reaction to a structural change in the excerpt reflected both in the drum beat and the other elements including the melody. There are characteristic patterns present in both the ‘Beat’ and ‘Track’ annotation for each participant. Sometimes, a peak in one graph can manifest as a trough in another one.
Excerpt 3
Table 4:
For the third excerpt, the table shows that the difference between the average and median timing of ‘Beat’ and ‘Track’ annotations is not as large as for the previous two excerpts. Does this mean that participants did not react to the changes in instrumentation as in the previous two excerpts?
It is worth noting that this excerpt has a much higher tempo compared to the previous two excerpts. The table below shows tempo that was measured for each of the music excerpts.
Table 5:
The excerpt being at a higher tempo means that there is less time for the tapping to be late or early. It might be useful to express this value as a percentage instead to see if the amount of relative change that has occurred.
Table 6:
We can see that even with the difference expressed in percents relative to the IOI calculated for the given bpm, the change occurring in the thirds excerpt is much lower.
Figure 3:
By examining the plots, we can see that patterns that could be related to the musical structure are common. Patterns that could be explained by a change in the perceived timing of pulse also appear. For example, participant 1 shows almost identical looking plot for both ‘Beat’ and ‘Track’ annotation with the ‘Track’ annotation showing a slight increase in delay. Referring to Table 1, we can see that this participant has the lowest measured standard deviation for the 'Click' category out of all participants. This could suggest musical training and therefore increased accuracy at high tempi.
Other two patterns worth noting are those produced by participants 2 and 3. These show similar traits such as the contours of the curves being similar, starting low, reaching a peak at around 15-20 seconds and then decreasing again. For both of the participants, the ‘Track’ delay is constantly lower compared to the ‘Beat’ delay.
Excerpts 4-6
Table 7:
Figure 4:
Table 8:
Figure 5:
Table 9:
Figure 6:
The analysis of excerpts 4-6 shows that there are no notable differences or patterns that would suggest that systematic shift in the tapping delay associated with a difference in the perception of the pulse timing has occurred.
Results
The analysis of the first two excerpts of music has shown patterns that would be expected if there was a shift in the perception of the pulse timing; this presents as a constant decrease in the measured delay of tapping that is present for the entire length of the annotation. This change is also represented in the statistic measurements as a noticeable difference between the average tapping delay in the ‘Beat’ and ‘Track’ annotations.
The third excerpt has shown only a slight difference in the tapping delay. However, three of the participants' annotations show that tapping delays were consistently lower for either the 'Beat' or the 'Track' version of the excerpt, depending on the participant.
Momentary peaks and troughs are often present in both ‘Beat’ and ‘Track’ annotations for all the excerpts and are similar between different participants. However, these are more indicative of structural changes.
There were no signs of a shift in the perception of pulse timing in the last three excerpts. Factors that could be responsible for this are for example that all of the excerpts start with a drum intro, with the rest of the band joining in after a few bars. Some of the participants have shown a reaction to this change for some of the excerpts. However, it presented itself as a short time change as opposed to a steady shift in the tapping delay. It is possible that once the pulse timing has been established by the drummer, it is 'resistant' to change. This would have to be tested.
Another factor was that the last three excerpts were most likely not recorded to a click track. However, the expectation in such cases would be to observe the opposite trend - a greater difference between the delay measured for the isolated drum track and the complete recording resulting from the musicians using timing more freely as an expressive tool.
Regarding implications on the design and evaluation of beat tracking algorithms, we can see from the data that the range of the minimum and maximum delay with respect to the ground truth annotations can be higher than 100 ms in some cases. It is common to use windows of 50-70 ms for beat tracking evaluation. Some evaluation methods such as Cemgil penalise the accuracy of an estimated beat location based on its distance from the closest ground truth annotation. If a hand-annotated data set is used for the evaluation, it is possible that many of the hand-annotations would be missed due to the window length or far away from prominent onsets. This would result in a lower measured performance of beat tracking algorithms.
Although this experiment has found some evidence that the human perception of pulse timing can shift depending on the instrumentation, the hypothesised reasons for what exactly caused these shifts are only speculations. It is not clear why these shifts are clearly present in the first two excerpts and not in the last three. It seems that the presence of a click track during the recording could have an effect on this. As of now, the results are inconclusive. In next experiments, this could be overcome by using a larger database and finding more participants to produce the annotations. A thorough analysis of the music included in the database could also provide clarification to some of the question posed by this experiment.
Limitations
This experiment involved only a small number of participants (five) and a small data set (six songs). Therefore the results are not representative and can not be generalised.
There are a few possible sources of error, mainly regarding the conditions in which the participants performed the annotations. These were not controlled, and it is possible that the choice of equipment and environment could have an effect on the results.
The annotations were performed by tapping a key on a computer keyboard. It can be difficult to perform timing annotations precisely using a computer keyboard, especially if the participants did not have previous experience using a computer keyboard as an input device for musical information.
References
Prögler, J.A., 1995. Searching for swing: Participatory discrepancies in the jazz rhythm section. Ethnomusicology, 39(1), pp.21-54.
Keil, C., 1987. Participatory discrepancies and the power of music. Cultural Anthropology, 2(3), pp.275-283.
McKinney, M.F. and Moelants, D., 2006. Ambiguity in tempo perception: What draws listeners to different metrical levels?. Music Perception: An Interdisciplinary Journal, 24(2), pp.155-166.
Davies, M.E., Degara, N. and Plumbley, M.D., 2009. Evaluation methods for musical audio beat tracking algorithms. Queen Mary University of London, Centre for Digital Music, Tech. Rep. C4DM-TR-09-06.
Dixon, S., 2007. Evaluation of the audio beat tracking system Beatroot. Journal of New Music Research, 36(1), pp.39-50.
Excerpt 1
Bellamy, M. (2006). Supermassive Black Hole. [Black Holes and Revelations].
Excerpt 2
Reznor, T. (2005). The Hand That Feeds. [With Teeth].
Excerpt 3
Grohl, D. (1997). Monkey Wrench. [The Colour and the Shape].
Excerpt 4
Paich, D. (1982). Rosanna. [Toto IV].
Excerpt 5
Sting. (1978). Can’t Stand Losing You. [Outlandos d’Amour].
Excerpt 6
Wonder, S. (1972). Superstition. [Talking Book].