DrumNet: High-Level Control of Drum Track Generation Using Learned Patterns of Rhythmic Interaction

Sony Computer Science Laboratories (Sony CSL), Paris, France

Sony CSL Paris develops technology for AI-assisted music production. The goal is not to replace musicians, but to provide them with better tools to be more efficient in realizing their creative ideas. DrumNet is based on an artificial neural network which learns rhythmic relationships between different instruments and encodes these relationships in a 16-dimensional style space. A similar example is the Logic Pro X Drummer, allowing the user to specify the playing style by navigating a two-dimensional space. The difference of DrumNet to the Logic Pro X Drummer, however, is that it dynamically adapts to the existing music. In its current form, DrumNet can either autonomously generate kick drum tracks (following the statistics of the training data), be controlled by manually navigating the style space, or be used to extract a style from an existing piece.

As opposed to many other generative music technologies, we aim to directly use existing audio tracks as input to which we generate the kick drum track as audio output. Using audio input directly makes DrumNet more useful for music production than models based on MIDI input. We show the generality of the model, by providing many examples of full songs with different generated kick drum tracks on this website. For a proof-of-concept, we trained the model only on kick drum rhythms, but we are currently extending the model to generate a whole drum set.

This page contains supplementary material to the paper

"High-Level Control of Drum Track Generation Using Learned Patterns of Rhythmic Interaction", Stefan Lattner and Maarten Grachten, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2019), New Paltz, New York, U.S.A., October 20-23. [pdf]

Acknowledgements

We thank Adonis Storr, Tegan Koster, Stefan Weißenberger, Clemens Riedl and Karin Hageneder for their contributions in producing the example tracks.

Please use headphones or good speakers to listen to the examples, as kick drums are not audible otherwise.

The video below shows a demo of the DrumNet prototype, which uses models trained on different percussion types (while in the paper only the kick drum model is presented).

Originals

Drehscheibe

Orgs Waltz


Gipsy Love

Miss You

Samples

Each of the following audio files contains a different, generated kick drum track. The style of a kick drum track is determined by a 16-dimensional vector sampled from independent multivariate Gaussians. This style vector defines the relationship between the kick onsets and onsets of bass, snare, beat and downbeats (estimated by a downbeat detector). The style vector is tempo and time-invariant; the model adjusts the tempo and timing of the output according to the input (not according to the style vector).

The screenshots above the respective sets show a few bars of each generated track. The order of the tracks in the screenshots corresponds to the order of the audio samples (listened to in the order up->down, left->right).

Gipsy Love

Drehscheibe

Orgs Waltz

Miss You

Style Transfer

In this section, a style vector is inferred from source songs and applied to target songs in order to transfer the "style" to play the kick drum to other tracks. The similarities are not always obvious, as the output is not only determined by the style vector, but also by the way bass and snare is played (the style vector defines the kick drum in relation to them).

Orgs Waltz has a very regular electronic kick drum hitting at every beat. In most target songs, this characteristic is reproduced.

Orgs Waltz -> Drehscheibe

Orgs Waltz -> Gipsy Love

Orgs Waltz -> Orgs Waltz

Orgs Waltz -> Miss You

In the style extracted from Drehscheibe the kick drum tends to play on beat 1 and 3 of a bar, and sometimes on beat 2 and 4 when a strong bass onset is present.

Drehscheibe -> Drehscheibe

Drehscheibe -> Gipsy Love

Drehscheibe -> Orgs Waltz

Drehscheibe -> Miss You

The kick drum of Gipsy Love has a tendency to play shortly before and at the downbeat, and concurrently with the bass. This tendency can be observed when the style is transferred to the other pieces. Note that Gipsy Love has a time signature of 3/4 but its style can still be reasonably transferred to the other songs with 4/4 measure.

Gipsy Love -> Drehscheibe

Gipsy Love -> Gipsy Love

Gipsy Love -> Orgs Waltz

Gipsy Love -> Miss You

Miss you provides a style vector which causes the kick drum to form a pickup before the actual beat (before 1 or 3).

Miss You -> Drehscheibe

Miss You -> Gipsy Love

Miss You -> Orgs Waltz

Miss You -> Miss You

Tempo Invariance

Here, we extracted the style from each song in original tempo, and applied it to the same song in time-stretched variants.

This results in kick drum tracks which are almost identical, but adjusted to the correct time.

This shows that style vectors are indeed tempo-invariant, and the model automatically adjusts the output tempo to the input tempo.

Drehscheibe

Gipsy Love

80% Tempo

90% Tempo

100% Tempo

110% Tempo

120% Tempo

80% Tempo

90% Tempo

100% Tempo

110% Tempo

120% Tempo

Orgs Waltz

Miss You

80% Tempo

90% Tempo

100% Tempo

110% Tempo

120% Tempo

80% Tempo

90% Tempo

100% Tempo

110% Tempo

120% Tempo

Please use headphones or good speakers to listen to the examples, as kick drums are not audible otherwise.