Demo of "Modeling Interpretation Variations in Music Performance Rendering using CVRNN"

Akira Maezawa, Kazuhiko Yamamoto, Takuya Fujishima (Yamaha Corporation)

Correspondence: akira.maezawa __at__ music.yamaha.com

Demonstrations

(Loading the audio might take a few minutes)

Baroque

demo-baroque.mp3
bach_badinerie.mp3

Classic

demo-classic.mp3

Romantic

chop2816-default.mp3

Interpretation sequence sampled from the prior

chop2816-calm-vector.mp3

Same as left, but the interpretation sequence has been offset. Notice that it generates a much calmer and shorter articulation.

Ragtime

ragtime-default.mp3

Interpretation sequence sampled from the prior

ragtime-calm-vector.mp3

Same song as the left, but adjusted the interpretation sequence the same way as above. Notice that here the playing becomes calmer and softer. This shows that the property of interpretation vector remains consistent between pieces.

Some comparisons with the baselines

Song 1

raw-1.wav

Original MIDI (Raw)

finale-1.wav

Rendered with Finale

proposed-1.wav

Rendered with the proposed method

Song 2

raw-2.wav

Original MIDI (Raw)

finale-2.wav

Rendered with Finale

proposed-2.wav

Rendered with the proposed method

Visualizing the piano-roll as interpretation vector is changed

Here, we can see that the articulation changes (as well as dynamics and the tempo) as we change the interpretation vector. The audio here plays the bottom two piano-rolls, first the left followed by the right. You can hear subtle nuances in the articulation as well as the tempo

laent-change-demo.mp3

Performance generated with the default interpretation vector sampled from the prior

Performance generated with the default interpretation vector sampled from the prior, with additional bias