Dance Generation by Sound Symbolic Words

Miki Okamura *1     Naruya Kondo *1    Tatsuki Fushimi *1    Maki Sakamoto *2    Yoichi Ochiai *1

*1 University of Tsukuba    *2 The University of Electro-Communications

We propose a novel approach that generates dances based on sound-symbolic words, a.k.a. onomatopoeia, as input, and have created a new dataset consisting of sound-symbolic word and dance pairs, based on the AIST Dance Video Database

Abstract

This study introduces a novel approach to generate dance motions using onomatopoeia as input, with the aim of enhancing creativity and diversity in dance generation. Unlike text and music, onomatopoeia conveys rhythm and meaning through abstract word expressions without constraints on expression and without need for specialized knowledge. We adapt the AI Choreographer framework and employ the Sakamoto system, a feature extraction method for onomatopoeia focusing on phonemes and syllables. Additionally, we present a new dataset of 40 onomatopoeia-dance motion pairs collected through a user survey. Our results demonstrate that the proposed method enables more intuitive dance generation and can create dance motions using sound-symbolic words from a variety of languages, including those without onomatopoeia. This highlights the potential for diverse dance creation across different languages and cultures, accessible to a wider audience.

Paper Overview

Results in Various Languages

To generate a dance intuitively even without knowledge of onomatopoeia, it is required that plausible dances can be generated even when onomatopoeia that comes to mind are used as is for input. The Sakamoto system we employed focuses on the phonemes and phonology of the input word, so there are virtually no restrictions on the onomatopoeia that can be used as input, and even created onomatopoeia can be used as input. To demonstrate this experimentally, we additionally generated a variety of onomatopoeia and attempted to generate dances.

input: "ディンダン" 
(叮当 (Mandarin), Dīngdāng, /diŋ˥daŋ˥/)

input: "グィビオ"  (Gwibio (Welsh), /gwibio/)

input: "チャムキロ"  (चम्किलो (Nepali), Chamkilo, /ʧamkilo/)

input: "ブルバー"  (Blubber (German), /ˈblubər/)

input: "ジュジューニー"(Жужжание (Russian), Zhuzhzhanie, /žuɕˈžaniə/)

input: "ワッケル"  (Wackel (German), /ˈvakəl/)

Comparison between generated motion and voice waveform

To verify the validity of the generated dance, we compare the input onomatopoeia’s audio waveform with the generated dance. The audio waveform represents the rhythm of the onomatopoeia, and since the rhythm is closely related to the onomatopoeia meaning, we can investigate the semantic plausibility by looking at how the onomatopoeia meaning and waveform are related to the dance movement. The meaning of the sound symbolism of each onomatopoeia in ChatGPT is: a. The sound of some liquid boiling b. The sound of a cat meowing c. The sound of something hitting or being hit by a hard object.