Neural double-tracking

Singing voice effector that reproduces randomness of singing voices

Abstract / 概要

ダブルトラッキング (DT) とは，同一フレーズを複数回歌唱してミックスすることで，歌声に厚みを持たせる方法です．これは，人間が「歌うたびに声が異なる（＝発話間変動がある）」ことを利用しています．しかしながら，今の歌声合成は「楽譜が決まると声は常に同じ（＝発話間変動がない）」であるため，歌声に厚みを持たせることができません．そこで，DNNを使って発話間変動をモデル化することで，歌声としての自然性を保ったまま，合成歌声をランダム変調して加工してミックスできます．提案法は，データドリブンで発話間変動をモデル化するため，従来の信号処理的方法 (ADT) よりも優れた重ね録り感を実現できます．

Double-tracking (DT) involves two or more vocal performance uttered by one singer that are combined in a mix. It gives layeredness in the mixed voices thanks to the singer's inter-utterance variation. On the other hand, current singing voice synthesizers synthesize only one voice from one musical score (= no inter-utterance variation), and they cannot synthesize layered (double-tracked) voices. We proposed a DNN-based approach to model the inter-utterance variation and a post-filter to randomly modulate the synthesized voice without degrading its quality. Because our method learns voice randomness from data, it can synthesize perceptually natural voices than the conventional signal-processing-based method (ADT).

Example / 歌声サンプル

Input voice

Output voice

Other samples

我々の方法は，合成歌声だけではなく人間の歌声に対しても適用できます．すなわち，ユーザが一度だけ歌った歌声から，その人が複数回歌ってミックスされたような歌声を自動生成できます．

Our method can be applied to not only synthesized voices but also human singers' voices. Namely, the naturally layered (double-tracked) voice is automatically synthesized from one vocal performance.

Publication / 発表文献

Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, and Hiroshi Saruwatari, "Generative moment matching network-based random modulation post-filter for DNN-based singing voice synthesis and neural double-tracking," Proc. ICASSP, pp. 7070--7074, Brighton, United Kingdom, May 2019. (preprint) (poster)
田丸浩気, 齋藤佑樹, 高道慎之介, 郡山知樹, 猿渡洋, "モーメントマッチングに基づくDNN 合成歌声のランダム変調ポストフィルタとニューラルダブルトラッキングへの応用," 情報処理学会研究報告, 2018-SLP-125, no. 20, pp. 1--6, Dec., 2018. (slide)