A GAN Based Approach to Lip-Sync 2D Cartoon Animations without Requiring Raw Cartoon Datasets
Abstract
We present a generative adversarial network (GAN) based approach to lip-sync 2D cartoon animations. Most of the previous works have worked on lip-sync for real-life speaking videos. However, lip-sync for 2D cartoon animations was rarely discussed, while the traditional workflow of creating 2D cartoon animations is highly time-consuming. The main problem of automatically lip-syncing a 2D cartoon animation, especially using a deep learning approach, is the lack of datasets that consist of well-lip-synced cartoon animations. Therefore, this paper presents a GAN-based approach to achieve 2D cartoon animation lip-sync with no need to collect raw cartoon animation datasets. We construct a cartoon-style speaking video dataset by applying style transfer techniques to transform real-life speaking videos into cartoon styles. The dataset after the style transfer was used to train a GAN-based lip-syncing model. The results show that our approach can generate natural lip-synced cartoon animations. We also conduct a user study that demonstrates the effectiveness of our approach.
Examples of the videos generated by our model:
It can also take a video as input:
Video of the demo system:
The input images are from https://www.ghibli.jp/works/ , which allows our fair usage.
The input videos are from from https://www.youtube.com/c/OPAPJP provided by OPAP-JP contributors (https://opap.jp/contributors/), which is under CC-BY 4.0.
The input audios are from CommonVoice (https://commonvoice.mozilla.org/ ) which is under CC-0 liscence.