A GAN Based Approach to Lip-Sync 2D Cartoon Animations without Requiring Raw Cartoon Datasets

Abstract

We present a generative adversarial network (GAN) based approach to lip-sync 2D cartoon animations. Most of the previous works have worked on lip-sync for real-life speaking videos. However, lip-sync for 2D cartoon animations was rarely discussed, while the traditional workflow of creating 2D cartoon animations is highly time-consuming. The main problem of automatically lip-syncing a 2D cartoon animation, especially using a deep learning approach, is the lack of datasets that consist of well-lip-synced cartoon animations. Therefore, this paper presents a GAN-based approach to achieve 2D cartoon animation lip-sync with no need to collect raw cartoon animation datasets. We construct a cartoon-style speaking video dataset by applying style transfer techniques to transform real-life speaking videos into cartoon styles. The dataset after the style transfer was used to train a GAN-based lip-syncing model. The results show that our approach can generate natural lip-synced cartoon animations. We also conduct a user study that demonstrates the effectiveness of our approach.

Examples of the videos generated by our model:

1.mp4
2.mp4
3.mp4

It can also take a video as input:

video_input.mp4

Video of the demo system:

demo.mov

The input images are from https://www.ghibli.jp/works/ , which allows our fair usage.

The input videos are from from https://www.youtube.com/c/OPAPJP provided by OPAP-JP contributors (https://opap.jp/contributors/), which is under CC-BY 4.0.

The input audios are from CommonVoice (https://commonvoice.mozilla.org/ ) which is under CC-0 liscence.