My Research

Using Conditional GANs for generating beat-like head gestures.

Generating novel realizations of prototypical behaviors such as head shakes, head nods, or prototypical hand gestures, which are well synchronized with speech requires enough samples for training a model to learn them. We developed a framework which finds arbitrarily gestures, given a few instances. Using these samples we can train a speech-driven model constrained on these gestures.

Adding discourse related constraints to speech driven models helps the model to capture characteristic patterns associated with each constrain.


Original Constrained Unconstrained

One of the limitations of speech driven models is that they require all the utterances of the CAs being pre-recorded.. TTS offers a flexible solution to this problem. However, if we train the models with the natural speech and test them with TTS, there is a mismatch between train and test.

We created a parallel corpus of synthetic speech which is timely aligned with the original recordings for which we have motion capture data. This provides us with the opportunity to retrain or adapt the models to TTS to avoid the mismatch.