Projects and Demos

GAN-based high quality talking face video synthesis

  • Proposed a conditional generation model capable to generate temporally coherent, high resolution and realistic talking faces with lip sync, natural head movement and facial expressions.

  • Online demo from National Business Daily Online demo

The demo is generated for debugging. It is not the release version.

GAN-based dancing and action generation ( exploratory)

  • Proposed a conditional dancing generation pipeline with temporally consistent 3D pose estimation and mesh reconstruction



The demo is generated for debugging. It is not the release version.

CG rendering based 3D cartoon digital human

  • Fully controllable 3D avatar with Mixamo skeleton actions and speech-driven facial blendshapes, with API for user customized or randomly generated actions.


The demo is generated for debugging. It is not the release version.

Automatic online video composition platform

  • Large-scale batch composition of videos using user provided data, featured by NLP-based paragraph generation, text-to-speech, digital human interpretation and data visualization.


The demo is generated for debugging. It is not the release version.

Identifying children with ASD based on their face processing abnormality

  • Proposed the first machine learning framework for ASD prediction with eye movement patterns.

Children behavior analysis with machine learning methods

  • Collected a large scale multimodal ASD children behavior dataset (including audios, multi-view videos, depth, questionnaires and expert evaluations).

  • Children behavior analysis with machine learning methods, including objection detection, human pose estimation and tracking, person re-identification, face detection/recognition, and speech recognition

Speaker and language recognition with phonemic tokenization

  • We present a generalized i-vector framework with phonetic tokenizations and tandem features. First, the tokens for zero-order statistics are extended from the MFCC trained GMM components to phonetic phonemes, 3-grams and tandem feature trained GMM components. Second, given the posterior probabilities on tokens, the feature for first-order statistics is also extended from MFCC to tandem features.

  • Experimental results are reported on NIST SRE 2010 and NIST LRE 2007. The proposed framework outperforms the i-vector baseline by relatively 45% in terms of equal error rate (EER) and norm minDCF values.

  • This work has been accepted to Interspeech 2014.

Multi-Class Semantic Segmentation with Confidence Propagation

  • Intuitively, confidence propagation automatically finds confident superpixel predictions and propagate their confidence to unconfident superpixel predictions, and therefore possibly corrects wrong predictions made at these superpixels.