Projects and Demos
GAN-based high quality talking face video synthesis
Proposed a conditional generation model capable to generate temporally coherent, high resolution and realistic talking faces with lip sync, natural head movement and facial expressions.
Online demo from National Business Daily Online demo
The demo is generated for debugging. It is not the release version.
GAN-based dancing and action generation ( exploratory)
Proposed a conditional dancing generation pipeline with temporally consistent 3D pose estimation and mesh reconstruction
The demo is generated for debugging. It is not the release version.
CG rendering based 3D cartoon digital human
Fully controllable 3D avatar with Mixamo skeleton actions and speech-driven facial blendshapes, with API for user customized or randomly generated actions.
The demo is generated for debugging. It is not the release version.
Automatic online video composition platform
Large-scale batch composition of videos using user provided data, featured by NLP-based paragraph generation, text-to-speech, digital human interpretation and data visualization.
The demo is generated for debugging. It is not the release version.
Identifying children with ASD based on their face processing abnormality
Proposed the first machine learning framework for ASD prediction with eye movement patterns.
Children behavior analysis with machine learning methods
Collected a large scale multimodal ASD children behavior dataset (including audios, multi-view videos, depth, questionnaires and expert evaluations).
Children behavior analysis with machine learning methods, including objection detection, human pose estimation and tracking, person re-identification, face detection/recognition, and speech recognition
Speaker and language recognition with phonemic tokenization
We present a generalized i-vector framework with phonetic tokenizations and tandem features. First, the tokens for zero-order statistics are extended from the MFCC trained GMM components to phonetic phonemes, 3-grams and tandem feature trained GMM components. Second, given the posterior probabilities on tokens, the feature for first-order statistics is also extended from MFCC to tandem features.
Experimental results are reported on NIST SRE 2010 and NIST LRE 2007. The proposed framework outperforms the i-vector baseline by relatively 45% in terms of equal error rate (EER) and norm minDCF values.
This work has been accepted to Interspeech 2014.
Multi-Class Semantic Segmentation with Confidence Propagation
Intuitively, confidence propagation automatically finds confident superpixel predictions and propagate their confidence to unconfident superpixel predictions, and therefore possibly corrects wrong predictions made at these superpixels.