MRLab KIST - RIDGE

RIDGE

Rule-Infused Deep Learning for Realistic Co-Speech Gesture Generation

Ghazanfar Ali Hwangyoun Kim Jae-In Hwang

Korea Institute of Science and Technology

Abstarct

Co-speech gestures are essential for natural human communication, yet existing synthesis methods fall short in delivering semantically aligned and contextually appropriate motions. In this paper, we present \textbf{RIDGE}, a hybrid system that combines rule-based and deep learning approaches to generate realistic gestures for virtual avatars and human-computer interaction. RIDGE employs a high-fidelity rule base generated from motion capture data with the assistance of large language models, to select reliable gesture mappings. When a high-confidence match is not available, a contrastively trained deep learning model steps in to produce semantically appropriate gestures. Evaluated using a novel Gesture Cluster Affinity (GCA) metric, our system outperforms existing baselines, achieving a GCA score of 0.73 compared to rule-based baseline 0.6 and end-to-end: 0.52, while ground truth score was 0.90. Detailed analyses of system architecture, data preprocessing, and evaluation methodologies demonstrate RIDGE’s potential to enhance gesture synthesis.

Page updated

Google Sites

Report abuse

2022 @ MRLabKIST