Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots
Youngwoo Yoon, Woo-Ri Ko, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee
ETRI KAIST
Abstract
Co-speech gestures enhance interaction experiences between humans as well as between humans and robots. Existing robots use rule-based speech-gesture association, but this requires human labor and prior knowledge of experts to be implemented. We present a learning-based co-speech gesture generation that is learned from 52 h of TED talks. The proposed end-to-end neural network model consists of an encoder for speech text understanding and a decoder to generate a sequence of gestures. The model successfully produces various gestures including iconic, metaphoric, deictic, and beat gestures. In a subjective evaluation, participants reported that the gestures were human-like and matched the speech content. We also demonstrate a co-speech gesture with a NAO robot working in real time.
Preprint: [arXiv]
TED Gesture Dataset
https://github.com/youngwoo-yoon/youtube-gesture-dataset
You can download python scripts to generate gesture datasets. We also provide pre-built TED Gesture Dataset including all extracted poses for more than 1,700 videos. Please visit the github repository.
Model Codes
- https://github.com/youngwoo-yoon/Co-Speech_Gesture_Generation
- It is a modified version for GENEA Workshop 2020. You can train and test on the Trinity Gesture Dataset.
- Model code only: https://github.com/ai4r/Co-Speech-Gesture-Generation
- (unofficial) pytorch implementation by Pieter Wolfert: https://github.com/pieterwolfert/co-speech-humanoids
One more video
Citation
@INPROCEEDINGS{
yoonICRA19,
title={Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots},
author={Yoon, Youngwoo and Ko, Woo-Ri and Jang, Minsu and Lee, Jaeyeon and Kim, Jaehong and Lee, Geehyuk},
booktitle={Proc. of The International Conference in Robotics and Automation (ICRA)},
year={2019}
}