Audio-Visual Interpretable and Controllable Video Captioning

Yapeng Tian, Chenxiao Guan, Justin Goodman, Marc Moore and Chenliang Xu

University of Rochester

[Paper] [Demo]