Look-Hear: 

Gaze Prediction for Speech-directed Human Attention