Driver Gaze in the Wild

Labelling of human behavior analysis data is a complex and time consuming task. In this paper, a fully automatic technique for labelling an image based gaze behavior dataset for driver gaze zone estimation is proposed. Domain knowledge can be added to the data recording paradigm and later labels can be generated in an automatic manner using speech to text conversion. In order to remove the noise in STT due to different ethnicity, the speech frequency and energy are analysed. The resultant Driver Gaze in the Wild DGW dataset contains 586 recordings, captured during different times of the day including evening. The large scale dataset contains 338 subjects with an age range of 18-63 years. As the data is recorded in different lighting conditions, an illumination robust layer is proposed in the Convolutional Neural Network (CNN). The extensive experiments show the variance in the database resembling real-world conditions and the effectiveness of the proposed CNN pipeline. The proposed network is also fine-tuned for the eye gaze prediction task, which shows the discriminativeness of the representation learnt by our network on the proposed DGW dataset.

Demo:

1.avi

DGW Dataset Overview:

Following frames are from the Driver Gaze in the Wild (DGW) dataset. The Driver Gaze in the Wild (DGW) dataset contains 586 recordings of 338 subjects in different illumination conditions.

Audio Based Automatic Labelling Framework :

The data has been collected in a car with different subjects at the driver’s position. We pasted number stickers on different gaze zones of the car. The nine car zones are chosen from back mirror, side mirrors, radio, speedometer and windshield. The recording sensor is a Microsoft Lifecam RGB camera, which contains a microphone as well. For recording, we asked the subjects to look at the zones marked with numbers in different orders. For each zone, the subject has to fixate on a particular zone number and speak the zone’s number and then move to the next zone. For recording realistic behaviour, no constraint is mentioned to the subjects about looking by eye movements and/or head movements. The subjects chose the way in which they are comfortable. This leads to more naturalistic data.

Overview of the automatic data annotation technique is shown in the following figure. On the top are the representative frames from each zone. Please note the numbers written in alphabets below the curve. On the bottom right are reference car zones.

Contact:

For accessing the database for research or commercial purpose, please email Abhinav Dhall at abhinav[DOT]dhall[at]monash[DOT]edu.

Report:

Shreya Ghosh, Abhinav Dhall, Garima Sharma, Sarthak Gupta and Nicu Sebe. Speak2Label: Using Domain Knowledge for Creating a Large Scale Driver Gaze Zone Estimation Dataset. ICCVW 2021(link)