Human Judgment Baseline

Description :

We conducted a set of experiments to measure human judgment in multi-stage drowsiness detection. In these experiments, we asked four volunteers per fold (20 volunteers in total) to watch the unlabeled and muted videos in each fold and write down a real number between 0 to 10 estimating the drowsiness degree per video. Before the experiment, volunteers (8 female and 12 male, 3 undergraduates and 17 graduate students) were shown some sample videos that illustrated the drowsiness scale. Then, they were left alone in a room to watch the videos (they were allowed to rewind back or fast forward the videos at will) and annotate them. In order to make sure that each judgment was independent of the other videos of the same person, volunteers were instructed to annotate one video of each subject before annotating a second video for any subject. Results of these experiments are demonstrated and compared with the results of our baseline method. Observers (aged 26.1 ± 2.9 (mean ± SD)) were from computer science, psychology, nursing, social work and information systems majors.

Video Accuracy (VA) :

"Video Accuracy'' is the main metric, that shines light to the final test classification results at the very end. For the human judgement experiment, the five-fold cross validated accuracy is 57.8%.