Microwave-Clock Search Dataset

Gregory Zelinsky, Zhibo Yang, Lihan Huang, Yupei Chen, Seoyoung Ahn, Zijun Wei, Hossein Adeli, Dimitris Samaras, Minh Hoai

{gregory.zelinsky, zhibo.yang, lihan.huang, yupei.chen, seoyoung.ahn, zijun.wei, hossein.adeli, dimitrios.samaras, minh.nguyen}@stonybrook.edu

Examples

Description

Human attention during free viewing has generated wide interest in the computer vision community. However, search behavior, where the fixation scanpaths are highly dependent on the viewer’s goals, has received much less attention, even though visual search constitutes much of human everyday behavior. One reason for this is the absence of real-world image datasets on which search models can be trained. In this paper we present a carefully created dataset for two target categories, microwaves and clocks, curated from the COCO2014 dataset. A total of 2183 images were presented to multiple participants, who were tasked to search for one of the two categories. This yields a total of 16184 validated fixations used for training, making our microwave-clock dataset currently one of the largest datasets of eye fixations in categorical search. Another contribution is our collection of a 40-image testing dataset, where images contained both a microwave and a clock target. Distinct fixation patterns emerged depending on whether participants searched for a microwave (n=30) or a clock (n=30) in the same images. Models therefore had to predict different search scanpaths from the same pixel inputs. This dataset will provide a useful testbed for methods of generating category-specific priority maps for the modeling of visual search behavior.

Resource

Download

MCS Dataset (876 MB) contains:

Image Stimuli: 1494 target-present (TP) clock + 2972 target-absent (TA) clock + 1992 TA microwave + 689 TP microwave images for training; 40 TP + 40 TA images for testing.
Eye Fixation: 41127 fixations (microwave) + 47793 fixations (clock) for training; 51465 fixations (microwave + clock) for testing.
Demo code: Python code for scanpath visualization.

Python code for generating foveated images.

Source code for training the IRL model.

A related dataset: COCO-Search18.

Paper

If you are using this work, please cite the following paper:

Predicting Goal-directed Attention Control Using Inverse-Reinforcement Learning. G.J. Zelinsky, Y. Chen, S. Ahn, H. Adeli, Z. Yang, L. Huang, D. Samaras and M. Hoai, 2020. arXiv preprint arXiv:2001.11921.
Benchmarking Gaze Prediction for Categorical Visual Search, G.J. Zelinsky, Z. Yang, L. Huang, Y. Chen, S. Ahn, Z. Wei, H. Adeli, D. Samaras and M. Hoai. CVPR Workshops 2019.

Notes

You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the images and any portion of derived data.
You agree not to further copy, publish or distribute any portion of the MCS dataset. Except, for internal use at a single site within the same organization it is allowed to make copies of the database.
The CVLab@Stony Brook University reserves the right to terminate your access to the database at any time.