From where and how to what we see

Abstract

Eye movement studies have confirmed that overt atten- tion is highly biased towards faces and text regions in im- ages. In this paper we explore a novel problem of predicting face and text regions in images using eye tracking data from multiple subjects. The problem is challenging as we aim to predict the semantics (face/text/background) only from eye tracking data without utilizing any image information. The proposed algorithm spatially clusters eye tracking data ob- tained in an image into different coherent groups and sub- sequently models the likelihood of the clusters containing faces and text using a fully connected Markov Random Field (MRF). Given the eye tracking data from a test image, it pre- dicts potential face/head (humans, dogs and cats) and text locations reliably. Furthermore, the approach can be used to select regions of interest for further analysis by object de- tectors for faces and text. The hybrid eye position/object de- tector approach achieves better detection performance and reduced computation time compared to using only the ob- ject detection algorithm. We also present a new eye tracking dataset on 300 images selected from ICDAR, Street-view, Flickr and Oxford-IIIT Pet Dataset from 15 subjects.

ICB Affiliated Authors

Miguel Eckstein

B.S. Manjunath

Authors

S. Karthikeyan, V. Jagadeesh, R. Shenoy, M. Eckstein, and B. S. Manjunath

Date

December 1, 2013

Type

Peer-Reviewed Conference Presentation

Journal

Proceedings of the IEEE International Conference on Computer Vision

Pages

625 - 632

DOI

10.1109/ICCV.2013.83