Learning bottom-up text attention maps for text detection using stroke width transform

Abstract

Humans have a remarkable ability to quickly discern regions con- taining text from other noisy regions in images. The primary contri- bution of this paper is to learn a model to mimic this behavior and aid text detection algorithms. The proposed approach utilizes mul- tiple low level visual features which signify visually salient regions and learns a model to eventually provide a text attention map which indicates potential text regions in images. In the next stage, a text detector using stroke width transform only focusses on these selec- tive image regions achieving dual benefits of reduced computation time and better detection performance. Experimental results on the ICDAR 2003 text detection dataset demonstrate that the proposed method outperforms the baseline implementation of stroke width transform, and the generated text attention maps compare favorably with human fixation maps on text images.

ICB Affiliated Authors

B.S. Manjunath

Authors

S. Karthikeyan, V. Jagadeesh, and B. S. Manjunath

Date

November 1, 2013

Type

Peer-Reviewed Conference Presentation

Journal

Proceedings of the IEEE International Conference on Image Processing

Pages

2–6

DOI

10.1109/ICIP.2013.6738682