Learning bottom-up text attention maps for text detection using stroke width transform

Abstract

Humans have a remarkable ability to quickly discern regions con- taining text from other noisy regions in images. The primary contri- bution of this paper is to learn a model to mimic this behavior and aid text detection algorithms. The proposed approach utilizes mul- tiple low level visual features which signify visually salient regions and learns a model to eventually provide a text attention map which indicates potential text regions in images. In the next stage, a text detector using stroke width transform only focusses on these selec- tive image regions achieving dual benefits of reduced computation time and better detection performance. Experimental results on the ICDAR 2003 text detection dataset demonstrate that the proposed method outperforms the baseline implementation of stroke width transform, and the generated text attention maps compare favorably with human fixation maps on text images. 

ICB Affiliated Authors

Authors
S. Karthikeyan, V. Jagadeesh, and B. S. Manjunath
Date
Type
Peer-Reviewed Conference Presentation
Journal
Proceedings of the IEEE International Conference on Image Processing
Pages
2–6