High-level Component Filtering for Robust Scene Text Detection Weilin Huang (黄韡林) Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences Multimedia Laboratory, The Chinese University of Hongkong Outline ■ Introduction ♦ Connected Component and Sliding-Window Methods ♦ Stroke Width Transform (SWT) ♦ SWT based Text Detection ■ Stroke Feature Transform ♦ Colour Information on Text Stroke Detection ■ Text Covariance Descriptor (TCD) ♦ TCD for Component Filtering ♦ TCD for Text-line Filtering ■ Convolution Neural Network Induced MSER Trees ♦ Maximally Stable Extremal Regions (MSERs) ♦ CNN for Component Classification ♦ Component Splitting I. Introduction: Text Detection Methods ■ Connected Component Methods ♦ Step 1: Separate text and non-text information at pixel-level ♦ Step 2: Group text pixels to construct character components ♦ Advantages: fast computing ♦ Limitations: not robust, erroneous components, many false alarms ♦ Examples: SWT, MSERs ■ Sliding-Window Methods ♦ Step 1: Train a text classifier ♦ Step 1I: Scan a sliding sub-window though the image ♦ Advantages: high-level text classification ♦ Limitations: computing costly, difficulty in feature design I. Introduction: Stroke Width Transform(1) ■ Example SWT Operator Stroke width constraint: |Op - Oq|<λ ■ Low-level ■ Canny edges ■ Gradient ■ pixel filter orientation for ray tracking Compute stroke width bwt. paired pixels SWT Map ■ Problem 1: Erroneous connection Connecting multiple characters Separating single characters ■ Problem 2: many non-text components I. Introduction: SWT based Text Detection ■ Complete Processing: Comp. filtering SWT Heuristic Filtering Random Forest classifier (heuristic and geometric features) Our Improvements TL filtering GP More powerful high-level filters Text components Grouped text lines Final text lines C. Yao, X. Bai,W. Liu, Y. Ma, Z. Tu, Detecting texts of arbitrary orientations in natural images, CVPR, 2012. II. Stroke Feature Transform (SFT) (1) ■ Stroke Feature Transform(SFT): Stroke Width Constraint: |Op - Oq|<λ1 Stroke Color Constraint: |Cp - Cq|<λ2 Stroke width constraint: |Op - Oq|<λ Neighborhood Coherency Constraint SWT Stroke Width Map SFT Output Stroke Width Map Stroke Color Map II. Stroke Feature Transform (SFT) (2) ■ SFT vs SWT Mitigate inter-component connections Enhance intra-component connections Better character candidate detection Higher Recall II. Stroke Feature Transform (SFT) (3) ■ Limitation: not robust by low-level operation Text-like outliers ■ Bricks ■ Windows ■ Leaves …… Many false alarms Low Precision Heuristic filter not work well High-level learning based filtering required III. Text Covariance Descriptor (TCD) (1) ■ Text Covariance Descriptor Each pixel represented by d-features TCD is computed as: U is a given region: Multiple features are incorporated in a matrix III. Text Covariance Descriptor (TCD) (2) ■ TCD for components Pixel coordinates in X- and Y-axis Encode spatial information Pixel intensities and RGB values Color uniformity 9x9 Covariance Features Stroke width and distance values Stroke width/distance consistency Edge information by Canny detector Stroke spatial layout ■ Totally 9 features to construct a 9 x 9 matrix ■ Transform to a 45-dim feature vector ■ Get component confident maps by RF classifier III. Text Covariance Descriptor (TCD) (3) ■ TCD for Text-line Mean properties of component features Uniformity Coordinates of component centers 12x12 Covariance Features Spatial information Heights of components Consistency Horizontal distances between components Text spatial layout 16-bins HOG on edge pixels Orientated spatial features ■ 16x16 Covariance Features Get Text-line Confident Maps by RF classifier III. Text Covariance Descriptor (TCD) (4) ■ Component and text-line confidence maps III. Text Covariance Descriptor (TCD) (5) ■ Top:TCD for component; Middle:TCD for text-line; Bottom: detection III. Text Covariance Descriptor (TCD) (5) ■ Results ■ Failure Cases W. Huang, Z. Lin, J.Yang and J. Wang,Text localization in natural images using stroke feature transform and text covariance descriptors, ICCV, 2013. V. Convolution Neural Network Induced MSER Trees (1) ■ Maximally Stable Extremal Region (MSER) Tree L. Neumann and J. Matas. Text localization in real-world images using efficiently pruned exhaustive search, ICDAR, 2011. ■ MSER vs SWT ♦ Detect low-quality texts Higher Recall ♦ Generate more non-text components Lower Precision ♦ Require a more powerful classifier/filter V. Convolution Neural Network Induced MSER Trees (2) ■ A Two-layers Convolution Neural Network (CNN) T. Wang, D. J. Wu, A. Coates and A. Y. Ng, End-to-end text recognition with convolutional neural networks, ICPR, 2012. V. Convolution Neural Network Induced MSER Trees (3) ■ Training Data: Synthetic 15000 samples ■ Data Transformation ♦ Fixed-size of 32x32 ♦ Horizontal ♦ Include warp additional image context V. Convolution Neural Network Induced MSER Trees (3) ■ CNN Confident Scores MSERs CNN Scores Comp. Splitting Detection V. Convolution Neural Network Induced MSER Trees (4) ■ Component Splitting Erroneously connected Component ■ High aspect ratio ■ Positive conf. score ■ Leaf of the MESR tree or conf. score> all children V. Convolution Neural Network Induced MSER Trees (5) ■ Comparisons with SFT-TCD V. Convolution Neural Network Induced MSER Trees (6) ■ Results V. Convolution Neural Network Induced MSER Trees (7) ■ Results on the ICDAR 2011 Database W. Huang,Y. Qiao, and X. Tang, Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees, ECCV, 2014. The End Thank You!