presentation

advertisement
Con-Text: Text Detection Using Background
Connectivity for Fine-Grained Object Classification
Sezer Karaoglu, Jan van Gemert, Theo Gevers
1
Can we achieve a better object recognition with
the help of scene-text?
2
Goal
• Exploit hidden details by text in the scene to improve visual
classification of very similar instances.
DJ SUBS Breakfast
Starbucks Coffee
SKY
SKY
CAR
Starbucks Coffee
SKY
CAR
Application : Linking images from Google street view to textual business
inforation as e.g. the Yellow pages, Geo-referencing, Information retrieval
3
Challenges of Text Detection in Natural
Scene Images
o
o
o
o
o
o
o
Lighting
Surface Reflections
Unknown background
Non-Planar objects
Unknown Text Font
Unknown Text Size
Blur
4
Literature Review Text Detection
• Texture Based:
Wang et al. “End-to-End Scene Text Recognition”
ICCV ‘11
Computational Complexity
Dataset specific
 Do not rely on heuristic rules
•
Region Based:
Epshtein et al. “Detecting Text in Natural Scenes
with Stroke Width Transform ” CVPR ‘10
Hard to define connectivity
 Segmentation helps to improve ocr performance
5
Motivation to remove background for Text
Detection
• To reduce majority of image regions for further processes.
• To reduce false positives caused by text like image regions (fences, bricks,
windows, and vegetation).
• To reduce dependency on text style.
Proposed Text Detection Method
Automatic BG seed selection
BG reconstruction
Text detection by BG
substraction
7
Background Seed Selection
Original Image
•
•
•
•
Color Boosting
Contrast
Objectness
Color, contrast and objectness responses are used as feature.
Random Forest classifier with 100 trees based on out-of-bag error are used to create forest.
Each tree is constructed with three random features.
The splitting of the nodes is made based on GINI criterion.
Conditional Dilation for BG
connectivity
repeat
until
where B is the structring element (3 by-3 square), M is the binary image where bg seeds are ones
and X is the gray level input image
Text Recognition Experiments
• ICDAR’03 Dataset with 251 test images, 5370 characters, 1106
words.
10
ICDAR 2003 Dataset
Char. Recognition Results
Method
ABBYY
Karaoglu et. al.
Proposed
Cl. Rate (%)
36
62
63
The proposed system removes 87% of the non-text regions where on average
91% of the test set contains non-text regions. It retains approximately %98 of
text regions.
11
ImageNet Dataset
Bakery
Country House
Discount House
Funeral
Pizzeria
Steak
• ImageNet building and place of business dataset ( 24255 images 28
classes, largest dataset ever used for scene tekst recognition)
• The images do not necessarily contain scene text.
• Visual features : 4000 visual words, standard gray SIFT only.
• Text features: Bag-of-bigrams , ocr results obtained for each image
in the dataset.
• 3 repeats, to compute standard deviations in Avg. Precision.
• Histogram Intersection Kernel in libsvm.
• Text only, Visual only and Fused results are compared.
Fine-Grained Building Classification
Results
Text
Fusion
Visual
ocr : 15.6 ± 0.4
Bow : 32.9 ± 1.7 Bow + ocr : 39.0 ± 2.6
Discount House
Visual
#269
#431
#584
#2752
Text
#1
#4
#5
#8
#1
#4
#5
#8
Proposed
Conclusion
•
•
•
•
Background removal is a
suitable approach for scene
text detection
A new text detection method,
using background connectivity
and, color, contrast and
objectness cues is proposed.
Improved performance to scene
text recognition.
Improved Fine-Grained Object
Classification performance
with visual and scene text
information fusion.
14
DEMO
TRY HERE
Download