TEXT EXTRACTION FROM INVARIANT COMPLEX IMAGE UNIVERSITI TEKNOLOGI MALAYSIA

advertisement
TEXT EXTRACTION FROM INVARIANT COMPLEX IMAGE
NOURI ALI AL MABROUK AL HASHI
UNIVERSITI TEKNOLOGI MALAYSIA
TEXT EXTRACTION FROM INVARINAT COMPLEX IMAGE
NOURI ALI AL MABROUK AL HASHI
A project submitted in partial fulfillment of the
requirements for the award of the degree of
Master of Science (Computer Science)
Faculty of Computer Science and Information Systems
Universiti Teknologi Malaysia
July 2009
iii
To my dearest father, mother and family,
for their encouragement, blessing and inspiration …
iv
ACKNOWLEDGEMENT
Alhamdullillah, I am grateful to ALLAH SWT on His blessing and mercy for
giving me the strength along the challenging journey of carrying out this project and
making this project successful.
First of all, I would like to express my deepest appreciation to my supervisor,
Prof.Dr.Dzulkkifli Mohamad for his effort, guidance and support throughout this
project. Without his advice, suggestions and guidance, the project would have not been
successful achieve the objectives.
To all lecturers who have taught me, thank you for the lesson that has been
delivered. Not forgetting all my friends, thank you for their useful idea, information and
moral support during the course of study.
Last but not least, I would like to express my heartiest appreciation to my parents
and my family, who are always there when it matters most.
v
ABSTRACT
Great progress has been made in Optical Character Recognition (OCR)
technology. Most current OCRs, however, can only read characters printed on sheets of
paper according to some rigid format restrictions. For that, the detection and extraction
of text regions in an image are well known problems in Computer Vision research area.
The goal of this project is to extract and recognize the text from an image by using the
edge-based and fuzzy logic algorithm respectively. The algorithms are implemented and
evaluated by using a set of images of natural scenes that vary along its’ size, scale and
orientation. Various kernels can be used for this operation ,the whole set of 8 kernels is
produced by taking one of kernels and rotating its coefficient circularly and edgedetection operator is calculated by forming matrix centered on pixel chosen as center of
matrix area, then Localization involves further enhancing regions by eliminating nontext regions. Edge-detection works quite well for digital image corrupted with multiscale and multi-orientation whereas its performance of this operator cannot be used in
practical image which generally corrupted other types and edge-detection for detection
of edge in digital image is that image should contain sharp intensity transition and low
noise of the type is present. Moreover the image is colored image .Then, edge detect at
eight edges and convolve with Gaussian after that select the strong edge was suitable of
detect the text. As known be the project in complex image by using eight kernels to
accomplish the task .Then, we used identified pixel of determine the character with use
fuzzy logic.
.
vi
ABSTRAK
kemajuan yang pesat telah berkembang dalam teknologi pengenalan wojah
optic(OCR). Kebangakkan OCR kini hanga boleh mengenapasti wajah yang dihadkan
untuk itu ,pengecanan dan mengeluarkan esbahagian format kawasan danpada imej
adalah salah sutu musalah yang masih dibuat kajjan dalan system komputer .Projek ini
bertujuan untuk mengeuarkan dan mengenalpasti teks daripada imej menggunakan
kaedah teras-sisi dan algoritma logik fuzzy .Algorithma ini telah digunakan dan dinilai
menggunakan beberapa set imej yg dan bersifat semulajdi yang berbeza dari segi saiz,
skala dan kedudukan. Operator kompas teloh digunakan dimang ia menganchingi lopan
isirong yg digunakan untuk mengesun 8 arah berlainon selepasitu,setiap sisi
dikumpulkan bersama mengunakan tingkatan sisi menegan. Kaedah ini adalah sangaf
penting untuk menentukan cirri pergerakan dan juga perabahun setiap pemanjangan jugu
nemberi perubahan juga kepada cirri ketinggian yg berkaltan penempatan juga terlibut
untuk mempertingkatkan kawasan dengan menghpuskan kanan bukan teks. Imej ini
adalah imej yg bewaran.Kwmudiank sisi akan aikesun mengyunkgn pengesan pada
lapan sisi dan menggulung dengan kaedan cuussian selepas sisi yang kuat dipilih
bersesuaien dengan pengesan teks sepert yang diketahui imej yang kompleks akan
menggunakan lapon isirong untuk mengempumakan tugas ini .Kemudian ,pengesan
piksel digunakan untuk mendapatkan cirri yang dikehendaki menggunakon logic fuzzy
.
vii
TABLE OF CONTENTS
CHAPTER
1
TITLE
PAGE
DECLARATION
ii
DEDICATION
iii
ACKNOWLEDGEMENT
iv
ABSTRACT
v
ABSTRAK
vi
TABLE OF CONTENTS
vii
LIST OF TABLES
xi
LIST OF FIGURES
xii
LIST OF SYMBOLS
xv
LIST OF APPENDICES
xvii
INTRODUCTION
1
1.1
Introduction
1
1.2
Problem Background
2
1.3
Problem Statement
4
1.4
Project Objectives
5
1.5
Scope of the Project
6
1.6
Significant of the Study
6
1.7
Report Organization
7
viii
2
LITERATURE REVIEW
8
2.1
Introduction
8
2.2
Segmentation Categories
9
2.2.1. Threshold Based Segmentation
9
2.2.2. Clustering Techniques
10
2.2.3. Matching
10
2.2.4. Edge Based Segmentation
10
2.2.5. Region Based Segmentation
11
Categories of Variance Text
11
2.3.1.
Lighting Variance
12
2.3.2.
Scale Variance
12
2.3.3.
Orientation Variance
12
2.3
2.4
2.5
Recognition Text
13
2.4.1.
Text Detection
15
2.4.2.
Text Area Identification
15
2.4.3.
Text Region Localization
15
2.4.4.
Text Extraction and Binary Image
16
Analytic Segmentation
17
2.5.1.
Pattern Recognition
17
2.5.2.
Statistical Pattern Recognition
18
2.5.3.
Data Clustering
18
2.5.4.
Fuzzy Logic
19
2.5.5.
Neural Networks
19
2.5.6.
Structural Pattern Recognition
20
2.5.7.
Syntactic Pattern Recognition
20
2.5.8.
Approximate Reasoning Approach to
Pattern Recognition
2.5.9.
Application of Support Vector Machine
(SVM)
2.6
21
21
Pattern Recognition System
21
2.6.1.
22
The Structure of Pattern Recognition
ix
2.7
2.8
2.9
2.6.2.
Application of Pattern Recognition
23
2.6.3.
Character Recognition
23
Run-Length Coding Algorithm
24
2.7.1.
Neighbors
25
2.7.2.
Path
26
2.7.3.
Foreground
26
2.7.4.
Connectivity
27
2.7.5.
Connected Component
27
2.7.6.
Background
28
2.7.7.
Boundary
29
2.7.8.
Interior
29
2.7.9.
Surrounds
30
2.7.10. Component Labeling
30
Properties Text
32
2.8.1.
Removing the Borders
32
2.8.2.
Divide the Text into Rows
32
2.8.3.
Divide the Row “Lines” into the Words
32
2.8.4.
Divide the Word into Characters
34
Identify Character
2.10 Fuzzy Logic
35
35
2.10.1. What Fuzzy Logic?
37
2.10.2. What is the Fuzzy Logic Toolbox?
38
2.10.3. Fuzzy Sets
38
2.10.4. Membership Function
39
2.10.5. If-Then Rules
40
2.10.6. Fuzzy Inference System
41
2.10.7. Rule Review
41
2.10.8. Surface Review
42
2.11 Summary
43
x
3
METHODOLOGY
44
3.1. Introduction
44
3.2. Problem Statement and Literature Review
46
3.3. System Development
46
3.4. Performance Evaluation
47
3.5. General Steps of Proposed Techniques
47
3.6. Proposed Algorithm for Edge Based Text
Region Extraction
3.7. Detection
48
49
3.8. Feature Map and Candidate Text Region
4
Detection
55
3.8.1. Directional Filtering
55
3.8.2. Edge Selection
55
3.8.3. Feature Map Generation
58
3.8.4. Localization
59
3.8.5. Character Extraction
59
3.9. Connection Component
60
3.10.
Fuzzy Logic
65
3.11.
Summary
67
IMPLEMENTATION
68
4.1. Introduction
68
4.2. Input Image
69
4.3. Complement Edge Detect with them
83
4.4. Eight Edge Detection
85
4.5. Image Localization
85
4.6. Separate Text From Background
86
4.7. Reduce Size
88
4.7.1. Determine Borders
88
4.7.2. Divide Text into Rows
89
4.8. Determine Character by Run-Length
90
xi
5
6
REFERENCES
Appendices
RESULTS DISCUSION
95
5.1. Introduction
95
5.2. Discussion on Results
96
5.3. Experimental results and discussion
98
5.4. Project Advantage
108
5.5. Suggestion and Future Works
109
5.6. Conclusion
110
CONCLUSION
111
113-115
116
xi
LIST OF TABLES
TITLE
TABLE NO.
3.1
Results to object to rows
4.2
Results after image scan ,where ST=start, EN=end and
PAGE
63
RW=row
64
4.1
Running time of major step
67
5.1
Performance evaluation 1
105
5.2
Performance evaluation 2
107
5.3
Performance evaluation 3
108
xii
LIST OF FIGURES
TITLE
FIGURE NO.
PAGE
2.1
General model of extraction text
13
2.2
The composition of PR system
22
2.3
Horizontal projection calculated from run-length code
24
2.4
4-and 8-neighborhood for rectangular image location
Pixel [i,j] is located in center
25
2.5
4-path and 8-path
26
2.6
Border of an image
28
2.7
Ambiguous border
28
2.8
A binary image with its boundary ,interior and
surrounds
30
2.9
An image (a) and its connected component image (b)
31
2.10
Divide the text into rows
33
2.11
Divide the rows into the words
34
2.12
Divide the word into characters
35
2.13
Identify character
35
2.14
A classical set and fuzzy set representation of “warm
room temperature”
37
2.15
(a) input of pixel (b) input of location for pixel
39
2.16
Output variable “letter”
40
2.17
Building the system with fuzzy logic
42
xiii
3.1
Proposed method
45
3.2
Block diagram of general steps of proposed approach
48
3.3
Gaussian filter
49
3.4
Sample gaussian pyramid with 8 levels
50
3.5
Extraction Operation
50
3.6
Edge detection
53
3.7
U shape object with runs after pixeltoruns
63
3.8
8-neighborhoods for rectangular image location pixel
[i,j] is located in center of each figure
65
3.9
Identify the character
66
3.10
(a) example of fuzzy input (b) example of fuzzy output
4.1
Original image
56
4.2
Structure 3x3(filter)
70
4.3
Our example of convolution operation
71
4.4
Kernel used
73
4.5
Directions of edge-detection
74
4.6
Structure of convolution
4.7
Operation of kernel 0
4.8
Edge detection
4.9
Effect of adding two edge
84
4.10
Total of edges detection
85
4.11
Localized of text
86
4.12
Separate text from background
87
4.13
Test image1 (a)image (b)localization(c)result
87
4.14
Test image2 (a)image (b)localization(c) result
88
4.15
Determine borders
88
4.16
(a)row one (b) row two
89
4.17
Identified character
90
4.18
Ten input and one output
91
4.19
Input one n1
92
4.20
Output
92
54-55
75-76
77
78-83
xiv
4.21
Output of extracted text
93
5.1
Sample 1
98
5.2
Sample 2
99
5.3
Sample 3
99
5.4
Sample 4
100
5.5
Sample 5
100
5.6
Sample 6
101
5.7
Sample 7
101
5.8
Sample 8
102
5.9
Sample 9
102
5.10
Sample 10
103
xv
LIST OF SYMBOLS
OCR
-
Optical character recognition
CC
-
Connected components
BAG
-
Black adjacency graph
AMA
-
Aligning-and merging analysis
SVM
-
Support vector machine
RLC
-
Run-length code
PR
-
Pattern recognition
SE
-
Structuring element
MFs
-
membership functions
FIS
-
Fuzzy Inference System
xvii
LIST OF APPENDICES
TITLE
APPENDIX
PAGE
A1
Matlab command to find binary image
116
A2
Matlab command used fuzzy logic for identify
116
character
CHAPTER I
INTRODUCTION
1.1
Introduction
During the past years, recent studies in the field of computer vision and pattern
recognition showed a great amount of interest in content retrieval from images and
videos. This content can be in form of objectives, colors, textures and shapes as well as a
relationship between them.
As stated by (Kwang, Keechul and Jin, 2003c) the text data is particularly
interesting, because text data can contain image of varying text due to differences in size
,orientation and alignment as well as complex background that make the problem of
automatic text extraction extremely challenging. In recent years, great progress has
been made in Optical Character Recognition (OCR) technique that can only handle the
text against plain monochrome background, and extract text from a complex
2
background. Commercially, OCR engines
cannot yet
detect and recognize text
embedded in complex background directly.
Extraction of text from images has been relied upon mainly on the properties of
text. In the past few years, it was witnessed rapid growth in a number and variety of
applications using fuzzy logic. Fuzzy logic is a logical system which is an extension of
multi-value logic. It was used to identify the character after extracting text from an
image. (Kongqiao and Jari, 2003b) proposed character recognition that comprises a
character boundaries operation for invariance of multi-scale and multi-orientation.
Finally, it is expected that results will present the success of text extraction and
recognition process from a complex image.
1.2
Problem Background
Most of the applications that involve documented naturals, where texts and
graphics are blended together, need some land separation between texts and graphics to
detect and recognize text without any computer help is difficult task of information
processing field. Because of that, intensive projects are performed to process extraction
and recognition by machine and automatic extracting recognition were topics of research
for years.
3
(Jagath and Xiaoqing, 2006b) their algorithm which can use edge-based text
extraction algorithm which is robust with respect to font sizes, color, intensity,
orientation, effects of illumination, reflection, shadows, perspective distortion and the
complexity of image background can quickly and effectively localize and extract text
from real scenes.
(Kongqiao and Jari, 2003b) they proposed connected-component based (CC-
based) method which combines color clustering, a black adjacency graph (BAG) , an
aligning –and-merging-analysis (AMA) scheme and a set of heuristic rules together to
detect text in the application of sign recognition such as street indicators and
billboards . (Rainer and Axel, 2002c) proposed a feed-forward neural network to
localize and segment the text from complex images; it is designed specifically for
horizontal text with at least two characters. (Yuzhong, kallekearu and anil, 1995)
proposed hybrid of CC-based and texture-based method to extract text. Although
experimental results show that the combination of these two methods perform better, the
monochrome constraint used also fails to detect touching characters. (Kwang, Keechul
and Jin, 2003c) combined a Support Vector Machine (SVM) and continuously adaptive
mean shift algorithm (CAMSHIFT) to detect and identify text regions. Datong, (Herve
and Jean, 2001) they used a SVM to identify text lines from candidates .However,
experimental results show that both methods above are mainly designed for video
captions.
(Jiang and Jie, 2000) developed a three layer hierarchical adaptive text detection
algorithm for natural scenes; this method has been applied in a prototype Chinese sign
translation system which mostly has horizontal and/ or vertical alignment. ( Ezaki,
Bulacu and Schomaker, 2004) proposed four character extraction methods based on
connected components. The performance of the different methods depends on character
size. (Takuma, Yasuaki and Minoru, 2003d) proposed digits classification system to
recognize telephone number written on signboard. Candidate regions of digits are
4
extracted from an image through edge extraction, enhancement and labeling. Matsuo,
(Ueda and Michio, 2002d) proposed a text extraction algorithm from image scenes after
an identification stage of local target area and adaptive thresholding. (Xilin, Jie, Jing and
Alex, 2003e) proposed a framework for automatic detection of signs from natural scenes
.This framework considers critical challenges in sign extraction and can extract signs
robustly under different conditions .
Based on these studies, this project attempts to propose extraction strategy relies
on the edge-detection of text and characters in conjunction with fuzzy logic to recognize
characters.
1.3
Problem Statement
This study that utilizes effective extraction method may provide significant
improvement of multi-orientation and multi-scale recognition performance .To reach
good recognition performance, it is important to solve explicit extraction problems such
as different scale and different orientation.
The main research question is “how to achieve an effective extraction of text
variations for multi-scale, multi-orientation and sub-question of main project questions
as shown:
5
1. How the recent extraction approach has done?
2. How system might improve extraction approach?
3. How to evaluate and to measure the proposed extraction and recognition
character performance?
1.4
Project Objectives
Based on the problem statement above, this project encompasses a set of objectives
that is associated with milestones of the project process. The project objectives are
mentioned below.
1. To develop an improved extraction method based on edge detection and fuzzy
logic.
2. To verify the effectiveness of the proposed technique as compared to existing
techniques.
6
1.5
Scope of the Project
In order to accomplish the objective of this study, it is important to identify the
scope which covers the following aspects:
This research is concerned with the extraction of text from image and recognition
of characters by using fuzzy logic.
1. This research is concerned with invariant complex image.
2. Dilation and erosion are used to remove noise and touching between characters
3. Fuzzy logic that is used for identifying the characters.
1.6
Significance of the Study
This study is carried out with the main objective of extracting text. Based on the
results obtained, it is hoped that this is able to achieve the following:
1. To give exposure on another promising technique of extraction that could offer
better or at least same performance as the existing techniques.
2. To solve the extraction problem such as complex background, different style, font
etc.
3. To encourage more works in exploring the advantages of extraction and
recognition.
7
1.7
Report Organization
This report will mainly be divided into five chapters .The first chapter is an
introduction and brief overview of the project including the problem background,
problem statement, objectives, scope and significance of the study .Chapter II reviews
the literature background of previous studies on extraction text and recognition character
performance analysis including the techniques of the analysis and the result of analysis.
Chapter III covers the framework and methodology of the project which focuses on
application-based analysis .Chapter IV presents the implementation. Chapter V contains
the result discussions. Chapter VI includes conclusion of this project.
CHAPTER II
LITERATURE REVIEW
2.1
Introduction
This chapter discusses issues related to the study, which describes the-state-ofart of segmentation categories and focuses on the recognition of text. They also describe
analytic segmentation, run-length coding algorithm and properties of text, and identify a
character based on the fuzzy logic which is used to determine the character.
9
2.2
Segmentation Categories
Categories of segmentation which are considered as portions of an image
included within the text structure i.e. an image segmentation is often an essential step in
the image analysis, object representation, visualization and many other image processing
tasks. A great variety of segmentation methods was proposed in the past decades, and
some categorization is necessary to present the method properly .The presented
categorization is therefore rather categorization regarding the emphasis of approaches
that strict division.
2.2.1 Threshold Based Segmentation
Histogram threshold is a slicing technique used to segment the image .It may be
applied directly to the image, providing it is combined with pre-and post-processing
techniques.
10
2.2.2 Clustering Techniques
Although clustering is sometimes used as synonym for segmentation techniques,
which denote mechanisms that are primarily used in exploratory data analysis of highdimensional measurement that is similar in some sense.
2.2.3 Matching
When we know what an object we wish to identify in an image approximately
looks like, we can use this knowledge to locate the object in the image. Also we can
discriminate this object by making matching between pixels themselves.
2.2.4 Edge-Based Segmentation
With this technique, detecting an edge in image is assumed to represent object
boundaries, and used to identify these objects. According to our knowledge of the
boundaries of the object, we can recognize the object. (Rabbani and Chellappan, 2007)
the edge detection is a fundamental tool used in most image processing applications to
obtain information from frames as a step of feature extraction and object segmentation.
11
2.2.5 Region Based Segmentation
While an edge based technique attempts to find the object boundaries, and then
locate the object itself, by filling them in, a region based technique takes the opposite
approach i.e. by starting in middle of an object and then “growing” outward until it
meets the object’s boundaries. In this effort, we will focus on edge based segmentation.
But, we may face multi-orientation and multi-scale in region text problems. Because of
this, it requires sophisticated approach to segment character properly.
2.3
Categories of Variance Text
Nowadays, it’s noticed that the commercial propaganda tools are rapidly
increasing and deploying by the use of posters on walls, signs on roads, and lightening
indicators mounted in public streets with different styles whose font-size, color,
orientation, lightness and alignment of text can be easily edited and modified to make
the show more attractive and tempting. So the following is desirable in detail regarding
variance text.
12
2.3.1 Light Variance
(Gatos, Pratikakis, Kepene and Perantonis, 2005a) proposed that the image
varies with transition of light condition on text overlaid on image. When a text has
different lightness, this will affect the extract process of text from image, because of
varying text lightness.
2.3.2 Scale Variance
(Xiaoqing and Jagath, 2006a) suggested that the image properties vary according
to the distance at which the camera is. When pictures are taken at different distance, this
affects the resolution of image and text.
2.3.3 Orientation Variance
(Xiaoqing and Jagath, 2006b) thought that the image varies with different angles
from camera .When the pictures are taken from different angles; this can display the text
overlaid on image with different size of text in image, because of the different location
and position of cameras.
13
2.4
Recognition Text
Generally, extraction of text can be drawn as a combination of separate modules
that process text in images from raw of data until extraction of the text from images.
(Tsung, Yung, Chih , 2006c) who clearly described extraction of text as seen in Figure
2.1. Most of the image text detection and extraction methods deal with the static image
text.
Figure 2.1 General model of extraction text
Firstly, raw data of image should follow the initial text detection step to achieve
a suitable form. Moreover, the suitable form identifies the text to ensure that detection of
text is found in images. Meanwhile, the stage of text localization which is important to
localize the text. Then, the text extraction is used to extract the text from image.
14
2.4.1 Text Detection
As known, text detection in terms of the entire process that performs prior of
localization and extraction step. In this model, detection has a purpose of converting raw
data to suitable form and calibrating text lineament.
Acquisition text via a scanner device would display an image containing texts as
seen in Figure 2.2. During text detection, a process is run to identify that if a text is
being detected or noise. (Qixiang, Wen, Weiqiang and Wei, 2003a) used an algorithm
based on Sober edge operation in four directions. By using this algorithm, the density
represents the precision of text localization. It gets gradient map in three components,
RGB .Meanwhile; morphology operation is “close” and “open” operation where the
edge-pixels become adjacent to each other. The “open” operation is used to disconnect
edge map where it is too narrow to contain text. By using the projection profile of image
block that compact representation of spatial pixel .Then , bounding edge-dense blocks in
edge map after profile projection and compare the performance
with Roshanak and
Shohreh, (2005c) who proposed a method based on finding text edge using information
content of sub image coefficients of discrete wavelet transformed input image ,the most
text used image are well characterized by their contained edge and “dense edge” are
distinct characteristics of text blocks which can be used in detecting possible text
region and using Sobel detection is effective in extracting strong edge of image .In the
past (Tsai, Chen, Fang , 2006c) generated the edge map for text detection , they mainly
depended on the edge information to detect text .Then, two edge maps are generated
for detection of scrolling text and edge map is generated by performing Sobel detection
to entire input image.
15
2.4.2 Text Area Identification
Ideally, text area identification whose task is to ensure the text detection.
(Qixiang, Wen, Weiqiang and Wei 2003a) invented three rules to confirm the candidate
text blocks .These rules are divided into, First, text block height and text block width are
defined as a text line which contains at least two words .Second, is used to limit the size
of text T1 and T2 to be set as 8 pixels and 32 pixels respectively .Text blocks whose
height is smaller than 8 pixels or larger than 32 pixels can be found in zoom image.
Third, is to eliminate blocks that contain too few edge pixels, but, sometimes remain
noise similar to the text size, that is non-text. So use Wavelet feature and SVM classify
the candidate text. Although text has its own property, it may be quite weak and
irregular, only that they include some strokes i.e. horizontal, vertical, up-right-slanting
and up-left-slanting stroke. These strokes are regular to some extent when we consider
them as one block, but never regular as to each pixel. Furthermore, (Tsung, Yung, Chih,
2006c) who solved the false alarm due to complex background that could be filtered in
horizontal edge map are used to identify the text region and to get the refined text
region. So calculate the pixels with horizontal edge in detected text region to identify
whether it is true text region.
2.4.3 Text Region Localization
(Xiaoqing and Jagath, 2006a), proved that the text embedded in an image appears
in clusters, i.e., it is arranged compactly. Thus, characteristics of clustering can be used
to localize text regions. Since the intensity of the feature map represents the possibility
of text, a simple global thresholding can be employed to highlight those with high text
16
possibility regions resulting in a binary image. A morphological dilation operator can
easily connect the very close regions together, while leaving those whose position is far
away to each other isolated. And use a morphological dilation operator that obtained
binary image to get joint areas referred to as text blobs. Two constraints are used to filter
out those blobs which do not contain text, where the first constraint is used to filter out
all of the very small isolated blobs whereas the second constraint filters out those blobs
whose widths are much smaller than corresponding heights. The retaining blobs are
enclosed in boundary boxes. Four pairs of coordinates of the boundary boxes are
determined by the maximum and minimum coordinates of the top, bottom, left and right
points of the corresponding blobs. In order to avoid missing those character pixels which
lie near or outside of the initial boundary, width and height of the boundary box are
padded by small amounts.
2.4.4 Text Extraction and Binary Image
The final step of text overlaid on image recognition is text extraction. The goal
of text extraction is to convert the grayscale image in an accepted text region into the
OCR-ready binary image, where all pixels of the characters are in black and others are in
white. To extract the static text over complex background, bitmap integration over time
is often used to remove the moving background of a text region. However, ( Tsung,
Yung, Chih, 2006c) saw this method seed-fill algorithm is proper to eliminate the false
text character region to enhance the recognition rate.
Considering the Otsu method is often used to calculate the threshold to segment
text from the background. For the vertical motion text, a vertical adaptive thresholding is
17
applied, meanwhile horizontal adaptive thresholding for the horizontal motion text. In
corresponding, (Jie, Jigui and Shengsheng, 2006d) who used Strong & boosted edges
after dilation are combined then followed by AND operation which forms the text
region. results of dilation and logical operation and mapping to the original image to get
text regions remaining non text regions are identified and eliminated by removing from a
binary image .Meanwhile, (Xiaoqing and Jagath, 2006a) their method that cannot handle
character embedded in shade, the text used or complex background .The stage to extract
accurate binary character from localized text regions. So that can the existing OCR
directly for recognition, they use uniform white character pure black ground.
2.5
Analytic Segmentation
2.5.1 Pattern Recognition
(Jie , Jigui and Shengsheng, 2006d), explained that the pattern recognition (PR)
is a subject researching an object description and classification method. It is also a
collection of mathematical, statistical, heuristic and inductive techniques of fundamental
role in executing the tasks like human beings on computers. Pattern recognition includes
a lot of methods which implies the development of numerous applications in different
fields. The practicability of these methods is intelligent emulation.
18
2.5.2 Statistical Pattern Recognition
Statistical decision and estimation theories have been commonly used in PR for
a long time. It is a classical method of PR which was found out during a long developing
process. It based on the feature vector distribution which gets from probability and
statistical model. The statistical model is defined by a family of class-conditional
probability density functions (Probability of feature vector x given class) in detail. We
put the features in some optional order, and then we can regard the set of features as a
feature vector. Also statistical pattern recognition deals with features only without
considering the relations between features.
2.5.3 Data Clustering
Its aim is to find out a few similar clusters in a mass of data which do not need
any information of the known clusters. It is an unsupervised method. In general, the
method of data clustering can be partitioned into two classes, one is hierarchical
clustering, and the other is partition clustering.
19
2.5.4 Fuzzy Sets
The thinking process of human being is often fuzzy and uncertain, and the
languages of human are often fuzzy also. And in reality, we can’t always give complete
answers or classification, so theory of fuzzy sets come into being. Fuzzy sets can
describe the extension and intension of a concept effectively.
2.5.5 Neural Networks
Neural networks are developing very fast, since the first neural network model
MLP was proposed since 1943, especially the Hopfield neural networks and famous BP
arithmetic came into being after. It is a data clustering method based on distance
measurement; also this method is model-irrespective. The neural approach applies
biological concepts to machines to recognize patterns. The outcome of this effort is the
invention of artificial neural networks which is set up by the elicitation of the physiology
knowledge of human brain. Neural networks are composed of a series of different,
associate unit. In addition, a genetic algorithm applied in neural networks is a statistical
optimized algorithm proposed by (Holland, 1975).
20
2.5.6 Structural Pattern Recognition
Structural pattern recognition is not based on a firm theory which relies on
segmentation and features extraction. (Pavilidis 1977), said that structural pattern
recognition lays emphases on the description of the structure, namely explain how some
simple sub-patterns compose one pattern. There are two main methods in structural
pattern recognition, syntax analysis and structure matching. The basis of syntax analysis
is the theory of formal language, while the basis of structure matching is some special
technique of mathematics based on sub-patterns. When considering the relation among
each part of the object, the structural pattern recognition is best. It deals with symbol
information; Structural pattern recognition always associates with statistic classification
or neural networks through which we can deal with more complex problems of pattern
recognition, such as recognition of multidimensional objects.
2.5.7 Syntactic Pattern Recognition
This method basically emphasizes on the rules of composition. The attractive
aspect of syntactic methods is its suitability for dealing with recursion. After
customizing a series of rules which can describe the relation among the parts of the
object, syntactic pattern recognition which is a special kind of structural pattern
recognition that can be used.
21
2.5.8 Approximate Reasoning Approach of Pattern Recognition
This method which uses two concepts: fuzzy applications and compositional rule
of inference can cope with the problem of rule based pattern recognition.
2.5.9 Applications of Support Vector Machine (SVM).
SVM is a relatively new thing with simple structure; it has been researched
widely since it was proposed by (Hyeran and Seong, 2002a) the SVM based on
statistical theory ,and the method of SVM is an effective tool that can solve the problems
of pattern recognition and function estimation, especially classification and regression
problem. It has been applied to a wide range of pattern recognition such as face
detection, verification and recognition, object detection and recognition and speech
recognition etc.
2.6
Pattern Recognition System
A pattern recognition system can be described as a process that copes with real
or noisy data. Even the decision made by the system was right or not, it mainly depends
on the decision made by the human expert.
22
2.6.1 The Structure of Pattern Recognition System
A pattern recognition system is based on PR method which mainly includes
three mutual-associate and differentiated processes .The aim of pattern classification is
to utilize the information acquired from pattern analysis to discipline the computer in
order to accomplish the classification. A very common description of the pattern
recognition
system
that
includes
five
steps
to
accomplish.
The
step
of
classification/regression / description shown in Figure 2.2 is the kernel of the system.
Classification is a PR problem of assigning an object to a class, the output of the PR
system is an integer label, such as classifying a product as “1” or “0” in a quality control
test. Regression is a generalization of a classification task, and the output of the PR
system is a real-valued number, such as predicting the share value of a firm based on
past performance and stock market indicators. Description is the problem of representing
an object in terms of a series of primitives, and the PR system produces a structural or
linguistic description. A general composition of a PR system is given below.
Figure 2.2 The composition of a PR system
23
2.6.2 Applications of Pattern Recognition
It is true that application was one of the most important elements for PR theory.
Pattern Recognition has been developed for many years, and the technology of PR has
been applied in many fields, one of those fields is “character recognition”.
2.6.3 Character Recognition
Character extraction from a scene image is based on identification of local target.
Character recognition is commonly performed after the image has been binarized by
using a single threshold value. In photographic image, characters are most often located
on signboard or similar region of coherent background .Therefore, identifying the
signboard region as local target area will then be similar to the document image.
Binarization processing will produce useful image for character recognition, the
character and background regions in the local target area are separated using threshold
value calculated for the local target area.
24
2.7
Run-Length Coding Algorithm
Usually, after converting the image into binary image, we now deal with zeros
and ones to represent the foreground and background. (Chengjie, Jie and Trac, 2002b);
Kofi, (Andrew, Patrick and Jonathan, 2007) assumed that the run length coding is the
standard coding technique block transforming based on image / video compression. A
block of quantized transform coefficient is first represented as sequence of RUN
(number of consecutive zeros) level (the value of the following nonzero coefficient) pair
which are then entropy coded .The RUN means taking the pixels in the same row in
blocks .That means, every block of RUN can be represented in horizontal projection,
and every horizontal projection can be calculated from run-length code. The problem
then is to group together all points of image that labeled as object points into an object
image. We will assume that such points are spatially close. This notion of spatial
proximity requires more precise definition, so that an algorithm may be devised to group
spatially close points into component, as shown in Figure 2.3 below
Figure 2.3 Horizontal projection calculated from run-length code
25
2.7.1 Neighbors
A pixel in digital image is spatially close to several other pixels .In digital image
represented on a square grid, a pixel has common boundary with four pixels and share a
corner with four additional pixels, we say that two pixels are 4-neighbors if they share a
common boundary .Similarly , two pixels are 8-neighbors if they share at least one
corner .For example , the pixel at location [i,j] has 4-neighbors[i+1,j] ,[i-1,j] ,[i,j+1] and
[i,j-1]. The 8-neighbors of pixel include the 4-neighbors plus [i+1,j+1], [i+1,j-1] , [i1,j+1] and [i-1,j-1]. A pixel is said to be 4-connected to its 4-neigbhors and 8-neighbors
as shown in Figure 2.4 below.
4-Neighbors [i+1,j] ,[i-1,j] ,[i,j+1] and [i,j-1]
4-Neighbors plus [i+1,j+1], [i+1,j-1] , [i-1,j+1] and [i-1,j-1]
Figure 2.4: 4- and 8-neigbhorhoods for rectangular image location pixel [i,j] is
located in center of each figure
26
2.7.2 Path
A path the pixel at [i0,j0] to the pixel at [in,jn] is sequence of pixel indices
[i0,j0],[i1,j1],…………………[in,jn] such that the pixel at [ik,jk] is a neighbor of the
pixel
at [ik+1,jk+1] for all k with o<=k<=n-1. If the neighbor relation uses 4-
connection, then the pixels is a 4-path; for 8-connection, the path is an 8-path, simple
examples of these are shown in Figure 2.5 below
Figure 2.5 4-path and 8-path
2.7.3 Foreground
The set of all 1 pixels in an image is called the foreground and is denoted by S.
The foreground represents the object which exists on background .It perhaps can be a
text or other object exists in the image. On the other hand, foreground is more interesting
than the background.
27
2.7.4 Connectivity
A pixel p
S is said to be connected to q
S if there is a path from p to q
consisting entirely of pixels of S. Note that connectivity is an equivalence relation .For
any three pixels p, q and r in S , we have the following properties.
1. Pixel p is connected to p (reflexivity)
2. If p is connected to q , then q is connected to p( commutatively)
3. If p is connected to q is connected to r, then p is connected to r (transitivity).
2.7.5 Connected Components
A set of pixels in which each is connected to all other pixels is called a
connected component. the connected component is set of pixels collect together in every
row that call it Run ,but it necessary each Run whose pixels joint together after that,
every Run connect with other Run in other rows, eventually, the connected component is
huge from pixels connected together and also call it by Run-Length Coding
28
2.7.6 Background
The set of all connected components of S’ (the complement of S’) that have
points on the border of an image is called the background. All other components of S’
are called holes .Let us consider the simple picture shown in Figure 2.6 below
Figure 2.6 Border of an image
How many objects and how many holes are in this figure? If we consider four
connections for both foreground and background, there will be four objects that are 1
pixel in size and there is one hole. If we use eight connections, then there will be one
object and no hole. In both cases, we have an ambiguous situation. A similar ambiguous
situation arises in a simple case is shown in Figure 2.7 below
Figure 2.7 Ambiguous border
29
If the 1s are connected, then the 0s should not be .To avoid this hard situation,
then four connections should be used for S’
2.7.7 Boundary
The boundary of S is a set of pixels of S’ that have four neighbors in S’ .The
boundary is usually denoted by S’. The boundary is edges of the object which separate
the object from other objects or background, and also the boundary whose intensity
should be higher than sides of the boundary. When we want detect a text from an image,
we have to know the boundary of the text.
2.7.8 Interior
The interior is the set of pixels of S that are not in its boundary. The interior of S
is (S-S’).The interior is insider part of the object that represents the object, and it is
different from the boundary in terms of intensity, because the boundary represents a line
that separates between the object and foreground.
30
2.7.9 Surrounds
Region T surrounds region S (or S is inside T), if any four-paths from any point
of S to the border of the picture must intersect T. Figure 2.8 below shows examples of a
simple binary image: its boundary, interior and surrounds.
Figure 2.8 A binary image with its boundary, interior and surrounds
2.7.10 Component Labeling
One of the most common operations in machine vision is finding the connected
components in an image .The points in a connected component form are candidate
regions of representing an object. As mentioned earlier, in computer vision most objects
have surfaces. Points belong to a surface project to spatially close points. The notion of
31
“spatially close” is captured by connected components in digital image. It should be
mentioned here that connected component algorithms usually form a bottleneck in a
binary vision system .The algorithm is sequential in nature, because the operation of
finding connected components is a global operation. If there is only one object in an
image, then there may not be a need for finding the connected components; however, if
there are many objects in an image and the objects properties and locations need to be
found, then the connected components must be determined. A component labeling
algorithm finds all connected components in an image and assigned a unique label to all
points in the same components. Figure 2.9 shows an image and its labeled connected
components.
Figure 2.9 An image (a) and its connected component image (b)
32
2.8
Properties Text
2.8.1 Removing the Borders
The borders should be removed; this will reduce the image size .only the
rectangular part of the image which contains a text or texts in the image that will remain.
That means, there are many connected components which assigned by pixel “1” while
the background assigned by “0” .To remove the borders, there four stages that are
involved: Firstly, the start from up-bottom that from first row, if the row does not
contain the pixel “1”, it will be removed and this will continue until there exist pixel “1”
in a row .In this case, this row containing the pixel “1” is the border of image in above
part. Furthermore, the same case is applicable to bottom-up until there exists a row
containing the pixel “1”- this is in the horizontal stage. Afterwards, the same operation is
made in the vertical, but the differences in column instead from row, which start from
first column then continues until there exists the pixel “1” in any column may be first
column, that means from left-to-right. Moreover, the reverse makes it from right-to-left.
2.8.2 Dividing the Text into Rows.
After removing the borders, the area will now be divided into rows. ( Mohanad
and Mohammad, 2006e) they made every text to have two lines i.e. every text with start
line in the first row and end line in the end row in the text. Furthermore, the start line
and end line are different from text to text because of multi-scale variances in texts. So
every text depends on the size of the text which has properties assigned for it. That
33
means, length and width of connected components exist side by side i.e. horizontally in
many rows .We can say this represents the text whose properties and size are different
from the properties and size of another text, or multi-scale. In this base, we can know
start line and end line of each text .After that, can deal with it as shown in Figure 2.10
below
Figure 2.10 Divide the text into rows
2.8.3 Dividing the Rows “Lines” into the Words
The single line is then divided into words .Because that, (Mohanad and
Mohammad, 2006e) they employed the
empty area before and after the text are
removed, but there main question must answer on it “here is size or scale differences
from text to another also space between the words depend on the scale .So, in this
problem can answer on this question, we can depend on length and width of connected
components if is character and on this basic we know the space between words and
between characters in the word based on ratio the length of the width. The word may be
a single character or more than that, the size may be different in the same word, and it is
not necessary that the word has a meaning. As shown in Figure 2.11.
34
Figure 2.11 Divide the rows into the words
2.8.4 Dividing the Word into Characters
Each word is then divided into characters and saved in the array, Again on this
basic , we know the character in each word and each word in text .Afterward , each
connected component known then used the fuzzy logic to recognize the character, and
any connected component not recognized is noise or distortion included in the remaining
image. As shown in Figure 2.12 below
35
Figure 2.12 Divide the word into characters
2.9
Identifying Character
After we have connected a component with a rectangular segment, this
rectangular has four corners and nine identifying pixels to distinguish the character as
show in Figure 2.13 below.
Figure 2.13 Identify character
36
Here, (Mohanad and Mohammad, 2006e) proposed that any character can be
identified based on four corners and center point that intersect between y-axis and xaxis. Then based on this criterion, we can identify the character easily. Also any pixel in
any corner if it’s a “0” this means a background, or if it’s a “1” this means a pixel of
character or connected component .Every character whose properties are different from
others, for example, a character “a” has upper left corner off, upper right corner off,
lower left corner off, lower right corner on and pixel of center is off .This properties
different from other characters. So, we can recognize a character in the status in which
the character doesn’t has noise or distortion.
2.10 Fuzzy Logic
An objective of Fuzzy Logic has been to make computers think like people.
Fuzzy Logic can deal with the vagueness intrinsic to human thinking and natural
language and recognizes that its nature is different from randomness. Using Fuzzy Logic
algorithms could enable machines to understand and respond to vague human concepts
such as hot, cold, large, small, etc. It also could provide a relatively simple approach to
reach definite conclusions from imprecise information.
37
2.10.1 What Fuzzy Logic?
The term Fuzzy Logic has been used in two different senses. It is thus important
to clarify the distinctions between these two different usages of the term. In a narrow
sense, Fuzzy Logic refers to a logical system that generalizes classical two valued logic
for reasoning under uncertainty. In a broad sense, Fuzzy Logic refers to all of the
theories and technologies that employ fuzzy sets, which are classes with unsharp
boundaries.
For instance, the concept of “warm room temperature” may be expressed as an
interval (e.g. [70 F, 78 F]) in classical set theory. However, the concept does not have a
well-defined natural boundary. A representation of the concept closer to human
interpretation is to allow a gradual transition from "not warm" to" warm" In order to
achieve this, the notion of membership in a set needs to become a matter of degree. This
is the essence of fuzzy sets. An example of classical set and a fuzzy set shown in Figure
2.14, where the vertical axis represent the degree of membership a set.
Figure 2.14 A classical set and fuzzy set representation of "warm room temperature"
38
2.10.2 What is the Fuzzy Logic Toolbox?
The Fuzzy Logic Toolbox is a collection of functions built on the MATLAB®
numeric computing environment. It provides tools for you to create and edit fuzzy
inference systems within the framework of MATLAB.
2.10.3 Fuzzy Sets
Fuzzy logic starts with the concept of a fuzzy set. A fuzzy set is a set without a
crisp, clearly defined boundary. It can contain elements with only a partial
degree of
membership. To understand what a fuzzy set is, first consider what is meant by what we
might call a classical set. A classical set is a container that wholly includes or wholly
excludes any given element .For example , when we have values of pixels in above
where represented by “0” or “1” this will represent by value off or on respectively .This
regarding the value of the pixel either be regarding the location of pixel in any corner or
in center point “how know the location of the pixel upper-left or upper-right or middleleft or middle-right or low-left or low-right this depended on fuzzy set of this ,that
determine the location of pixel by using the fuzzy set. As show in Figure 2.15 below
and will more explain in next in “membership function”
39
Figure 2.15 (a) Input of a pixel
Figure 2.15 (b) Input of location for a pixel
2.10.4 Membership Function
A membership function (MF) is a curve that defines how each point in the input
space is mapped to a membership value (or degree of membership) between 0 and 1.
The input space is sometimes referred to as the universe of discourse, a fancy name for a
40
simple concept. As show in above Figure 2.15 this explain the membership function
between cures .This means that it’s based on whether the pixel is “0-0.5” or “0.6-1” with
“off” or “on” respectively, also with location of pixel to low , median or high with
“0.1-0.3” or “0.4-0.6” or “0.7-0.9” respectively . This gives us the benefit of fuzzy set
with use membership function to generate the output based on requirements of input of
pixel value and pixel location as show in Figure 2.16 below of output.
Figure 2.16 Output variable “letter”
2.10.5 If-Then Rules
Fuzzy sets and fuzzy operators are the subjects and verbs of fuzzy logic. These
if-then rule statements are used to formulate the conditional statements that comprise
fuzzy logic. A fuzzy rule is the basic unit of capturing knowledge in many fuzzy
systems. A fuzzy rule has two components, an IF part (Also referred to as the
antecedent) and THEN Part (Also referred to as the consequent).
41
IF < antecedent > THEN < consequent >
Where: The antecedent describes a condition and the consequent describes a conclusion.
2.10.6 Fuzzy Inference Systems
Fuzzy inference is the process of formulating the mapping from a given input to
an output using fuzzy logic. The mapping then provides a basis from which decisions
can be made, or patterns discerned. The process of fuzzy inference involves all of the
pieces that are described in the previous sections: membership functions, fuzzy logic
operators, and if-then rules.
2.10.7 Rule Review.
The Rule Viewer displays a roadmap of the whole fuzzy inference process. It’s
based on the fuzzy inference diagram afore-described.
42
2.10.8 Surface Review
The Surface Viewer has a special capability that is very helpful in cases of two
or more inputs and one output. In Figure 2.17 below shows a building system with fuzzy
logic.
Figure 2.17 Building the system with fuzzy logic
43
2.11 Summary
This chapter has given a detailed overview of segmentation categories and text
recognition which play an important role in terms of character recognition. Also the text
detection, text area identification, text region localization, text extraction and binary
image were drawn and explained separately. Moreover, analytic segmentation, runlength coding algorithm, properties of text and character identifying. Finally fuzzy logic
was used to determine the text.
CHAPTER III
METHODOLOGY
3.1
Introduction
This chapter describes the project methodology and the proposed technique.
Figure 3.1 shows the project framework which is very important. It’s employed to
provide a systematic proposed method of procedures and principles that aim to achieve
the objectives of this study. The purpose of having methodology is to simplify the
analysis process and also to explain the requirements and formulations of the project.
This is important to ensure that the phases of the project can be done smoothly and
timely.
45
Figure 3.1 Proposed method
46
3.2
Problem Statement and Literature Review
As described in chapter I, this project would propose a segmentation approach to
solve invariant complex image problems. In this regards, a study was carried out on
related and latest literatures about text segmentation fields. Whereas, this study is to
illustrate edge segmentation styles, text detection, text identification, text localization,
text extraction, analytic segmentation and current segmentation techniques. This
investigation is essential in designing the novel method and can ensure a better
performance.
3.3
System Development
This project would develop separately two parts of proposed approach. Firstly,
heuristic segmentation model based on edge detection by using kernels (eight angles) of
compass operator, and after that collect every two edges to kernels which are opposite to
each other. So we can get four directions of edges detection. By collecting two kernels
(opposite to each other), this can give us complemented edge detection throughout every
direction. Afterward, it collects all four edges detection, so gives us total of all edges
direction. Then, we use feature extraction to extract text from an image with help runlength algorithm to determine connection component. Secondly, fuzzy logic system is
used of identify the character, so based on rectangle on connection component and then,
we use nine pixel of identify the connection component.
47
3.4
Performance Evaluation
At this project phase, performance of proposed method is evaluated by using
edge strong .The performance edges explain percentage in term of correct segmentation
and miss segmentation .A missed error occurs when edge not dense or when it is very
similar of background between character and background. Moreover, the performance
for the proposed method does perform well in multi-scale and multi-orientation, system
development would revise to enhanced proposed segmentation method.
3.5
General Steps of Proposed Techniques
This project proposes segmentation approach as analytical strategies via kernels
of compass operator of edges detection .In this regards, to achieve a collect in of kernels.
Next, and use feature extraction is used extract text .Then, fuzzy logic to indentify
character. Finally, character extraction algorithm used to extract character properly.
Figure 3.2 illustrates block diagram of general steps of proposed approach.
48
Figure 3.2 Block diagram of general steps of proposed approach
3.6
Proposed Algorithm for Edge Based Text Region Extraction
The basic steps of the edge-based text extraction algorithm are given below. The
details are explained in the following.
Step1:- Input image or read image with original colors.
Step2:- Create a Gaussian pyramid by convolving the input image with a Gaussian
kernel and successively down-sample each direction by half.
Step3:- Create directional kernels to detect edge at 0, 45, 90,135,180,255,90and 315
orientations.
Step4:- Convolve each image in the Gaussian pyramid with each orientation filter.
Step5:- Collect kernels to detect edge at 0+180, 45+255, 90+90 and 135+315
orientations.
49
Step6:- Dilate the resultant image using sufficiently structuring element (3 3) to cluster
Candidate text regions together.
Step7:- Create final output image with in white pixels against a plain black background.
3.7
Detection
This section corresponds to steps 1 to 4 of 3.5.1. Given an input image, the
region with a possibility of text in the image is detected A Gaussian kernel of size 3 3
and down-sampling the image in each direction by half. Down sampling refers to the
process whereby an image is resized to lower resolution from its original resolution. A
Gaussian filter of size 3 3 will be used as figure 3.3 .Each level in the pyramid
corresponds to the input at a different resolution. A sample Gaussian pyramid with 8
levels of resolution is shown in Figure 3.4. These images are next convolved with
directional filters at different orientation kernel for edges detection in the horizontal
angle (0+180), vertical angle (90+90), and diagonal angle (45+255) and diagonal angle
(135+315) directions. The kernels used are shown in the figure 3.5 and whose
implements in Figures 3.6
Figure 3.3 Gaussian filter
50
Figures 3.4 Samples Gaussian pyramid with 8 levels
0 kernel
0 directional
180 kernel
45 kernel
90 kernel
135 kernel
45 directional 90 directional 135 directional
255 kernel
90 kernel
315 kernel
180 directional 255 directional 90 directional 315 directional
Figure 3.5 Extraction operation
51
From kernels existed in above we do put them in variables such as kernel0
kernel45, kernel90, kernel135, kernel180, kernel255, kernel90, and kernel315 until now
we inserted kernels in whose variables Then we put eight kernels in array we call it
kernel{} that has been inserted eight kernels. Next, we do Gaussian as shown in
equation below
GK=Fspecial(‘gaussain’)
(3.1)
After, we created Gaussain Pyramid, we do that convolve image with Gaussain
filter with eight levels until now original size, equations show below.
i=7
pyamid{i}=image1
(3.2)
image2=imfilter(image1,GK,’Conv’)
(3.3)
i=0
i=7
i=0
Next, we do, the down-sample by 0.5 from i=0 until i=7 .As shown in question
below
i=7
pyamid{i}=imresize(image2,0.5)
i=0
(3.4)
52
Next, we do convolving images at each level in pyramid with eight edges
detection as shown in equation below.
i=7
j=7
i=0
j=0
Conv{i,j}=imfilter(pyramid{i},kernel{j},’Conv’)
(3.5)
Then, we do resize the images to original image size
i=7
j=7
i=0
j=0
Conv2{i,j}= imresize(Conv{i,j},[size(image1,1) size(image1,2)])
(3.6)
We have eight levels of image that each level has been edges detected that are {
0, 45, 90, 180, 255, 90, 135 and 315 } at each level from eight levels. After that, we
return each level of original size of image .Next; we do collect each same edge from
eight levels which backed to original size. Then it give us edges detection in { 0, 45, 90,
180, 255, 90, 135 and 315 } we can see the result to these edges in (chapter IV, in Figure
4.2).The equation below shows the operation of collect to it.
i=7
total{i}=im2bw(Conv2{1,i}+ Conv2{2,i}+ Conv2{3,i}+ Conv2{4,i}+
i=0
Conv2{5,i}+ Conv2{6,i}+ Conv2{7,i}+ Conv2{8,i} )
(3.7)
53
Given k operators, gk(x,y) is the image obtained by convolving f(x,y) with the
kth operator with original size .We can put also seven levels remaining ,but it will take
big area so, we enough with original image of explain as shown in Figure 3.6 . The
gradient is defined as
g(x,y) = max gk(x,y)
(3.8)
0
45
180
255
90
90
135
315
Figure 3.6 Edges detection
After convolving the image with the orientation kernels, we will collect the edges
detection with them, because give us the final edges detection with them as shown in
equation below.
54
Edge first= Edge”0”+ Edge “180”
(3.9)
Edge second= Edge”45”+ Edge “255”
(3.10)
Edge third= Edge”90”+ Edge “90”
(3.11)
Edge fourth = Edge”135”+ Edge “315”
(3.12)
Edge total= Edge first+ Edge second+ Edge third+ Edge fourth
(3.13)
Feature map will be creating. A weighting factor is associated with each pixel to
classify it as candidate or non-candidate for text region. A pixel is a candidate for text if
it is highlighted in all of the edge maps created by the directional filters. Thus, the
feature map is a combination of all edge maps at different scales and orientations with
the highest weighted pixels present in the resultant map
55
3.8
Feature map and candidate Text Region detection
3.8.1 Directional Filtering
In our proposed method, we use magnitude of the second derivation of intensity
as measurement of edge strength as this allows better detection of intensity peaks that
normally characterize text in the image. The edge density is calculated based on average
edge strength within window. Considering effectiveness and efficiency, eight orientation
0,45,90,135,180,255,90 and 315are used to evaluate the variance of orientation, where 0
and 180 denote vertical direction, 90 and 90 denote horizontal direction and 45,135,255
and 315 denote four diagonal directions, respectively and the convolution operation with
compass operator.
3.8.2 Edge Selection
Vertical edges form the most important strokes of characters and their lengths
also reflect the heights of corresponding characters. By extracting and grouping these
strokes, we can locate text with different heights (sizes).However, in real scene under an
indoor environment many other objects, such as windows, doors, walls, etc .., also
produce strong vertical edges. Thus, not all the vertical edges can be used to locate text.
However, vertical edges produced by such non-character objects normally have very
large lengths. Therefore, by grouping vertical edges into long and short edges, we can
56
eliminate those vertical edges with extremely long length and retain short edges for
further processing.
After, thresholding these long vertical edges may become broken short edges
which may cause false alarms (positives). So the proposed method uses a two stage edge
generation method. The first stage is used to get strong vertical edges by collect edge
“0” and edge “180” who described in equations below.
strong
Edge v=│Ev│z
strong
strong
(3.14)
strong
Edge v= Edge “0” bw +Edge “180” bw
(3.15)
Where Ev is “0”+”180” intensity edges image which is the 2D convolution result
of the original image with the “0” and “180” kernels , │.│z is a thresholding operator to
get a binary result of vertical edges ,it is not very sensitive to the threshold value.
The second stage is used to obtain weak vertical edges described in below.
Strong
dilated=Dilation( Edge v ) 3
3
(3.16)
57
closed=closing(dilated)m
3
(3.17)
weak
Edge v= │ Ev (closed-dilated)│ z
(3.18)
Where the morphological dilation with rectangular structuring element of size 3 3
is used to eliminate the effects of slightly slanted edges and a vertical linear structuring
element m m is then employed in closing operator to force the strong vertical edges
closed. The resultant vertical edges are a combination of strong and weak edges as
described in equation below.
strong
weak
Edge v= Edge “v” bw +Edge “v” bw
(3.19)
The results of two stages edges generation method the resultant vertical edge
images are already done after use the question in the above. Morphological thinning
operator followed by connected component labeling and analysis algorithm are then
applied on the resultant vertical edges as described equations in the below
Thinned=thinning (Edge v)
(3.20)
Labeled=BWlabell (Thinned,8)
(3.21)
58
Where the morphological thinning operator makes the widths of the resultant
vertical edges on one pixel thick .Since high value in the
length labeled image
represents a long edge, a simple thresholding described equation the below that used to
separate short edges.
lengthlabeled
Short v bw=│Ev
│z
(3.22)
3.8.3 Feature Map Generation
As known, regions with text in them will have significantly higher values of
average edge density, strength and variance of orientation than those of non-text regions.
We exploit these three characteristics to refine the candidate regions by generation
feature which suppresses the false regions and enhances true candidate text regions. This
procedure is described in below.
Candidate=Dilation (Short v bw) m
(3.23)
m
Where , the morphological dilation with a
m m
structuring element employed in
the selected short vertical edge image and used to get potential candidate text regions
59
3.8.4 Localization
This part corresponds to step 6 of 3.6 The process of localization involves
further enhancing the text regions by eliminating non-text regions. One of the properties
of text is that usually all characters appear close to each other in the image , thus
forming a cluster by using a morphological dilation operation these possible text pixels
can be clustered together, eliminating pixels that are far from the candidate text regions.
Dilation is an operation which expands or enhances the region of interest, using a
structural element of the required shape and/or size. The process of dilation is carried out
using structuring element in order to enhance the regions which lie close to each other.
In this algorithm, a structuring element of size [3 3] has been has been used.
The resultant image after dilation may consist of some non-text regions or noise
which needs to be eliminated. An area flitting is carried out to eliminate noise blobs
present in the image.
3.8.5 Character Extraction
In corresponds to step 7 of 3.5.1 .The common OCR system available require
the input image to be such that the characters can be easily parsed and recognized .The
text and background should be monochrome and background-to-text contrast should be
high .Thus this process generates an output image with white text against a black
background.
60
3.9
Connection Component
The labeling of connected components in a binary image is a fundamental
operation in pattern recognition. This algorithm transforms a binary image into a
symbolic one with each connected component having a unique numeric label. The image
can be represented in a number of ways using array, run-length, quadtree, octrees and
bintress usually, after the conversion of image to binary form, we now deal with zero
and one to represent the foreground and background that use Run-Length Coding
Algorithm. The RUN mean “take the pixels in same row in to blocks” .That mean,
“every block of RUN represented in horizontal projection, and every horizontal
projection calculated from run-length code”. The labeling algorithm represent in
equivalence table, Resolving the equivalence table has been the focus of most labeling
algorithms, with little effort to implement and also minimizes use of memory. Process
uses a run-length encoding representation. Conversion of the original binary image to
run-length encoded format is easily parallelized by processing multiple rows in parallel.
The run-length encoded format is much more compact than the binary image (individual
runs have a single label), and so the sequential label propagation stage is much faster
than the conventional algorithm. Details of the algorithm are given below. The stages
involved in our implementation are as follows:
1. Pixels are converted to runs in parallel by rows,
2. Initial labeling and propagation of labels
3. Equivalence table resolution
4.
Translating run labels to connected component
The design is parallelized as much as possible. Although stages 2 and 3 are
sequential, they operate on runs, which are far less numerous than pixels. Similar to
stage 1, stage 4 can be executed in parallel by row. A run has the properties {ID,EQ, s, e,
61
r}, where ID is the identity number of the run, EQ is the equivalence value, s the x offset
of the start pixel, e the x offset of the end pixel, and the row.
The first stage involves row-wise parallel conversion from pixels to runs.
Depending on the location and access mode of the memory holding the binary image,
the entire image may be partitioned into n parts to achieve n run-length encoding in
parallel. The use of runs rather than pixels reduces the size of the equivalence table. The
following sequential local operations are performed in parallel on each partition, for an
image size M × N to assign pixels to runs:
Algorithm 3.1: Pixeltoruns (T)
T: T(x, y) = I(x, y)
i←0
If T(x, y) = 1 and Block = 0
si ← x
Then
Block ← 1
If Block = 1 and (T(x, y) = 0 or x = M)
Then
ei ← (x − 1)
ri ← y
IDi ← EQi ← 0
i←i+1
isBlock ← 0
Where Block is 1 when a new run is scanned for partition T and M is the width
of the image. A run is complete when the end of a row is reached or when a background
pixel is reached. The maximum possible number of runs in an M × N image is 2MN.
62
The second stage involves initial labelling and propagation of labels. The IDs
and equivalences (EQs) of all runs are initialized to zero. This is followed by a raster
scan of the runs; assigning provisional labels which propagate to any adjacent runs on
the row below. For any unassigned run (IDi = 0) a unique value is assigned to both its ID
and EQ. For each run i with ID IDi, excluding runs on the last row of the image; runs
one row below runi are scanned for an overlap. An overlapping run in 4-adjacency (ie. si
≤ ej and ei ≥ sj ) or 8-adjacency (ie. si+1 ≤ ej and ei+1 ≥ sj ) is assigned the ID IDi, if
and only if IDj is unassigned. If there is a conflict (if an overlapping run has assigned
IDj ), the equivalence of run i, EQi is set to IDj . This is summarized in algorithm
Algorithm 3.2: Initlabelling (runs)
m←1
For i ← 1 to Total Runs
Do
If IDi = 0
Then
IDi ← EQi ← m
m←m+1
For each rj Є ri+1
Do
If IDj = 0 and ei ≥ sj and si ≤ ej
Then
IDj ← IDi
EQj ← IDi
If IDj 6= 0 and ei ≥ sj and si ≤ ej
Then
EQi ← IDj
Where Total Runs excludes runs on the last row of the image. Applying
PixelToRuns() to the object in Figure 3.7 (a ’U’ shaped object) will generate four runs
each with unassigned ID and EQ.
63
Figure 3.7 U shapes object with 4 runs after pixeltoruns
Table 3.1Result to object to rows
The third stage is resolution of conflicts, as shown in algorithm 3.3. In the
example above (Figure 3.7 and table 1) a conflict occurs at B3; the initially assigned EQ
= 1 in iteration 1 changes to EQ = 2 in iteration 3 due to the overlap with B1 and B4, see
table 1. This conflict is resolved in ResolveConflict () resulting in ID = 2 and EQ = 2 for
all the four runs. Even thoughResolveConflict () is highly sequential, it takes half the
total cycles as the two ‘if statements’ in the second loop are executed in simultaneously.
The final IDs (final labels) are written back to the image at the appropriate pixel
location, without scanning the entire image, as each run has associated s, e and r values.
64
Algorithm 3.3: Resolveconflict (runs)
For i ← 1 to Total Runs
Do
If IDi =! EQi
Then
TID ← IDi
TEQ ← EQi
For j ← 1 to Total Runs
Do
If IDj = TID
Then
IDj ← TEQ
If EQj = TID
Then EQj ← TEQ
Table 3.2 Results after image scan, where ST=start, EN=end and RW=row
B1
B2
B3
B4
ID
0
0
0
0
EQ
0
0
0
0
ST
4
1
4
1
EN
5
2
5
5
RW
1
2
2
3
65
Completely label. As shown in Figure 3.7 above, the runs are extracted in the
scan, while the 8-adjacency labeling is done. We will use 8-neighbors where they share
at least one corner, the position of it are [i+1,j] ,[i-1,j] ,[i,j+1] , [i,j-1], [i+1,j+1], [i+1,j-1]
, [i-1,j+1] and [i-1,j-1] as shown in Figure 3.8 below
Figure 3.8, 8-neigbhorhoods for rectangular image location pixel [i,j] is located
in center of each figure
3.10 Fuzzy logic
We will use fuzzy logic of identify the character or connection component in the
image, after determining connection component we will represent rectangle on it. Then
we must know which pixel in each corner and other pixels used in identify the
connection component as shown in Figures 3.9.
66
Figure 3.9 Identify the character
Afterward, we give value between 0 and 1 for each identified pixel .Next send to
fuzzy of make recognize identified pixel of connection component when fuzzy receives
this values of identified pixel and whose status “on” or “off”, after that fuzzy logic
determine either character or noise. Below Figure 3.10 designs input fuzzy of location
of pixel and status of pixel.
Figure 3.10 (a) Example of fuzzy input
67
Figure 3.10 (b) Example of fuzzy outputs
3.11 Summary
This chapter discusses the general framework of the proposal methodology. We
need to follow the plan that has been discussed in this chapter to reach the objective of
this project. It was shown with some details in each step in our methodology. First, the
problem statement has been outlined, and followed by literature review, system
development, performance evaluation, proposed technical, connection component and
fuzzy logic. Furthermore, each phase of the project procedure is discussed briefly.
Finally, the last phase is the conclusion and the report writing. Each stage of these stages
play an important role in accomplishing the project.
CHAPTER IV
IMPLEMENTATION
4.1
Introduction
In this chapter is about the results .Basically this chapter discusses the
findings of the project. The findings will be presented as extract the text and
recognize the character.
69
4.2
Input Image
Our input image is a colored image with resolution 255x256 which will
extract from it the text as shown in below Figure 4.1.
Figure 4.1 Original image
Our proposed method is based on the fact that edges are reliable feature of
extract the text of color/intensity, orientation and multi-scale. Edge strength, density
and orientation variance are three distinguishing characteristics of text embedded in
images, which can be used as main feature for detecting text. The proposed method
consists of three stages: candidate text region detection, text region localization and
character extraction. Based on the idea that edge information in an image, is found
by looking at the relationship between a pixel and its neighbours. i.e. edge is found
by discontinuity of grey level values. An ideal edge detector should produce an edge
indication localized to a single pixel located at the mid-point of the slope.
70
The first derivative at any point in an image is obtained by using the
magnitude of the gradient at that point. A change of the image function can be
described by a gradient that points in the direction of the largest growth of the image
function However we have kernels (filters) 3x3 that will pass on W(i,j) is the weight
for pixel (i,j) array of image, whose value is determined by the number of edge
orientations within window(filters) and figure 4.2 shows structure of kernel.
Figure 4.2 Structures 3x3(filter)
As known in below the structure or filter that use density is calculated based
on the average edge strength within a window. The edge strength based on threshold
either is edge strength (edge detected) or edge weak(no edge detected and whose
place is center pixel as shown in Figure 4.2 it is place of edge detected in filter which
exists on array to image as shown in our example below in Figure 4.3.
71
Figure 4.3 our example of convolution operation
Convolution is a simple mathematical operation which is fundamental to
many common image processing operators. Convolution provides a way of
`multiplying together' two arrays of numbers, generally of different sizes, but of the
same dimensionality, to produce a third array of numbers of the same dimensionality.
This can be used in image processing to implement operators whose output pixel
values are simple linear combinations of certain input pixel values.
In an image processing context, one of the input arrays is normally just a gray
level image. The second array is usually much smaller, and is also two-dimensional
72
(although it may be just a single pixel thick), and is known as the kernel. Figure 4.3
shows an example image and kernel.
The convolution is performed by sliding the kernel over the image, generally
starting at the top left corner, so as to move the kernel through all the positions where
the kernel fits entirely within the boundaries of the image. (Note that
implementations differ in what they do at the edges of images.) Each kernel position
corresponds to a single output pixel, the value of which is calculated by multiplying
together the kernel value and the underlying image pixel value for each of the cells in
the kernel, and then adding all these numbers together.
Edge-detection operator is a matrix area gradient operation that determines
the level of variance between different pixels. The edge-detection operator is
calculated by forming a matrix centered on a pixel chosen as the center of the matrix
area. If the value of this matrix area is above a given threshold, then the middle pixel
is classified as an edge. Based on this the operation we divide image into several
regions, regions with text in them normally have much higher average values of edge
density, strength and orientation variance than those of non-text regions. All the
gradient-based algorithms have kernel operators that calculate the strength of the
slope in directions which are orthogonal to each other, commonly vertical until
diagonal. the contributions of the different components of the slopes are combined to
give the total value of the edge strength.
So, we have been had types eight of kernel (filter) which use in detect edge in
direction eight as shown in Figure 4.4 ,we have eight kernels(filters) and each kernel
detect edges based on whose angle, so by use this kernels we can detect edges in
whole directions
73
0 kernel
180 kernel
45 kernel
90 kernel
255 kernel
90 kernel
135 kernel
315 kernel
Figure 4.4 kernels used
Various kernels can be used for this operation. The whole set of 8 kernels is
produced by taking one of the kernels and rotating its coefficients circularly. Each of
the resulting kernels is sensitive to an edge orientation ranging from 0° to 315° in
steps of 45°, where 0° corresponds to a vertical edge.
The maximum response for each pixel is the value of the corresponding pixel
in the output magnitude image. The values for the output orientation image lie
between 1 and 8, depending on which of the 8 kernels produced the maximum
response. This edge detection method is also called edge template matching, because
a set of edge templates is matched to the image, each representing an edge in a
certain orientation. The edge magnitude and orientation of a pixel is then determined
by the template that matches the local area of the pixel the best as shown in Figure
4.5 below.
74
Figure 4.5 Directions of edges-detection
Edge detector is an appropriate way to estimate the magnitude and
orientation of an edge. Although differential gradient edge detection needs a rather
time-consuming calculation to estimate the orientation from the magnitudes in the
whole-directions, the edge detection obtains the orientation directly from the kernel
with the maximum response. The set of kernels is limited to 8 possible orientations;
however experience shows that most direct orientation estimates are not much more
accurate. On the other hand, the set of kernels needs 8 convolutions for each pixel,
whereas the set of kernel in gradient method needs eight, start at “E” kernel being
sensitive to edges in the vertical direction until last
“SE” kernel the diagonal
direction. The result for the edge magnitude image is very similar with whole
methods, provided the same convolving kernel is used as shown in Figure 4.5.
A variety of Edge Detectors are available for detecting the edges in digital
images. However, each detector has its own advantages and disadvantages. The basic
idea behind edge detection is to find places in an image where the intensity changes
rapidly. Based on this idea, an edge detector may either be based on the technique of
locating the places where the first derivative of the intensity is greater in magnitude
than a specified threshold or it may be based on the criterion to find places where the
second derivative of the intensity has a zero crossing. Edge detection the image is
convolved with a set of (in general 8) convolution kernels, each of which is sensitive
75
to edges in a different orientation. For each pixel the local edge gradient magnitude is
estimated with the maximum response of all 8 kernels at this pixel location:
|G| = max (|Gi|: i=1 to n)
Where Gi is the response of the kernel i at the particular pixel position and n is
the number of convolution kernels. The local edge orientation is estimated with the
orientation of the kernel that yields the maximum response. Various kernels can be
used for this operation.
Now, we will explain how it make magnitude with kernel “0” or in direct
vertical at “E” and as shown below in Figure 4.6. And same procedure does with
eight kernels from 0 to 315
Kernel “0”
structure of kernels
76
sample of image
Figure 4.6 structure of convolution
The vertical edge component is calculated with kernel KE. |KE| gives an
indication of the intensity of the gradient in the current pixel. The direction of the
gradient is given by the mask giving maximal response. This is valid for all
following operators approximating the first derivative. The below we have Figure 4.7
explain operation of kernel 0.
77
Figure 4.7 operation of kernel 0
GE= (Z3*a3+Z6*a6+Z9*a9)-(Z1*a1+Z4*a4+Z7*a7)
Total edges=|GE|+|GNE|+|GN|+|GNW|+|GW|+|GSW|+|GS|+|GSE|
The gradient is estimated in eight (for a convolution mask) possible
directions. The convolution result of the greatest magnitude indicates the gradient
magnitude. Operators approximating first derivative of an image function are
sometimes Introduction and called compass operators because of the ability to
determine gradient directions. So proper threshold value have to be selected so that
we get only real edges and false edges are rejected. The selection of a threshold value
is an important design decision that depends on a number of factors, such as image
brightness, contrast, level of noise, and even edge direction. Typically, the threshold
is selected following an analysis of the gradient image histogram. So, Selection of
threshold is an important parameter to get better performance for considered noisy
images. The output of the thresholding stage is extremely sensitive and there are no
automatic procedures for satisfactorily determining thresholds that work for all
images.
78
Here, we have been detected the image with eight edges and convolve with
Gaussian pyramid as shown in Figures 4.8 below.
Kernel 0
block structure
result
Figure 4.8 (a) detect edges
Detect edge in Figure 4.8 (a) shows that by use kernel 0 where the use kernel
that has block structure divide into white and black, the part black who detect from
image and through black part is edge-detection, we notice that in black part divide
into regions white and black (where white part represent edge-detection and black
part represent strokes), so relied upon we can detect the edges.
79
kernel 45
block structure
result
Figure 4.8 (b) detect edges
Detect edge in Figure 4.8 (b) shows that by use kernel 45 where the use
kernel that has block structure divide into white and black, the part black who detect
from image and through black part is edge-detection, we notice that in black part
divide into regions white and black (where white part represent edge-detection and
black part represent strokes), so relied upon we can detect the edges
kernel 45
block structure
Figure 4.8 (c) detect edges
result
80
Detect edge in Figure 4.8 (c) shows that by use kernel 90 where the use
kernel that has block structure divide into white and black, the part black who detect
from image and through black part is edge-detection, we notice that in black part
divide into regions white and black (where white part represent edge-detection and
black part represent strokes), so relied upon we can detect the edges.
kernel 135
block structure
result
Figure 4.8 (d) detect edges
Detect edge in Figure 4.8 (d) shows that by use kernel 135 where the use
kernel that has block structure divide into white and black, the part black who detect
from image and through black part is edge-detection, we notice that in black part
divide into regions white and black (where white part represent edge-detection and
black part represent strokes), so relied upon we can detect the edges
81
kernel 180
block structure
result
Figure 4.8 (e) detect edges
Detect edge in Figure 4.8 (e) shows that by use kernel 180 where the use
kernel that has block structure divide into white and black, the part white who detect
from image and through white part is edge-detection, we notice that in white part
divide into regions white and black (where white part represent edge-detection and
black part represent strokes), so relied upon we can detect the edges.
kernel 255
block structure
Figure 4.8 (f) detect edges
result
82
Detect edge in Figure 4.8 (f) shows that by use kernel 255 where the use
kernel that has block structure divide into white and black, the part black who detect
from image and through black part is edge-detection, we notice that in black part
divide into regions white and black (where white part represent edge-detection and
black part represent strokes), so relied upon we can detect the edges.
kernel 90
block structure
result
Figure 4.8 (g) detect edges
Detect edge in Figure 4.8 (g) shows that by use kernel 90 where the use
kernel that has block structure divide into white and black, the part black who detect
from image and through black part is edge-detection, we notice that in black part
divide into regions white and black (where white part represent edge-detection and
black part represent strokes), so relied upon we can detect the edges.
83
kernel 315
block structure
result
Figure 4.8 (h) detect edges
Detect edge in Figure 4.8 (h) shows that by use kernel 315 where the use
kernel that has block structure divide into white and black, the part black who detect
from image and through black part is edge-detection, we notice that in black part
divide into regions white and black (where white part represent edge-detection and
black part represent strokes), so relied upon we can detect the edges.
4.3
Complement Edge Detect with them
Actually, after edges have been detected and convolve each edge with
Gaussian pyramid. We will add two edges with them. Then it will give us entire
edges detection with two angles. As below in Figure 4.9 shows the effect of collect
two edges together. Where the black regions represent the background or strokes and
84
white regions represent the edge-detection, we notice the edge-detection in vertical at
angle (0, 180), diagonal at angles (45, 255) together and angles (135, 315) together
and in horizontal at angles (90, 90) together.
(a)
(b)
(c)
(d)
Figure 4.9 Effect of adding two edges a) Add edges “0” and “180”, b) Add edges
“45” and “255”, c) Add edges “90” and “90”, d) Add edges “135” and
“315”
85
4.4
Eight Edges Detection
Figure 4.10 shows the result of adding all edges that have been detected.
Figure 4.10 Total of edges detection
4.5
Image Localization
The process of localization involves further enhancing the text regions by
eliminating non-text regions .Normally, text embedded in an image appears in
clusters, i.e., it is arranged compactly. Thus, characteristics of clustering can be used
to localize text regions. Since the intensity of the feature map represents the
possibility of text, a simple global thresholding can be employed to highlight those
with text possibility regions resulting in binary image. A morphological dilation
operator can easily connect the very close regions together while leaving those
position are fat away to each other isolated. In our proposed method, we use
86
morphological dilation operator with 3x3 square structuring to the previous obtained
binary image to get joint areas referred to as text blobs, we use two constraints,
where the first constraint is used to filter out all the very small isolated blobs whereas
the second constraint filter out those blobs whose widths are much smaller than
corresponding heights As shown in Figure 4.11 below.
Figure 4.11 Localized of text
4.6
Separate Text from Background
The text and background should be monochrome and background-to-text
contrast should be high .Thus this process generates an output image with white text
against a black background. And effect of this operation as is shown in Figure 4.6.
87
Figure 4.12 Separate text from background
The algorithm has been tested on images, and the resultant can be seen in
Figure 4.13and 4.14 respectively the below.
(a)
(b)
(c)
Figure 4.13 Test image 1 a) Image, b) Localization, c) Result
88
(a)
(b)
(c)
Figure 4.14 Test image 2 a) Image, b) Localization, c) Result
4.7
Reduce Size
4.7.1 Determine Borders
The borders should be removed; this will reduce the image size .only the
rectangular part of the image which contains text or texts in the image that will
remain as shown in Figure 4.15 below.
Figure 4.15 Determine borders
89
4.7.2 Divide Text into Rows
Now, after determine the region of text we are going to divide text into rows
and also we used dilation and erosion of separate reaches characters as found in word
“masters” in Figure 4.15 above .the result is shown in Figure 4.16(a)and (b) below.
Figure 4.16 (a) row one
Figure 4.16 (b) row two
90
4.8
Determine Character by Run-length and Recognize by Fuzzy logic
Here, we have to determine the characters and how many characters in all
words and, how many words in each text or line. Finally, the text is extracted from
the image .So when we have connected component and have identified pixel to it. As
shown in Figure 4.17 below.
Figure 4.17 Identified character
Let’s suppose the following
N1=upper left center
N4=center center off
N7=lower left off
N2=upper center on
N5=center center on
N8=lower center on
N3=upper right off
N6=center right off
N9=lower right off
N10=half lower center
91
Now, send identified pixel from N1 until N9 to fuzzy logic that has ten input
and one output as shown in Figures 4.18, 4.19 and 4.20 below.
Figure 4.18 Ten inputs and one output
92
Figure 4.19 Input one N1
Figure 4.20 Output
93
As noticed above of Figure 4.120 shown output we can see that output of
character “s” is between 0 and 2, so we can know this is character “s” same scheme
with all characters done with output of text is as shown in Figure 4.21 below.
Figure 4.21 Output of extracted text
4.9
Summary
An attempt is made to evaluate the performance of edge-detection for image.
Experimental result have demonstrated that edge-detection works quite well for
digital images corrupted with multi-scale and multi-orientation, whereas its
performance of this type of edge-detection cannot be used in practical images which
are generally corrupted with other types. However, these can be used successfully in
conjunction with suitable digital filter to reduce the effect of noise substantially
94
before applying edge-detection. And the result for edge magnitude image is very
similar with whole kernels, provided the same convolving kernel is used.
CHAPTER V
RESULTS DISCUSION
5.1
Introduction
Recently, optical character recognition (OCR) has become widely used in
character extraction from image .There are very large approaches of text extraction from
image .Research on text extraction via applying edge-detection by using kernels (eight
kernels). Also identified pixel which is used for character recognition after determine the
connected components .Then, fuzzy logic is used as character classification after send
identified pixel to fuzzy logic.
96
5.2
Discussion on Result
The project finding includes:
We represent an effective and robust general-purpose text detection and
extraction algorithm, which can automatically detect and the extract text from complex
image, the algorithm for detection text in images and video frames. According to the text
property with higher edge strength in eight directions, edge detection is applied to get
eight directional edge maps. Then based on that texts have weak text property, the text
features are extracted to statistically characterize text and non-text areas. By using the
edges features, we proposed a text detection algorithm for image and video frames. The
algorithm has a good detection performance and is robust to detect the text also the
algorithm was good performance to multi-scale and multi-orientation, where it has able
to detect edges of different size to text. the algorithm with eight filters that detect in
eight direction that give us more specify of array to image, that’s mean .the area of
image divided into eight regions and each region will detect based on the filter that used
in this operation. Based on the idea that edge information in an image, is found by
looking at the relationship between a pixel and its neighbors .i.e. edge is found by
discontinuity of grey level values. An ideal edge detector should produce an edge
indication localized to a single pixel located at the mid-point of the slope. However edge
detection is appropriate way to estimate the magnitude and orientation of an edge.
Although differential gradient edge detection needs a rather time-consuming calculation
to estimate the orientation from the magnitudes in the whole directions, the edge
detection obtains the orientation directly from the kernel with the maximum response.
As known, the edge which detected is edge strength that represents the edge of
text by using the structure kernel at center pixel of kernel determine if exists strength
97
edge relied upon identify the edge also when we have detected edges perhaps exists long
edges not element of the text, so we take measure of more maximum of longer edge,
then if the edge was loner than standard it will remove. The Run-Length Encoding based
on connected component labeling, this algorithm that we have successfully implemented
is an extension, exploiting the desirable properties of run-length encoding, combined
with the ease of parallel conversion to run-length encoding.
The method “Identify pixel” to recognize the character was fast, suitable of
performance, but when we have some character has blurry is difficulty recognize the
character because based on determine the corner of connected component and point on
side connected component. And last the fuzzy logic systems have been developed
Matlab fuzzy toolbox. The systems are based on “Mamdani” fuzzy approach. The first
task is to define inputs and the output of the fuzzy system. This stage depend on expert
decision.
Finally the text areas are identified by the empirical rules analysis and refined
through project profile analysis. The experiment with various kinds of the natural image
and video frames shows that the proposed method is effective for the distinction between
regions and non-text regions, which is robust for font-size, font-color, background
complexity and language. In the future work of text detection in videos, the performance
need be further improved in detecting texts captured by camera where are strong
illuminations changes and text distortion.
98
5.3
Experimental Results and Discussion
In order to the evaluate the performance of the our proposed method, we use ten
test image with different
font-size, orientation, perspective and alignment .Figure
1….10 show some of the results .We can see that our proposed method used in
localization and extraction the text from image with different font-size, orientation and
perspective.
Image
localization
Figure 5.1 sample 1
result
99
Image
localization
result
Figure 5.2 sample 2
Image
localization
Figure 5.3 sample 3
result
100
Image
localization
result
Figure 5.4 sample 4
Image
localization
Figure 5.5 sample 5
result
101
Image
localization
result
Figure 5.6 sample 6
Image
localization
Figure 5.7 sample 7
result
102
Image
localization
result
Figure 5.8 sample 8
Image
localization
Figure 5.9 sample 9
result
103
Image
localization
result
Figure 5.10 sample 10
Although, there is no entirely accepted method that can be used to evaluate the
performance of the text localization. So, we will assess the accuracy of our algorithm
output by manually counting the number of correctly localized characters, which are
regarded as ground-truth precision rate and recall rate. Here the performance can be
evaluated by using the equations (5.1), (5.2) and (5.3).
Precision=((correctly located)/(correctly located + false positive))*100%
(5.1)
Recall=((correctly located)/(correctly located + false negative))*100%
(5.2)
False positive=((false postive)/(correctly located + false negative))*100%
(5.3)
104
Correctly located: means the corrected localization of a text exists in an image .On
other words, which is exactly located on place of the text.
False positive: relates to a localization process of the real intended text.
False negative: relates to a localization process of non-text object.
105
Table 5.1 Performance evaluation 1
No sample
Resolution Precision rate
Recall rate
False positive
Sample1
256x255
75%
66%
33%
Sample2
150x190
68%
65%
53%
Sample3
370x280
75%
58%
30%
Sample4
169x170
80%
83%
80%
Sample5
406x307
67%
67%
50%
Sample6
250x190
80%
60%
90%
Sample7
460x480
60%
70%
65%
Sample8
290x220
62%
67%
60%
Sample9
408x300
55%
62%
40%
60%
71%
65%
Sample10
373x498
Now, we will evaluate the performance of text extraction. So, we will assess the
accuracy of our algorithm output by manually counting the precision of the number of
falsely detected text and the number of detected text from the image. Given the marked
ground-truth and detected result by our algorithm, we can calculate the recall and false
alarm rates by using the equations (5.4) and (5.5).
106
Recall = ((number of correctly detect text)/(number of text))*100%
(5.4)
False alarm rate=((number of correctly detect text)/(number of detected text))*100% (5.5)
Number of correctly detected text: is that the number of how many characters detected
correctly in the text.
Number of text: means the number of characters in the text.
Number of falsely detected text: means the number of characters in the text that the
algorithm failed to detect.
107
Table 5.2 Performance evaluation 2
No sample
Resolution False alarm rate
Recall rate
Sample1
256x255
25%
80%
Sample2
150x190
5.26%
95%
Sample3
370x280
33%
75%
Simple4
169x170
2%
99%
Sample5
406x307
44%
69%
Simple6
250x190
1%
99%
Sample7
460x480
53%
65%
Sample8
290x220
20%
80%
Sample9
408x300
40%
70%
30%
75%
Sample10
373x498
108
Table 5.3 Performance evaluation 3
No sample
Resolution feature
accuracy
Sample1
256x255
Cover book
80%
Sample2
150x190
Cover book
95%
Sample3
370x280
Poster
75%
Sample4
169x170
Poster
99%
Sample5
406x307
Poster
69%
Sample6
250x190
Note
99%
Sample7
460x480
Poster
65%
Sample8
290x220
Poster
80%
Sample9
408x300
Cover product
70%
Cover book
75%
Sample10
5.4
373x498
Project Advantage
The following are some advantages that can be found in the project:
Edge detection is to get the edge maps of the image that can decrease the
influence of background and effectively detect the initial candidate text and compute the
text feature at each pixel from edge-detection by using kernels of candidate text. This
approach can effectively detect the texts that have different font-size, font-color,
109
Language, spacing, distribution, background complexity, and edge where these distinct
characteristics can be used to find possible text areas. Text is mainly composed of the
strokes in horizontal, vertical, up-right, up-left direction, so it can be considered that the
region with higher edge strength in these directions is the text regions. We use the edge
detector to get edge maps in eight directions. Whatever due to uneven illumination
and/or reflection, long vertical edges produced by non-character objects may have a
large intensity variance and after thresholding these long vertical edges may become
broken short edges which may cause false alarms (positive.).In the meantime, uneven
surface of character from various lighting and shadows as well as the nature of the
character shape itself. The basic idea behind edge detection is to find places in an image
where the intensity changes rapidly. Moreover, based on this idea, an edge detector may
either be based on the technique of locating the places where the first derivative of the
intensity is greater in magnitude than a specified threshold or it may be based on the
criterion to find places where the second derivative of the intensity has a zero crossing
5.5
Suggestions and Future Works
There are several suggestions and future works that can be an area of interest for
researchers and developers that aim to improve and enhance the whole performance.
Our main future work involves using a suitable existing OCR technique to
recognize the extracted text the contributions possible are can handle , first both printed
document and scene text images, second not more sensitive to image color/intensity,
robustness with respect to font, sizes, orientation, uneven illumination, perspective and
reflection effects, third distinguishes text regions from texture –like regions, such as
110
window frames and wall patterns etc., by using the variance of edge orientations and
binary output that can be directly used as an input to an existing OCR engine for
character recognition without any further processing, also can use a suitable existing
OCR technique to recognize the extracted text from landmarks. Moreover, the basic
criterion of using edge detection in digital images is that image should contain sharp
intensity transition and low noise of Poisson type presented, so it can handle an image
without contain sharp intensity transition and low noise of Poisson type is presented.
5.6
Conclusion
As known, in this project we used edge-detection for text detection in complex
image by using eight kernels (filters) to accomplish this task. Then, we used identified
pixel to determine a character by utilizing the fuzzy logic. We also fulfilled the aims,
objectives and scope of the project which have been outlined before.
CHAPTER VI
CONCLUSION
Usually, many methods automatically detect and extract the text from complex
image. According to text properties exists in the image. We can use edge-detection in
eight directions, the every direction detect the edge strength with higher dense that
represent the detected edge in image. Our algorithm outputs were satisfy in this process
by using the magnitude and orientation which to detect the edge of the text. By using the
structure element (kernels) with whose center pixel to calculate the edge. Then if the
edge was low from require threshoding, it will be ignores other is selected as edgedetection.
Edge-strength that represent the edge of the text by using structure element at
the center pixel determine it, also we have detected edge perhaps exists long edge not
element of the text. Identify pixel to recognize the character by fast and suitable
performance, but we have some characters have blurry is difficultly recognize the
character because based on determine the identify pixel. Finally the text areas are
identified by the empirical rules analysis and refined through project profile analysis.
112
The experiment with various kinds of the natural image and video frames shows that the
proposed method is effective for the distinction between regions and non-text regions,
which is robust for font-size, font-color, background complexity and language. In the
future work of text detection in videos, the performance need be further improved in
detecting texts captured by camera where are strong illuminations changes and text
distortion
113
REFERENCES
Alasdir,2004.Introduction to Digital Image Processing with Mathlab. Springer-Verlag
Berlin Heidelberg 2007
Chunmei, Chunheng, Ruwei, (2005).Text Detection in Image Based on Unsupervised
Classification of Edge-Based Features.IEEE. 0-7520-5263. Proceedings of the
Eight International Conference on Document Analysis and Recognition.
china.2005
Chengjie, Jie and Trac, 2002b.Adaptive Runlenght Coding .IEEE.0-7803-76226.Baltimore MD 21218
Datong,Herve and Jean, 2001. text identification n complex background using
SVM.IEEE.0-7695-1272-0.Dalle molle insitiute for perceptual artificial
intelligence,Switzerland .UTM.2001.
Ezaki, Bulacu and Schomaker,(2004).Text Detection from Natural scene Images:T
owards a
System for Sisually Smpaired Sersons. In Sroceedings of the
Snternational Sonference on Sattern Secognition (ICPR’04).pp.683-686.2004.
Fuzzy Logic Toolbox User’s Guide” The MathWorks, Inc. 2006f
Gatos, Pratikakis, Kepene and J.perantonis(2005a). Text dectection in indoor/outdoor
scene images. Proceedings to National center for scientific research
“Demokritos”, GR-153 10 Agia Paraskevi, Athens, Greece
Holland JH,(1975) . Adaption in Natural
and Artificial systems. University of
MICHIGAN press, Ann Arbor, 1975
Hyeran,
Seong-Whan,
Applications of Support Vector Machines for Pattern
Recognition: A Survey, SVM 2002, LNCS 2388, pp. 213-236 , 2002a
114
Jagath, and Xiaoqing .(2006b). An Edge-Based Text Region Extraction Algorithm for in
Door Mobile Robot Navigations. International Journal of Signal Processing 3;4.
Western Ontario, London, ON., N6A 5B9, Canada
Jie ,Jigui and Shengsheng,(2006d). Pattern Recognition: An overview. Proceedings to
IJCSNS International Journal of Computer Science and Network Security,
VOL.6 No.6. 130012 Changchun, China June 2006
Jiang and Jie,2000. An Adaptive Algorithm for Text Detection from Natural Scenes.
University Pittsburgh. 2000
Kofi, Andrew, Patrick and Jonathan, (2007b).Run-Length Based Connected Component
Algorithm for FPGA Implementation.University of Lincoln.England.2007
Kongqiao
and
Jari,(2003b).Character
Location
in
Scene
Images
from
Digitalcamera.Pattern Recognition. Proceedings to journal of the pattern
recognition.Tampere,Finland.2003
Kwang,Keechul and Jin,2003c .Texture-Based Approach for Text Detection in Image
Using Support Vector Machine and Continuously Adaptive Mean Shift
Algorithm. IEEE Transtion On Pattern Analysis and Machine Intelligence, Vol.
25,
No. 12, December2003
Kim ,Byun, Song, Choi, Chi and .chung, (2004).Scene Text Extraction Scene Images
Using Hierarchical Feature Combining and Verification .IEEE . 1051-4651/04
.
Proceedings of the 17th International Conference on Pattern Recognition
Mohanad and Mohammad, 2006e. Text Detection AND Character Recognition Using
Fuzzy
Image
Processing.
Proceedings
to
Journal
of
Electrical
Engineering.Vol.57,No.5. Jordan. 2006
Matsuo,Ueda and Michio,2002d .Sxtraction of Sharacter String from Scene Smage
Binarizing Local Target Aarea.Transaction of the Anstitute of Electrical
Engineers.122-c(2).232-241 Japan.2004.
Pavilidis,(1977)., Structural Pattern Recognition, Springer-Verlag,New York, 1977.
Qixiang, Wen, Weiqiang and Wei (2003a).Roust text detection algorithm in images and
video frames.IEEE.0-7803-8185-8.School of Chinese academy of scienes,China
Qixiang, Qingming, Wen and Debin,2005b .Fast and Robust Text Detection in Images
and Video Frames.Image and Vision Computing 23.China.2005.
115
Roshanak and Shohreh(2005c).text segmentation from image with textured and colored
background. Proceedings to Sharif University of Technology, Tehran, Iran
Rabbani and Chellappan,(2007a).Fast and New Approach to Gradient Edge Detection.
Proceedings to International Journal of Soft Computing 2(2):325-330.india.2007
Rainer and Axel, (2002c). Localizing and Segmenting Text in Images and Videos.
University Pittsburgh.2002
Sivanandam, Deepa and Sumathi,2007.Introduction to Fuzzy Logic Using Mathlab.
Tsai, Chen, Fang (2006c).A Comprehensive motion videotext detection localization and
extraction method.IEEE.0-7803-9584-0.Taoyuan County 320,Taiwan P.R.C
Takuma, Yasuaki and Minoru, 2003d. Digit Classification on Signboards for Telephone
Number Recognition .IEEE. 0-7695-1960-1. Proceedings of the Seventh
International Conference on Document Analysis and Recognition. Japan.2003
Victor,Raghavan and Edward,(1999).Textfinder:an Automatic System to Detect and
Recognize Text in Image.IEEE Tansaction on Pattern Analysis and Machine
Intelligence,vol.21, no.11, Novermber 1999.Amherst
Xiaoqing and Jagath,.(2006a).Multiscale Edge-Based Text Extraction from Complex
Image..IEEE.1-4244-0367-7.London, Ontario, N6A 5B9, Canada
Xilin,
Jie, Jing and Alex, (2003e). Automatic Detection of Signs with
Transformation. University Mobile Technologies.2003
Yuzhong, kallekearu and anil,1995. Locating Text in Complex Color Images.
Affine
116
APPENDICES
X=Imread(‘image.jpg’);%read image
X1=Reducesize(x);%reduce size by half of 8 time
X2=convolve(x1);%convolve each image with kernels
X3=Imresize(x2);%return image of original size
X4=addedge(x3);%collect edges together
%x4 present total addition edges
X5=Dilation(x4)% does dilation of image
X6=Erosion(x5);%does erosion of image
X7=eliminated(x6);%eliminating long edges
X8=Extract(x7);%extract text from image
Imagebinary(x8);%binary image
A1: Matlab command to find binary image
X1=removeborders(‘image.jpg’);%read image and remove borders
X2=dividetintotext(x1);%divide text of image onto texts
X3=dividetextintorows(x2);%divides into rows
X4=bwlabel(x3,8);%label connected component
X5=identifedpixel(x4);%identify pixels of each
connected compoment
X6=sendfuzzylogic(x5);%send data into fuzzy logic
X7=recognizecharacter(x6);%recognize the characters
A2: Matlab command used fuzzy logic for identify character
Download