TEXT EXTRACTION FROM INVARIANT COMPLEX IMAGE UNIVERSITI TEKNOLOGI MALAYSIA

TEXT EXTRACTION FROM INVARIANT COMPLEX IMAGE NOURI ALI AL MABROUK AL HASHI UNIVERSITI TEKNOLOGI MALAYSIA TEXT EXTRACTION FROM INVARINAT COMPLEX IMAGE NOURI ALI AL MABROUK AL HASHI A project submitted in partial fulfillment of the requirements for the award of the degree of Master of Science (Computer Science) Faculty of Computer Science and Information Systems Universiti Teknologi Malaysia July 2009 iii To my dearest father, mother and family, for their encouragement, blessing and inspiration … iv ACKNOWLEDGEMENT Alhamdullillah, I am grateful to ALLAH SWT on His blessing and mercy for giving me the strength along the challenging journey of carrying out this project and making this project successful. First of all, I would like to express my deepest appreciation to my supervisor, Prof.Dr.Dzulkkifli Mohamad for his effort, guidance and support throughout this project. Without his advice, suggestions and guidance, the project would have not been successful achieve the objectives. To all lecturers who have taught me, thank you for the lesson that has been delivered. Not forgetting all my friends, thank you for their useful idea, information and moral support during the course of study. Last but not least, I would like to express my heartiest appreciation to my parents and my family, who are always there when it matters most. v ABSTRACT Great progress has been made in Optical Character Recognition (OCR) technology. Most current OCRs, however, can only read characters printed on sheets of paper according to some rigid format restrictions. For that, the detection and extraction of text regions in an image are well known problems in Computer Vision research area. The goal of this project is to extract and recognize the text from an image by using the edge-based and fuzzy logic algorithm respectively. The algorithms are implemented and evaluated by using a set of images of natural scenes that vary along its’ size, scale and orientation. Various kernels can be used for this operation ,the whole set of 8 kernels is produced by taking one of kernels and rotating its coefficient circularly and edgedetection operator is calculated by forming matrix centered on pixel chosen as center of matrix area, then Localization involves further enhancing regions by eliminating nontext regions. Edge-detection works quite well for digital image corrupted with multiscale and multi-orientation whereas its performance of this operator cannot be used in practical image which generally corrupted other types and edge-detection for detection of edge in digital image is that image should contain sharp intensity transition and low noise of the type is present. Moreover the image is colored image .Then, edge detect at eight edges and convolve with Gaussian after that select the strong edge was suitable of detect the text. As known be the project in complex image by using eight kernels to accomplish the task .Then, we used identified pixel of determine the character with use fuzzy logic. . vi ABSTRAK kemajuan yang pesat telah berkembang dalam teknologi pengenalan wojah optic(OCR). Kebangakkan OCR kini hanga boleh mengenapasti wajah yang dihadkan untuk itu ,pengecanan dan mengeluarkan esbahagian format kawasan danpada imej adalah salah sutu musalah yang masih dibuat kajjan dalan system komputer .Projek ini bertujuan untuk mengeuarkan dan mengenalpasti teks daripada imej menggunakan kaedah teras-sisi dan algoritma logik fuzzy .Algorithma ini telah digunakan dan dinilai menggunakan beberapa set imej yg dan bersifat semulajdi yang berbeza dari segi saiz, skala dan kedudukan. Operator kompas teloh digunakan dimang ia menganchingi lopan isirong yg digunakan untuk mengesun 8 arah berlainon selepasitu,setiap sisi dikumpulkan bersama mengunakan tingkatan sisi menegan. Kaedah ini adalah sangaf penting untuk menentukan cirri pergerakan dan juga perabahun setiap pemanjangan jugu nemberi perubahan juga kepada cirri ketinggian yg berkaltan penempatan juga terlibut untuk mempertingkatkan kawasan dengan menghpuskan kanan bukan teks. Imej ini adalah imej yg bewaran.Kwmudiank sisi akan aikesun mengyunkgn pengesan pada lapan sisi dan menggulung dengan kaedan cuussian selepas sisi yang kuat dipilih bersesuaien dengan pengesan teks sepert yang diketahui imej yang kompleks akan menggunakan lapon isirong untuk mengempumakan tugas ini .Kemudian ,pengesan piksel digunakan untuk mendapatkan cirri yang dikehendaki menggunakon logic fuzzy . vii TABLE OF CONTENTS CHAPTER 1 TITLE PAGE DECLARATION ii DEDICATION iii ACKNOWLEDGEMENT iv ABSTRACT v ABSTRAK vi TABLE OF CONTENTS vii LIST OF TABLES xi LIST OF FIGURES xii LIST OF SYMBOLS xv LIST OF APPENDICES xvii INTRODUCTION 1 1.1 Introduction 1 1.2 Problem Background 2 1.3 Problem Statement 4 1.4 Project Objectives 5 1.5 Scope of the Project 6 1.6 Significant of the Study 6 1.7 Report Organization 7 viii 2 LITERATURE REVIEW 8 2.1 Introduction 8 2.2 Segmentation Categories 9 2.2.1. Threshold Based Segmentation 9 2.2.2. Clustering Techniques 10 2.2.3. Matching 10 2.2.4. Edge Based Segmentation 10 2.2.5. Region Based Segmentation 11 Categories of Variance Text 11 2.3.1. Lighting Variance 12 2.3.2. Scale Variance 12 2.3.3. Orientation Variance 12 2.3 2.4 2.5 Recognition Text 13 2.4.1. Text Detection 15 2.4.2. Text Area Identification 15 2.4.3. Text Region Localization 15 2.4.4. Text Extraction and Binary Image 16 Analytic Segmentation 17 2.5.1. Pattern Recognition 17 2.5.2. Statistical Pattern Recognition 18 2.5.3. Data Clustering 18 2.5.4. Fuzzy Logic 19 2.5.5. Neural Networks 19 2.5.6. Structural Pattern Recognition 20 2.5.7. Syntactic Pattern Recognition 20 2.5.8. Approximate Reasoning Approach to Pattern Recognition 2.5.9. Application of Support Vector Machine (SVM) 2.6 21 21 Pattern Recognition System 21 2.6.1. 22 The Structure of Pattern Recognition ix 2.7 2.8 2.9 2.6.2. Application of Pattern Recognition 23 2.6.3. Character Recognition 23 Run-Length Coding Algorithm 24 2.7.1. Neighbors 25 2.7.2. Path 26 2.7.3. Foreground 26 2.7.4. Connectivity 27 2.7.5. Connected Component 27 2.7.6. Background 28 2.7.7. Boundary 29 2.7.8. Interior 29 2.7.9. Surrounds 30 2.7.10. Component Labeling 30 Properties Text 32 2.8.1. Removing the Borders 32 2.8.2. Divide the Text into Rows 32 2.8.3. Divide the Row “Lines” into the Words 32 2.8.4. Divide the Word into Characters 34 Identify Character 2.10 Fuzzy Logic 35 35 2.10.1. What Fuzzy Logic? 37 2.10.2. What is the Fuzzy Logic Toolbox? 38 2.10.3. Fuzzy Sets 38 2.10.4. Membership Function 39 2.10.5. If-Then Rules 40 2.10.6. Fuzzy Inference System 41 2.10.7. Rule Review 41 2.10.8. Surface Review 42 2.11 Summary 43 x 3 METHODOLOGY 44 3.1. Introduction 44 3.2. Problem Statement and Literature Review 46 3.3. System Development 46 3.4. Performance Evaluation 47 3.5. General Steps of Proposed Techniques 47 3.6. Proposed Algorithm for Edge Based Text Region Extraction 3.7. Detection 48 49 3.8. Feature Map and Candidate Text Region 4 Detection 55 3.8.1. Directional Filtering 55 3.8.2. Edge Selection 55 3.8.3. Feature Map Generation 58 3.8.4. Localization 59 3.8.5. Character Extraction 59 3.9. Connection Component 60 3.10. Fuzzy Logic 65 3.11. Summary 67 IMPLEMENTATION 68 4.1. Introduction 68 4.2. Input Image 69 4.3. Complement Edge Detect with them 83 4.4. Eight Edge Detection 85 4.5. Image Localization 85 4.6. Separate Text From Background 86 4.7. Reduce Size 88 4.7.1. Determine Borders 88 4.7.2. Divide Text into Rows 89 4.8. Determine Character by Run-Length 90 xi 5 6 REFERENCES Appendices RESULTS DISCUSION 95 5.1. Introduction 95 5.2. Discussion on Results 96 5.3. Experimental results and discussion 98 5.4. Project Advantage 108 5.5. Suggestion and Future Works 109 5.6. Conclusion 110 CONCLUSION 111 113-115 116 xi LIST OF TABLES TITLE TABLE NO. 3.1 Results to object to rows 4.2 Results after image scan ,where ST=start, EN=end and PAGE 63 RW=row 64 4.1 Running time of major step 67 5.1 Performance evaluation 1 105 5.2 Performance evaluation 2 107 5.3 Performance evaluation 3 108 xii LIST OF FIGURES TITLE FIGURE NO. PAGE 2.1 General model of extraction text 13 2.2 The composition of PR system 22 2.3 Horizontal projection calculated from run-length code 24 2.4 4-and 8-neighborhood for rectangular image location Pixel [i,j] is located in center 25 2.5 4-path and 8-path 26 2.6 Border of an image 28 2.7 Ambiguous border 28 2.8 A binary image with its boundary ,interior and surrounds 30 2.9 An image (a) and its connected component image (b) 31 2.10 Divide the text into rows 33 2.11 Divide the rows into the words 34 2.12 Divide the word into characters 35 2.13 Identify character 35 2.14 A classical set and fuzzy set representation of “warm room temperature” 37 2.15 (a) input of pixel (b) input of location for pixel 39 2.16 Output variable “letter” 40 2.17 Building the system with fuzzy logic 42 xiii 3.1 Proposed method 45 3.2 Block diagram of general steps of proposed approach 48 3.3 Gaussian filter 49 3.4 Sample gaussian pyramid with 8 levels 50 3.5 Extraction Operation 50 3.6 Edge detection 53 3.7 U shape object with runs after pixeltoruns 63 3.8 8-neighborhoods for rectangular image location pixel [i,j] is located in center of each figure 65 3.9 Identify the character 66 3.10 (a) example of fuzzy input (b) example of fuzzy output 4.1 Original image 56 4.2 Structure 3x3(filter) 70 4.3 Our example of convolution operation 71 4.4 Kernel used 73 4.5 Directions of edge-detection 74 4.6 Structure of convolution 4.7 Operation of kernel 0 4.8 Edge detection 4.9 Effect of adding two edge 84 4.10 Total of edges detection 85 4.11 Localized of text 86 4.12 Separate text from background 87 4.13 Test image1 (a)image (b)localization(c)result 87 4.14 Test image2 (a)image (b)localization(c) result 88 4.15 Determine borders 88 4.16 (a)row one (b) row two 89 4.17 Identified character 90 4.18 Ten input and one output 91 4.19 Input one n1 92 4.20 Output 92 54-55 75-76 77 78-83 xiv 4.21 Output of extracted text 93 5.1 Sample 1 98 5.2 Sample 2 99 5.3 Sample 3 99 5.4 Sample 4 100 5.5 Sample 5 100 5.6 Sample 6 101 5.7 Sample 7 101 5.8 Sample 8 102 5.9 Sample 9 102 5.10 Sample 10 103 xv LIST OF SYMBOLS OCR - Optical character recognition CC - Connected components BAG - Black adjacency graph AMA - Aligning-and merging analysis SVM - Support vector machine RLC - Run-length code PR - Pattern recognition SE - Structuring element MFs - membership functions FIS - Fuzzy Inference System xvii LIST OF APPENDICES TITLE APPENDIX PAGE A1 Matlab command to find binary image 116 A2 Matlab command used fuzzy logic for identify 116 character CHAPTER I INTRODUCTION 1.1 Introduction During the past years, recent studies in the field of computer vision and pattern recognition showed a great amount of interest in content retrieval from images and videos. This content can be in form of objectives, colors, textures and shapes as well as a relationship between them. As stated by (Kwang, Keechul and Jin, 2003c) the text data is particularly interesting, because text data can contain image of varying text due to differences in size ,orientation and alignment as well as complex background that make the problem of automatic text extraction extremely challenging. In recent years, great progress has been made in Optical Character Recognition (OCR) technique that can only handle the text against plain monochrome background, and extract text from a complex 2 background. Commercially, OCR engines cannot yet detect and recognize text embedded in complex background directly. Extraction of text from images has been relied upon mainly on the properties of text. In the past few years, it was witnessed rapid growth in a number and variety of applications using fuzzy logic. Fuzzy logic is a logical system which is an extension of multi-value logic. It was used to identify the character after extracting text from an image. (Kongqiao and Jari, 2003b) proposed character recognition that comprises a character boundaries operation for invariance of multi-scale and multi-orientation. Finally, it is expected that results will present the success of text extraction and recognition process from a complex image. 1.2 Problem Background Most of the applications that involve documented naturals, where texts and graphics are blended together, need some land separation between texts and graphics to detect and recognize text without any computer help is difficult task of information processing field. Because of that, intensive projects are performed to process extraction and recognition by machine and automatic extracting recognition were topics of research for years. 3 (Jagath and Xiaoqing, 2006b) their algorithm which can use edge-based text extraction algorithm which is robust with respect to font sizes, color, intensity, orientation, effects of illumination, reflection, shadows, perspective distortion and the complexity of image background can quickly and effectively localize and extract text from real scenes. (Kongqiao and Jari, 2003b) they proposed connected-component based (CC- based) method which combines color clustering, a black adjacency graph (BAG) , an aligning –and-merging-analysis (AMA) scheme and a set of heuristic rules together to detect text in the application of sign recognition such as street indicators and billboards . (Rainer and Axel, 2002c) proposed a feed-forward neural network to localize and segment the text from complex images; it is designed specifically for horizontal text with at least two characters. (Yuzhong, kallekearu and anil, 1995) proposed hybrid of CC-based and texture-based method to extract text. Although experimental results show that the combination of these two methods perform better, the monochrome constraint used also fails to detect touching characters. (Kwang, Keechul and Jin, 2003c) combined a Support Vector Machine (SVM) and continuously adaptive mean shift algorithm (CAMSHIFT) to detect and identify text regions. Datong, (Herve and Jean, 2001) they used a SVM to identify text lines from candidates .However, experimental results show that both methods above are mainly designed for video captions. (Jiang and Jie, 2000) developed a three layer hierarchical adaptive text detection algorithm for natural scenes; this method has been applied in a prototype Chinese sign translation system which mostly has horizontal and/ or vertical alignment. ( Ezaki, Bulacu and Schomaker, 2004) proposed four character extraction methods based on connected components. The performance of the different methods depends on character size. (Takuma, Yasuaki and Minoru, 2003d) proposed digits classification system to recognize telephone number written on signboard. Candidate regions of digits are 4 extracted from an image through edge extraction, enhancement and labeling. Matsuo, (Ueda and Michio, 2002d) proposed a text extraction algorithm from image scenes after an identification stage of local target area and adaptive thresholding. (Xilin, Jie, Jing and Alex, 2003e) proposed a framework for automatic detection of signs from natural scenes .This framework considers critical challenges in sign extraction and can extract signs robustly under different conditions . Based on these studies, this project attempts to propose extraction strategy relies on the edge-detection of text and characters in conjunction with fuzzy logic to recognize characters. 1.3 Problem Statement This study that utilizes effective extraction method may provide significant improvement of multi-orientation and multi-scale recognition performance .To reach good recognition performance, it is important to solve explicit extraction problems such as different scale and different orientation. The main research question is “how to achieve an effective extraction of text variations for multi-scale, multi-orientation and sub-question of main project questions as shown: 5 1. How the recent extraction approach has done? 2. How system might improve extraction approach? 3. How to evaluate and to measure the proposed extraction and recognition character performance? 1.4 Project Objectives Based on the problem statement above, this project encompasses a set of objectives that is associated with milestones of the project process. The project objectives are mentioned below. 1. To develop an improved extraction method based on edge detection and fuzzy logic. 2. To verify the effectiveness of the proposed technique as compared to existing techniques. 6 1.5 Scope of the Project In order to accomplish the objective of this study, it is important to identify the scope which covers the following aspects: This research is concerned with the extraction of text from image and recognition of characters by using fuzzy logic. 1. This research is concerned with invariant complex image. 2. Dilation and erosion are used to remove noise and touching between characters 3. Fuzzy logic that is used for identifying the characters. 1.6 Significance of the Study This study is carried out with the main objective of extracting text. Based on the results obtained, it is hoped that this is able to achieve the following: 1. To give exposure on another promising technique of extraction that could offer better or at least same performance as the existing techniques. 2. To solve the extraction problem such as complex background, different style, font etc. 3. To encourage more works in exploring the advantages of extraction and recognition. 7 1.7 Report Organization This report will mainly be divided into five chapters .The first chapter is an introduction and brief overview of the project including the problem background, problem statement, objectives, scope and significance of the study .Chapter II reviews the literature background of previous studies on extraction text and recognition character performance analysis including the techniques of the analysis and the result of analysis. Chapter III covers the framework and methodology of the project which focuses on application-based analysis .Chapter IV presents the implementation. Chapter V contains the result discussions. Chapter VI includes conclusion of this project. CHAPTER II LITERATURE REVIEW 2.1 Introduction This chapter discusses issues related to the study, which describes the-state-ofart of segmentation categories and focuses on the recognition of text. They also describe analytic segmentation, run-length coding algorithm and properties of text, and identify a character based on the fuzzy logic which is used to determine the character. 9 2.2 Segmentation Categories Categories of segmentation which are considered as portions of an image included within the text structure i.e. an image segmentation is often an essential step in the image analysis, object representation, visualization and many other image processing tasks. A great variety of segmentation methods was proposed in the past decades, and some categorization is necessary to present the method properly .The presented categorization is therefore rather categorization regarding the emphasis of approaches that strict division. 2.2.1 Threshold Based Segmentation Histogram threshold is a slicing technique used to segment the image .It may be applied directly to the image, providing it is combined with pre-and post-processing techniques. 10 2.2.2 Clustering Techniques Although clustering is sometimes used as synonym for segmentation techniques, which denote mechanisms that are primarily used in exploratory data analysis of highdimensional measurement that is similar in some sense. 2.2.3 Matching When we know what an object we wish to identify in an image approximately looks like, we can use this knowledge to locate the object in the image. Also we can discriminate this object by making matching between pixels themselves. 2.2.4 Edge-Based Segmentation With this technique, detecting an edge in image is assumed to represent object boundaries, and used to identify these objects. According to our knowledge of the boundaries of the object, we can recognize the object. (Rabbani and Chellappan, 2007) the edge detection is a fundamental tool used in most image processing applications to obtain information from frames as a step of feature extraction and object segmentation. 11 2.2.5 Region Based Segmentation While an edge based technique attempts to find the object boundaries, and then locate the object itself, by filling them in, a region based technique takes the opposite approach i.e. by starting in middle of an object and then “growing” outward until it meets the object’s boundaries. In this effort, we will focus on edge based segmentation. But, we may face multi-orientation and multi-scale in region text problems. Because of this, it requires sophisticated approach to segment character properly. 2.3 Categories of Variance Text Nowadays, it’s noticed that the commercial propaganda tools are rapidly increasing and deploying by the use of posters on walls, signs on roads, and lightening indicators mounted in public streets with different styles whose font-size, color, orientation, lightness and alignment of text can be easily edited and modified to make the show more attractive and tempting. So the following is desirable in detail regarding variance text. 12 2.3.1 Light Variance (Gatos, Pratikakis, Kepene and Perantonis, 2005a) proposed that the image varies with transition of light condition on text overlaid on image. When a text has different lightness, this will affect the extract process of text from image, because of varying text lightness. 2.3.2 Scale Variance (Xiaoqing and Jagath, 2006a) suggested that the image properties vary according to the distance at which the camera is. When pictures are taken at different distance, this affects the resolution of image and text. 2.3.3 Orientation Variance (Xiaoqing and Jagath, 2006b) thought that the image varies with different angles from camera .When the pictures are taken from different angles; this can display the text overlaid on image with different size of text in image, because of the different location and position of cameras. 13 2.4 Recognition Text Generally, extraction of text can be drawn as a combination of separate modules that process text in images from raw of data until extraction of the text from images. (Tsung, Yung, Chih , 2006c) who clearly described extraction of text as seen in Figure 2.1. Most of the image text detection and extraction methods deal with the static image text. Figure 2.1 General model of extraction text Firstly, raw data of image should follow the initial text detection step to achieve a suitable form. Moreover, the suitable form identifies the text to ensure that detection of text is found in images. Meanwhile, the stage of text localization which is important to localize the text. Then, the text extraction is used to extract the text from image. 14 2.4.1 Text Detection As known, text detection in terms of the entire process that performs prior of localization and extraction step. In this model, detection has a purpose of converting raw data to suitable form and calibrating text lineament. Acquisition text via a scanner device would display an image containing texts as seen in Figure 2.2. During text detection, a process is run to identify that if a text is being detected or noise. (Qixiang, Wen, Weiqiang and Wei, 2003a) used an algorithm based on Sober edge operation in four directions. By using this algorithm, the density represents the precision of text localization. It gets gradient map in three components, RGB .Meanwhile; morphology operation is “close” and “open” operation where the edge-pixels become adjacent to each other. The “open” operation is used to disconnect edge map where it is too narrow to contain text. By using the projection profile of image block that compact representation of spatial pixel .Then , bounding edge-dense blocks in edge map after profile projection and compare the performance with Roshanak and Shohreh, (2005c) who proposed a method based on finding text edge using information content of sub image coefficients of discrete wavelet transformed input image ,the most text used image are well characterized by their contained edge and “dense edge” are distinct characteristics of text blocks which can be used in detecting possible text region and using Sobel detection is effective in extracting strong edge of image .In the past (Tsai, Chen, Fang , 2006c) generated the edge map for text detection , they mainly depended on the edge information to detect text .Then, two edge maps are generated for detection of scrolling text and edge map is generated by performing Sobel detection to entire input image. 15 2.4.2 Text Area Identification Ideally, text area identification whose task is to ensure the text detection. (Qixiang, Wen, Weiqiang and Wei 2003a) invented three rules to confirm the candidate text blocks .These rules are divided into, First, text block height and text block width are defined as a text line which contains at least two words .Second, is used to limit the size of text T1 and T2 to be set as 8 pixels and 32 pixels respectively .Text blocks whose height is smaller than 8 pixels or larger than 32 pixels can be found in zoom image. Third, is to eliminate blocks that contain too few edge pixels, but, sometimes remain noise similar to the text size, that is non-text. So use Wavelet feature and SVM classify the candidate text. Although text has its own property, it may be quite weak and irregular, only that they include some strokes i.e. horizontal, vertical, up-right-slanting and up-left-slanting stroke. These strokes are regular to some extent when we consider them as one block, but never regular as to each pixel. Furthermore, (Tsung, Yung, Chih, 2006c) who solved the false alarm due to complex background that could be filtered in horizontal edge map are used to identify the text region and to get the refined text region. So calculate the pixels with horizontal edge in detected text region to identify whether it is true text region. 2.4.3 Text Region Localization (Xiaoqing and Jagath, 2006a), proved that the text embedded in an image appears in clusters, i.e., it is arranged compactly. Thus, characteristics of clustering can be used to localize text regions. Since the intensity of the feature map represents the possibility of text, a simple global thresholding can be employed to highlight those with high text 16 possibility regions resulting in a binary image. A morphological dilation operator can easily connect the very close regions together, while leaving those whose position is far away to each other isolated. And use a morphological dilation operator that obtained binary image to get joint areas referred to as text blobs. Two constraints are used to filter out those blobs which do not contain text, where the first constraint is used to filter out all of the very small isolated blobs whereas the second constraint filters out those blobs whose widths are much smaller than corresponding heights. The retaining blobs are enclosed in boundary boxes. Four pairs of coordinates of the boundary boxes are determined by the maximum and minimum coordinates of the top, bottom, left and right points of the corresponding blobs. In order to avoid missing those character pixels which lie near or outside of the initial boundary, width and height of the boundary box are padded by small amounts. 2.4.4 Text Extraction and Binary Image The final step of text overlaid on image recognition is text extraction. The goal of text extraction is to convert the grayscale image in an accepted text region into the OCR-ready binary image, where all pixels of the characters are in black and others are in white. To extract the static text over complex background, bitmap integration over time is often used to remove the moving background of a text region. However, ( Tsung, Yung, Chih, 2006c) saw this method seed-fill algorithm is proper to eliminate the false text character region to enhance the recognition rate. Considering the Otsu method is often used to calculate the threshold to segment text from the background. For the vertical motion text, a vertical adaptive thresholding is 17 applied, meanwhile horizontal adaptive thresholding for the horizontal motion text. In corresponding, (Jie, Jigui and Shengsheng, 2006d) who used Strong & boosted edges after dilation are combined then followed by AND operation which forms the text region. results of dilation and logical operation and mapping to the original image to get text regions remaining non text regions are identified and eliminated by removing from a binary image .Meanwhile, (Xiaoqing and Jagath, 2006a) their method that cannot handle character embedded in shade, the text used or complex background .The stage to extract accurate binary character from localized text regions. So that can the existing OCR directly for recognition, they use uniform white character pure black ground. 2.5 Analytic Segmentation 2.5.1 Pattern Recognition (Jie , Jigui and Shengsheng, 2006d), explained that the pattern recognition (PR) is a subject researching an object description and classification method. It is also a collection of mathematical, statistical, heuristic and inductive techniques of fundamental role in executing the tasks like human beings on computers. Pattern recognition includes a lot of methods which implies the development of numerous applications in different fields. The practicability of these methods is intelligent emulation. 18 2.5.2 Statistical Pattern Recognition Statistical decision and estimation theories have been commonly used in PR for a long time. It is a classical method of PR which was found out during a long developing process. It based on the feature vector distribution which gets from probability and statistical model. The statistical model is defined by a family of class-conditional probability density functions (Probability of feature vector x given class) in detail. We put the features in some optional order, and then we can regard the set of features as a feature vector. Also statistical pattern recognition deals with features only without considering the relations between features. 2.5.3 Data Clustering Its aim is to find out a few similar clusters in a mass of data which do not need any information of the known clusters. It is an unsupervised method. In general, the method of data clustering can be partitioned into two classes, one is hierarchical clustering, and the other is partition clustering. 19 2.5.4 Fuzzy Sets The thinking process of human being is often fuzzy and uncertain, and the languages of human are often fuzzy also. And in reality, we can’t always give complete answers or classification, so theory of fuzzy sets come into being. Fuzzy sets can describe the extension and intension of a concept effectively. 2.5.5 Neural Networks Neural networks are developing very fast, since the first neural network model MLP was proposed since 1943, especially the Hopfield neural networks and famous BP arithmetic came into being after. It is a data clustering method based on distance measurement; also this method is model-irrespective. The neural approach applies biological concepts to machines to recognize patterns. The outcome of this effort is the invention of artificial neural networks which is set up by the elicitation of the physiology knowledge of human brain. Neural networks are composed of a series of different, associate unit. In addition, a genetic algorithm applied in neural networks is a statistical optimized algorithm proposed by (Holland, 1975). 20 2.5.6 Structural Pattern Recognition Structural pattern recognition is not based on a firm theory which relies on segmentation and features extraction. (Pavilidis 1977), said that structural pattern recognition lays emphases on the description of the structure, namely explain how some simple sub-patterns compose one pattern. There are two main methods in structural pattern recognition, syntax analysis and structure matching. The basis of syntax analysis is the theory of formal language, while the basis of structure matching is some special technique of mathematics based on sub-patterns. When considering the relation among each part of the object, the structural pattern recognition is best. It deals with symbol information; Structural pattern recognition always associates with statistic classification or neural networks through which we can deal with more complex problems of pattern recognition, such as recognition of multidimensional objects. 2.5.7 Syntactic Pattern Recognition This method basically emphasizes on the rules of composition. The attractive aspect of syntactic methods is its suitability for dealing with recursion. After customizing a series of rules which can describe the relation among the parts of the object, syntactic pattern recognition which is a special kind of structural pattern recognition that can be used. 21 2.5.8 Approximate Reasoning Approach of Pattern Recognition This method which uses two concepts: fuzzy applications and compositional rule of inference can cope with the problem of rule based pattern recognition. 2.5.9 Applications of Support Vector Machine (SVM). SVM is a relatively new thing with simple structure; it has been researched widely since it was proposed by (Hyeran and Seong, 2002a) the SVM based on statistical theory ,and the method of SVM is an effective tool that can solve the problems of pattern recognition and function estimation, especially classification and regression problem. It has been applied to a wide range of pattern recognition such as face detection, verification and recognition, object detection and recognition and speech recognition etc. 2.6 Pattern Recognition System A pattern recognition system can be described as a process that copes with real or noisy data. Even the decision made by the system was right or not, it mainly depends on the decision made by the human expert. 22 2.6.1 The Structure of Pattern Recognition System A pattern recognition system is based on PR method which mainly includes three mutual-associate and differentiated processes .The aim of pattern classification is to utilize the information acquired from pattern analysis to discipline the computer in order to accomplish the classification. A very common description of the pattern recognition system that includes five steps to accomplish. The step of classification/regression / description shown in Figure 2.2 is the kernel of the system. Classification is a PR problem of assigning an object to a class, the output of the PR system is an integer label, such as classifying a product as “1” or “0” in a quality control test. Regression is a generalization of a classification task, and the output of the PR system is a real-valued number, such as predicting the share value of a firm based on past performance and stock market indicators. Description is the problem of representing an object in terms of a series of primitives, and the PR system produces a structural or linguistic description. A general composition of a PR system is given below. Figure 2.2 The composition of a PR system 23 2.6.2 Applications of Pattern Recognition It is true that application was one of the most important elements for PR theory. Pattern Recognition has been developed for many years, and the technology of PR has been applied in many fields, one of those fields is “character recognition”. 2.6.3 Character Recognition Character extraction from a scene image is based on identification of local target. Character recognition is commonly performed after the image has been binarized by using a single threshold value. In photographic image, characters are most often located on signboard or similar region of coherent background .Therefore, identifying the signboard region as local target area will then be similar to the document image. Binarization processing will produce useful image for character recognition, the character and background regions in the local target area are separated using threshold value calculated for the local target area. 24 2.7 Run-Length Coding Algorithm Usually, after converting the image into binary image, we now deal with zeros and ones to represent the foreground and background. (Chengjie, Jie and Trac, 2002b); Kofi, (Andrew, Patrick and Jonathan, 2007) assumed that the run length coding is the standard coding technique block transforming based on image / video compression. A block of quantized transform coefficient is first represented as sequence of RUN (number of consecutive zeros) level (the value of the following nonzero coefficient) pair which are then entropy coded .The RUN means taking the pixels in the same row in blocks .That means, every block of RUN can be represented in horizontal projection, and every horizontal projection can be calculated from run-length code. The problem then is to group together all points of image that labeled as object points into an object image. We will assume that such points are spatially close. This notion of spatial proximity requires more precise definition, so that an algorithm may be devised to group spatially close points into component, as shown in Figure 2.3 below Figure 2.3 Horizontal projection calculated from run-length code 25 2.7.1 Neighbors A pixel in digital image is spatially close to several other pixels .In digital image represented on a square grid, a pixel has common boundary with four pixels and share a corner with four additional pixels, we say that two pixels are 4-neighbors if they share a common boundary .Similarly , two pixels are 8-neighbors if they share at least one corner .For example , the pixel at location [i,j] has 4-neighbors[i+1,j] ,[i-1,j] ,[i,j+1] and [i,j-1]. The 8-neighbors of pixel include the 4-neighbors plus [i+1,j+1], [i+1,j-1] , [i1,j+1] and [i-1,j-1]. A pixel is said to be 4-connected to its 4-neigbhors and 8-neighbors as shown in Figure 2.4 below. 4-Neighbors [i+1,j] ,[i-1,j] ,[i,j+1] and [i,j-1] 4-Neighbors plus [i+1,j+1], [i+1,j-1] , [i-1,j+1] and [i-1,j-1] Figure 2.4: 4- and 8-neigbhorhoods for rectangular image location pixel [i,j] is located in center of each figure 26 2.7.2 Path A path the pixel at [i0,j0] to the pixel at [in,jn] is sequence of pixel indices [i0,j0],[i1,j1],…………………[in,jn] such that the pixel at [ik,jk] is a neighbor of the pixel at [ik+1,jk+1] for all k with o<=k<=n-1. If the neighbor relation uses 4- connection, then the pixels is a 4-path; for 8-connection, the path is an 8-path, simple examples of these are shown in Figure 2.5 below Figure 2.5 4-path and 8-path 2.7.3 Foreground The set of all 1 pixels in an image is called the foreground and is denoted by S. The foreground represents the object which exists on background .It perhaps can be a text or other object exists in the image. On the other hand, foreground is more interesting than the background. 27 2.7.4 Connectivity A pixel p S is said to be connected to q S if there is a path from p to q consisting entirely of pixels of S. Note that connectivity is an equivalence relation .For any three pixels p, q and r in S , we have the following properties. 1. Pixel p is connected to p (reflexivity) 2. If p is connected to q , then q is connected to p( commutatively) 3. If p is connected to q is connected to r, then p is connected to r (transitivity). 2.7.5 Connected Components A set of pixels in which each is connected to all other pixels is called a connected component. the connected component is set of pixels collect together in every row that call it Run ,but it necessary each Run whose pixels joint together after that, every Run connect with other Run in other rows, eventually, the connected component is huge from pixels connected together and also call it by Run-Length Coding 28 2.7.6 Background The set of all connected components of S’ (the complement of S’) that have points on the border of an image is called the background. All other components of S’ are called holes .Let us consider the simple picture shown in Figure 2.6 below Figure 2.6 Border of an image How many objects and how many holes are in this figure? If we consider four connections for both foreground and background, there will be four objects that are 1 pixel in size and there is one hole. If we use eight connections, then there will be one object and no hole. In both cases, we have an ambiguous situation. A similar ambiguous situation arises in a simple case is shown in Figure 2.7 below Figure 2.7 Ambiguous border 29 If the 1s are connected, then the 0s should not be .To avoid this hard situation, then four connections should be used for S’ 2.7.7 Boundary The boundary of S is a set of pixels of S’ that have four neighbors in S’ .The boundary is usually denoted by S’. The boundary is edges of the object which separate the object from other objects or background, and also the boundary whose intensity should be higher than sides of the boundary. When we want detect a text from an image, we have to know the boundary of the text. 2.7.8 Interior The interior is the set of pixels of S that are not in its boundary. The interior of S is (S-S’).The interior is insider part of the object that represents the object, and it is different from the boundary in terms of intensity, because the boundary represents a line that separates between the object and foreground. 30 2.7.9 Surrounds Region T surrounds region S (or S is inside T), if any four-paths from any point of S to the border of the picture must intersect T. Figure 2.8 below shows examples of a simple binary image: its boundary, interior and surrounds. Figure 2.8 A binary image with its boundary, interior and surrounds 2.7.10 Component Labeling One of the most common operations in machine vision is finding the connected components in an image .The points in a connected component form are candidate regions of representing an object. As mentioned earlier, in computer vision most objects have surfaces. Points belong to a surface project to spatially close points. The notion of 31 “spatially close” is captured by connected components in digital image. It should be mentioned here that connected component algorithms usually form a bottleneck in a binary vision system .The algorithm is sequential in nature, because the operation of finding connected components is a global operation. If there is only one object in an image, then there may not be a need for finding the connected components; however, if there are many objects in an image and the objects properties and locations need to be found, then the connected components must be determined. A component labeling algorithm finds all connected components in an image and assigned a unique label to all points in the same components. Figure 2.9 shows an image and its labeled connected components. Figure 2.9 An image (a) and its connected component image (b) 32 2.8 Properties Text 2.8.1 Removing the Borders The borders should be removed; this will reduce the image size .only the rectangular part of the image which contains a text or texts in the image that will remain. That means, there are many connected components which assigned by pixel “1” while the background assigned by “0” .To remove the borders, there four stages that are involved: Firstly, the start from up-bottom that from first row, if the row does not contain the pixel “1”, it will be removed and this will continue until there exist pixel “1” in a row .In this case, this row containing the pixel “1” is the border of image in above part. Furthermore, the same case is applicable to bottom-up until there exists a row containing the pixel “1”- this is in the horizontal stage. Afterwards, the same operation is made in the vertical, but the differences in column instead from row, which start from first column then continues until there exists the pixel “1” in any column may be first column, that means from left-to-right. Moreover, the reverse makes it from right-to-left. 2.8.2 Dividing the Text into Rows. After removing the borders, the area will now be divided into rows. ( Mohanad and Mohammad, 2006e) they made every text to have two lines i.e. every text with start line in the first row and end line in the end row in the text. Furthermore, the start line and end line are different from text to text because of multi-scale variances in texts. So every text depends on the size of the text which has properties assigned for it. That 33 means, length and width of connected components exist side by side i.e. horizontally in many rows .We can say this represents the text whose properties and size are different from the properties and size of another text, or multi-scale. In this base, we can know start line and end line of each text .After that, can deal with it as shown in Figure 2.10 below Figure 2.10 Divide the text into rows 2.8.3 Dividing the Rows “Lines” into the Words The single line is then divided into words .Because that, (Mohanad and Mohammad, 2006e) they employed the empty area before and after the text are removed, but there main question must answer on it “here is size or scale differences from text to another also space between the words depend on the scale .So, in this problem can answer on this question, we can depend on length and width of connected components if is character and on this basic we know the space between words and between characters in the word based on ratio the length of the width. The word may be a single character or more than that, the size may be different in the same word, and it is not necessary that the word has a meaning. As shown in Figure 2.11. 34 Figure 2.11 Divide the rows into the words 2.8.4 Dividing the Word into Characters Each word is then divided into characters and saved in the array, Again on this basic , we know the character in each word and each word in text .Afterward , each connected component known then used the fuzzy logic to recognize the character, and any connected component not recognized is noise or distortion included in the remaining image. As shown in Figure 2.12 below 35 Figure 2.12 Divide the word into characters 2.9 Identifying Character After we have connected a component with a rectangular segment, this rectangular has four corners and nine identifying pixels to distinguish the character as show in Figure 2.13 below. Figure 2.13 Identify character 36 Here, (Mohanad and Mohammad, 2006e) proposed that any character can be identified based on four corners and center point that intersect between y-axis and xaxis. Then based on this criterion, we can identify the character easily. Also any pixel in any corner if it’s a “0” this means a background, or if it’s a “1” this means a pixel of character or connected component .Every character whose properties are different from others, for example, a character “a” has upper left corner off, upper right corner off, lower left corner off, lower right corner on and pixel of center is off .This properties different from other characters. So, we can recognize a character in the status in which the character doesn’t has noise or distortion. 2.10 Fuzzy Logic An objective of Fuzzy Logic has been to make computers think like people. Fuzzy Logic can deal with the vagueness intrinsic to human thinking and natural language and recognizes that its nature is different from randomness. Using Fuzzy Logic algorithms could enable machines to understand and respond to vague human concepts such as hot, cold, large, small, etc. It also could provide a relatively simple approach to reach definite conclusions from imprecise information. 37 2.10.1 What Fuzzy Logic? The term Fuzzy Logic has been used in two different senses. It is thus important to clarify the distinctions between these two different usages of the term. In a narrow sense, Fuzzy Logic refers to a logical system that generalizes classical two valued logic for reasoning under uncertainty. In a broad sense, Fuzzy Logic refers to all of the theories and technologies that employ fuzzy sets, which are classes with unsharp boundaries. For instance, the concept of “warm room temperature” may be expressed as an interval (e.g. [70 F, 78 F]) in classical set theory. However, the concept does not have a well-defined natural boundary. A representation of the concept closer to human interpretation is to allow a gradual transition from "not warm" to" warm" In order to achieve this, the notion of membership in a set needs to become a matter of degree. This is the essence of fuzzy sets. An example of classical set and a fuzzy set shown in Figure 2.14, where the vertical axis represent the degree of membership a set. Figure 2.14 A classical set and fuzzy set representation of "warm room temperature" 38 2.10.2 What is the Fuzzy Logic Toolbox? The Fuzzy Logic Toolbox is a collection of functions built on the MATLAB® numeric computing environment. It provides tools for you to create and edit fuzzy inference systems within the framework of MATLAB. 2.10.3 Fuzzy Sets Fuzzy logic starts with the concept of a fuzzy set. A fuzzy set is a set without a crisp, clearly defined boundary. It can contain elements with only a partial degree of membership. To understand what a fuzzy set is, first consider what is meant by what we might call a classical set. A classical set is a container that wholly includes or wholly excludes any given element .For example , when we have values of pixels in above where represented by “0” or “1” this will represent by value off or on respectively .This regarding the value of the pixel either be regarding the location of pixel in any corner or in center point “how know the location of the pixel upper-left or upper-right or middleleft or middle-right or low-left or low-right this depended on fuzzy set of this ,that determine the location of pixel by using the fuzzy set. As show in Figure 2.15 below and will more explain in next in “membership function” 39 Figure 2.15 (a) Input of a pixel Figure 2.15 (b) Input of location for a pixel 2.10.4 Membership Function A membership function (MF) is a curve that defines how each point in the input space is mapped to a membership value (or degree of membership) between 0 and 1. The input space is sometimes referred to as the universe of discourse, a fancy name for a 40 simple concept. As show in above Figure 2.15 this explain the membership function between cures .This means that it’s based on whether the pixel is “0-0.5” or “0.6-1” with “off” or “on” respectively, also with location of pixel to low , median or high with “0.1-0.3” or “0.4-0.6” or “0.7-0.9” respectively . This gives us the benefit of fuzzy set with use membership function to generate the output based on requirements of input of pixel value and pixel location as show in Figure 2.16 below of output. Figure 2.16 Output variable “letter” 2.10.5 If-Then Rules Fuzzy sets and fuzzy operators are the subjects and verbs of fuzzy logic. These if-then rule statements are used to formulate the conditional statements that comprise fuzzy logic. A fuzzy rule is the basic unit of capturing knowledge in many fuzzy systems. A fuzzy rule has two components, an IF part (Also referred to as the antecedent) and THEN Part (Also referred to as the consequent). 41 IF < antecedent > THEN < consequent > Where: The antecedent describes a condition and the consequent describes a conclusion. 2.10.6 Fuzzy Inference Systems Fuzzy inference is the process of formulating the mapping from a given input to an output using fuzzy logic. The mapping then provides a basis from which decisions can be made, or patterns discerned. The process of fuzzy inference involves all of the pieces that are described in the previous sections: membership functions, fuzzy logic operators, and if-then rules. 2.10.7 Rule Review. The Rule Viewer displays a roadmap of the whole fuzzy inference process. It’s based on the fuzzy inference diagram afore-described. 42 2.10.8 Surface Review The Surface Viewer has a special capability that is very helpful in cases of two or more inputs and one output. In Figure 2.17 below shows a building system with fuzzy logic. Figure 2.17 Building the system with fuzzy logic 43 2.11 Summary This chapter has given a detailed overview of segmentation categories and text recognition which play an important role in terms of character recognition. Also the text detection, text area identification, text region localization, text extraction and binary image were drawn and explained separately. Moreover, analytic segmentation, runlength coding algorithm, properties of text and character identifying. Finally fuzzy logic was used to determine the text. CHAPTER III METHODOLOGY 3.1 Introduction This chapter describes the project methodology and the proposed technique. Figure 3.1 shows the project framework which is very important. It’s employed to provide a systematic proposed method of procedures and principles that aim to achieve the objectives of this study. The purpose of having methodology is to simplify the analysis process and also to explain the requirements and formulations of the project. This is important to ensure that the phases of the project can be done smoothly and timely. 45 Figure 3.1 Proposed method 46 3.2 Problem Statement and Literature Review As described in chapter I, this project would propose a segmentation approach to solve invariant complex image problems. In this regards, a study was carried out on related and latest literatures about text segmentation fields. Whereas, this study is to illustrate edge segmentation styles, text detection, text identification, text localization, text extraction, analytic segmentation and current segmentation techniques. This investigation is essential in designing the novel method and can ensure a better performance. 3.3 System Development This project would develop separately two parts of proposed approach. Firstly, heuristic segmentation model based on edge detection by using kernels (eight angles) of compass operator, and after that collect every two edges to kernels which are opposite to each other. So we can get four directions of edges detection. By collecting two kernels (opposite to each other), this can give us complemented edge detection throughout every direction. Afterward, it collects all four edges detection, so gives us total of all edges direction. Then, we use feature extraction to extract text from an image with help runlength algorithm to determine connection component. Secondly, fuzzy logic system is used of identify the character, so based on rectangle on connection component and then, we use nine pixel of identify the connection component. 47 3.4 Performance Evaluation At this project phase, performance of proposed method is evaluated by using edge strong .The performance edges explain percentage in term of correct segmentation and miss segmentation .A missed error occurs when edge not dense or when it is very similar of background between character and background. Moreover, the performance for the proposed method does perform well in multi-scale and multi-orientation, system development would revise to enhanced proposed segmentation method. 3.5 General Steps of Proposed Techniques This project proposes segmentation approach as analytical strategies via kernels of compass operator of edges detection .In this regards, to achieve a collect in of kernels. Next, and use feature extraction is used extract text .Then, fuzzy logic to indentify character. Finally, character extraction algorithm used to extract character properly. Figure 3.2 illustrates block diagram of general steps of proposed approach. 48 Figure 3.2 Block diagram of general steps of proposed approach 3.6 Proposed Algorithm for Edge Based Text Region Extraction The basic steps of the edge-based text extraction algorithm are given below. The details are explained in the following. Step1:- Input image or read image with original colors. Step2:- Create a Gaussian pyramid by convolving the input image with a Gaussian kernel and successively down-sample each direction by half. Step3:- Create directional kernels to detect edge at 0, 45, 90,135,180,255,90and 315 orientations. Step4:- Convolve each image in the Gaussian pyramid with each orientation filter. Step5:- Collect kernels to detect edge at 0+180, 45+255, 90+90 and 135+315 orientations. 49 Step6:- Dilate the resultant image using sufficiently structuring element (3 3) to cluster Candidate text regions together. Step7:- Create final output image with in white pixels against a plain black background. 3.7 Detection This section corresponds to steps 1 to 4 of 3.5.1. Given an input image, the region with a possibility of text in the image is detected A Gaussian kernel of size 3 3 and down-sampling the image in each direction by half. Down sampling refers to the process whereby an image is resized to lower resolution from its original resolution. A Gaussian filter of size 3 3 will be used as figure 3.3 .Each level in the pyramid corresponds to the input at a different resolution. A sample Gaussian pyramid with 8 levels of resolution is shown in Figure 3.4. These images are next convolved with directional filters at different orientation kernel for edges detection in the horizontal angle (0+180), vertical angle (90+90), and diagonal angle (45+255) and diagonal angle (135+315) directions. The kernels used are shown in the figure 3.5 and whose implements in Figures 3.6 Figure 3.3 Gaussian filter 50 Figures 3.4 Samples Gaussian pyramid with 8 levels 0 kernel 0 directional 180 kernel 45 kernel 90 kernel 135 kernel 45 directional 90 directional 135 directional 255 kernel 90 kernel 315 kernel 180 directional 255 directional 90 directional 315 directional Figure 3.5 Extraction operation 51 From kernels existed in above we do put them in variables such as kernel0 kernel45, kernel90, kernel135, kernel180, kernel255, kernel90, and kernel315 until now we inserted kernels in whose variables Then we put eight kernels in array we call it kernel{} that has been inserted eight kernels. Next, we do Gaussian as shown in equation below GK=Fspecial(‘gaussain’) (3.1) After, we created Gaussain Pyramid, we do that convolve image with Gaussain filter with eight levels until now original size, equations show below. i=7 pyamid{i}=image1 (3.2) image2=imfilter(image1,GK,’Conv’) (3.3) i=0 i=7 i=0 Next, we do, the down-sample by 0.5 from i=0 until i=7 .As shown in question below i=7 pyamid{i}=imresize(image2,0.5) i=0 (3.4) 52 Next, we do convolving images at each level in pyramid with eight edges detection as shown in equation below. i=7 j=7 i=0 j=0 Conv{i,j}=imfilter(pyramid{i},kernel{j},’Conv’) (3.5) Then, we do resize the images to original image size i=7 j=7 i=0 j=0 Conv2{i,j}= imresize(Conv{i,j},[size(image1,1) size(image1,2)]) (3.6) We have eight levels of image that each level has been edges detected that are { 0, 45, 90, 180, 255, 90, 135 and 315 } at each level from eight levels. After that, we return each level of original size of image .Next; we do collect each same edge from eight levels which backed to original size. Then it give us edges detection in { 0, 45, 90, 180, 255, 90, 135 and 315 } we can see the result to these edges in (chapter IV, in Figure 4.2).The equation below shows the operation of collect to it. i=7 total{i}=im2bw(Conv2{1,i}+ Conv2{2,i}+ Conv2{3,i}+ Conv2{4,i}+ i=0 Conv2{5,i}+ Conv2{6,i}+ Conv2{7,i}+ Conv2{8,i} ) (3.7) 53 Given k operators, gk(x,y) is the image obtained by convolving f(x,y) with the kth operator with original size .We can put also seven levels remaining ,but it will take big area so, we enough with original image of explain as shown in Figure 3.6 . The gradient is defined as g(x,y) = max gk(x,y) (3.8) 0 45 180 255 90 90 135 315 Figure 3.6 Edges detection After convolving the image with the orientation kernels, we will collect the edges detection with them, because give us the final edges detection with them as shown in equation below. 54 Edge first= Edge”0”+ Edge “180” (3.9) Edge second= Edge”45”+ Edge “255” (3.10) Edge third= Edge”90”+ Edge “90” (3.11) Edge fourth = Edge”135”+ Edge “315” (3.12) Edge total= Edge first+ Edge second+ Edge third+ Edge fourth (3.13) Feature map will be creating. A weighting factor is associated with each pixel to classify it as candidate or non-candidate for text region. A pixel is a candidate for text if it is highlighted in all of the edge maps created by the directional filters. Thus, the feature map is a combination of all edge maps at different scales and orientations with the highest weighted pixels present in the resultant map 55 3.8 Feature map and candidate Text Region detection 3.8.1 Directional Filtering In our proposed method, we use magnitude of the second derivation of intensity as measurement of edge strength as this allows better detection of intensity peaks that normally characterize text in the image. The edge density is calculated based on average edge strength within window. Considering effectiveness and efficiency, eight orientation 0,45,90,135,180,255,90 and 315are used to evaluate the variance of orientation, where 0 and 180 denote vertical direction, 90 and 90 denote horizontal direction and 45,135,255 and 315 denote four diagonal directions, respectively and the convolution operation with compass operator. 3.8.2 Edge Selection Vertical edges form the most important strokes of characters and their lengths also reflect the heights of corresponding characters. By extracting and grouping these strokes, we can locate text with different heights (sizes).However, in real scene under an indoor environment many other objects, such as windows, doors, walls, etc .., also produce strong vertical edges. Thus, not all the vertical edges can be used to locate text. However, vertical edges produced by such non-character objects normally have very large lengths. Therefore, by grouping vertical edges into long and short edges, we can 56 eliminate those vertical edges with extremely long length and retain short edges for further processing. After, thresholding these long vertical edges may become broken short edges which may cause false alarms (positives). So the proposed method uses a two stage edge generation method. The first stage is used to get strong vertical edges by collect edge “0” and edge “180” who described in equations below. strong Edge v=│Ev│z strong strong (3.14) strong Edge v= Edge “0” bw +Edge “180” bw (3.15) Where Ev is “0”+”180” intensity edges image which is the 2D convolution result of the original image with the “0” and “180” kernels , │.│z is a thresholding operator to get a binary result of vertical edges ,it is not very sensitive to the threshold value. The second stage is used to obtain weak vertical edges described in below. Strong dilated=Dilation( Edge v ) 3 3 (3.16) 57 closed=closing(dilated)m 3 (3.17) weak Edge v= │ Ev (closed-dilated)│ z (3.18) Where the morphological dilation with rectangular structuring element of size 3 3 is used to eliminate the effects of slightly slanted edges and a vertical linear structuring element m m is then employed in closing operator to force the strong vertical edges closed. The resultant vertical edges are a combination of strong and weak edges as described in equation below. strong weak Edge v= Edge “v” bw +Edge “v” bw (3.19) The results of two stages edges generation method the resultant vertical edge images are already done after use the question in the above. Morphological thinning operator followed by connected component labeling and analysis algorithm are then applied on the resultant vertical edges as described equations in the below Thinned=thinning (Edge v) (3.20) Labeled=BWlabell (Thinned,8) (3.21) 58 Where the morphological thinning operator makes the widths of the resultant vertical edges on one pixel thick .Since high value in the length labeled image represents a long edge, a simple thresholding described equation the below that used to separate short edges. lengthlabeled Short v bw=│Ev │z (3.22) 3.8.3 Feature Map Generation As known, regions with text in them will have significantly higher values of average edge density, strength and variance of orientation than those of non-text regions. We exploit these three characteristics to refine the candidate regions by generation feature which suppresses the false regions and enhances true candidate text regions. This procedure is described in below. Candidate=Dilation (Short v bw) m (3.23) m Where , the morphological dilation with a m m structuring element employed in the selected short vertical edge image and used to get potential candidate text regions 59 3.8.4 Localization This part corresponds to step 6 of 3.6 The process of localization involves further enhancing the text regions by eliminating non-text regions. One of the properties of text is that usually all characters appear close to each other in the image , thus forming a cluster by using a morphological dilation operation these possible text pixels can be clustered together, eliminating pixels that are far from the candidate text regions. Dilation is an operation which expands or enhances the region of interest, using a structural element of the required shape and/or size. The process of dilation is carried out using structuring element in order to enhance the regions which lie close to each other. In this algorithm, a structuring element of size [3 3] has been has been used. The resultant image after dilation may consist of some non-text regions or noise which needs to be eliminated. An area flitting is carried out to eliminate noise blobs present in the image. 3.8.5 Character Extraction In corresponds to step 7 of 3.5.1 .The common OCR system available require the input image to be such that the characters can be easily parsed and recognized .The text and background should be monochrome and background-to-text contrast should be high .Thus this process generates an output image with white text against a black background. 60 3.9 Connection Component The labeling of connected components in a binary image is a fundamental operation in pattern recognition. This algorithm transforms a binary image into a symbolic one with each connected component having a unique numeric label. The image can be represented in a number of ways using array, run-length, quadtree, octrees and bintress usually, after the conversion of image to binary form, we now deal with zero and one to represent the foreground and background that use Run-Length Coding Algorithm. The RUN mean “take the pixels in same row in to blocks” .That mean, “every block of RUN represented in horizontal projection, and every horizontal projection calculated from run-length code”. The labeling algorithm represent in equivalence table, Resolving the equivalence table has been the focus of most labeling algorithms, with little effort to implement and also minimizes use of memory. Process uses a run-length encoding representation. Conversion of the original binary image to run-length encoded format is easily parallelized by processing multiple rows in parallel. The run-length encoded format is much more compact than the binary image (individual runs have a single label), and so the sequential label propagation stage is much faster than the conventional algorithm. Details of the algorithm are given below. The stages involved in our implementation are as follows: 1. Pixels are converted to runs in parallel by rows, 2. Initial labeling and propagation of labels 3. Equivalence table resolution 4. Translating run labels to connected component The design is parallelized as much as possible. Although stages 2 and 3 are sequential, they operate on runs, which are far less numerous than pixels. Similar to stage 1, stage 4 can be executed in parallel by row. A run has the properties {ID,EQ, s, e, 61 r}, where ID is the identity number of the run, EQ is the equivalence value, s the x offset of the start pixel, e the x offset of the end pixel, and the row. The first stage involves row-wise parallel conversion from pixels to runs. Depending on the location and access mode of the memory holding the binary image, the entire image may be partitioned into n parts to achieve n run-length encoding in parallel. The use of runs rather than pixels reduces the size of the equivalence table. The following sequential local operations are performed in parallel on each partition, for an image size M × N to assign pixels to runs: Algorithm 3.1: Pixeltoruns (T) T: T(x, y) = I(x, y) i←0 If T(x, y) = 1 and Block = 0 si ← x Then Block ← 1 If Block = 1 and (T(x, y) = 0 or x = M) Then ei ← (x − 1) ri ← y IDi ← EQi ← 0 i←i+1 isBlock ← 0 Where Block is 1 when a new run is scanned for partition T and M is the width of the image. A run is complete when the end of a row is reached or when a background pixel is reached. The maximum possible number of runs in an M × N image is 2MN. 62 The second stage involves initial labelling and propagation of labels. The IDs and equivalences (EQs) of all runs are initialized to zero. This is followed by a raster scan of the runs; assigning provisional labels which propagate to any adjacent runs on the row below. For any unassigned run (IDi = 0) a unique value is assigned to both its ID and EQ. For each run i with ID IDi, excluding runs on the last row of the image; runs one row below runi are scanned for an overlap. An overlapping run in 4-adjacency (ie. si ≤ ej and ei ≥ sj ) or 8-adjacency (ie. si+1 ≤ ej and ei+1 ≥ sj ) is assigned the ID IDi, if and only if IDj is unassigned. If there is a conflict (if an overlapping run has assigned IDj ), the equivalence of run i, EQi is set to IDj . This is summarized in algorithm Algorithm 3.2: Initlabelling (runs) m←1 For i ← 1 to Total Runs Do If IDi = 0 Then IDi ← EQi ← m m←m+1 For each rj Є ri+1 Do If IDj = 0 and ei ≥ sj and si ≤ ej Then IDj ← IDi EQj ← IDi If IDj 6= 0 and ei ≥ sj and si ≤ ej Then EQi ← IDj Where Total Runs excludes runs on the last row of the image. Applying PixelToRuns() to the object in Figure 3.7 (a ’U’ shaped object) will generate four runs each with unassigned ID and EQ. 63 Figure 3.7 U shapes object with 4 runs after pixeltoruns Table 3.1Result to object to rows The third stage is resolution of conflicts, as shown in algorithm 3.3. In the example above (Figure 3.7 and table 1) a conflict occurs at B3; the initially assigned EQ = 1 in iteration 1 changes to EQ = 2 in iteration 3 due to the overlap with B1 and B4, see table 1. This conflict is resolved in ResolveConflict () resulting in ID = 2 and EQ = 2 for all the four runs. Even thoughResolveConflict () is highly sequential, it takes half the total cycles as the two ‘if statements’ in the second loop are executed in simultaneously. The final IDs (final labels) are written back to the image at the appropriate pixel location, without scanning the entire image, as each run has associated s, e and r values. 64 Algorithm 3.3: Resolveconflict (runs) For i ← 1 to Total Runs Do If IDi =! EQi Then TID ← IDi TEQ ← EQi For j ← 1 to Total Runs Do If IDj = TID Then IDj ← TEQ If EQj = TID Then EQj ← TEQ Table 3.2 Results after image scan, where ST=start, EN=end and RW=row B1 B2 B3 B4 ID 0 0 0 0 EQ 0 0 0 0 ST 4 1 4 1 EN 5 2 5 5 RW 1 2 2 3 65 Completely label. As shown in Figure 3.7 above, the runs are extracted in the scan, while the 8-adjacency labeling is done. We will use 8-neighbors where they share at least one corner, the position of it are [i+1,j] ,[i-1,j] ,[i,j+1] , [i,j-1], [i+1,j+1], [i+1,j-1] , [i-1,j+1] and [i-1,j-1] as shown in Figure 3.8 below Figure 3.8, 8-neigbhorhoods for rectangular image location pixel [i,j] is located in center of each figure 3.10 Fuzzy logic We will use fuzzy logic of identify the character or connection component in the image, after determining connection component we will represent rectangle on it. Then we must know which pixel in each corner and other pixels used in identify the connection component as shown in Figures 3.9. 66 Figure 3.9 Identify the character Afterward, we give value between 0 and 1 for each identified pixel .Next send to fuzzy of make recognize identified pixel of connection component when fuzzy receives this values of identified pixel and whose status “on” or “off”, after that fuzzy logic determine either character or noise. Below Figure 3.10 designs input fuzzy of location of pixel and status of pixel. Figure 3.10 (a) Example of fuzzy input 67 Figure 3.10 (b) Example of fuzzy outputs 3.11 Summary This chapter discusses the general framework of the proposal methodology. We need to follow the plan that has been discussed in this chapter to reach the objective of this project. It was shown with some details in each step in our methodology. First, the problem statement has been outlined, and followed by literature review, system development, performance evaluation, proposed technical, connection component and fuzzy logic. Furthermore, each phase of the project procedure is discussed briefly. Finally, the last phase is the conclusion and the report writing. Each stage of these stages play an important role in accomplishing the project. CHAPTER IV IMPLEMENTATION 4.1 Introduction In this chapter is about the results .Basically this chapter discusses the findings of the project. The findings will be presented as extract the text and recognize the character. 69 4.2 Input Image Our input image is a colored image with resolution 255x256 which will extract from it the text as shown in below Figure 4.1. Figure 4.1 Original image Our proposed method is based on the fact that edges are reliable feature of extract the text of color/intensity, orientation and multi-scale. Edge strength, density and orientation variance are three distinguishing characteristics of text embedded in images, which can be used as main feature for detecting text. The proposed method consists of three stages: candidate text region detection, text region localization and character extraction. Based on the idea that edge information in an image, is found by looking at the relationship between a pixel and its neighbours. i.e. edge is found by discontinuity of grey level values. An ideal edge detector should produce an edge indication localized to a single pixel located at the mid-point of the slope. 70 The first derivative at any point in an image is obtained by using the magnitude of the gradient at that point. A change of the image function can be described by a gradient that points in the direction of the largest growth of the image function However we have kernels (filters) 3x3 that will pass on W(i,j) is the weight for pixel (i,j) array of image, whose value is determined by the number of edge orientations within window(filters) and figure 4.2 shows structure of kernel. Figure 4.2 Structures 3x3(filter) As known in below the structure or filter that use density is calculated based on the average edge strength within a window. The edge strength based on threshold either is edge strength (edge detected) or edge weak(no edge detected and whose place is center pixel as shown in Figure 4.2 it is place of edge detected in filter which exists on array to image as shown in our example below in Figure 4.3. 71 Figure 4.3 our example of convolution operation Convolution is a simple mathematical operation which is fundamental to many common image processing operators. Convolution provides a way of `multiplying together' two arrays of numbers, generally of different sizes, but of the same dimensionality, to produce a third array of numbers of the same dimensionality. This can be used in image processing to implement operators whose output pixel values are simple linear combinations of certain input pixel values. In an image processing context, one of the input arrays is normally just a gray level image. The second array is usually much smaller, and is also two-dimensional 72 (although it may be just a single pixel thick), and is known as the kernel. Figure 4.3 shows an example image and kernel. The convolution is performed by sliding the kernel over the image, generally starting at the top left corner, so as to move the kernel through all the positions where the kernel fits entirely within the boundaries of the image. (Note that implementations differ in what they do at the edges of images.) Each kernel position corresponds to a single output pixel, the value of which is calculated by multiplying together the kernel value and the underlying image pixel value for each of the cells in the kernel, and then adding all these numbers together. Edge-detection operator is a matrix area gradient operation that determines the level of variance between different pixels. The edge-detection operator is calculated by forming a matrix centered on a pixel chosen as the center of the matrix area. If the value of this matrix area is above a given threshold, then the middle pixel is classified as an edge. Based on this the operation we divide image into several regions, regions with text in them normally have much higher average values of edge density, strength and orientation variance than those of non-text regions. All the gradient-based algorithms have kernel operators that calculate the strength of the slope in directions which are orthogonal to each other, commonly vertical until diagonal. the contributions of the different components of the slopes are combined to give the total value of the edge strength. So, we have been had types eight of kernel (filter) which use in detect edge in direction eight as shown in Figure 4.4 ,we have eight kernels(filters) and each kernel detect edges based on whose angle, so by use this kernels we can detect edges in whole directions 73 0 kernel 180 kernel 45 kernel 90 kernel 255 kernel 90 kernel 135 kernel 315 kernel Figure 4.4 kernels used Various kernels can be used for this operation. The whole set of 8 kernels is produced by taking one of the kernels and rotating its coefficients circularly. Each of the resulting kernels is sensitive to an edge orientation ranging from 0° to 315° in steps of 45°, where 0° corresponds to a vertical edge. The maximum response for each pixel is the value of the corresponding pixel in the output magnitude image. The values for the output orientation image lie between 1 and 8, depending on which of the 8 kernels produced the maximum response. This edge detection method is also called edge template matching, because a set of edge templates is matched to the image, each representing an edge in a certain orientation. The edge magnitude and orientation of a pixel is then determined by the template that matches the local area of the pixel the best as shown in Figure 4.5 below. 74 Figure 4.5 Directions of edges-detection Edge detector is an appropriate way to estimate the magnitude and orientation of an edge. Although differential gradient edge detection needs a rather time-consuming calculation to estimate the orientation from the magnitudes in the whole-directions, the edge detection obtains the orientation directly from the kernel with the maximum response. The set of kernels is limited to 8 possible orientations; however experience shows that most direct orientation estimates are not much more accurate. On the other hand, the set of kernels needs 8 convolutions for each pixel, whereas the set of kernel in gradient method needs eight, start at “E” kernel being sensitive to edges in the vertical direction until last “SE” kernel the diagonal direction. The result for the edge magnitude image is very similar with whole methods, provided the same convolving kernel is used as shown in Figure 4.5. A variety of Edge Detectors are available for detecting the edges in digital images. However, each detector has its own advantages and disadvantages. The basic idea behind edge detection is to find places in an image where the intensity changes rapidly. Based on this idea, an edge detector may either be based on the technique of locating the places where the first derivative of the intensity is greater in magnitude than a specified threshold or it may be based on the criterion to find places where the second derivative of the intensity has a zero crossing. Edge detection the image is convolved with a set of (in general 8) convolution kernels, each of which is sensitive 75 to edges in a different orientation. For each pixel the local edge gradient magnitude is estimated with the maximum response of all 8 kernels at this pixel location: |G| = max (|Gi|: i=1 to n) Where Gi is the response of the kernel i at the particular pixel position and n is the number of convolution kernels. The local edge orientation is estimated with the orientation of the kernel that yields the maximum response. Various kernels can be used for this operation. Now, we will explain how it make magnitude with kernel “0” or in direct vertical at “E” and as shown below in Figure 4.6. And same procedure does with eight kernels from 0 to 315 Kernel “0” structure of kernels 76 sample of image Figure 4.6 structure of convolution The vertical edge component is calculated with kernel KE. |KE| gives an indication of the intensity of the gradient in the current pixel. The direction of the gradient is given by the mask giving maximal response. This is valid for all following operators approximating the first derivative. The below we have Figure 4.7 explain operation of kernel 0. 77 Figure 4.7 operation of kernel 0 GE= (Z3*a3+Z6*a6+Z9*a9)-(Z1*a1+Z4*a4+Z7*a7) Total edges=|GE|+|GNE|+|GN|+|GNW|+|GW|+|GSW|+|GS|+|GSE| The gradient is estimated in eight (for a convolution mask) possible directions. The convolution result of the greatest magnitude indicates the gradient magnitude. Operators approximating first derivative of an image function are sometimes Introduction and called compass operators because of the ability to determine gradient directions. So proper threshold value have to be selected so that we get only real edges and false edges are rejected. The selection of a threshold value is an important design decision that depends on a number of factors, such as image brightness, contrast, level of noise, and even edge direction. Typically, the threshold is selected following an analysis of the gradient image histogram. So, Selection of threshold is an important parameter to get better performance for considered noisy images. The output of the thresholding stage is extremely sensitive and there are no automatic procedures for satisfactorily determining thresholds that work for all images. 78 Here, we have been detected the image with eight edges and convolve with Gaussian pyramid as shown in Figures 4.8 below. Kernel 0 block structure result Figure 4.8 (a) detect edges Detect edge in Figure 4.8 (a) shows that by use kernel 0 where the use kernel that has block structure divide into white and black, the part black who detect from image and through black part is edge-detection, we notice that in black part divide into regions white and black (where white part represent edge-detection and black part represent strokes), so relied upon we can detect the edges. 79 kernel 45 block structure result Figure 4.8 (b) detect edges Detect edge in Figure 4.8 (b) shows that by use kernel 45 where the use kernel that has block structure divide into white and black, the part black who detect from image and through black part is edge-detection, we notice that in black part divide into regions white and black (where white part represent edge-detection and black part represent strokes), so relied upon we can detect the edges kernel 45 block structure Figure 4.8 (c) detect edges result 80 Detect edge in Figure 4.8 (c) shows that by use kernel 90 where the use kernel that has block structure divide into white and black, the part black who detect from image and through black part is edge-detection, we notice that in black part divide into regions white and black (where white part represent edge-detection and black part represent strokes), so relied upon we can detect the edges. kernel 135 block structure result Figure 4.8 (d) detect edges Detect edge in Figure 4.8 (d) shows that by use kernel 135 where the use kernel that has block structure divide into white and black, the part black who detect from image and through black part is edge-detection, we notice that in black part divide into regions white and black (where white part represent edge-detection and black part represent strokes), so relied upon we can detect the edges 81 kernel 180 block structure result Figure 4.8 (e) detect edges Detect edge in Figure 4.8 (e) shows that by use kernel 180 where the use kernel that has block structure divide into white and black, the part white who detect from image and through white part is edge-detection, we notice that in white part divide into regions white and black (where white part represent edge-detection and black part represent strokes), so relied upon we can detect the edges. kernel 255 block structure Figure 4.8 (f) detect edges result 82 Detect edge in Figure 4.8 (f) shows that by use kernel 255 where the use kernel that has block structure divide into white and black, the part black who detect from image and through black part is edge-detection, we notice that in black part divide into regions white and black (where white part represent edge-detection and black part represent strokes), so relied upon we can detect the edges. kernel 90 block structure result Figure 4.8 (g) detect edges Detect edge in Figure 4.8 (g) shows that by use kernel 90 where the use kernel that has block structure divide into white and black, the part black who detect from image and through black part is edge-detection, we notice that in black part divide into regions white and black (where white part represent edge-detection and black part represent strokes), so relied upon we can detect the edges. 83 kernel 315 block structure result Figure 4.8 (h) detect edges Detect edge in Figure 4.8 (h) shows that by use kernel 315 where the use kernel that has block structure divide into white and black, the part black who detect from image and through black part is edge-detection, we notice that in black part divide into regions white and black (where white part represent edge-detection and black part represent strokes), so relied upon we can detect the edges. 4.3 Complement Edge Detect with them Actually, after edges have been detected and convolve each edge with Gaussian pyramid. We will add two edges with them. Then it will give us entire edges detection with two angles. As below in Figure 4.9 shows the effect of collect two edges together. Where the black regions represent the background or strokes and 84 white regions represent the edge-detection, we notice the edge-detection in vertical at angle (0, 180), diagonal at angles (45, 255) together and angles (135, 315) together and in horizontal at angles (90, 90) together. (a) (b) (c) (d) Figure 4.9 Effect of adding two edges a) Add edges “0” and “180”, b) Add edges “45” and “255”, c) Add edges “90” and “90”, d) Add edges “135” and “315” 85 4.4 Eight Edges Detection Figure 4.10 shows the result of adding all edges that have been detected. Figure 4.10 Total of edges detection 4.5 Image Localization The process of localization involves further enhancing the text regions by eliminating non-text regions .Normally, text embedded in an image appears in clusters, i.e., it is arranged compactly. Thus, characteristics of clustering can be used to localize text regions. Since the intensity of the feature map represents the possibility of text, a simple global thresholding can be employed to highlight those with text possibility regions resulting in binary image. A morphological dilation operator can easily connect the very close regions together while leaving those position are fat away to each other isolated. In our proposed method, we use 86 morphological dilation operator with 3x3 square structuring to the previous obtained binary image to get joint areas referred to as text blobs, we use two constraints, where the first constraint is used to filter out all the very small isolated blobs whereas the second constraint filter out those blobs whose widths are much smaller than corresponding heights As shown in Figure 4.11 below. Figure 4.11 Localized of text 4.6 Separate Text from Background The text and background should be monochrome and background-to-text contrast should be high .Thus this process generates an output image with white text against a black background. And effect of this operation as is shown in Figure 4.6. 87 Figure 4.12 Separate text from background The algorithm has been tested on images, and the resultant can be seen in Figure 4.13and 4.14 respectively the below. (a) (b) (c) Figure 4.13 Test image 1 a) Image, b) Localization, c) Result 88 (a) (b) (c) Figure 4.14 Test image 2 a) Image, b) Localization, c) Result 4.7 Reduce Size 4.7.1 Determine Borders The borders should be removed; this will reduce the image size .only the rectangular part of the image which contains text or texts in the image that will remain as shown in Figure 4.15 below. Figure 4.15 Determine borders 89 4.7.2 Divide Text into Rows Now, after determine the region of text we are going to divide text into rows and also we used dilation and erosion of separate reaches characters as found in word “masters” in Figure 4.15 above .the result is shown in Figure 4.16(a)and (b) below. Figure 4.16 (a) row one Figure 4.16 (b) row two 90 4.8 Determine Character by Run-length and Recognize by Fuzzy logic Here, we have to determine the characters and how many characters in all words and, how many words in each text or line. Finally, the text is extracted from the image .So when we have connected component and have identified pixel to it. As shown in Figure 4.17 below. Figure 4.17 Identified character Let’s suppose the following N1=upper left center N4=center center off N7=lower left off N2=upper center on N5=center center on N8=lower center on N3=upper right off N6=center right off N9=lower right off N10=half lower center 91 Now, send identified pixel from N1 until N9 to fuzzy logic that has ten input and one output as shown in Figures 4.18, 4.19 and 4.20 below. Figure 4.18 Ten inputs and one output 92 Figure 4.19 Input one N1 Figure 4.20 Output 93 As noticed above of Figure 4.120 shown output we can see that output of character “s” is between 0 and 2, so we can know this is character “s” same scheme with all characters done with output of text is as shown in Figure 4.21 below. Figure 4.21 Output of extracted text 4.9 Summary An attempt is made to evaluate the performance of edge-detection for image. Experimental result have demonstrated that edge-detection works quite well for digital images corrupted with multi-scale and multi-orientation, whereas its performance of this type of edge-detection cannot be used in practical images which are generally corrupted with other types. However, these can be used successfully in conjunction with suitable digital filter to reduce the effect of noise substantially 94 before applying edge-detection. And the result for edge magnitude image is very similar with whole kernels, provided the same convolving kernel is used. CHAPTER V RESULTS DISCUSION 5.1 Introduction Recently, optical character recognition (OCR) has become widely used in character extraction from image .There are very large approaches of text extraction from image .Research on text extraction via applying edge-detection by using kernels (eight kernels). Also identified pixel which is used for character recognition after determine the connected components .Then, fuzzy logic is used as character classification after send identified pixel to fuzzy logic. 96 5.2 Discussion on Result The project finding includes: We represent an effective and robust general-purpose text detection and extraction algorithm, which can automatically detect and the extract text from complex image, the algorithm for detection text in images and video frames. According to the text property with higher edge strength in eight directions, edge detection is applied to get eight directional edge maps. Then based on that texts have weak text property, the text features are extracted to statistically characterize text and non-text areas. By using the edges features, we proposed a text detection algorithm for image and video frames. The algorithm has a good detection performance and is robust to detect the text also the algorithm was good performance to multi-scale and multi-orientation, where it has able to detect edges of different size to text. the algorithm with eight filters that detect in eight direction that give us more specify of array to image, that’s mean .the area of image divided into eight regions and each region will detect based on the filter that used in this operation. Based on the idea that edge information in an image, is found by looking at the relationship between a pixel and its neighbors .i.e. edge is found by discontinuity of grey level values. An ideal edge detector should produce an edge indication localized to a single pixel located at the mid-point of the slope. However edge detection is appropriate way to estimate the magnitude and orientation of an edge. Although differential gradient edge detection needs a rather time-consuming calculation to estimate the orientation from the magnitudes in the whole directions, the edge detection obtains the orientation directly from the kernel with the maximum response. As known, the edge which detected is edge strength that represents the edge of text by using the structure kernel at center pixel of kernel determine if exists strength 97 edge relied upon identify the edge also when we have detected edges perhaps exists long edges not element of the text, so we take measure of more maximum of longer edge, then if the edge was loner than standard it will remove. The Run-Length Encoding based on connected component labeling, this algorithm that we have successfully implemented is an extension, exploiting the desirable properties of run-length encoding, combined with the ease of parallel conversion to run-length encoding. The method “Identify pixel” to recognize the character was fast, suitable of performance, but when we have some character has blurry is difficulty recognize the character because based on determine the corner of connected component and point on side connected component. And last the fuzzy logic systems have been developed Matlab fuzzy toolbox. The systems are based on “Mamdani” fuzzy approach. The first task is to define inputs and the output of the fuzzy system. This stage depend on expert decision. Finally the text areas are identified by the empirical rules analysis and refined through project profile analysis. The experiment with various kinds of the natural image and video frames shows that the proposed method is effective for the distinction between regions and non-text regions, which is robust for font-size, font-color, background complexity and language. In the future work of text detection in videos, the performance need be further improved in detecting texts captured by camera where are strong illuminations changes and text distortion. 98 5.3 Experimental Results and Discussion In order to the evaluate the performance of the our proposed method, we use ten test image with different font-size, orientation, perspective and alignment .Figure 1….10 show some of the results .We can see that our proposed method used in localization and extraction the text from image with different font-size, orientation and perspective. Image localization Figure 5.1 sample 1 result 99 Image localization result Figure 5.2 sample 2 Image localization Figure 5.3 sample 3 result 100 Image localization result Figure 5.4 sample 4 Image localization Figure 5.5 sample 5 result 101 Image localization result Figure 5.6 sample 6 Image localization Figure 5.7 sample 7 result 102 Image localization result Figure 5.8 sample 8 Image localization Figure 5.9 sample 9 result 103 Image localization result Figure 5.10 sample 10 Although, there is no entirely accepted method that can be used to evaluate the performance of the text localization. So, we will assess the accuracy of our algorithm output by manually counting the number of correctly localized characters, which are regarded as ground-truth precision rate and recall rate. Here the performance can be evaluated by using the equations (5.1), (5.2) and (5.3). Precision=((correctly located)/(correctly located + false positive))*100% (5.1) Recall=((correctly located)/(correctly located + false negative))*100% (5.2) False positive=((false postive)/(correctly located + false negative))*100% (5.3) 104 Correctly located: means the corrected localization of a text exists in an image .On other words, which is exactly located on place of the text. False positive: relates to a localization process of the real intended text. False negative: relates to a localization process of non-text object. 105 Table 5.1 Performance evaluation 1 No sample Resolution Precision rate Recall rate False positive Sample1 256x255 75% 66% 33% Sample2 150x190 68% 65% 53% Sample3 370x280 75% 58% 30% Sample4 169x170 80% 83% 80% Sample5 406x307 67% 67% 50% Sample6 250x190 80% 60% 90% Sample7 460x480 60% 70% 65% Sample8 290x220 62% 67% 60% Sample9 408x300 55% 62% 40% 60% 71% 65% Sample10 373x498 Now, we will evaluate the performance of text extraction. So, we will assess the accuracy of our algorithm output by manually counting the precision of the number of falsely detected text and the number of detected text from the image. Given the marked ground-truth and detected result by our algorithm, we can calculate the recall and false alarm rates by using the equations (5.4) and (5.5). 106 Recall = ((number of correctly detect text)/(number of text))*100% (5.4) False alarm rate=((number of correctly detect text)/(number of detected text))*100% (5.5) Number of correctly detected text: is that the number of how many characters detected correctly in the text. Number of text: means the number of characters in the text. Number of falsely detected text: means the number of characters in the text that the algorithm failed to detect. 107 Table 5.2 Performance evaluation 2 No sample Resolution False alarm rate Recall rate Sample1 256x255 25% 80% Sample2 150x190 5.26% 95% Sample3 370x280 33% 75% Simple4 169x170 2% 99% Sample5 406x307 44% 69% Simple6 250x190 1% 99% Sample7 460x480 53% 65% Sample8 290x220 20% 80% Sample9 408x300 40% 70% 30% 75% Sample10 373x498 108 Table 5.3 Performance evaluation 3 No sample Resolution feature accuracy Sample1 256x255 Cover book 80% Sample2 150x190 Cover book 95% Sample3 370x280 Poster 75% Sample4 169x170 Poster 99% Sample5 406x307 Poster 69% Sample6 250x190 Note 99% Sample7 460x480 Poster 65% Sample8 290x220 Poster 80% Sample9 408x300 Cover product 70% Cover book 75% Sample10 5.4 373x498 Project Advantage The following are some advantages that can be found in the project: Edge detection is to get the edge maps of the image that can decrease the influence of background and effectively detect the initial candidate text and compute the text feature at each pixel from edge-detection by using kernels of candidate text. This approach can effectively detect the texts that have different font-size, font-color, 109 Language, spacing, distribution, background complexity, and edge where these distinct characteristics can be used to find possible text areas. Text is mainly composed of the strokes in horizontal, vertical, up-right, up-left direction, so it can be considered that the region with higher edge strength in these directions is the text regions. We use the edge detector to get edge maps in eight directions. Whatever due to uneven illumination and/or reflection, long vertical edges produced by non-character objects may have a large intensity variance and after thresholding these long vertical edges may become broken short edges which may cause false alarms (positive.).In the meantime, uneven surface of character from various lighting and shadows as well as the nature of the character shape itself. The basic idea behind edge detection is to find places in an image where the intensity changes rapidly. Moreover, based on this idea, an edge detector may either be based on the technique of locating the places where the first derivative of the intensity is greater in magnitude than a specified threshold or it may be based on the criterion to find places where the second derivative of the intensity has a zero crossing 5.5 Suggestions and Future Works There are several suggestions and future works that can be an area of interest for researchers and developers that aim to improve and enhance the whole performance. Our main future work involves using a suitable existing OCR technique to recognize the extracted text the contributions possible are can handle , first both printed document and scene text images, second not more sensitive to image color/intensity, robustness with respect to font, sizes, orientation, uneven illumination, perspective and reflection effects, third distinguishes text regions from texture –like regions, such as 110 window frames and wall patterns etc., by using the variance of edge orientations and binary output that can be directly used as an input to an existing OCR engine for character recognition without any further processing, also can use a suitable existing OCR technique to recognize the extracted text from landmarks. Moreover, the basic criterion of using edge detection in digital images is that image should contain sharp intensity transition and low noise of Poisson type presented, so it can handle an image without contain sharp intensity transition and low noise of Poisson type is presented. 5.6 Conclusion As known, in this project we used edge-detection for text detection in complex image by using eight kernels (filters) to accomplish this task. Then, we used identified pixel to determine a character by utilizing the fuzzy logic. We also fulfilled the aims, objectives and scope of the project which have been outlined before. CHAPTER VI CONCLUSION Usually, many methods automatically detect and extract the text from complex image. According to text properties exists in the image. We can use edge-detection in eight directions, the every direction detect the edge strength with higher dense that represent the detected edge in image. Our algorithm outputs were satisfy in this process by using the magnitude and orientation which to detect the edge of the text. By using the structure element (kernels) with whose center pixel to calculate the edge. Then if the edge was low from require threshoding, it will be ignores other is selected as edgedetection. Edge-strength that represent the edge of the text by using structure element at the center pixel determine it, also we have detected edge perhaps exists long edge not element of the text. Identify pixel to recognize the character by fast and suitable performance, but we have some characters have blurry is difficultly recognize the character because based on determine the identify pixel. Finally the text areas are identified by the empirical rules analysis and refined through project profile analysis. 112 The experiment with various kinds of the natural image and video frames shows that the proposed method is effective for the distinction between regions and non-text regions, which is robust for font-size, font-color, background complexity and language. In the future work of text detection in videos, the performance need be further improved in detecting texts captured by camera where are strong illuminations changes and text distortion 113 REFERENCES Alasdir,2004.Introduction to Digital Image Processing with Mathlab. Springer-Verlag Berlin Heidelberg 2007 Chunmei, Chunheng, Ruwei, (2005).Text Detection in Image Based on Unsupervised Classification of Edge-Based Features.IEEE. 0-7520-5263. Proceedings of the Eight International Conference on Document Analysis and Recognition. china.2005 Chengjie, Jie and Trac, 2002b.Adaptive Runlenght Coding .IEEE.0-7803-76226.Baltimore MD 21218 Datong,Herve and Jean, 2001. text identification n complex background using SVM.IEEE.0-7695-1272-0.Dalle molle insitiute for perceptual artificial intelligence,Switzerland .UTM.2001. Ezaki, Bulacu and Schomaker,(2004).Text Detection from Natural scene Images:T owards a System for Sisually Smpaired Sersons. In Sroceedings of the Snternational Sonference on Sattern Secognition (ICPR’04).pp.683-686.2004. Fuzzy Logic Toolbox User’s Guide” The MathWorks, Inc. 2006f Gatos, Pratikakis, Kepene and J.perantonis(2005a). Text dectection in indoor/outdoor scene images. Proceedings to National center for scientific research “Demokritos”, GR-153 10 Agia Paraskevi, Athens, Greece Holland JH,(1975) . Adaption in Natural and Artificial systems. University of MICHIGAN press, Ann Arbor, 1975 Hyeran, Seong-Whan, Applications of Support Vector Machines for Pattern Recognition: A Survey, SVM 2002, LNCS 2388, pp. 213-236 , 2002a 114 Jagath, and Xiaoqing .(2006b). An Edge-Based Text Region Extraction Algorithm for in Door Mobile Robot Navigations. International Journal of Signal Processing 3;4. Western Ontario, London, ON., N6A 5B9, Canada Jie ,Jigui and Shengsheng,(2006d). Pattern Recognition: An overview. Proceedings to IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.6. 130012 Changchun, China June 2006 Jiang and Jie,2000. An Adaptive Algorithm for Text Detection from Natural Scenes. University Pittsburgh. 2000 Kofi, Andrew, Patrick and Jonathan, (2007b).Run-Length Based Connected Component Algorithm for FPGA Implementation.University of Lincoln.England.2007 Kongqiao and Jari,(2003b).Character Location in Scene Images from Digitalcamera.Pattern Recognition. Proceedings to journal of the pattern recognition.Tampere,Finland.2003 Kwang,Keechul and Jin,2003c .Texture-Based Approach for Text Detection in Image Using Support Vector Machine and Continuously Adaptive Mean Shift Algorithm. IEEE Transtion On Pattern Analysis and Machine Intelligence, Vol. 25, No. 12, December2003 Kim ,Byun, Song, Choi, Chi and .chung, (2004).Scene Text Extraction Scene Images Using Hierarchical Feature Combining and Verification .IEEE . 1051-4651/04 . Proceedings of the 17th International Conference on Pattern Recognition Mohanad and Mohammad, 2006e. Text Detection AND Character Recognition Using Fuzzy Image Processing. Proceedings to Journal of Electrical Engineering.Vol.57,No.5. Jordan. 2006 Matsuo,Ueda and Michio,2002d .Sxtraction of Sharacter String from Scene Smage Binarizing Local Target Aarea.Transaction of the Anstitute of Electrical Engineers.122-c(2).232-241 Japan.2004. Pavilidis,(1977)., Structural Pattern Recognition, Springer-Verlag,New York, 1977. Qixiang, Wen, Weiqiang and Wei (2003a).Roust text detection algorithm in images and video frames.IEEE.0-7803-8185-8.School of Chinese academy of scienes,China Qixiang, Qingming, Wen and Debin,2005b .Fast and Robust Text Detection in Images and Video Frames.Image and Vision Computing 23.China.2005. 115 Roshanak and Shohreh(2005c).text segmentation from image with textured and colored background. Proceedings to Sharif University of Technology, Tehran, Iran Rabbani and Chellappan,(2007a).Fast and New Approach to Gradient Edge Detection. Proceedings to International Journal of Soft Computing 2(2):325-330.india.2007 Rainer and Axel, (2002c). Localizing and Segmenting Text in Images and Videos. University Pittsburgh.2002 Sivanandam, Deepa and Sumathi,2007.Introduction to Fuzzy Logic Using Mathlab. Tsai, Chen, Fang (2006c).A Comprehensive motion videotext detection localization and extraction method.IEEE.0-7803-9584-0.Taoyuan County 320,Taiwan P.R.C Takuma, Yasuaki and Minoru, 2003d. Digit Classification on Signboards for Telephone Number Recognition .IEEE. 0-7695-1960-1. Proceedings of the Seventh International Conference on Document Analysis and Recognition. Japan.2003 Victor,Raghavan and Edward,(1999).Textfinder:an Automatic System to Detect and Recognize Text in Image.IEEE Tansaction on Pattern Analysis and Machine Intelligence,vol.21, no.11, Novermber 1999.Amherst Xiaoqing and Jagath,.(2006a).Multiscale Edge-Based Text Extraction from Complex Image..IEEE.1-4244-0367-7.London, Ontario, N6A 5B9, Canada Xilin, Jie, Jing and Alex, (2003e). Automatic Detection of Signs with Transformation. University Mobile Technologies.2003 Yuzhong, kallekearu and anil,1995. Locating Text in Complex Color Images. Affine 116 APPENDICES X=Imread(‘image.jpg’);%read image X1=Reducesize(x);%reduce size by half of 8 time X2=convolve(x1);%convolve each image with kernels X3=Imresize(x2);%return image of original size X4=addedge(x3);%collect edges together %x4 present total addition edges X5=Dilation(x4)% does dilation of image X6=Erosion(x5);%does erosion of image X7=eliminated(x6);%eliminating long edges X8=Extract(x7);%extract text from image Imagebinary(x8);%binary image A1: Matlab command to find binary image X1=removeborders(‘image.jpg’);%read image and remove borders X2=dividetintotext(x1);%divide text of image onto texts X3=dividetextintorows(x2);%divides into rows X4=bwlabel(x3,8);%label connected component X5=identifedpixel(x4);%identify pixels of each connected compoment X6=sendfuzzylogic(x5);%send data into fuzzy logic X7=recognizecharacter(x6);%recognize the characters A2: Matlab command used fuzzy logic for identify character

TEXT EXTRACTION FROM INVARIANT COMPLEX IMAGE UNIVERSITI TEKNOLOGI MALAYSIA

Related documents

Products

Support

TEXT EXTRACTION FROM INVARIANT COMPLEX IMAGE UNIVERSITI TEKNOLOGI MALAYSIA

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib