International Journal of Engineering Trends and Technology (IJETT) – Volume22 Number 5- April2015 A Review Paper on Character Recognition using Binarization technique Shalu M.Tech(CSE) Sri Ram College of Engineering, Palwal(India) Abstract: This paper presents a review on the use of binarization technique for character recognition. A wellknown document image analysis product is the Optical Character Recognition (OCR) software that recognizes characters in a scanned document. The main focus of this work is to extract features obtained by binarization technique for recognition of handwritten characters of English language. Several Preprocessing techniques such as noise removal, noramalizes image, feature extraction are used for the preprocessing phase Using Binarization technique for feature extraction provides very promising results and the classifier used to recognize the handwritten characters is the multilayer feed forward neural network. Keywords: OCR, Binarization technique, Extraction, Multilayer feed forward network I. Feature B) Character Recognition INTRODUCTION The paper is important in our daily life because it is cheap, reliable, easily available, flexible in filling, secure for future references and is easy to keep. A huge amount of important historical data is also written on papers. So, there is a great demand to digitize all these paper documents so that the people all over the world can access these important sources of knowledge. For this purpose, the image of handwritten text is preprocessed and segmented into individual characters and are recognized by a neural network classifier. The process of reading handwritten text from the static surfaces is termed as off-line cursive handwriting recognition. Simulating the behaviour of the human brain into a machine opened innovative prospects to improve manmachine interface. For the last four decades, the classification of cursive and unconstrained handwritten characters has been a major issue in this field of research. A) Recognition phases There are 4 main phases used in the given work. First is conversion of RGB image into Grayscale image. Then features extraction is performed and binary image is obtained then using binarization technique and finally recognition of characters is performed. ISSN: 2231-5381 Figure 1: Recognition Phases Optical character recognition (OCR) is commonly used term for Character Recognition which is used for the conversion of digital or handwritten images into computer readable form. It is a field of research in pattern recognition, artificial intelligence and machine vision. Though academic research in the field continues, the focus on OCR has shifted to implementation of proven techniques. Optical character recognition (using optical techniques such as mirrors and lenses) and digital character recognition (using scanners and computer algorithms) were originally considered separate fields. Because very few applications survive that use true optical techniques, the OCR term has now been broadened to include digital image processing as well. Optical Character Recognition (OCR) deals with machine recognition of characters present in an input image obtained using scanning operation. The input document is read preprocessed, feature extracted and recognized and the recognized text is displayed in a picture box. The goal of Optical Character Recognition (OCR) is to classify optical patterns (often contained in a digital image) corresponding to alphanumeric or other characters. The process of OCR involves several steps including segmentation, feature extraction, and classification. In case of academic system and library management, the significance of the OCR recognition is proven already. The character recognition is basically performed using the mirrors or the lenses. The character recognition is considered as the separate field so that he recognition of http://www.ijettjournal.org Page 214 International Journal of Engineering Trends and Technology (IJETT) – Volume22 Number 5- April2015 characters will be done effectively. OCR is defined as character recognition was done by U. Pal[2]. In this the important image processing application in which the work author presented analysis on 12 classifiers with 4 recognition is based on multiple parameters. These feature sets. These feature sets includes the projection parameters includes the feature extraction and feature distance analysis, subspace method, linear discriminant specification. The features depends on the algorithmic function etc. Author performed the analytical study approaches adapted to extract the features. under different information analysis such as survature based and the gradient information analysis. David Andre[3] presented a work on rule updation based C) Applications of Character Recognition on learning approach in OCR system using Genetic approach. Author defined the genetic programming There are various areas in which Character Recognition approach for effective character identification. Author is doing wonders. Character Recognition approach is defined a human hand coded rules for initial population widely used and applied in different fields. Some of the generation and rule updation. Author analyze the work fields that uses this approach are Data Entry, Form on different datasets under different real time problems Readers, Aid for Blind, Text Entry etc. such as noise etc. Angelo Marcelli[4] presented a work based on structural Process automation is an area of application to analysis to perform the shape based recognition to apply control some particular process. The general the effective encoding and transformation so that the approach is to get all the available information and effective vector space will be generated and processed for the redundancy check use the postcode. under genetic approach. The vector based structural Signature Verification and Identification is an area analysis is performed under genetic approach to perform useful for banking purpose. The identity of the the recognition. writer is established without reading the Soumen Bag[5] presented a work of recognition of hand handwriting. And the pattern to be matched is written character that was based on character structural simply a signature with signatures collected in shape. Skeletal Convexity are used to describe the shape database. of the character. Recognition is done by using Longest Common Subsequence matching. The dataset of Automatic Cartography is helpful for recognizing handwritten Bengali Character has been taken for the test characters from maps. The graphics and symbols get and the promising preliminary results were obtained. mixed and the different fonts and styles can be Another work on feature extraction based character present during recognition. recognition was done using neural network. This work was done by J. Pradeep[6] in year 2011. Author defined Automatic Number Plate Readers basically for the analysis under different feature extraction approaches vehicles. Here the input image must be captured by followed by neural network. This approach includes the a fast camera and it is not like other bilevel images effective character training process so that the and this thing makes recognition complex. recognition reate will be improved. Author defined the work for English alphabets. II. EXISTING WORK In Year 2011, Huiqin Lin[7] has defined a research to improve the assignment system using the character Lot of work is already done by different researchers to recognition based recognition system. The presented improve recognition of characters and make it more approach was the effective distribution based effective. Some of the efforts of earlier researchers are segmentation model in which the segmentation was done discussed in this section. using centroid based analysis and the angular analysis as done to perform the recognition process. Author Depeng Tao[1]. This work includes is performed using performed the work on deflection based adapted the locality alignment approach for discriminative correction so that high accuracy based matching will be Chinese characters. The work is performed using a performed. Author defined the work to improve the hybrid learning approach performed using locality recognition rate with the concept of tinning and the alignment and subspace analysis. The defined approach regularity analysis between the character positions. This was kernel based approach that used the PCA as the kind of analysis includes the skelton based recognition initial stage and followed by discriminative locality process. analysis approach. The obtained results shows that the In Year 2013, Cao Xinyan[8] performed a work," proposed approach had provided the effective accuracy. Handwritten Mathematical Symbol Recognition Based Another comparative study on different recognition and on Niche Genetic Algorithm". This method makes great classification approaches for handwritten devnagri ISSN: 2231-5381 http://www.ijettjournal.org Page 215 International Journal of Engineering Trends and Technology (IJETT) – Volume22 Number 5- April2015 use of the searching ability of ecological niche genetic A) Preprocessing algorithm and the nonlinear mapping and associative ability of BP neural network, it extracts the coarse grid Preprocessing of an image is done to remove variability characteristics, the projector features, cross-cut in handwritten characters. In this phase, Grayscale characteristics and structural features , then makes use of conversion and Binarization is performed. the operation of choice , cross, variation and obsolete of the ecological niche genetic algorithm, optimizes the A.1. GrayScale Conversion initial weight values and threshold of BP neural network, finally, makes the well-trained NGA-BP network In this phase of Preprocessing , the input image of recognize the mathematical symbols. handwritten character in .bmp or .jpg image format is In Year 2010, Reza Azmi[9] performed a work," A converted to grayscale format by using a MATLAB hybrid GA and SA algorithms for feature selection in function ‘rgb2gray’. recognition of hand-printed Farsi characters". In this research a hybrid feature selection technique based on A.2. Binarization genetic and simulated annealing algorithms is proposed. this approach is evaluated by using Bayesian classifier The goal of binarization is to minimize the unwanted on a dataset of hand-printed Farsi characters. information present in the image while protecting the Another work on tamil character recognition using useful information. It must preserve the maximum useful neural network was proposed by P. Banumathi[10] in information and details present in the image, and on the year 2011. Author defined the process under different other hand, it must eliminate the background noise styles, shapes, sizes and orientation. Author work on the associated with the image in an efficient way. paragraphs and separate them by using the segmentation approach by using the concept of centroid based analysis with special dot based segmentation approach. Anshul Gupta[11], presented an offline recognition of handwritten English words. Two classes holistic and segmentation based is categorized for the recognition. In holistic, feature extraction is done according to the size of the vocabulary. Segmentation used bottom up approaches eventually producing a meaningful text. Finally, the Postprocessing stage uses lexicon to increase the accuracy in recognition. Jia Zeng[12] has defined a character modelling and recognition approach based on the statistical structured analysis and markov model based recognition process. This structured analysis includes the stroke analysis, Figure 2: Stepwise Input Image Conversions into neighbourhood character analysis and the encoding Different formats technique so that effective recognition will be B) Feature Extraction performed. The recognization process is performed on the certain features using the markov model based The curvic feature extraction will be performed to predictive approach. The work is implemented on identify the image features. These features set will be Korean dataset and the obtained results shows that the used as the basic training set for the segmentation and work has improved the accuracy of the system. classification process. III. PROPOSED WORK AND IMPLEMENTATION Image Acquisition is performed in which handwritten character images are captured using digital camera or can also b scanned using scanner. All the characters are converted to image format such as .jpg or .bmp. These samples can be written with different colored pens. In the studied work, samples are contributed by 10 different people and completely 1300 character image samples are collected ISSN: 2231-5381 Figure 3: Extracted features of image in Binary format http://www.ijettjournal.org Page 216 International Journal of Engineering Trends and Technology (IJETT) – Volume22 Number 5- April2015 References C) Implementation The implementation is performed using Neural Network Training tool (nntraintool). Figure 4: nntraintool The network learning iterations must be selected in such a way that the network may converge properly with least generalization error. The maximum allowedepochs for the training process has been set to 100000. If the network could not converge within the maximum allowed epochs count, the training will stop. IV. CONCLUSION [1] Dapeng Tao, Similar handwritten Chinese character recognition by kernel discriminative locality alignment, Pattern Recognition Letters, pp 186–194, 2014 [2] U. Pal, Comparative Study of Devnagari Handwritten Character Recognition using Different Feature and Classifiers, 10th International Conference on Document Analysis and Recognition, pp 1112-1115, 2009 [3] David Andre," Learning and Upgrading Rules for an OCR System Using Genetic Programming", 0-7803-18994194@1994 IEEE [4] Angelo Marcelli," Exploring genetic programming for modeling character shape", 0-7803-6583-6/00@2000 IEEE [5] Soumen Bag, Recognition of Bengali Handwritten Characters Using Skeletal Convexity and Dynamic Programming, Second International Conference on Emerging Applications of Information Technology, pp 265-268, 2011 [6] J. Pradeep, Neural Network based Handwritten Character Recognition system without feature extraction, International Conference on Computer, Communication and Electrical Technology – ICCCET, pp 40-44, 2011 [7] Huiqin Lin, The Research of Algorithm for Handwritten Character Recognition in Correcting Assignment System, Sixth International Conference on Image and Graphics, pp 456-460, 2011 [8] Cao Xinyan, Handwritten Mathematical Symbol Recognition Based on Niche Genetic Algorithm, 2013 Third International Conference on Intelligent System Design and Engineering Applications 978-0-7695-49231/12 © 2012 IEEE [9] Reza Azmi, A hybrid GA and SA algorithms for feature selection in recognition of hand-printed Farsi characters, 978-1-4244-6585-9/10©2010 IEEE [10] P. Banumathi, Handwritten Tamil Character Recognition using Artificial Neural Networks, International Conference on Process Automation, Control and Computing, pp 1-5, 2011 [11] Anshul Gupta, Offline Handwritten Character Recognition Using Neural Network, 978-1-4577-2058-1@2011 IEEE [12] Jia Zeng, Markov Random Field-Based Statistical Character Structure Modeling for Handwritten Chinese Character Recognition, IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 30, No. 5, pp 767-780, 2008 In the given work, the neural network has been trained by each of 26 characters 50 times i.e. 1300( 50x26) character image samples from the database has been evolved. As a result, an outstanding classification accurace of 85.62 has been achieved. The techniques like Training , feature extraction, and classifier for deciding the accuracy of recognition system can be refined because there is always a scope of improvement. ISSN: 2231-5381 http://www.ijettjournal.org Page 217