Character Recognition Technique using Feature Extraction 1UdayaTheja.V, 2Sangamesh, 3Dr.Rajshekhar Ghogge 1 M.Tech (CNE), Dept of ISE, Dr.AIT, Bangalore, India, udayatheja.v@gmail.com 2 M.Tech (CNE), Dept of ISE, Dr.AIT, Bangalore, India, sangamesh.kollur@gmail.com 3 Associate Professor, Dept of ISE, Dr.AIT, Bangalore, India, rajsgsrm@yahoo.co.in ABSTRACT- Character recognition has long been a critical area of the Artificial Intelligence. Recognition is a trivial task for humans, but to computer easy. Classical methods in recognition are not perfect for the recognition of visual characters due to the following reasons [9]: make a computer program that does character 1. The “same‟ characters differ in sizes, shapes and recognition is extremely difficult. Recognizing styles from person to person and even from time to patterns is just one of those things humans do well time with the same person. The source of confusion and computers don’t. The reasons for this are the is the high level of abstraction: there are thousands many sources of variability, abstraction and styles of type in common use and a character absence of hard-and-fast rules that define the recognition program must recognize most of these. appearance of a visual character. Hence rules need 2. Like any image, visual characters are subject to to be heuristically deduced from samples. Feature spoilage due to noise. Noise consists of random extraction is an essential step to recognize a changes to a pattern, particularly near the edges. A character through offline methods. This paper gives character with much noise may be interpreted as a a detailed overview of different feature extraction completely different character by a computer techniques for recognition process of different program. characters. Index Terms- Character Recognition, Feature Extraction. 3. There are no hard-and-fast rules that define the appearance of a visual character. Hence rules need to be heuristically deduced from the samples. I. INTRODUCTION Character recognition system is useful in license Character recognition is the process to classify the plate recognition system, smart card processing input character according to the predefined system, automatic data entry, bank cheque /DD character class. With the increasing interest of processing, computer applications, modern society needs that automation, address and zip code recognition, the computer should read the text. The text may be writer identification etc. in the form of scanned handwritten document or There exist several different techniques for typed text in various fonts or a combination of recognizing both. The character recognition system helps in characters by the number of loops in a character making the communication between a human and a and the other by direction of their concavities. money counting characters. machine, One postal distinguishes These methods can be used one after the other to problem of recognizing optically characters that are increase accuracy and speed for recognition. processed. Optical recognition is performed offline after the writing or printing has been usually completed, the opposed to on-line recognition abbreviated to OCR is the mechanical, electronic where computer recognizes the characters. Both of conversion of the scanned or photographed images hand printed and printed the characters may be of the typewritten or printed text into computer- recognized, but the performance of computer readable text. OCR is used as a form of data entry directly dependent on the quality of the input from some sort of original paper data source documents. Optical Character Recognition, whether passport documents or bank statement, receipts, mail, business card or any number of the printed records. Optical Character Recognition is a common method of digitizing printed texts so that they can be electronically searched, edited and stored above compactly displayed on-line and used in the machine processes such as machine translation, key data extraction, text to speech and text-mining. OCR is the different kind of field to research in pattern recognition and artificialintelligence. Optical Character Recognition is the mechanical or electronic translation of images of handwritten, Fig. 1: Areas of Character Recognition typewritten or printed text (usually captured by a The more constrained input is better performance scanner) into a machine-editable text. It is often of the OCR system. However, when it comes to the used to convert paper books and documents into totally unconstrained handwriting and the OCR electronic files. When one scans a single paper machines. However, the computer reads fast and page into the computer, it is produces just an image technical advances are continually bringing the file and a photo of the page. The computer can’t technology closer to its ideal. understand the letters on the page, so you cannot The above fig.1 shows the two main types of search for words or edit this and have the words re- Character Recognition i.e. Off-Line and On- Line. wrap as you type and change the font, as in the Firstly, the character recognition off line captures word processor. the statistics from documents through optical You would be using OCR software to the convert it scanners or cameras whereas secondly, the into the text or word processor file so that you character recognition system on-line, make use of could do those things. The results are much more the digitizers which directly captures writing flexible and compact than the original page photo. through the order of the strokes and speed information. Different Areas of Character Recognition: In Character Recognition, there are two main types Optical Character Recognition deals with the of off-line character recognition i.e. Single Characters and other is Handwritten Script. Optical disconnections and noise. Instead of searching for Character Recognition can be used for:- linear strokes in the image, the global directional Data entry for business documents, e.g. check, passport, receipt and bank statement. information at each pixel of the image is computed. This information is stored into several feature maps. Assigning to each pixel a single orientation Automatic number plate recognition. Automatic insurance documents is avoided in order to preserve useful information. key Each feature map is then processed by zones in order to estimate the local orientation of the information extraction. strokes. Finally, image is recognized by means of a Extracting business card information into neural network classifier. These systems work for a contact list. the recognition of segmented cursive characters, More quickly make textual versions of cursive words and the first letter of cursive words. printed documents. There are simple and fast algorithms for detection of italic, bold and all-capital words without doing Make electronic images of printed actual character recognition [3]. documents searchable. Researchers present a statistical study which Converting handwriting in real time to reveals that the detection of such words may play a control a computer (Pen computing). key role in automatic Information Retrieval from documents. Moreover, detection of italicized words II. OFF-LINE RECOGNITION can be used to improve the recognition accuracy of a text recognition system. Considerable numbers of Off-line recognition operates on pictures generated document images have been tested and these by an optical scanner. The data is two-dimensional algorithms give accurate results on all the tested and space-ordered which means that overlapping images, and the algorithms are easy to implement. characters cannot be separated easily. Off-line Feature Extraction is one of the important method handwriting recognition involves the automatic in off-line recognition. conversion of text in an image into letter codes which are usable within computer and text- III. FEATURE EXTRACTION processing applications. The data obtained by this form is regarded as a The idea of the feature point extraction algorithm is static representation of handwriting. Off-line to identify characters based on features that are handwriting recognition is comparatively difficult, somewhat similar to the features humans use to as different people have different handwriting identify characters [1][8][11]. Programmers must styles. And, as of today, OCR engines are primarily manually determine the properties they feel are focused on machine printed text and ICR for hand important. Some example properties might be "printed" text. Cursive handwriting utilizes the Aspect Ratio, Percent of pixels above horizontal Hough transform and a neural network [4]. The half point, Percent of pixels to right of vertical half Hough transform is a line detection technique point , Number of strokes , Average distance from which has the ability of tolerating deformation, image centre , Is reflected y axis , Is reflected x axis. Researchers have used many methods of feature extraction for handwritten characters [5]. Shadow code, fractal code, profiles, moment, template, structural (points, primitives), wavelet, directional feature etc., have been addressed in the literature as features. From the literature survey of the existing pieces of works on characters recognition, it was evident that not much effort is given on feature enhancement to remove the confusion between similar shaped characters for their recognition. Fig. 3.1 Projection method This approach gives the recognizer more control over the properties used in identification. Yet any system using this approach requires substantially more development time than a neural network because the properties are not learned automatically. Selection of a feature extraction method is probably the single most important factor in achieving high recognition performance in 2. Border Transition Technique (BTT) Border transition technique assumes that all the characters are oriented vertically. Each character is partitioned into four equal quadrants. The scanning and calculation of zero-to-one transition in both vertical and horizontal directions in each quadrant take place. Fig. 3.2 shows the partition and transition of a character 6 using BTT. character recognition systems. 1. Projection Method The projection method [10] does the compression of the data through a projection. Black pixel counts are taken along parallel lines through the image area to generate marginal distributions. The direction of projection can be horizontal axis, vertical axis, diagonal axis or all of the above. Evermore, the character can be divided vertically and horizontally into four parts and do the same projection on each quarter. It will improve the recognition rate. Fig. 3.1 [10] shows horizontal and vertical projection of a character. Fig.3.2 Border Transition Technique A. Zoning is a method involves the division of the character into smaller fragment of areas (zone) [10]. The black pixels in each zone are counted and accumulating or averaging the profiles in each zone extracts features. Fig. 3.3 shows the 16X16 and 8X8zoning. Fig. 3.4 Graph matching method Fig. 3.4 shows character 3 on which Graph Matching Method is applied and it is described using end points, branch point and curve point. IV. CONCLUSION Fig. 3.3 16X16 to 8X8 zoning The character recognition methods have developed B. Graph Matching Method A graph matching method [4] uses structural feature of character. It is robust method to change of font or rotation. Three features are defined. First, an end point is connected only one pixel which has information of position. A branch point is connected more than three pixels. It has feature information which is connected the branch point. The information includes kind of features, position and direction. And a curve point is connected two pixels. However a straight line is also connected two remarkably in the last decade. A variety of techniques have emerged, influenced by developments in related It is hoped that this comprehensive discussion will provide insight into the concepts involved, and perhaps provoke further advances in the area. The difficulty of performing accurate recognition is determined by the nature of the text to be read and by its quality. Generally, improper segmentation rates for unconstrained material increase progressively from machine print to handprint to cursive writing. pixels. In order to discriminate between a curve Fields such as image recognition and face point and a straight line, direction information is recognition. We believe that wise use of features used. extraction has led to improved accuracies. Features of each character are required based on which a character can be classified. We can combine two or more techniques so as to improve the accuracy of [7] Rumiana Krasteva, “Bulgarian Hand-Printed the system. We have included a list of references Character Recognition Using Fuzzy C-Means sufficient to provide a more-detailed understanding Clustering”, of the approaches described. robotics”, pp 112-117. REFERENCES [8] Mohammed Abu Ayshi, M.Jay Kimmel, Diane Problems of engineering and C. Simmons, “Character recognition system using [1] Dr. P. S. Deshpande, Mrs. Latesh Malik, Mrs. spatial and structural features”, US 7,010,166B2. Sandhya [9] Arora, “Handwritten Devanagari Character Recognition Using Connected Segments Shashank Araokar, “Visual Character Recognition using Artificial Neural Networks”. and Minimum Edit Distance” IEEE 2007. [10] Attaullah Khawaja, Shen Tingzhi, Noor [2] Rókus Arnold, Póth Miklós, “Character Mohammad Memon, AltafRajpa, “Recognition of Recognition Using Neural Networks”, CINTI 2010, printed Chinese characters by using Neural 978-1-4244-9280-0/10/$26.00 ©2010 IEEE, 311- Network”, 1-4244-0794-X/06/$20.00 ©2006 IEEE, 314. pp 169-172. [3] Feng Yanga, Fan Yangb, “Character [11] Yuk Yirtg Chung, M„an To Wong, Recognition Using Parallel BP Neural Network”, “Handwritten Character Recognition By Fourier ICALIP2008, pp 1595-1599, 978-1-4244-1724- Descriptors And Neural Network”, 1997 IEEE 7/08/$25.00©2008IEEE. TENCON, pp 391-394. [4] Jieun Kim, Ho-sub Yoon, “Graph Matching Method for Character Recognition in Natural Scene Images“, INES 2011, pp 347-350, 978-1-42448956-5/11/$26.00 ©2011 IEEE. [5] T. Wakabayashi, U. Pal, F. Kimura and Y. Miyake, “F-ratio Based for Similar Extraction Weighted Shape Feature Character Recognition”, ICDAR.2009, pp 196-200, 978-07695-3725-2/09 $25.00 © 2009 IEEE. [6] E.Kavallieratos, N.Antoniades, N.Fakotakis and G.Kokkinakis, “Extraction and recognition of handwritten alphanumeric application forms”. characters from