International Journal of Engineering Trends and Technology (IJETT) – Volume 24 Number 1- June 2015 An Approach of K-Means and ART Network for Character Recognition 1 Ankush Goyal, 2Shallu 1 Asst. Prof (CSE) , Sri Ram College of Engg.,Palwal, India 2 M.Tech(CSE), Sri Ram College of Engg.,Palwal, India Abstract- The main utility of character recognition system is to classify the digital and optical patterns so that the alphanumeric character will be obtained. To perform this recognition a series of operations are adapted such as segmentation, feature extraction and classification. Based on these operations the actual recognition of character is performed. The scanning is also been under the human recognized characters and text so that the effective detection of the character will be performed. The presented work has three stages. In first stage, the image improvement is performed removing noise from the image. In second stage, the image feature extraction is done using K-Means approach to identify the character ROI and the feature points.. At the final stage, the image classification and recognition is performed using ART Network approach. The obtained results from system show the effective recognition rate. Keywords: OCR, KNN, Art Network, Feature Extraction, I. INTRODUCTION Most of the human work is done in the form of some written work that is now been performed using the computer system. This application area has grown in application area such as reading the cheque signs, reading the traffic number plates, reading the electricity meter reading etc. The major broad areas associated with handwritten character recognition includes the reading the digital characters from printed media and to convert it to textual form, the recognition of characters, and textual information present on printed media, enhancing the digital representation of characters. Character Recognition is one of the applications of Neural Network. Figure 1 : Applications of Neural Network Image Compression needs information for the processing and the neural networks can receive and ISSN: 2231-5381 process a wide range of information at once. Character Recognition is widely used field for recognition of digital and handwritten characters and neural network helps in recognition of characters. Feature Extraction is a field to extract the information from the data or images and multilayer perceptron neural network is highly useful in this field. Classification can be done on the basis of different patterns and neural network provides various networks like ART for this purpose. Optical character recognition (OCR) is commonly used term for Character Recognition which is used for the conversion of digital or handwritten images into computer readable form. It is a field of research in pattern recognition, artificial intelligence and machine vision. The goal of Optical Character Recognition (OCR) is to classify optical patterns (often contained in a digital image) corresponding to alphanumeric or other characters. The process of OCR involves several steps including segmentation, feature extraction, and classification. Optical Character Recognition (OCR) works as its name defines it. It recognizes characters in the document that has been scanned into computer. On the other side, Optical Word Recognition (OWR) recognizes words rather than the characters. OWR accomplishes this through the process of comparing and contrasting the results of several OCR engines, by which OWR evaluates and then identifies each word. Through our studies, which results we will review, OWR has proved more effective than OCR. Intelligent Character Recognition(ICR) can recognize and extract printed handwritten characters as well as cursive handwritten characters. The ICR recognition system does not give highly accurate results every time as the handwritten characters can be of different styles, font and cursive. Every individual has its own style of writing characters which makes it complex to recognize them with same accuracy results in same system. The ICR software commonly used mostly have their own self learning system within them which train themselves for new inputs and automatically arrange for different inputs. Intelligent Word Recognition(IWR) works on handwritten words or phrases instead of character to character. http://www.ijettjournal.org Page 45 International Journal of Engineering Trends and Technology (IJETT) – Volume 24 Number 1- June 2015 IWR technology matches handwritten words to a speed and accuracy. It was able to identify the user-defined dictionary, significantly reducing pattern recognition and abnormality detection. character errors encountered in typical characterS.Nagaprasad [3] has presented a data mining based recognition engines. based neural network model for soil image classification and processing. In this paper Author implemented, spatial image processing mining for soil classification using diversified domains like Digital Image Processing, Neural Networks, and Soil fundamentals. The three most important algorithms used in implementation are Back Propagation Network (BPN), Adaptive Resonance Theory 1 (ART) and Simplified Fuzzy ARTMAP for soil classification as well as spatial image recognition. Further Author are working on Presented research by combining the visual data Figure 2 : Applications of Character Recognition mining with spatial data mining algorithms, such as spatial clustering, spatial association rules, a selforganizing map etc. in order to try to detect Process automation is an area of application to patterns in the data in an even more effective way. control some particular process. The general Dan C. Ciresan [4] has defined a flexible and high approach is to get all the available information and performance neural network approach for image for the redundancy check use the postcode. classification. Author presents a fast, fully Signature Verification and Identification is an area parameterizable GPU implementation of useful for banking purpose. The identity of the Convolutional Neural Network variants. Presented writer is established without reading the feature extractors are neither carefully designed nor handwriting. And the pattern to be matched is pre-wired, but rather learned in a supervised way. simply a signature with signatures collected in Munish Kumar[5], in 2011 ,presented a KNN based database. Automatic Cartography is helpful for handwritten Gurumukhi Character recocgnition. In recognizing characters from maps. The graphics this work, firstly information is extracted about and symbols get mixed and the different fonts and character by creating Skeleton of character. styles can be present during recognition. Automatic Character features in terms of diagonal and Number Plate Readers basically for vehicles. Here transition have been computed. And Euclidean the input image must be captured by a fast camera distance is calculated to find the nearest neighbor. and it is not like other bilevel images and this thing The presented work showed accuracy of 94.12% in makes recognition complex. recognition. Puttipong Mahasukhon[6] , presented a fuzzy II. RELATED WORK theory based Handprinted English Character Recognition. The work divided in two main stages, feature extraction and pattern recognition. Position, Character Recognition includes image processing size and shape are parameters which creates and it is an important part of Neural Networks also. variation in recognition. The system was tested on The work already done by different researchers in 26 lowercase hand printed English Character with this area is discussed in this section. different writers. Nadine Hajj[7], in 2012 presented a system for Tim J. Klassen[1] has presented an effective isolated letter handwriting recognition system. Two recognition process for Arabic characters. Author stages are categorized for the work, feature defined the work for online and offline character extraction using Pen trajectory modeling and recognition. Author presented the SOM based classification using Support Vector heuristic approach to perform feature analysis on machines(SVM). The best recognition rate online data so that the effective recognition will be achieved was of 89.15% using KNN nearest obtained. Author presented the genetic based neighbor having k=3. And Dynamic Time approach to improve the recognition process. Wrapping. Yuefeng Chen[2] has defined an artificial immune R.Arnold[8], proposed his work using MATLAB’s system based handwritten character recognition. Neural Network tool box to recognize printed and Author defined the analysis over the optimization handwritten characters by making their projection of rate and time for the recognition. This approach on grids of different size . Character Recognition is based on the biological principle with the match depends on resolution of character memory cell based analysis. Author presented the projection. It was found that the resolution of experimentation on UCI dataset. The adaptive character projection is necessary for the evaluation algorithm provided by the author had improved the of the match of the character recognition. The ISSN: 2231-5381 http://www.ijettjournal.org Page 46 International Journal of Engineering Trends and Technology (IJETT) – Volume 24 Number 1- June 2015 results came that not every writing style can be INPUT NOISY DENOISING recognized using same network with the same IMAGE ALORITHM precision value. Another author, E.J. Bellarda[9] ,showed his work using a bank of multilayer feedforward neural FEATURE K-MEANS BASED EXTRACTION CLUSTERING network for handwritten character recognition . He used the preclassification on segmentation concept that is taken as basic building block for handwriting. Second is Connectionist Approach. ART BASED RECOGNITION RECOGNITION The set of parallel networks taken instead of single network. The results are evaluated on the bases of similarity shaped character and upper case Figure 3 : Character Image Recognition characters in discrete manner. The basic algorithm approaches used in this work K.Toscano[10], had done the work for recognition are given here under. The complete work is divided of Cursive Handwriting and test of system’s ability in two main algorithmic stages. The Gaussian filter to work like human being. The feature extraction is based denoising and hybrid recognition algorithm. done using SALOM, a natural spline function and These algorithmic approaches are defined in this the steepest descent method is used for section. optimization. The recognition phase had two sub phases, global feature classification and local feature classification. A) Gaussian Filter Another wok on the dictionary based analysis to As the presented work is defined to perform the perform character recognition was done by Shinji recognition under noisy input image. The denoising Tsuruoka[11]. Author defined a separate library set is here performed using Gaussian filter. The for each author to identify the writing similarity. algorithmic approach for Gaussian filter is shown Author defined the character specific analysis in figure 4. The Gaussian filter is more robust along with feature space generation so that compared to the mean and median filter. Thus, a effective covariance matrix will be generated. single very unrepresentative pixel in a Author defined the work on Japanese characters so neighborhood will not affect the median value that effective recognition will be done. significantly. Since the FFT based decoding process is defined to obtain the signal values. This III. Proposed Approach value must actually be the value of one of the pixels in the neighborhood; the filter does not In this section, the proposed hybrid model is create new unrealistic pixel values when the filter presented to perform the recognition. As the earlier straddles an edge. For this reason the median filter stage, the training set is defined on which the is much better at preserving sharp edges than the feature extraction is performed and the featured mean filter. These advantages aid median filters in dataset is generated. Once the feature dataset is denoising uniform noise as well from an image. obtained, the noisy input image is captured to The denoising approach is effective for additive perform the recognition process. This prenoise as well as on multiplicative noise. The processing stage includes the denoising algorithm Flowchart of the work is shown here under and the character area identification over the image. To remove the image noise, Gaussian filter is applied in this work and to perform the image segmentation, the combination of mathematical filters is applied. These mathematical filters include the convolution filter and morphological filters. Based on these filters, the character area from the image is extracted. At the final stage, the K-Means Art network approach is applied to perform the recognition and classification. The K-Means is here used for clustering and feature extraction, vigilance vector of the recognition process is obtained. Now the vigilance ratio match is performed using art network to identify the character class based on the dataset classes. The basic model of this presented work is shown in figure 3. ISSN: 2231-5381 http://www.ijettjournal.org Page 47 International Journal of Engineering Trends and Technology (IJETT) – Volume 24 Number 1- June 2015 14. p=img1; Start 15. } 16. } 17. if(p==null) Read the Input Image 18. { 19. Print “No Match Image Found” 20. } 21. else Define the Gaussian Noise Level called Leveli 22. { 23. Print “Image Detected “+ p 24. } Implement the FFT on Input Image 25. } IV. RESULTS Implement the Gaussian Adaptive Filter Perform Inverse FFT on Drive Image The presented work is applied on cursive alphanumeric characters defined in grayscale. Dataset 1 is shown here for sample Derive the Result Image Start Figure 4 : Gaussian Filter B) Figure 5 : DataSet Sample The properties of the dataset is shown in table 2. Recognition Table 2: Dataset Properties The recognition is here defined using K-Means and ART network. The recognition is here defined as the feature based vigilance match of input image with dataset images. The dataset is trained at initial stage and the vigilance values are obtained and the vigilance value dataset is generated. The algorithmic approach for recognition process is shown in table 1 Parameter Value Number of Images 26 Color No Images Type Alphabet Image Size 100x100 Table 1 : Recognition Algorithm Image Format BMP Image Fault/Noise Yes ( Level .2) Noise Intensity .1 Noise Type Speckle Image Filtration Gaussian Recognition K-Means Art Network Input Image A(1) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. We have a Trained Art Network with N Classes Input Image Img Define Vegilience Vector V matchratio=0; p=null /* initialize the match image*/ for c=1 to N { img1=GetImage(c) Find Feature Difference Diff=img1img M=Matchingratio(img,img1) if Differecne>=V and M<matchratio { matchratio=M; ISSN: 2231-5381 The recognition process is here defined to perform the recognition. The recognition is here defined at class level under vigilance vector so that effective recognition will be performed. The recognition property set is shown here under http://www.ijettjournal.org Page 48 International Journal of Engineering Trends and Technology (IJETT) – Volume 24 Number 1- June 2015 Table 3 : Recognition Properties V. CONCLUSION In this paper, a K-Means ART network approach is defined to perform the recognition. The work is here defined for English alphanumeric characters. The work is effective for noisy images. The recognition rate obtained from the work shows the effective detection of objects. Properties Values Number of training Images 26 Number of Test Images 12 Noisy Images 5 Correctly Detected 11 References Noisy Correctly Detected 4 [1] Tim J. Klassen," Towards the On-line Recognition of Non Noisy Correctly Detected 7 Recognition Rate Non Noisy Images 100% Recognition Rate Noisy Images 80% Matching Ratio of Input Image(A) 99.7339% Matching Ration of Input Image(S) 99.7722% Arabic Characters", 0-7803-7278-6/02@2002 IEEE Yuefeng Chen, A Handwritten Character Recognition Algorithm based on Artificial Immune, International Conference on Computer Application and System Modeling, vol 12, pp 273-276, 2010 [3] S.Nagaprasad,” Spatial Data Mining Using Novel Neural Networks for Soil Image Classification and Processing”, International Journal of Engineering Science and Technology [4] Dan C. Ciresan,” Flexible, High Performance Convolution Neural Networks for Image Classification”, Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence [5] Munish Kumar, k-nearest neighbor based offline handwritten Gurmukhi character recognition, 978-161284-859-4@2011 IEEE [6] Puttipong Mahasukhon, Hand Printed English Character Recognition based on Fuzzy Theory, 978-1-4673-08199@2012 IEEE [7] Nadine Hajj, Isolated Handwriting Recognition Via MultiStage Support Vector Machines, 978-1-4673-22768@2012 IEEE [8] R. Arnold, Character recognition using neural networks, 978-1-4244-9279-4 @2010 IEEE [9] E.J. Bellagarda, On-line handwritten character recognition using parallel neural networks, 0-7803-1775-0@1994 IEEE [10] K.Toscano, Cursive Character Recognition System, 07695-2569-5@2006 IEEE [11] Shinji Tsuruoka, Personal Dictionaries for Handwritten Character Recognition Using Characters Written by a Similar Writer, 12th International Conference on Frontiers in Handwriting Recognition, pp 599-604, 2010 [2] The results are also shown in the form of Plot Graph for recognition of character A as shown in figure Figure 6 : Matching Ratio Plot Graph Figure 7 : Histogram for inputted image(A) ISSN: 2231-5381 http://www.ijettjournal.org Page 49