ONLINE HANDWRITTEN CHARACTER RECOGNITION BASED ON ONLINE-OFFLINE FEATURES USING BP NEURAL NETWORK Abdul Fadlil1 Marzuki Khalid2 Rubiyah Yusof 3 Centre for Artificial Intelligence and Robotics (CAIRO) Universiti Teknologi Malaysia, Jalan Semarak, 54100 Kuala Lumpur, Malaysia Abstract Efforts in handwriting recognition are still widely research as current techniques are still inadequate and inefficient. In this paper an online handwriting character recognition system based on online-offline features using the back-propagation neural network algorithm. A novel feature of this system is in the feature extraction technique which has been developed based on temporal and spatial handwriting signals. The goal is to improve the feature extraction analysis for online handwriting recognition. The results of recognition rates based on the UNIPEN and IRONOFF databases show a higher rate of accuracy for writer-independent applications. To achieve high handwriting recognition accuracy, many different representations of feature extraction and different methods for classification have been proposed. For online handwriting recognition, example of the standard features extracted are position, direction , curvature , pen-down/pen-up flag [ Schenkel et al. 1994], and some examples of the classification methods are Hidden Markov Models (HMM), Artificial Neural Network (ANN), Fuzzy Logic, Support Vector Machine (SVM), Hybrid HMM/ANN, to name a few [Poisson et al. 2002]. Keywords: online handwriting, character recognition, neural network, feature extraction 1 Introduction Handwriting recognition is not a new technology, however, research in this area has gained much attention lately which is due to the increasingly new interest in entering data into computers. Rather than using the conventional keyboard, many computers are now equipped with handwriting input monitors such as those used in tablet PCs, which is one example. Basically, handwriting recognition is separated into two distinct domains namely: online and offline recognition. In the online case, the recognition system is based on dynamic information with special equipment and it is recognized in real time. In the off-line case, a static representation resulting from the digitalization of a document is available as an image. Online handwriting is important where keyboard are difficult to be used, for example, when the writer is mobile and the device needs to be portable. Mobile communication system such as Personal Digital Assistants (PDAs), electronic pads and smartphones have online handwriting recognition interface integrated into them. Therefore, it is important to further improve on the recognition performances of these applications. --------------------------------------------1e-mail : fadlil@lycos.com 2e-mail : marzuki@utmkl.utm.my 3e-mail : rubiyah@utmkl.utm.my Figure 1. A block diagram of the proposed “Online Handwriting Recognition System” In this paper, we propose a new approach for the recognition of characters that is based on the integration of online-offline features and the Multi Layer Perceptrons (MLP) recognizer. The online handwriting recognition module includes: a pre-processing module and a recognition module as shown Figure 1. The following section describes the pre-processing module which includes data acquisition, size normalization, re-sampling, and feature extraction. The architecture and training algorithm of the recognizer are presented in the Section 3. The experimental results achieved are discussed in Section 4. The last section concludes the paper. 2 Preprocessing module Data acquisition is required to acquire the handwriting of the user which can be based on a variety of input tools. Basically, for online handwriting recognition system based on neural network, two processes are required namely: training and testing. For these processes, we used online data (x,y) coordinates from the UNIPEN [Guyon et al. 1994], and IRONOFF [Viard-Gaudin et al. 1999] databases. Size normalization is performed by scaling each character both horizontally and vertically [Li et al. 1997]: xio xmin xi W xmax xmin (1) yio y min H y max y min (2) yi where x o i ,y o i The curvature of the stroke as the second derivatives denotes the original point, and x , y is the i i corresponding point after transformation, x min min i xio , x max max i xio , y min min i y , y o i max Figure 2. Estimation of writing direction. max y , W and H are the i d 2 x ds 2 2 2 and d y ds are not bounded and the local curvature is approximated by the angle between two elementary segments. This can be shown as in Figure 3. This angle is also encoded by its cosine and sine. Using the subtraction formulas for sine and cosine these values can be calculated as: (8) cos (n) cos (n 1) (n 1) sin (n) sin (n 1) (n 1) (9) o i width and height of the normalized character, respectively. Re-sampling is done to make the raw data points equidistant in time using a simple linear interpolation algorithm as follows. The re-sampling step ΔS is a fraction of the total arc length L: n 1 L di (3) i 1 di xi xi 1 2 yi yi 1 2 (4) (5) S L n1 where di denotes the distance of point to point and n is the number of points. After re-sampling, the characters have a fixed number (n1) of points per character (50 points in our system) which provides a fixed size input to the neural network. The purpose of the feature extraction module is to enhance the variability which helps to discriminate between classes. In this system integration of the online and offline features are used together. Online features includes: pen-up/down, pen coordinates, direction and curvature (a symbol means direction of any point and does curvature). A binary feature “1” indicates the pen is touching the pad (pen-down) and “0” indicates the pen is not touching the pad (pen-up). The direction of a stroke is determined by a discrete approximation of the first derivatives with respect to the arc length, where ds dx ds dy ds , and dx dy . 2 2 These approximations can be calculated as shown in Figure 2 in which the following calculations are required [Raph et al. 1997]: x(n) cos (n) s(n) y (n) sin (n) s (n) (6) (7) Figure 3. Estimation of curvature. The online feature components are bounded and varied between 0 and +1 as follows [Jung et al. 2000]: 1 if pen is down f0 otherwise 0 f1 x xmin xmax xmin y y min y max y min f 3 cos 1 / 2 f2 f 4 sin 1 / 2 f 5 cos 1 / 2 f 6 sin 1 / 2 (10) (11) (12) (13) (14) (15) (16) Offline features are calculated from transformation of the UNIPEN and IRONOFF online database. Each of the characters are divided into 10 rows and 10 columns from (x,y) coordinate points to 10*10 binary images. All of the features extracted include online-offline features and an example is shown as in Figure 4. The features consist of 450 feature values as input to the neural network system. Figure 4. Example of features extracted for the normalized digit “2”. 3 Recognition module The output of the hidden layer is as follows, z j f ( z _ in j ) In this section we describe the architecture and training algorithm of the Multi Layer Perceptrons (MLP) with one hidden layer as shown as in Figure 5. (17) n z _ in j xi vij (18) i 1 and, the for the output layer is yk f ( y _ ink ) (19) p y _ ink z j w jk (20) j 1 where normally f(x) is a sigmoid function as follows: f ( x) 1 1 exp( x) (21) The algorithm follows such that where Figure 5. The MLP with one hidden layer From Figure 5, yk , z j and xi are the signals of the output, hidden, and input layer, respectively. The parameter w jk denotes the weights between the output and the hidden layer, and vij denotes the weights between the hidden and the input layers. Number of the neurons in the input layer, hidden layer and output layer are n, p, and m, respectively. The system is a multi-layer feed-forward neural network trained using the back-propagation algorithm. The input vector values are derived from measurements of the extracted features and are bounded between 0 and +1. During learning, an input vector is presented to the network and is propagated from the input layer to the output layer. In the learning phase, the learning rate is preset, and the weights of the network are small randomly selected. and w jk k z j (22) k (t k yk ) f ' ( y _ ink ) (23) vij j xi (24) where m j k w jk f ' ( z _ in j ) (25) k 1 In this system, we find that the number of hidden layer neurons is optimally chosen at 100 and the learning rate is chosen at 0.5. After training process, the online handwritten character recognition system have knowledge, so an unknown input with the given feature vector x belongs can be recognized. The recognition result can be selected the maximum as k * arg max y k , 1 k m k and classify the input sample as class k*. (26) FIGURE 6. Efficiency vs speed using neural 4 Experimental results network and V/hz scalar The MLP proposed was trained and tested using the isolatedcontrol character portion of the UNIPEN database of subset-categories 1a, 1b and 1c, and also using the IRONOFF databases. The databases are split into two sets, i.e. the training set and testing set. All of the experiments were performed using the proposed MLP system based on writer-independent mode (or omni-writer) and the recognition results are shown as in Table 1. The results are also compared with the system based on offline features using Space Displacement Neural Network (SDNN) and Multi Layer Perceptrons (MLP) developed by Poisson. JUNG, K., AND KIM, H. J., 2000. On-line Recognition of Cursive Korean Characters using Graph Representation. Pattern Recognition, vol. 33, 399-412. BAHLMANN, C., HAASDONK, B., AND BURKHARDT, H. 2002. On-line Handwriting with Support Vector Machines – A Kernel Approach. Proc. of the 8th Int. Workshop on Frontiers in Handwriting Recognition (IWFHR), 49-50. POISSON, E., VIARD-GAUDIN, C., AND LALLICAN, P.M. 2002. Multi-modular Architecture Based on Convolutional Neural Networks for Online Handwritten Character Recognition, ICONIP’02. A novel features in this system shows the average recognition rate are better. In the future, from the experimental results indicates the system with integration online-offline features can perhaps further improve the accuracy of the recognition rate. GUYON, I., SCHOMAKER, L., PLAMONDON, R., LIBERMAN, M., AND JANET, S. 1994. UNIPEN project on-line data exchange and recognizer benchmarks, Proc. of the 12th Int’nl Conference on Pattern Recognition, 9-13. Table 1. The experimental results of online handwriting character recognition VIARD-GAUDIN, C., LALLICAN, P. M., KNERR, S., AND BINTER, P. 1999. The IRESTE On/Off (IRONOFF) Dual Handwriting Database, ICDAR’99. 5 Conclusion The results of our experiments show that the system is effective to recognize writer-independent online handwritten characters with a high accuracy in real-time. In the future, it is still possible to improve accuracy using new feature extraction techniques and recognition methods. Also, the system can be potentially improved for the online handwriting cursive recognition system. References PLAMONDON, R., AND SRIHARI, S.N. 2000. On-line and Offline Handwriting Recognition: A Comprehensive Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, 1, 63-81. LI, X., AND YEUNG, D. Y. 1997. On-line Handwritten Alphanumeric Character Recognition using Dominant Points in Strokes. Pattern Recognition, vol. 31, 1, 31-44. SCHENKEL, M., GUYON, I., AND HENDERSON, D. 1994. On-line Cursive Script Recognition using Time Delay Neural Networks and Hidden Markov Models. Proc. ICAASSP’93, vol. 2, 637-640. RAPH, G., 1997. Run-On Recognition in an On-line Handwriting Recognition System. Report, University of Karlsruhe.