4.1 Neural Network Architecture

advertisement
Improvement of the
Recognition Module of
WinBank
6.199 Advanced Undergraduate Project
Daniel González
MIT EECS 2002
Advisors: Professor Amar Gupta
Dr. Rafael Palacios
Table of Contents
1. Introduction ............................................................................... 3
2. Background ................................................................................ 3
2.1
WinBank ..................................................................................................................... 3
2.1.1 Preprocessing Module .................................................................................................. 3
2.1.2 Recognition Module ...................................................................................................... 4
2.1.3 Postprocessing Module ................................................................................................. 4
2.2 Neural Networks .......................................................................................................... 4
3. Procedure ................................................................................... 6
3.1
3.2
3.3
3.4
Creation ........................................................................................................................... 6
Training ........................................................................................................................... 6
Testing.............................................................................................................................. 7
Evaluation....................................................................................................................... 7
4. Network Parameters .................................................................8
4.1 Neural Network Architecture.................................................................................. 8
4.1.1 Hidden Layer Size ......................................................................................................... 8
4.1.2 Network Type ................................................................................................................ 9
4.1.3 Transfer Functions ........................................................................................................ 9
4.2 Neural Network Training ....................................................................................... 10
4.2.1 Performance Functions............................................................................................... 10
4.2.2 Training Algorithms .................................................................................................... 10
5. Results ....................................................................................... 11
5.1 Feed-Forward Network Results ........................................................................... 11
5.1.1
5.1.2
5.1.3
5.1.4
5.1.5
Hidden Layer Sizes ..................................................................................................... 12
Transfer Functions ...................................................................................................... 13
Performance Functions............................................................................................... 13
Training Algorithms .................................................................................................... 14
Total Network Analysis ............................................................................................... 15
5.2 LVQ Network Results .............................................................................................. 18
5.3 Elman Network Results ........................................................................................... 18
6. Conclusion ................................................................................ 18
References ..................................................................................... 19
Appendix A: MATLAB Code ................................................. 20
González 2
1. Introduction
More than 60 billion checks are written annually in the United States alone. The current
system for processing these checks involves human workers who read the values from the checks
and enter them into a computer system. Two readers are used for each check to increase
accuracy. This method of processing checks requires an enormous amount of overhead.
Because such a large number of checks are written annually, even a small reduction in the cost of
processing a single check adds up to significant savings. WinBank is a program that is being
created to automate check processing, drastically reducing the time and money spent processing
checks.
WinBank receives the scanned image of a check as input, and outputs the value for which
the check was written. This process of translating physical text (in this case, hand-written
numerals) into data that can be manipulated and understood by a computer is known as Optical
Character Recognition (OCR). WinBank implements OCR through heavy use of a concept from
artificial intelligence known as neural networks. Neural networks can be used to solve a variety
of problems and are a particularly good method for solving pattern recognition problems. The
effectiveness of a neural network at solving problems depends on many different network
parameters, including its architecture and the process by which a network is taught to solve
problems (known as training).
This paper explores the different neural network architectures considered for use in
WinBank and the processes used to train them. The following section presents background
information on WinBank and neural networks, and is followed by a discussion of the procedures
used to test the different types of neural networks considered. This procedural information is
followed by an explanation the different parameters (and their associated values) used for
creating and training the networks. The next section presents the values obtained from evaluating
the performances of the neural networks. The final section identifies the best neural network for
use in WinBank, as well as other neural networks that may be useful in other problems.
2. Background
The main focus of this paper is the module of WinBank that uses neural networks to
recognize handwritten numbers. However, a brief overview of the entire WinBank system and
background information on neural networks are presented here for the readers’ benefit.
2.1 WinBank
The Productivity From Information Technology Initiatives (PROFIT) group at MIT’s
Sloan School of Management is developing a program called WinBank in an effort to automate
check processing in both the United States and Brazil. WinBank achieves this automation by
implementing OCR with a heavy dependence on neural networks. The program is organized into
three main modules that combine to implement OCR. The three modules that make up WinBank
are the preprocessing module, the postprocessing module, and the recognition module.
2.1.1 Preprocessing Module
The preprocessing module takes the scanned image of a check as input, and outputs
binary images in a format that is useful for the recognition module. The preprocessing module
first analyzes the scanned image to determine the location of the courtesy amount block (CAB).
The CAB is the location on the check that contains the dollar amount of the check in Arabic
numerals (figure 1). After determining the location of the CAB, the preprocessing module next
attempts to segment the value written in the CAB into individual digits. These segments are then
González 3
passed through a normalization procedure designed to make all of the characters a uniform size
and a uniform thickness. The preprocessed images are then individually output to the
recognition module.
Figure 1: Courtesy Amount Block (CAB) Location
2.1.2 Recognition Module
The recognition module is the main engine that attempts to classify the number
represented by each image received from the preprocessing module. The recognition module
feeds the output obtained from the preprocessing module into a neural network. The neural
network then attempts to identify the value represented by this input and outputs a number from
zero to nine.
2.1.3 Postprocessing Module
The postprocessing module receives the output of the recognition module and gauges the
strength of the recognition module’s guess. If the postprocessing module is not satisfied that the
recognition module has output a correct value, then either the entire process begins again
(making different decisions along the way) or the check is rejected and a human steps in to
identify the value of the check. If the postprocessing module is satisfied with the recognition
module’s output, then it outputs this value as the output of WinBank.
Feedback
Preprocessing
Module
w CAB Location
w Image
Segmentation
w Image
Normalization
Recognition
Module
Normalized ,
Segmented
Images
w Number
Identification
w Basic
Verification
Postprocessing
Module
Identified Number
w Detailed
Verification
WinBank Output
Figure 2: The three major modules of WinBank
2.2 Neural Networks
Artificial neural networks are a modeled after the organic neural networks in the brain of
an organism. The fundamental unit of an organic neural network is the neuron. Neurons receive
input from one or more different neurons. The strength of the effect that each input has on a
neuron depends on the neuron’s proximity to the neuron from which it received the input[3]. If
the combined value of these inputs is strong enough, then the neuron receiving these signals
outputs a brief pulse. When neurons combine with many other neurons (there are approximately
1011 neurons in the human brain [3]) to form networks, an organism can learn to think and make
decisions.
González 4
Figure 3: Real Neuron (left), Model of an Artificial Neuron (right)
Although much simpler, artificial neural networks perform much the same way as
organic neural networks. Artificial neurons receive inputs from other neurons. The strength of
the effect that each input has on a neuron is determined by a weight associated with the input.
The receiving neuron then takes the sum these weighted inputs and outputs a value according to
its transfer function (and possibly a bias value). Neurons can be combined into sets of neurons
called layers. The neurons in a layer do not interconnect with each other, but interconnect with
neurons in other layers. A neural network is made up of one or more neurons, organized into
one or more layers. The layer that receives the network input is called the hidden layer and the
layer that outputs the network output is called the output layer. Neural networks can have one or
more layers between the input and output layers. These layers are called hidden layers. Two
major components that contribute to the effectiveness of a neural network at solving a particular
problem are its architecture and the method by which it is trained.
Different neural networks can have different architectures. In this paper, the following
parameters are considered when discussing neural network architecture: hidden layer size, the
type of network, and the transfer function or functions used at each layer.
In order for a neural network to learn how to correctly solve a problem, appropriate
network connections and their corresponding weights must be determined through a process
called training. There are many different algorithms used for training a neural network. The
various training procedures and neural network architectures considered for use in WinBank are
presented in later sections.
Figure 4: Basic Neural Network Structure

For notational convenience, artificial neural networks and artificial neurons will hereafter be referred to as neural
networks and neurons, respectively.
González 5
3. Procedure
Many different types of neural networks were designed, created, trained, tested, and
evaluated in an effort to find the appropriate neural network architecture and training method for
use in WinBank. These networks were evaluated according to the main goal of WinBank:
decrease the overhead involved in check processing as much as possible while achieving the
highest possible degree of accuracy. Neural networks that decrease the overhead involved in
check processing are fast and require little human intervention, while neural networks that
achieve a high degree of accuracy make the fewest number of errors when classifying numbers.
This section discusses the procedure used to create, train, test, and evaluate the various neural
networks according to this goal.
The creation, training, and testing of each neural network was done using the MathWorks
software package MATLAB. MATLAB contains a “Neural Network Toolbox” that facilitates
rapid creation, training, and testing of neural networks. MATLAB was chosen to use for
WinBank development because this toolbox would save an enormous amount programming
effort.
3.1 Creation
Creating a neural network is simply a matter of calling the appropriate MATLAB
function and supplying it with the necessary information. For example, the following code
creates a new feed-forward network that uses the logarithmic-sigmoidal transfer function in both
layers and trains its neurons with the resilient backpropagation training algorithm:
net=newff(mm, [25 10], {‘logsig’ ‘logsig’}, ‘RP’);
This network has an input layer, a hidden layer consisting of 25 neurons, and an output layer
consisting of 10 neurons. mm is a matrix of size number_of_inputs x 2. Each row contains the
minimum and maximum value that a particular input node can have. See appendix A for more
MATLAB code that can be used to create and analyze other neural networks.
3.2 Training
Neural networks are useful for OCR because they can often generalize and correctly
classify inputs they have not previously seen. In order reach a solid level of generalization, large
amounts of data must be used during the training process. We used data from the National
Institute of Standards and Technology’s (NIST) Special Database 19: Handprinted Forms and
Characters Database.
NIST Special Database 19 (SD19) is a database that contains Handwriting Sample Forms
(HSF) from 3699 different writers (figure 5). The HSF’s each had thirty-four different fields
used to gather samples of letters and numbers. Some fields were randomly generated for each
HSF to obtain a larger variety of samples. Twenty-eight of the thirty-four fields were digit
fields. SD19 contains scanned versions of each HSF (11.8 dots per millimeter) as well as
segmented versions of the HSF’s, allowing for easy access to specific samples.
Digit samples were obtained from SD19 for use in training and testing the neural
networks. Once obtained, the samples were normalized so that each sample was upright and of
the same thickness. Some of these samples were used to create a training set and others were
used to create a validation set. A training set is used to update network weights and biases, while
a validation set is used to help prevent overfitting. After training, each network went through a
testing procedure to gather data for evaluation of its usefulness in WinBank.

For detailed information on WinBank’s normalization procedure, see [4]
González 6
Figure 5: Handwriting Sample Form from SD19
3.3 Testing
Two different sets of data were obtained in order to test each network. The first set of
data consisted of 10000 samples from SD19 (1000 samples per digit). These samples were
presented to each network using the sim function of MATLAB. Network specific procedures
were then used to compare the output of each neural network against the desired outputs. The
second set of data used to test each network was a set of multiples.
A multiple occurs when image segmentation fails to recognize two adjacent numbers as
individual numbers and presents the recognition module with one image of two numbers (figure
6). Because a multiple is not a number, a multiple should be sent back to the preprocessing
module for resegmentation. In order to test the different neural networks on multiples, multiples
from several checks were used to create a testing set of multiples.
Figure 6: Example of a multiple (double zero)
3.4 Evaluation
Running a network simulation in MATLAB produces a matrix of outputs. This matrix
of actual network outputs can be compared to a target matrix of desired network outputs to
evaluate the performance of each network. Here, the main goal of WinBank should be divided
into its two components: the accuracy of a network, and its ability to reduce processing
overhead. Several parameters were obtained from each network test to evaluate the performance
of each network according to these goals. The percentage of correct outputs (GOOD), the
percentage of incorrect outputs (WRONG), and the percentage of rejected outputs (REJECT)
were obtained from the SD19 test set. The ideal network maximizes GOOD while minimizing
REJECT and WRONG. MULTIPLES REJECTED and NUMBER are two parameters obtained
González 7
from testing the networks on the testing set of multiples. MULTIPLES REJECTED is the
percentage of multiples rejected by the network, and should be maximized. NUMBER is the
percentage of multiples classified as numbers, and should be minimized. Another useful values
for network evaluation is the amount of time spent training it.
Important data for each neural network trained and tested was maintained in a
MATLAB struct array named netData. Each netData struct array has fields for the each
important value, such as the training time (obtained using MATLAB’s tic and toc functions)
and hidden layer size of the network. This struct array allowed for easy storage and access to
important information.
4. Network Parameters
The following parameters were varied during the creation and training of the neural
networks:
1. hidden layer size
a. 25
b. 50
c. 85
2. network type
a. feed-forward
b. learning vector quantization
c. Elman
3. transfer function used at network layers
a. logarithmic-sigmoidal
b. tangential-sigmoidal
c. hard limit
d. linear
e. competitive
4. performance function
a. least mean of squared errors
b. least sum of squared errors
5. training algorithm
a. batch gradient descent with momentum
b. resilient backpropagation
c. BFGS
d. Levenberg-Marquardt
e. random
4.1 Neural Network Architecture
4.1.1 Hidden Layer Size
Each neural network tested for use in WinBank had the same base structure. The input
layer consisted of 117 nodes that receive input from the preprocessing module. These nodes
correspond to the 13 x 9 pixels of the normalized binary image produced by the preprocessing
module. The output layer consisted 10 nodes, the output of which is ideally high at the output
node corresponding to the appropriate digit, and low at every other output node. The hidden
layer structure, however, is architecture dependent. The number of hidden layers is not an
important factor in the performance of a network because it has been rigorously proven one
hidden layer can match the performance achieved with any number of hidden layers [2].
González 8
Because of this, all of the neural networks tested were implemented using only one hidden layer.
The size of the hidden layer, however, is an important factor. Three values were tested for the
number of nodes in the hidden layer of each neural network architecture: 25, 50, and 80. These
values were obtained based on previous experience, and provide a diverse group of values
without creating excessive computation.
Figure 7: Basic Neural Network Architecture
4.1.2 Network Type
There are a variety of network types that can be used when creating neural networks.
The network type can determine various network parameters, such as the type of neurons that are
present in each layer and the method by which network layers are interconnected. Past
experience indicates that feed-forward networks work very well for OCR. Because of this, much
more time was spent analyzing feed-forward network networks than any other networks. The
following types were evaluated for use in WinBank:
1. Feed-forward neural networks (also known as multi-layer perceptrons) are made up of
two or more layers of neurons. The output of each layer is simply fed into the next layer,
hence the name feed-forward networks. Each layer can have a different transfer function
and size.
2. Learning Vector Quantization (LVQ) networks consist of an input layer, a hidden
competitive layer, and an output linear layer. Competitive layers output zero for all
neurons except for the neuron that is associated with the most positive element of the net
input, which outputs one. The linear layer transforms the competitive layer’s output into
target classifications defined by the user [1].
3. Elman networks are a type of recurrent network that consists of two feed-forward layers
and have feedback from the first layer’s output to the first layer’s input. The neurons of
the hidden layer have a tangential-sigmoidal transfer function, and the neurons of the
output layer have a linear transfer function [1].
4.1.3 Transfer Functions
Each neuron uses a transfer function in order to determine its output based on its input.
The following five transfer functions have been tested for use in WinBank:
González 9
1. The logarithmic-sigmoidal transfer function takes an input valued between negative
infinity and positive infinity and outputs a value between zero and positive one.
2. The tangential-sigmoidal transfer function takes an input valued between negative
infinity and positive infinity and outputs a value between negative one and positive one.
3. The hard limit transfer function outputs zero if the net input of a neuron is less than
zero, and outputs one if the net input of a neuron is greater than or equal to zero.
4. The linear transfer function produces a linear mapping of input to output.
5. The competitive transfer function is used in competitive learning and accepts a net
input vector for a layer and returns neuron outputs of zero for all neurons except for the
winner, the neuron associated with the most positive element of the net input [1].
4.2 Neural Network Training
Two important training parameters that effect neural network performance are the
performance function and the training algorithm.
4.2.1 Performance Functions
Performance functions are used in supervised learning to help update the network
weights and biases. In supervised learning, a network is provided with the desired output for
each input. All of the neural networks tested for use in WinBank were trained using supervised
learning. The error is defined as the difference between the desired output and the actual
network output. Network weights in WinBank are updated according to one of two performance
functions to reduce the network error:
1. Least mean of squared errors (MSE): minimizes the average of the squared network
errors.
2. Least sum of squared errors (SSE): minimizes the sums of the squared network errors.
4.2.2 Training Algorithms
There are many different algorithms that can be used to train a neural network. All of the
training algorithms that follow are backpropagation algorithms that implement batch training.
Training algorithms that use backpropagation begin by calculating the changes in the
weights of the final layer before proceeding to compute the weights for the previous layer. They
continue in this backwards fashion until reaching the input layer. The procedure used to
compute the changes in the input weights for each node is specific to each algorithm, and there
are there are various trade-offs in speed, memory consumption, and accuracy associated with
each algorithm.
Algorithms that implement batch training wait until each input is present at the input
layer before making any changes to the network weights. Once all of the inputs have been
presented, the training algorithm modifies the weights according to its procedure. Each iteration
of these algorithms is called an epoch.
The following methods were tested on one or more architectures for use in WinBank:
1. Batch Gradient Descent with Momentum training algorithm (GDM): This training
algorithm updates the network weights in the direction of the negative gradient of the
performance function by a factor determined by a parameter known as the learning rate.
This algorithm makes use of momentum, which allows a network to respond not only to
the local gradient, but also to recent trends in the error surface, allowing networks to
avoid getting stuck in shallow minima [1].
González 10
2. Resilient Backpropagation training algorithm (RP): Backpropagation algorithms that
rely on gradient descent can get stuck in local minima or slow down significantly when
the magnitude of the gradient is small. The resilient backpropagation training algorithm
avoids this problem by using the sign of the gradient to determine the direction of the
weight change. The magnitude of the weight change is obtained by a value that is
sensitive to the behavior of this sign. If the sign does not change for two consecutive
iterations, then the magnitude of the weight change is increased by a constant factor. The
magnitude is decreased when by a constant factor when the sign of the derivative of the
performance function with respect to the weight changes from the previous iteration. If
this derivative is zero, then the value of the magnitude remains the same. If the algorithm
notices oscillation, then the value of the magnitude will be decreased. Finally, if the
weight continues to change in the same direction for several oscillations, then the
magnitude of the weight change will be increased [1]. This method of changing
magnitudes allows the resilient backpropagation algorithm to converge very rapidly.
3. The BFGS training algorithm belongs to a class of training algorithms known as QuasiNewton algorithms. These algorithms approximate Newton’s method, which updates
network weights according to the following basic step:
xk+1 = xk – Ak-1gk
where xk+1 is the updated vector of weights and biases, xk is the current vector of weights
and biases, gk is the current gradient, and Ak is the Hessian matrix (second derivatives) of
the performance index at the current values of the weights and biases [1]. Quasi-Newton
algorithms approximate the complex and computationally expensive calculation of the
Hessian matrix by using a function of the gradient instead of calculating the second
derivative.
4. Levenberg-Marquardt training algorithm: This training algorithm is another
algorithm that approximates Newton’s method by updating network weights and biases in
the following manner:
xk+1 = xk – [JTJ + I]-1 JTe
where J is a matrix, known as the Jacobian matrix, that contains the first derivatives of
the network errors with respect to the weights and biases, e is a vector of network errors,
and  is a scalar that determines how close of an approximation to Newton’s method this
is. When  is zero, then the above function becomes Newton’s method. When  is large,
then it becomes gradient descent with a small step size [1].
5. Random training algorithm: This training algorithm uses gradient descent in order to
converge upon a solution. The difference between this algorithm and others, however, is
that this algorithm trains the network by supplying the inputs and corresponding targets
in a random order. This algorithm does not support validation or test vectors.
5. Results
5.1 Feed-Forward Network Results
A large amount of data was obtained from training and testing various feed-forward
network architectures and training algorithms. Individual parameters are considered below and
succeeded by a discussion of several parameters at once. Results obtained from training and
testing any architecture using the hard limit transfer function are not included in plots because
any architecture using hard limit could not be properly trained for use in OCR.
González 11
Data is presented for each of the two test sets. The important parameters associated with
SD19 test data are the accuracy and the rejection rate. The accuracy is the percentage of
properly recognized inputs. The rejection rate is the percentage of inputs that could not be
recognized by the neural network and had to be sent for further processing (either by humans or
computers). The important parameters associated with the test set containing multiples are
multiples rejected and multiples classified as numbers. Multiples rejected is the percentage of
inputs that the network cannot recognize and rejects. Multiples classified as numbers is the
percentage of inputs that the network classifies as a number. The training time is a parameter
independent of test data and is the number of seconds spent training a particular network. The
testing time was obtained for each network, but these times were all very similar and will not be
discussed further.
5.1.1 Hidden Layer Sizes
Each feed-forward network architecture was tested with three different hidden layer sizes.
The different sizes were 25 nodes, 50 nodes, and 85 nodes. The results for each test set are
shown below.
Figure 8: Results from feed-forward networks trained with varying hidden layer sizes and tested on properly
segmented and normalized images.
Figure 9: Results from feed-forward networks trained with varying hidden layer sizes and tested on images
of multiples.
González 12
5.1.2 Transfer Functions
Each neural network architecture was trained and tested using the tangential-sigmoidal,
logarithmic-sigmoidal, and hard limit transfer functions. Network architectures that used the
hard limit transfer function implemented it in the output layer and the architectures were trained
and tested with either the logarithmic-sigmoidal or tangential-sigmoidal transfer functions in use
for the neurons of the hidden layer. No useful testing resulted from networks trained with the
hard limit transfer function. Each of these networks identified every input presented as the
number one. This occurs because feed-forward networks need differentiable transfer functions
during training. Data associated with networks using the hard limit transfer function are thus
omitted from the graphs of this section.
Figure 10: Results from feed-forward networks trained with two different transfer functions and tested on
properly segmented and normalized images.
Figure 11: Results from feed-forward networks trained with two different transfer functions and tested on
images of multiples.
5.1.3 Performance Functions
Each of the feed-forward neural network architectures was trained with both the least
mean squared error (MSE) and least sum of squared error (SSE) performance functions.
González 13
Figure 12: Results from feed-forward networks trained with two different performance functions and tested
on properly segmented and normalized images.
Figure 13: Results from feed-forward networks trained with two different performance functions and tested
on images of multiples.
5.1.4 Training Algorithms
Each of the feed-forward network architectures were trained and tested thoroughly with
both the batch gradient descent with momentum (GDM) and resilient backpropagation (RP)
training algorithms. However, the BFGS (trainbfg) and Levenberg-Marquardt (trainlm) training
algorithms could not be trained due to unacceptable training time and memory usage. It took
trainlm seventeen minutes to train a network with a training set of one sample. Any increase of
the hidden layer’s size beyond twenty-five neurons yielded and “out of memory” error, despite
experimentation with the memory reduction parameter. Similar results were obtained from
training and testing networks using trainbfg thus, training and testing of architectures using
trainlm and trainbfg was aborted. However, tests of networks trained with GDM and RP yielded
useful results, displayed in the graphs below.
González 14
Figure 14: Results from feed-forward networks trained with two different training algorithms and tested on
properly segmented and normalized images.
Figure 15: Results from feed-forward networks trained with two different training algorithms and tested on
images of multiples.
5.1.5 Total Network Analysis
The network parameters considered individually above are now taken together in an
effort to find the neural network parameters that best suit the goals of WinBank. The graphs
below plot the rejection rate against the percentage of incorrect outputs and accuracy,
respectively, of networks tested with SD19 test data. The ideal neural network in figure 16 is
located as close to the origin of the left graph as possible, and as close to the top left corner of the
right graph as possible. These locations minimize the rejection rate and maximize the correct
output. A high accuracy rate is desirable because of the cost and inconvenience of inaccurate
check values being entered into a computer system. A low rejection rate is desirable because a
high rejection rate means high human intervention, increasing check processing overhead.
Because the ideal neural network does not exist, a compromise must be made between these two
rates. High accuracy thus becomes somewhat more desirable than a low rejection rate.
González 15
Figure 17: Two useful graphs for evaluating network performance on SD19 test inputs. Each node on the
plot corresponds to a neural network.
The input to the neural network will not always be properly segmented and normalized
images such as those used to evaluate the accuracy and rejection parameters above. It is very
likely that at some time the neural network will receive an image of a multiple as input. The
ideal neural network will either reject the image of a multiple, or not classify it as a number.
Unfortunately, the neural networks best suited to receive and classify properly
normalized and segmented images are not the same neural networks best suited to receive and
properly deal with images of multiples.
Tables 1 and 2 contain values obtained from the best networks for evaluating proper input
and multiples, respectively. The top ten networks for handling appropriately segmented and
normalized data classify, on average, 72 percent of inputs that are multiples as numbers, while
only rejecting an average of 24 percent of the multiples. On the other hand, the top ten networks
at handling inputs that are multiples reject an average of 76 percent of proper data and only
correctly classify an average of 19 percent of these data.
Because of these network differences, a simple tradeoff must be made. Because a good
segmentation module should be able to produce more proper inputs than improper inputs, and
because the networks equipped to handle multiples are all but useless when handling proper
inputs, a network is chosen that is better equipped to handle the proper inputs than multiples.
Table 1: Top ten feed-forward networks according to highest GOOD % and lowest REJECT %
GOOD % REJECT % WRONG % REJECT (MULT)% NUMBER %
85.88
10.48
3.64
33.33
66.67
85.81
10.46
3.73
33.33
66.67
85.29
11.16
3.55
18.18
81.82
84.74
12.32
2.94
36.36
63.64
79.17
18.13
2.7
39.39
60.61
75.61
18.89
5.5
18.18
81.82
73.49
21.13
5.38
15.15
84.85
73.21
23.85
2.94
27.27
72.73
71.67
23.61
4.72
21.21
78.79
70.51
26.55
2.94
30.30
69.70
González 16
Table 2: Top ten feed-forward networks according to highest REJECT (MULT)%
GOOD % REJECT % WRONG % REJECT (MULT)% NUMBER %
1.91
93.59
4.5
100.00
0.00
0
100
0
100.00
0.00
0.02
99.97
0.01
100.00
0.00
30.32
69.21
0.47
84.85
15.15
13.49
78.75
7.76
81.82
18.18
24.59
73.94
1.47
81.82
18.18
11.1
79.14
9.76
69.70
30.30
30.56
54.97
14.47
60.61
39.39
26.89
71.97
1.14
57.58
42.42
54.13
40.86
5.01
54.55
45.45
From the data in table 3, networks one through four have comparable parameters, except
network one has a training time that is an order of magnitude smaller than those of networks
three through four. Because network weights are initialized randomly, running the same
procedure with the same data more than once can produce data that is slightly different. Because
of this variation, parameters that have similar values are considered to be the same.
Table 3: Feed-Forward Networks Sorted by descending GOOD% and ascending REJECT%
Network TRAIN GOOD
REJECT
TIME %
REJECT % WRONG % (MULT)% NUMBER %
1
2036
85.88
10.48
3.64
33.33
66.67
2 13361
85.81
10.46
3.73
33.33
66.67
3 18569
85.29
11.16
3.55
18.18
81.82
4 19326
84.74
12.32
2.94
36.36
63.64
5
3179
79.17
18.13
2.7
39.39
60.61
6
1131
75.61
18.89
5.5
18.18
81.82
7
861
73.49
21.13
5.38
15.15
84.85
8
1310
73.21
23.85
2.94
27.27
72.73
9
930
71.67
23.61
4.72
21.21
78.79
10
1192
70.51
26.55
2.94
30.30
69.70
11
658
70.08
22.6
7.32
30.30
69.70
12
8270
62.39
15.29
22.32
45.45
54.55
13
5715
54.13
40.86
5.01
54.55
45.45
14
1610
30.56
54.97
14.47
60.61
39.39
15
1399
30.32
69.21
0.47
84.85
15.15
16
2036
26.89
71.97
1.14
57.58
42.42
17
2255
24.59
73.94
1.47
81.82
18.18
18
5225
18.19
74.46
7.35
42.42
57.58
19
1421
13.49
78.75
7.76
81.82
18.18
González 17
20
1277
11.48
64.56
23.96
45.45
54.55
21
4129
11.1
79.14
9.76
69.70
30.30
22
1946
1.91
93.59
4.5
100.00
0.00
23 19452
0.02
99.97
0.01
100.00
0.00
24
5039
0
100
0
100.00
0.00
Networks seven and eleven have parameters that may not necessarily differ significantly
from network one, and their training times are one order of magnitude smaller than that of
network one. The remaining networks differ significantly from these networks and are not
considered for use in WinBank. Table three contains the network parameters associated with
networks one, seven, and eleven.
Table 4: Network parameters associated with top three feed-forward networks
Training Performance
Transfer Training
Network Algorithm Function
Hidden Layer Size Function Time
1
GDM
SSE
50 logsig
2036
2
RP
SSE
25 logsig
861
3
RP
MSE
25 logsig
658
5.2 LVQ Network Results
A small number of LVQ networks were tested to compare with feed-forward networks.
The top network results obtained is shown in table 5. Although the accuracy is fairly high, its
inability to reject outputs leads to a high percentage of incorrect outputs. It is unable to reject
outputs because LVQ networks produce binary output. Its inability to reject outputs and
relatively long training time made it impractical to further pursue the use of LVQ networks.
Table 5: Top LVQ network results
Performance Hidden
train
Function
Layer Size time
Accuracy Reject
MSE
50 6475.88
73
Wrong
0
27
5.3 Elman Network Results
A small number of Elman networks were tested. Table 6 contains results from the best
Elman network produced. Low accuracy and high training time were the reasons that further
testing of Elman networks were abandoned.
Table 6: Top Elman network results
Layer
Layer
Performance Hidden
1Transfer 2Transfer Train
Function
Layer Size Function Function time
Accuracy Reject
MSE
50 logsig
logsig
8091.24
21.18
0
Wrong
78.82
6. Conclusion
The top network for use in WinBank is a network with a hidden layer size of 50 nodes,
use the logarithmic-sigmoidal transfer function at the hidden and output layers, and uses the
GDM training algorithm in combination with SSE. This combination took 2036 seconds to train
and achieved an accuracy of 85 percent, while only rejecting 10 percent of its outputs. Although
González 18
the networks of table 4 produce similar output and take much less time to train, they are
approximately 15 percent less accurate and reject approximately 10 percent more outputs. It is
unlikely that the network will need to be retrained very often, making the larger training time of
network 1 in table 4 insignificant. If the application should change and require the network to be
trained more often, then the top three networks should be tested several times and be evaluated
according to the averages of the values obtained. This increases the usefulness of small
differences in the values obtained from testing enabling the appropriate network to be chosen.
However, because network training does not currently need to occur often, network 1 in table 4
is the best network to use in WinBank.
References
[1] Demuth, Howard and Beale, Mark. “Neural Network Toolbox” (2001)
[2] Sinha, Anshu. “An Improved Recognition Module for the Identification of Handwritten
Digits” Master Thesis, Massachusetts Institute of Technology. (1999)
[3] Winston, Patrick. “Artificial Intelligence” (1992)
[4] Palacios, Rafael and Gupta, Amar. “A System for Processing Handwritten Bank Checks
Automatically” Working paper 4346-02
González 19
Appendix A: MATLAB Code
The following functions can be used to create, train, and test the neural networks
described above. For more information, see the Neural Network Toolbox [1].
Creation
Feed-Forward Networks
newff(mm, sizeArray, transferFunctionCellArray, trainingAlgorithm);
LVQ Networks
newlvq(mm, hiddenLayerSize, percentages);
Elman Networks
newelm(mm, sizeArray, transferFunctionCellArray);
mm: Matrix of size number_of_inputs x 2. Each row contains the minimum and maximum
value that a particular input node can have.
sizeArray: array that contains size for each layer (not including input)
transferFunctionCellArray: Cell Array that contains strings representing the transfer functions
for each layer (not including input layer).
Transfer function
MATLAB String
logarithmic-sigmoidal logsig
tansig
tangential-sigmoidal
hardlim
hard limit
purelin
linear
competitive
(automatic for appropriate layer)
trainingAlgorithm: A string representing the training algorithm for the network.
Training algorithm
MATLAB String
Batch Gradient Descent with Momentum traingdm
trainrp
Resilient Backpropagation
trainbfg
BFGS
trainlm
Levenberg-Marquardt
trainr
Random
hiddenLayerSize: The size of the hidden layer
percentages: matrix of expected percentages of inputs.
Training
[net, tr] = train(net, trainData, T, [], [], VV);
net: neural network to be trained
trainData: training data set
T: desired output for each input
VV: struct array of with validation inputs and targets
Testing
output = sim(net, testData);
net: neural network to be tested
testData: testing data set
González 20
Download