Neural Network Application Design Now let us talk about…

advertisement
Now let us talk about…
Neural Network
Application Design
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
1
NN Application Design
Now that we got some insight into the theory of
artificial neural networks, how can we design networks
for particular applications?
Designing NNs is basically an engineering task.
As we discussed before, for example, there is no
formula that would allow you to determine the optimal
number of hidden units in a BPN for a given task.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
2
NN Application Design
We need to address the following issues for a
successful application design:
• Choosing an appropriate data representation
• Performing an exemplar analysis
• Training the network and evaluating its performance
We are now going to look into each of these topics.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
3
Data Representation
• Most networks process information in the form of
input pattern vectors.
• These networks produce output pattern vectors
that are interpreted by the embedding application.
• All networks process one of two types of signal
components: analog (continuously variable) signals
or discrete (quantized) signals.
• In both cases, signals have a finite amplitude; their
amplitude has a minimum and a maximum value.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
4
Data Representation
The main question is:
How can we appropriately capture these signals and
represent them as pattern vectors that we can feed
into the network?
We should aim for a data representation scheme that
maximizes the ability of the network to detect (and
respond to) relevant features in the input pattern.
Relevant features are those that enable the network to
generate the desired output pattern.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
5
Data Representation
Similarly, we also need to define a set of desired
outputs that the network can actually produce.
Often, a “natural” representation of the output data
turns out to be impossible for the network to produce.
We are going to consider internal representation
and external interpretation issues as well as specific
methods for creating appropriate representations.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
6
Internal Representation Issues
As we said before, in all network types, the amplitude
of input signals and internal signals is limited:
• analog networks: values usually between 0 and 1
• binary networks: only values 0 and 1allowed
• bipolar networks: only values –1 and 1allowed
Without this limitation, patterns with large amplitudes
would dominate the network’s behavior.
A disproportionately large input signal can activate a
neuron even if the relevant connection weight is very
small.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
7
Creating Data Representations
The patterns that can be represented by an ANN most
easily are binary patterns.
Even analog networks “like” to receive and produce
binary patterns – we can simply round values < 0.5 to
0 and values  0.5 to 1.
To create a binary input vector, we can simply list all
features that are relevant to the current task.
Each component of our binary vector indicates
whether one particular feature is present (1) or
absent (0).
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
8
Creating Data Representations
With regard to output patterns, most binary-data
applications perform classification of their inputs.
The output of such a network indicates to which class
of patterns the current input belongs.
Usually, each output neuron is associated with one
class of patterns.
For any input, only one output neuron should be
active (1) and the others inactive (0), indicating the
class of the current input.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
9
Creating Data Representations
In other cases, classes are not mutually exclusive,
and more than one output neuron can be active at the
same time.
Another variant would be the use of binary input
patterns and analog output patterns for
“classification”.
In that case, again, each output neuron corresponds
to one particular class, and its activation indicates the
probability (between 0 and 1) that the current input
belongs to that class.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
10
Creating Data Representations
For non-binary (e.g., tertiary) features:
• Use multiple binary inputs to represent non-binary
states (e.g., 001 for “red”, 010 for “green”, 100 for
“blue” for representing three possible colors).
• Treat each feature in the pattern as an individual
subpattern.
• Represent each subpattern with as many positions
(units) in the pattern vector as there are possible
states for the feature.
• Then concatenate all subpatterns into one long
pattern
vector. Introduction to Artificial Intelligence
April 12, 2016
11
Lecture 19: Neural Network Application Design II
Creating Data Representations
Another way of representing n-ary data in a neural
network is using one neuron per feature, but scaling
the (analog) value to indicate the degree to which a
feature is present.
Good examples:
• the brightness of a pixel in an input image
• the output of an edge filter
Poor examples:
• the letter (1 – 26) of a word
• the type (1 – 6) of a chess piece
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
12
Creating Data Representations
This can be explained as follows:
The way NNs work (both biological and artificial ones)
is that each neuron represents the
presence/absence of a particular feature.
Activations 0 and 1 indicate absence or presence of
that feature, respectively, and in analog networks,
intermediate values indicate the extent to which a
feature is present.
Consequently, a small change in one input value
leads to only a small change in the network’s
activation pattern.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
13
Creating Data Representations
Therefore, it is appropriate to represent a non-binary feature by
a single analog input value only if this value is scaled, i.e., it
represents the degree to which a feature is present.
This is the case for the brightness of a pixel or the output of an
edge detector.
It is not the case for letters or chess pieces.
For example, assigning values to individual letters (a = 0, b =
0.04, c = 0.08, …, z = 1) implies that a and b are in some way
more similar to each other than are a and z.
Obviously, in most contexts, this is not a reasonable
assumption.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
14
Creating Data Representations
It is also important to notice that, in artificial (not
natural!), completely connected networks the order of
features that you specify for your input vectors does
not influence the outcome.
For the network performance, it is not necessary to
represent, for example, similar features in
neighboring input units.
All units are treated equally; neighborhood of two
neurons does not imply to the network that these
represent similar features.
Of course once you specified a particular order, you
cannot change it any more during training or testing.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
15
Exemplar Analysis
When building a neural network application, we must
make sure that we choose an appropriate set of
exemplars (training data):
• The entire problem space must be covered.
• There must be no inconsistencies (contradictions)
in the data.
• We must be able to correct such problems without
compromising the effectiveness of the network.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
16
Ensuring Coverage
For many applications, we do not just want our
network to classify any kind of possible input.
Instead, we want our network to recognize whether an
input belongs to any of the given classes or it is
“garbage” that cannot be classified.
To achieve this, we train our network with both
“classifiable” and “garbage” data (null patterns).
For the the null patterns, the network is supposed to
produce a zero output, or a designated “null
neuron” is activated.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
17
Ensuring Coverage
In many cases, we use a 1:1 ratio for this training,
that is, we use as many null patterns as there are
actual data samples.
We have to make sure that all of these exemplars
taken together cover the entire input space.
If it is certain that the network will never be presented
with “garbage” data, then we do not need to use null
patterns for training.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
18
Ensuring Consistency
Sometimes there may be conflicting exemplars in our
training set.
A conflict occurs when two or more identical input
patterns are associated with different outputs.
Why is this problematic?
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
19
Ensuring Consistency
Assume a BPN with a training set including the
exemplars (a, b) and (a, c).
Whenever the exemplar (a, b) is chosen, the network
adjust its weights to present an output for a that is
closer to b.
Whenever (a, c) is chosen, the network changes its
weights for an output closer to c, thereby “unlearning”
the adaptation for (a, b).
In the end, the network will associate input a with an
output that is “between” b and c, but is neither
exactly b or c, so the network error caused by these
exemplars will not decrease.
For many applications, this is undesirable.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
20
Ensuring Consistency
To identify such conflicts, we can apply a (binary)
search algorithm to our set of exemplars.
How can we resolve an identified conflict?
Of course, the easiest way is to eliminate the
conflicting exemplars from the training set.
However, this reduces the amount of training data that
is given to the network.
Eliminating exemplars is the best way to go if it is
found that these exemplars represent invalid data, for
example, inaccurate measurements.
In general, however, other methods of conflict
resolution are preferable.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
21
Ensuring Consistency
Another method combines the conflicting patterns.
For example, if we have exemplars
(0011, 0101),
(0011, 0010),
we can replace them with the following single exemplar:
(0011, 0111).
The way we compute the output vector of the new exemplar
based on the two original output vectors depends on the
current task.
It should be the value that is most “similar” (in terms of the
external interpretation) to the original two values.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
22
Ensuring Consistency
Alternatively, we can alter the representation scheme.
Let us assume that the conflicting measurements were taken at
different times or places.
In that case, we can just expand all the input vectors, and the
additional values specify the time or place of measurement.
For example, the exemplars
(0011, 0101),
(0011, 0010)
could be replaced by the following ones:
(100011, 0101),
(010011, 0010).
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
23
Ensuring Consistency
One advantage of altering the representation scheme
is that this method cannot create any new conflicts.
Expanding the input vectors cannot make two or more
of them identical if they were not identical before.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
24
Training and Performance Evaluation
A more insightful way of performance evaluation is
partial-set training.
The idea is to split the available data into two sets –
the training set and the test set.
The network’s performance on the second set
indicates how well the network has actually learned
the desired mapping.
We should expect the network to interpolate, but not
extrapolate.
Therefore, this test also evaluates our choice of
training samples.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
25
Training and Performance Evaluation
If the test set only contains one exemplar, this type of
training is called “hold-one-out” training.
It is to be performed sequentially for every individual
exemplar.
This, of course, is a very time-consuming process.
A less extreme version of hold-one-out training is
cross validation, in which we split the dataset into n
subsets.
Each subset serves as the test set once, with the
other (n – 1) subsets forming the training set.
This means that n training processes are performed;
this is referred to as n-fold cross validation.
April 12, 2016
Introduction to Artificial Intelligence
Lecture 19: Neural Network Application Design II
26
Download