8.1. How to use a multi-layer neural net for recognition? (Translation

advertisement
8.1. How to use a multi-layer neural net for recognition?
(Translation by Piotr Ciskowski; piotr.ciskowski@pwr.wroc.pl)
Multi-layer neural nets, which you have studied thoroughly in the previous chapter, may be used for
many tasks. Nevertheless, if we want to analyze their properties and the way they work, the most appropriate
area is image recognition. Image recognition is a problem, in which a neural net (or other kind of a learning
machine) decides on the membership of images to certain classes. The analyzed objects may be of various kinds
– from digital camera images to scanned or grabbed analog images. We present a comparison of a digital and
analog image in fig. 8.1, in order to illustrate what we are talking about, but also to displease all those readers too
much confident that “digital” always means “better”.
analog image
digital image
Fig. 8.1. A comparison of an analog and digital image
This book is about neural nets however, not about images, so we are not going to go deep into the theory of
image recognition, especially considered as picture recognition. Still it is worth noticing that the problem of
recognizing images, namely pictures, actually started that field of research and gave name for it. The first
(historical!) neural net built by Frank Rosenblatt (presented in fig. 8.2) was used to recognize images and that is
why it was called a “Perceptron”. Please regard this picture (even if quite ancient and of poor quality) with the
proper respect, as a relic of one of the first achievements in the discipline we are studying in this book.
Fig. 8.2. Rosenblatt’s „perceptron” – the first neural network recognizing images
The meaning of the word “image” has been generalized so much that now neural nets are used for
recognizing samples of sound signals (e.g. spoken commands), seismic or other geophysical signals (when
searching for geological ledges), symptoms of patients to be diagnosed, scores of companies applying for loans –
and much more. We consider all these tasks as image recognition, even if the above mentioned “images” are in
fact, respectively: acoustic, geophysical, diagnostic, economical, or other.
A neural net used for image recognition has usually got several inputs, supplied with signals
representing features of the objects being recognized. These may be for example some coefficients describing
the shape of a machine’s part, or the liver’s tissue texture. We often use many inputs as we want to show the
neural net all the features of the analyzed object, so that the net is able to learn properly how to recognize it.
However, the number of image features is much less than the number of image’s elements (pixels). If you supply
the net with the raw digital image, then the number of its inputs will go into hundreds of thousands, or even a
few million! That is why we practically never use neural nets for analyzing raw images. The nets usually “see”
the features of the analyzed image, extracted by other independent programs outside the net. This image
preparation is sometimes called “preprocessing”.
A neural net used for recognition has usually got several outputs as well. Generally speaking, each
output is assigned to a specific class. For example an OCR (optical character recognition) system may use over
60 outputs, each assigned to a certain character – e.g. the first output neuron indicates letter A, the second neuron
– letter B and so on. We discussed that in chapter 2, if you would like to recall the possible output signals of a
net used for recognition, go back to figures 2.30, 2.31 and 2.33.
There is usually at least one hidden layer of neurons between the input and the output of the net – we
will now study the hidden layer in a more detail. Generally speaking, there may be many neurons in the hidden
layer or just a few of them. After analyzing briefly the processes going on in neural nets, we usually think that
more neurons in the hidden layer give a more “clever” net. However, you will soon learn that it is not always
worth having a net with large “built-in intelligence”, as it sometimes turns out to be surprisingly disobeying!
Download