Pattern Recognition – definition

advertisement
Pattern Recognition – definition
The Basic Task
Assign an object to one of a set of classes
Examples
Character Recognition
Classify a new image as an alphanumeric character
Face Recognition
Classify a new image as the face of one of a class of persons
Stock Market Prediction
Given a set of financial data predict whether the stock market will rise or
fall
Earthquake Prediction
Given a set of geological observations predict when an earthquake will
occur
Fault diagnosis
Given a set of test on a piece of equipment diagnose what’s wrong with it
Medical diagnosis
Given a set of medical symptoms diagnose what disease a patient has
Autonomous vehicle control
Given data about a vehicle and its environment determine its next moves
1.1 Definitions
All the above situations have the following things in common
a set of objects:
alphanumeric characters
faces
states of the stock market
geological situation
faulty piece of equipment
sick patient
state of vehicle and its environment
a set of features describing each object:
pixels
financial data
seismic observations
tests on equipment
medical symptoms
a set of classes to which each object is to be assigned:
‘A’ ‘B’ ‘C’ … etc.
Fred’s face, Joe’s face, Mary’s face … etc.
Stock market rise, stock market fall
Earthquake 0 on Richter scale, 1 on Richter scale ….
Capacitor faulty, transistor faulty …
Lung cancer, stomach cancer, heart disease,…
Move left, move right, speed up, slow down, etc…
1.2 Central Task:
We must design an algorithm which takes the
features as inputs and the class as output
features
classifier
class
1.3 Features
Types of Feature
Continuous – real-valued
Discrete – integer
Binary
Categorical – non-ordered
Feature Space
In image analysis the features are the pixel values. These are usually integervalued ( e.g. greylevels 0-255). Let the number of pixels be N.
Then each image can be described by a vector of integer values – let us call
it the feature vector
Each element of the vector can be interpreted as a dimension of an Ndimensional space – let us call this feature space
Each image can be represented as a point in this space. Its co-ordinates are
given by its pixel values.
1.4 Clusters
Everything is this course depends on the following assumption:
The points which represent different images belonging to the same class
will fall into clusters within feature space
Most classification algorithms are based on computing boundaries within
feature space which separate the regions within which the different classes
lie.
These boundaries are called decision surfaces.
Pattern recognition algorithms differ in how they draw these boundaries.
Some draw linear boundaries.
Some draw quadratic boundaries.
Some draw more complex boundaries.
1.5 The Two Phases of Pattern Recognition:
Learning and Classification
Every pattern recognition algorithm has two phases.
Learning phase
Classification phase
During the learning phase we give the algorithm a set of data with known
classes. The algorithm uses this data to decide where to put the decision
surfaces.
During the classification phase we present the algorithm with new data
which it hasn’t seen before. It then uses the decision surfaces to decide the
class of the new data.
1.6 Common Themes
Noise – all data (both training data and testing data) contains noise
Occam’s razor – we mustn’t make the decision surfaces more complicated
than is justified by the training data
Probabilistic reasoning – we can never classify a new image with 100%
certainty, we can only give a probability of its belonging to each class.
Accuracy – even the best pattern recognition algorithm makes mistakes. It
will occasionally classify a new image into the wrong class.
Misclassifications can occur for various reasons.
Noise in the data
Decision boundaries being incorrectly set
Clusters overlap in feature space
We are continually in the presence of uncertainty. Because our training data
is necessarily incomplete and contains noise we can never be certain that our
decision boundaries are correct. Even if the decision boundaries were correct
we could still misclassify a new image because of noise or because the
clusters overlap. Hence the need for probabilistic reasoning.
1.7 Noise
All data contains noise.
In images this is usually caused by thermal agitation in the camera’s
photodetectors. Therefore two images of the same object are never identical.
There will always be some random variation in the pixel values.
If we were to take many images of the same object and plot them in feature
space they would form a spherical cluster – whose radius was proportional
to the amount of noise in the images.
This cloud may straddle a decision boundary. This would cause the image to
be sometimes put into one class and sometimes into another.
1.8 Overfitting – Occam’s razor
We must not make the decision boundaries too complex
Is the curve in this boundary justified or not? With so few data it’s difficult
to tell. It’s possible that if we had a bigger sample there might be some data
points on the other side of the boundary.
As the amount of data increases so does our confidence that the decision
boundary is actually curved.
However the following situation shows the opposite effect
The wiggles shown in this boundary are almost certainly not justified by this
data.
This phenomenon is called overfitting. It occurs when we try to fit the
decision boundary to effects in the data which are probably due to noise.
To prevent overfitting we should use simple models for our decision
boundaries (such as straight lines or simple curves). We should use complex
models only when we have sufficient data to justify it.
In general the model of the decision boundary should not contain more
parameters than there are objects in the data set – this is Occam’s razor
This problem becomes very severe when we have a large number of
dimensions- the “Curse of Dimensionality”. We shall see how we can reduce
the number of dimensions using a technique called Principal Components
Analysis.
Download