Pattern Recognition – definition The Basic Task Assign an object to one of a set of classes Examples Character Recognition Classify a new image as an alphanumeric character Face Recognition Classify a new image as the face of one of a class of persons Stock Market Prediction Given a set of financial data predict whether the stock market will rise or fall Earthquake Prediction Given a set of geological observations predict when an earthquake will occur Fault diagnosis Given a set of test on a piece of equipment diagnose what’s wrong with it Medical diagnosis Given a set of medical symptoms diagnose what disease a patient has Autonomous vehicle control Given data about a vehicle and its environment determine its next moves 1.1 Definitions All the above situations have the following things in common a set of objects: alphanumeric characters faces states of the stock market geological situation faulty piece of equipment sick patient state of vehicle and its environment a set of features describing each object: pixels financial data seismic observations tests on equipment medical symptoms a set of classes to which each object is to be assigned: ‘A’ ‘B’ ‘C’ … etc. Fred’s face, Joe’s face, Mary’s face … etc. Stock market rise, stock market fall Earthquake 0 on Richter scale, 1 on Richter scale …. Capacitor faulty, transistor faulty … Lung cancer, stomach cancer, heart disease,… Move left, move right, speed up, slow down, etc… 1.2 Central Task: We must design an algorithm which takes the features as inputs and the class as output features classifier class 1.3 Features Types of Feature Continuous – real-valued Discrete – integer Binary Categorical – non-ordered Feature Space In image analysis the features are the pixel values. These are usually integervalued ( e.g. greylevels 0-255). Let the number of pixels be N. Then each image can be described by a vector of integer values – let us call it the feature vector Each element of the vector can be interpreted as a dimension of an Ndimensional space – let us call this feature space Each image can be represented as a point in this space. Its co-ordinates are given by its pixel values. 1.4 Clusters Everything is this course depends on the following assumption: The points which represent different images belonging to the same class will fall into clusters within feature space Most classification algorithms are based on computing boundaries within feature space which separate the regions within which the different classes lie. These boundaries are called decision surfaces. Pattern recognition algorithms differ in how they draw these boundaries. Some draw linear boundaries. Some draw quadratic boundaries. Some draw more complex boundaries. 1.5 The Two Phases of Pattern Recognition: Learning and Classification Every pattern recognition algorithm has two phases. Learning phase Classification phase During the learning phase we give the algorithm a set of data with known classes. The algorithm uses this data to decide where to put the decision surfaces. During the classification phase we present the algorithm with new data which it hasn’t seen before. It then uses the decision surfaces to decide the class of the new data. 1.6 Common Themes Noise – all data (both training data and testing data) contains noise Occam’s razor – we mustn’t make the decision surfaces more complicated than is justified by the training data Probabilistic reasoning – we can never classify a new image with 100% certainty, we can only give a probability of its belonging to each class. Accuracy – even the best pattern recognition algorithm makes mistakes. It will occasionally classify a new image into the wrong class. Misclassifications can occur for various reasons. Noise in the data Decision boundaries being incorrectly set Clusters overlap in feature space We are continually in the presence of uncertainty. Because our training data is necessarily incomplete and contains noise we can never be certain that our decision boundaries are correct. Even if the decision boundaries were correct we could still misclassify a new image because of noise or because the clusters overlap. Hence the need for probabilistic reasoning. 1.7 Noise All data contains noise. In images this is usually caused by thermal agitation in the camera’s photodetectors. Therefore two images of the same object are never identical. There will always be some random variation in the pixel values. If we were to take many images of the same object and plot them in feature space they would form a spherical cluster – whose radius was proportional to the amount of noise in the images. This cloud may straddle a decision boundary. This would cause the image to be sometimes put into one class and sometimes into another. 1.8 Overfitting – Occam’s razor We must not make the decision boundaries too complex Is the curve in this boundary justified or not? With so few data it’s difficult to tell. It’s possible that if we had a bigger sample there might be some data points on the other side of the boundary. As the amount of data increases so does our confidence that the decision boundary is actually curved. However the following situation shows the opposite effect The wiggles shown in this boundary are almost certainly not justified by this data. This phenomenon is called overfitting. It occurs when we try to fit the decision boundary to effects in the data which are probably due to noise. To prevent overfitting we should use simple models for our decision boundaries (such as straight lines or simple curves). We should use complex models only when we have sufficient data to justify it. In general the model of the decision boundary should not contain more parameters than there are objects in the data set – this is Occam’s razor This problem becomes very severe when we have a large number of dimensions- the “Curse of Dimensionality”. We shall see how we can reduce the number of dimensions using a technique called Principal Components Analysis.