Pixel-based image classification Lecture 8 What is image classification or pattern recognition Is a process of classifying multispectral (hyperspectral) images into patterns of varying gray or assigned colors that represent either clusters of statistically different sets of multiband data, some of which can be correlated with separable classes/features/materials. This is the result of Unsupervised Classification, or numerical discriminators composed of these sets of data that have been grouped and specified by associating each with a particular class, etc. whose identity is known independently and which has representative areas (training sites) within the image where that class is located. This is the result of Supervised Classification. Spectral classes are those that are inherent in the remote sensor data and must be identified and then labeled by the analyst. Information classes are those that human beings define. unsupervised classification, The computer or algorithm automatically group pixels with similar spectral characteristics (means, standard deviations, covariance matrices, correlation matrices, etc.) into unique clusters according to some statistically determined criteria. The analyst then re-labels and combines the spectral clusters into information classes. supervised classification. Identify known a priori through a combination of fieldwork, map analysis, and personal experience as training sites; the spectral characteristics of these sites are used to train the classification algorithm for eventual land-cover mapping of the remainder of the image. Every pixel both within and outside the training sites is then evaluated and assigned to the class of which it has the highest likelihood of being a member. Hard vs. Fuzzy classification Supervised and unsupervised classification algorithms typically use hard classification logic to produce a classification map that consists of hard, discrete categories (e.g., forest, agriculture). Conversely, it is also possible to use fuzzy set classification logic, which takes into account the heterogeneous and imprecise nature (mix pixels) of the real world. Proportion of the m classes within a pixel (e.g., 10% bare soil, 10% shrub, 80% forest). Fuzzy classification schemes are not currently standardized. Pixel-based vs. Object-oriented classification In the past, most digital image classification was based on processing the entire scene pixel by pixel. This is commonly referred to as per-pixel (pixel-based) classification. Object-oriented classification techniques allow the analyst to decompose the scene into many relatively homogenous image objects (referred to as patches or segments) using a multiresolution image segmentation process. The various statistical characteristics of these homogeneous image objects in the scene are then subjected to traditional statistical or fuzzy logic classification. Object-oriented classification based on image segmentation is often used for the analysis of highspatial-resolution imagery (e.g., 1 1 m Space Imaging IKONOS and 0.61 0.61 m Digital Globe QuickBird). Knowledge-based information extraction: Artificial Intelligence Neural network Decision tree Support vector machine (SVM) … Purposes of classification Land use and land cover (LULC) Vegetation types Geologic terrains Mineral exploration Alteration mapping ……. Example spectral plot • Two bands of data. • Each pixel marks a location in this 2d spectral space Band 2 • Our eye’s can split the data into clusters. • Some points do not fit clusters. Band 1 1. Unsupervised classification Uses statistical techniques to group n-dimensional data into their natural spectral clusters, and uses the iterative procedures label certain clusters as specific information classes K-mean and ISODATA For the first iteration arbitrary starting values (i.e., the cluster properties) have to be selected. These initial values can influence the outcome of the classification. In general, both methods assign first arbitrary initial cluster values. The second step classifies each pixel to the closest cluster. In the third step the new cluster mean vectors are calculated based on all the pixels in one cluster. The second and third steps are repeated until the "change" between the iteration is small. The "change" can be defined in several different ways, either by measuring the distances of the mean cluster vector have changed from one iteration to another or by the percentage of pixels that have changed between iterations. The ISODATA algorithm has some further refinements by splitting and merging of clusters. Clusters are merged if either the number of members (pixel) in a cluster is less than a certain threshold or if the centers of two clusters are closer than a certain threshold. Clusters are split into two different clusters if the cluster standard deviation exceeds a predefined value and the number of members (pixels) is twice the threshold for the minimum number of members. Advantages Requires no prior knowledge of the region Human error is minimized Unique classes are recognized as distinct units Disadvantages Classes do not necessarily match informational categories of interest Limited control of classes and identities Spectral properties of classes can change with time Distance Measures are used to group or cluster brightness values together Euclidean distance between points in space is a common way to calculate closeness K-means (unsupervised) 1. 2. 3. 4. 5. A set number of cluster centers are positioned randomly through the spectral space. Pixels are assigned to their nearest cluster. The mean location is re-calculated for each cluster. Repeat 2 and 3 until movement of cluster centres is below threshold. Assign class types to spectral clusters. Band 1 1. First iteration. The cluster centers are set at random. Pixels will be assigned to the nearest center. Band 2 Band 2 Band 2 Example k-means Band 1 2. Second iteration. The centers move to the mean-center of all pixels in this cluster. Band 1 3. N-th iteration. The centers have stabilized. Band 1 1. Data is clustered but blue cluster is very stretched in band 1. Band 2 Band 2 Band 2 Example ISODATA Band 1 2.Cyan and green clusters only have 2 or less pixels. So they will be removed. Band 1 3. Either assign outliers to nearest cluster, or mark as unclassified. ISODATA: Initial Cluster Values (properties) number of classes maximum iterations pixel change threshold (0 - 100%) (The change threshold is used to end the iterative process when the number of pixels in each class changes by less than the threshold. The classification will end when either this threshold is met or the maximum number of iterations has been reached) initializing from statistics (Erdas) or from input (ENVI) (the initial values to put in for ENVI are minimum # pixel in class, maximum class stdv, minimum class distance, maximum # merge pairs) Maximum Class Stdv (in pixel value). If the stdv of a class is larger than this threshold then the class is split into two classes. Minimum class distance (in pixel value) between class means. If the distance between two class means is less than the minimum value entered, then ENVI merges the classes. Optional Maximum stdev from mean (1 to 3σ) and maximum distance error (in pixel value). If any of these two setup, the some pixels might not be classified. 5-10 classes, 8 iterations, 5 for change threshold, (MinP 5, MaxSD 1, MinD 5, MMP 2) 1-5 classes, 11 iterations, 5 for change threshold, (MinP 5, MaxSD 1, MinD 5, MMP 2) 5 classes 10 classes 2. Supervised classification: training sites selection Based on known a priori through a combination of fieldwork, map analysis, and personal experience on-screen selection of polygonal training data (ROI), and/or on-screen seeding of training data (ENVI does not have this, Erdas Imagine does). The seed program begins at a single x, y location and evaluates neighboring pixel values in all bands of interest. Using criteria specified by the analyst, the seed algorithm expands outward like an amoeba as long as it finds pixels with spectral characteristics similar to the original seed pixel. This is a very effective way of collecting homogeneous training information. From spectral library of field measurements Advantages Analyst has control over the selected classes tailored to the purpose Has specific classes of known identity Does not have to match spectral categories on the final map with informational categories of interest Can detect serious errors in classification if training areas are missclassified Disadvantages Analyst imposes a classification (may not be natural) Training data are usually tied to informational categories and not spectral properties Remember diversity Training data selected may not be representative Selection of training data may be time consuming and expensive May not be able to recognize special or unique categories because they are not known or small Statistic extraction of each training site Each pixel in each training site associated with a particular class (c) is represented by a measurement vector, Xc; Average of all pixels in a training site called mean vector, Mc; a covariance matrix of Vc. BVi , j ,1 BV i, j,2 BV X c i , j ,3 . . BVi , j ,k c1 c2 c3 Mc . . ck cov c11 cov c12 ... cov c1k cov cov ... cov c 22 c 2k c 21 Vc . . cov ck1 cov ck 2 ... cov ckk where BVi,j,k is the brightness value for the i,jth pixel in band k. µck represents the mean value of all pixels obtained for class c in band k. Covckl is the covariance of class c between bands l through k. Selecting ROIs Alfalfa Cotton Grass Fallow Spectra of ROIs from ETM+ image Spectra from library Resampled to match TM/ETM+, 6 bands Supervised classification methods Various supervised classification algorithms may be used to assign an unknown pixel to one of m possible classes. The choice of a particular classifier or decision rule depends on the nature of the input data and the desired output. Parametric classification algorithms assumes that the observed measurement vectors Xc obtained for each class in each spectral band during the training phase of the supervised classification are Gaussian; that is, they are normally distributed. Nonparametric classification algorithms make no such assumption. Several widely adopted nonparametric classification algorithms include: one-dimensional density slicing parallepiped, minimum distance, nearest-neighbor, and neural network and expert system analysis. The most widely adopted parametric classification algorithms is the: maximum likelihood. Hyperspectral classification methods Binary Encoding Spectral Angle Mapper Matched Filtering Spectral Feature Fitting Linear Spectral Unmixing 2.1 Parallepiped This is a widely used digital image classification decision rule based on simple Boolean “and/or” logic. ck ck BVijk ck ck Lck BVijk H ck If a pixel value lies above the low threshold and below the high threshold for all n bands being classified, it is assigned to that class. If the pixel value falls in multiple classes, ENVI assigns the pixel to the last class matched. Areas that do not fall within any of the parallelepipeds are designated as unclassified. In ENVI, you can use 1-3 This method is a computationally efficient method. But an unknown pixel might meet the criteria of more than one class and is always assigned to the first class for which it meets all criteria. The Minimum Distance to Means can assign any pixel to just one class. Parallelepiped example Training classes plotted in spectral space. In this example using 2 bands. Parallelepiped example continued •Each class type defines a spectral box •Note that some boxes overlap even though the classes are spatially separable. •This is due to band correlation in some classes. •Can be overcome by customising boxes. 1 means 1 stdev from mean, 2 means 2 stdev from mean, 3 means 3 stdev from mean; Use 1, you will classify the closest pixels to the class Use 3, you will include some not so closest pixels to the class 2.2 Minimum distance The distance used in a minimum distance to means classification algorithm can take two forms: the Dist BV BV on the Euclidean distance based Pythagorean theorem and the “round the block” distance. The Euclidean distance is more computationally intensive, but it is more frequently used 2 ijk Dist BV ck BVijl cl 2 ijk 2 Dist All pixels are classified to the nearest class unless a standard deviation or distance threshold is specified, in which case some pixels may be unclassified if they do not meet the selected criteria. BV ck cl ck BVijl cl 2 ijk 2 ijl 2 e.g. the distance of point a to class forest is Dist 40 39.12 40 35.52 4.6 If either Max stdev or Max distance error is not set, all pixels will be classified. If the Max stdev from mean is set at 2 (stdev), then the pixels with values outside the mean ± 2σ will not be classified. If the Max distance error is set at 4.2 (pixel value), then the pixels with distance larger than 4.2 will not be classified. 2.3 Maximum likelihood Instead based on training class multispectral distance measurements, the maximum likelihood decision rule is based on probability. The maximum likelihood procedure assumes that each training class in each band are normally distributed (Gaussian). Training data with bi- or n-modal histograms in a single band are not ideal. In such cases the individual modes probably represent unique classes that should be trained upon individually and labeled as separate training classes. the probability of a pixel belonging to each of a predefined set of m classes is calculated based on a normal probability density function, and the pixel is then assigned to the class for which the probability is the highest. probability The estimated probability density function for class wi (e.g., forest) is computed using the equation: 1 x ˆ i 2 pˆ x | wi exp 1 2 2 ˆ i 2 2 ˆ i 1 where exp [ ] is e (the base of the natural logarithms) raised to the computed power, x is one of the brightness values on the x-axis, ̂ i is the estimated mean of all the values 2 in the forest training class, and ˆ i is the estimated variance of all the measurements in this class. Therefore, we need to store only the mean and variance of each training class (e.g., forest) to compute the probability function associated with any of the individual brightness values in it. For multiple bands of remote sensor data for the classes of interest, we compute an ndimensional multivariate normal density function using: p X | wi 1 2 n 2 | Vi | 1 2 1 T 1 exp X M i Vi X M i 2 1 where is the determinant of the covariance matrix, is the inverse of the covariance matrix, and X M i T is the transpose of the vector X M i . The mean vectors (Mi) and covariance matrix (Vi) for each class are estimated from the training data. | Vi | Vi Without Prior Probability Information: Decide unknown measurement vector X is in class i if, and only if, pi > pj for all i and j out of 1, 2, ... m possible classes and pi 1 1 T 1 log e | Vi | X M i Vi X M i 2 2 The assign the measurement vector X of an unknown pixel to a class, the maximum likelihood decision rule computers the value pi for each class. Then it assigns the pixel to the class that has the largest value Unless you select a probability threshold (0-1), all pixels are classified. Each pixel is assigned to the class that has the highest probability Probability threshold from [0, 1]. 0 means zero probability of similarity, 1 means 100% probability of similarity. 2.4 Mahalanobis Distance M-distance is similar to the Euclidian distance Dist X M i T V 1i X M i It is similar to the Maximum Likelihood classification but assumes all class covariances are equal and therefore is a faster method. All pixels are classified to the closest ROI class unless you specify a distance threshold, in which case some pixels may be unclassified if they do not meet the threshold (in DN number) 2.5 Spectral Angle Mapper 2.6 Spectral Feature Fitting compare the fit of image reflectance spectra to selected reference reflectance spectra using a least-squares technique. SFF is an absorption-feature-based methodology. Both reflectance spectra should be continuum removed. A scale image is output for each reference spectrum and is a measure of absorption feature depth which is related to material abundance. The image and reference spectra are compared at each selected wavelength in a least-squares sense and the root mean square (rms) error is determined for each reference spectrum. Least square tech (regression) A continuum is a mathematical function used to isolate a particular absorption feature for analysis Supervised classification method: Spectral Feature Fitting Source: http://popo.jpl.nasa .gov/html/data.html