CEE 6150: Digital Image Processing LAB 11: Unsupervised Classification 1 Image Training data ML Class Single threshold 0.5 ISODATA 10-20 classes Clump ML Classified Image ISODATA Classified Image Test data Confusion Matrix B Confusion Matrix C The overall task is to perform a classification of a scene using both supervised and unsupervised methods, evaluate the classifications and compare the results. You will classify the ginna_2000 7-band image using the parameters (and follow on procedures in the case of the ISODATA classification), and compute the confusion matrices. A flow chart of the processing steps appears below. The settings for the 2 procedures are: # clr 1. Supervised Classification i. Classes: Select all 8 classes ii. Prob. threshold: Single Value = 0.5 iii. Data scale factor: 255 iv. Rule images: Not required 2. ISODATA Classification i. Number of Classes: ii. Maximum Iterations: iii. Change threshold: iv. Min. # pixels in Class: v. Max. class Stdev: vi. Min. Class distance: vii. Max # Merge Pairs: viii. Max. Stdev. 20-25 30 5 30 4 5 2 6 1 2 3 4 5 6 7 8 CLASS Bare Soil Forest Asphalt Water Grass Orchard Bldg_dk Bldg_lt CEE 6150: Digital Image Processing LAB 11: Unsupervised Classification 2 3. Assign classes to the ISODATA results There will be between 20-25 classes resulting from the ISODATA classification, each identified by a number and a color. Your job is to assign the ISODATA classes to one of the desired classes. You may also identify an ISODATA class as "unclassified" or assign it to a category not in the classification list. The numbers and colors appear in the chart below: ISODATA Class clr 1 2 3 4 5 6 7 8 9 10 11 12 # Assigned clr class ISODATA Class clr 13 14 15 16 17 18 19 20 21 22 # Assigned clr class Some of the classes should be easy to assign. Others will be more troublesome (e.g. all the vegetation types. For those that appear mixed: 1. create a table listing the ISODATA class and all the possible assignments. (A table with my assignments is shown below.) 2. Adjust the zoom box to fit within an area that you consider to be a single category (Orchard and Forest examples are shown below. 3. Display the histogram for the zoom box. a. Select Enhance > Interactive Stretching b. In the histogram dialog box, select Source > Zoom This will allow you to see the relative distribution of ISODATA classes within that category. Orchard Forest 17 19 17 20 20 CEE 6150: Digital Image Processing LAB 11: Unsupervised Classification 3 List of possible assignments (bold indicates the dominant class) Orig. Class 15 16 17 18 19 20 21 22 Grass 15 Orchard Forest 17 16 17 18 19 19 20 21 Assigned Class 4 6 6 4 20 21 22 5 5 6 My assignments appear below. Note that there are several classes that do not correspond to any of the training classes. Class 5 includes asphalt in the parking lots and portions of the shoreline. In this case I chose to live with the misclassification since there was too much to lose in terms of the asphalt class. Class 7 part water, part shore, part shadow and part cars. Water may be the dominant class, but it includes too much misclassification and I chose to leave it unclassified. Classes appears to correspond to shadows in vegetated areas, and Class 10 corresponds to the industrial area in the lower right (probably concrete pavement). This I also assigned to "asphalt". It might be more appropriate now to rename the class "roads". The buildings were all either confused with other classes or unclassified. My final class assignment was: ISODATA Class clr 1 2 3 4 5 6 7 8 9 10 11 12 # 1 1 1 1 2 2 Assigned Clr 2 2 3 2 clas water water water water asphalt asphalt unclassified unclassified asphalt asphalt soil asphalt CLASS ISODATA Assigned 13 3 soil 14 3 soil 15 4 grass 16 6 forest 17 6 forest 18 4 grass 19 4 grass 20 5 orchard 21 5 orchard 22 4 forest Your assignment need not match this. 4. Reassign the ISODATA classes to the desired classes. 1. Select Classification > Post Classification > Combine Classes. 2. Select the ISODATA classified image, then click OK. The Combine Classes Parameters dialog appears. 3. Select a class for input from the Input Classes list. The selected class name appears in the Input Class field. 4. Select an output class by clicking on a class name in the Output Classes list. CEE 6150: Digital Image Processing LAB 11: Unsupervised Classification 4 5. When both the input and output classes are selected, click Add Combination to finalize the selection. The new, combined class to create is shown in the Combined Classes list at the bottom of the dialog. To deselect combined classes, select the name in the Combined Classes list. 5. Change the Class colors and names This is not absolutely necessary, but it will make comparison with the ML classification much easier. 1. Display the ISODATA classification image. 2. In the ISODATA display menu bar, select Tools > Color Mapping > Class Color Mapping. The Classification Mapping dialog appears. 3. Select a class name in the Selected Classes list and click on the Color button and select the new color from the resulting menu. You may also select colors by: o Entering new values into the Red, Green, and Blue fields. o Move the color adjustment slider bars. 4. Change the class name by editing it in the Class Name field. 6. Select File > Save Changes to retain the new colors. 6. Visually compare the modified ISODATA classification with the ML classification. 7. Create a confusion matrix for the ISODATA classification and compare with the ML confusion matrix data. ############################################################################## ML Class. with 0.5 threshold ISODATA Classification CEE 6150: Digital Image Processing LAB 11: Unsupervised Classification 5 1. Evaluation of the classification a. Visual evaluation ML Class with 0.5 threshold The general impression is that the classification is good as far as it goes and that the major problem is that there are very many unclassified pixels. Unclassified pixels around the borders are not a surprise; these are mixed pixels and, if the classes on either side of the border are sufficiently different spectrally, then mixed pixels should not be classified. Water has been captured quite well with major problems occurring only near the shore where the water may be turbid or shallow, or where individual pixels combine shore and water. The orchard in which the training was done has been classified rather effectively, as has the adjacent orchard. The orchard below the training orchard shows significant misclassification as forest, while the orchards toward the right side of the image suffer from an excess of unclassified pixels. Bare soil has also been recognized most effectively in the field used for training, and is less well characterized farther away from the training area. The grass and forest classes are reasonably well characterized without obvious major problems. Finally the asphalt class appears to be good as far as it goes, but there are many asphalt pixels left unclassified. The light-toned buildings in the power plant area have generally been labeled correctly, but the dark-toned building are largely misclassified or unclassified. ISODATA classification The grouped ISODATA classification has done a very nice job with the water in that most of the lake is properly classified with very few misclassifications offshore and a relatively few pixels misclassified as asphalt. This classifier has even captured some of the inlet water in the power plant area. The asphalt and bare soil classes are rather consistently confused and are also misclassified as shoreline. The number of unclassified pixels in these classes is not excessive and many could be recaptured by some contextual operations. The unsupervised classifier had much more trouble in distinguishing among the vegetated classes. This is particularly true with the orchard and forest classes. The unsupervised classification appears to have been incapable of capturing either building class. In fact, most of the buildings in the power plant area are unclassified. Discussion For the ML classification with a 0.5 threshold, the Producer's accuracy is quite high even though there are a large number of unclassified pixels. The User's accuracy is even higher. Nonetheless, this is a reasonably accurate representation of the accuracy of the classification. This is because the areas used for training and testing were a good match. They did not CEE 6150: Digital Image Processing LAB 11: Unsupervised Classification 6 capture the true range of variability of the classes – thus the large number of unclassified pixels – but the only misrepresentation is in the extent of the unclassified pixels. 32% of the image is unclassified, and that is not represented in the confusion matrix or the error metrics. ML CLASSIFICATION Confusion Matrices in terms of percentages. Class Unclass. Bare Soil Forest Asphalt Water Grass Orchard Bldg_dr Bldg_lght Total Soil 7.72 91.85 0 0 0 0 0 0 0.43 100 Forest 13.06 0 84.54 0 0 0 2.4 0 0 100 TEST DATA (percentage) Asphalt Water Grass Orchard 7.7 5.01 2.78 7.54 0 0 0 0 0 0 0 0.66 92.2 0 0 0 0 94.99 0 0 0 0 97.08 4.61 0 0 0.14 87.19 0.1 0 0 0 0 0 0 0 100 100 100 100 Bldg_dk 82.19 0 0 17.81 0 0 0 0 0 100 Bldg_lt 53.32 0 0 0 0 0 0 0 46.68 100 Total 10.36 8.28 9.35 14 32.52 11.12 11.54 0.02 2.81 100 ISODATA CLASSIFICATIOON TEST DATA (percentage) Class Soil Asphalt Water Grass Orchard Forest Unclassified 1.29 4.65 soil 16.9 0.93 asphalt 81.82 water Bldg-dk Bldg-lt Total 0 0 4.19 0 48 0 12.77 0 99.87 8.68 0 35.62 0 7.27 94.21 0 0 0 0 64.38 0.13 22.22 0 0.21 100 0 0 0 0 0 34.27 grass 0 0 0 42.72 47.6 30.91 0 0 14.17 orchard 0 0 0 0 43.65 24.49 0 0 8.32 forest 0 0 0 9.27 4.55 31.83 0 0 5.08 Class 7 0 0 0 0 0 0 0 0 0 Class 8 0 0 0 0 0 0 0 0 0 Total 100 100 100 100 100 100 100 100 100 Suggested improvements ML Classifier: It is not likely that one will be able to extend the classification to include many of the unclassified pixels and not degrade (much less improve) the accuracy of the classification. Since there is very little class-to-class misclassification, one might want to keep the existing classes and define new classes in the unclassified areas that would extend the classification while giving more control over the class-to-class misclassification. This would ultimately demand that one define classes for the undefined classes such as the industrial area. Classification accuracies are low for the ISODATA classification. There appears to be significant error for every class but water. The confusion matrix shows the difficulty in distinguishing among the 3 vegetation classes. There are a number of misclassifications that are not apparent in the confusion matrix because the problem areas were not covered by the CEE 6150: Digital Image Processing LAB 11: Unsupervised Classification 7 training and test data sets, e.g., the shoreline and nearshore water being classified as asphalt and the confusion of asphalt with bare soil. On the other hand after grouping, two new classes emerged which, by their very presence, avoided some of the errors in the ML Classifier without a threshold. Errors and Accuracy metrics Class ML 0.5 thresh Comm. Omis. (Percent) (Percent) ISODATA Comm. Omis. (Percent) (Percent) Bare Soil Forest Asphalt Water Grass Orchard Bldg_dr Bldg_lght 0 0.91 1.44 0 5.35 2.41 100 1.38 8.15 15.46 7.8 5.01 2.92 12.81 100 53.32 79.04 36.51 0.09 67.3 32.25 31.35 0 0 83.1 5.79 0 57.28 56.35 68.17 100 100 Class Prod. Acc (Percent) User Acc. (Percent) Prod. Acc (Percent) User Acc. (Percent) Bare Soil Forest Asphalt Water Grass Orchard Bldg_dr Bldg_lght 91.85 84.54 92.2 94.99 97.08 87.19 0 46.68 100 99.09 98.56 100 94.65 97.59 0 98.62 16.9 94.21 100 42.72 43.65 31.83 0 0 20.96 63.49 99.91 32.7 67.75 68.65 0 0 Overall 92.9 Kappa 91.1 Overall 0.864 Kappa 0.827