CEE 615: Digital Image Processing Lab 9: Principal Components Analysis Task A: Compute and interpret the statistics (Covariance matrix, correlation matrix and the resulting Principal Components) for a multiband image. Load the SPOT_XI image from the ENVI directory. Compute the statistics for the full image. Select Basic Tools > Statistics > Compute Statistics Select SPOT_XI, full scene, all bands. Select OK In the "Compute Statistics Parameters" window, make the selections illustrated in the figure. (Some options won't become active until other options have been selected.) Enter a name for the output statistics file. I suggest SPOT_XI-PCA.sta Select OK. Examine the Statistics Report (next page) Note that Bands 1, 3 & 4 have a good dynamic range (max > 200) but the dynamic range for band 2 (red) is relatively low (max =158). Figure 1: Statistics Parameters Window The variance (the diagonal elements in the covariance matrix) is relatively low in the visible (band 1:green; band 2:red), increases sharply in band 3 (NIR) and then decreases in the (SWIR). Covariance (the off-diagonal elements of the covariance matrix) is relatively high all band combinations. You should have computed a covariance "image" which will give a visual display of the covariance matrix (Figure 2a). This is often easier to interpret that the covariance matrix. You may also display the correlation "image" to visualize Figure 2: a) Covariance; b) Correlation the correlation matrix. From the correlation matrix (Figure 2b), it is apparent that the visible bands are highly correlated (corr. = 0.965), and are both moderately correlated with the SWIR channel (corr. = 0684, 0.763). There is poor correlation between Band 3 and all other bands. This suggests that the information in Band 3 is unique (or very noisy). Consider the eigenvector table. The eigenvectors are in the rows with the weightings for the individual bands in the columns. The first eigenvector has positive weightings for bands 1 and 2, and negative weightings for bands 3 and 4. More importantly, band 3 has the largest (magnitude) weighting and will clearly dominate the 1st PC image. The second eigenvector is dominated by band 4. The eigenvalues sum to ~1992. That means that the variance "explained" by the first eigenvector (Principal Component, PC) is 1208.6 / 1992 = 0.607 or ~61%. Most of the remaining variance is "explained" by the second eigenvector (PC2). With less than two percent of the variance "explained" by the last two PCs. To summarize: Eigenvector Description Eigenvalue 1 2 3 4 Dominated by band 4 Dominated by band 3 Band 4 - visible Band 1 - Band 2 1208.6 745.2 35.5 2.66 % variance explained 0.607 0.374 0.018 0.001 Cumulative variance explained 0.607 0.981 0.999 1.000 Compute and display the PC images for bands 1 and 2 Bands 1 and 2 are highly correlated. Applying PC to this pair will optimally separate the most highly correlated information in PC1 and the least correlated information in PC2 (based on variance). Select Transform > Principal Component > Compute Statistic > Forward PC Rotation > PC Rotation from New Stats Select SPOT_XI Select Spectral Subset, and highlight bands 1 & 2 and Select OK. Select OK in the Principal Components Input File window. Fill out the Forward PC Parameters window (Figure 3). Be careful to use names for the files that will characterize procedure (e.g., 2-bands, covariance). Select OK Display the two PC images as gray scale images in two separate display windows. Display the stats for this operation: Select Basic Tools > Statistics > View Statistics File Select the 2-band stats file. Figure 3: Forward PC Parameters PC1 is a nearly equal combination of bands 1 and 2 and describes more than 98% of the variance of the image pair. PC2 is a difference image that explains less than 2% of the total variance. CHALLENGE: Can you tell whether or not this image has been geometrically altered? Hint: SPOT is a pushbroom scanner. Compute and display the PC images for all 4 bands a. Select Transform > Principal Component > Compute Statistic > Forward PC Rotation > PC Rotation from Existing Stats b. Select SPOT_XI, full scene, all bands. Select OK c. Select the Statistics file created previous step (SPOT_XI-PCA.sta). d. Enter a name for the output PC image file. I suggest SPOT_XI-PCA.img. e. Verify that the transformation will be based on the covariance matrix. f. Select OK. g. Display the 4 PC images. Notice: i. Contrast decreases from PC 1 to 4. Compare the histograms of the images to see this graphically. ii. PC1 is essentially a "brightness" image. iii. PC2 contains the bulk of the color contrast. iv. Boundaries (edges) tend to be more pronounced in PC3 & PC4, while topographic detail is suppressed. v. Banding noise (low variance and uncorrelated) does not appear until PC4. vi. Compare the eigenvector components for PC4 to the components for PC2 of the 2-band transformation. (Try displaying a 2-D scatterplot using these two bands). vii. Compare the eigenvector components for PC2 to the components for PC1 of the 2-band transformation. Recreate the original images using a subset principal component images. a. Use the 1st two PC images to recreate the original image data. The first 2 PC images "explain" 98% of the variance. Is this good enough? i. Select Transform > Principal Components > Inverse PC Rotation. ii. Choose the PC image (SPOT_XI-PCA.img). iii. Select Spectral Subset and select the first 2 PC images. OK. iv. OK v. Select the appropriate stats file (SPOT_XI-PCA.sta) vi. Select an appropriate name for the images created using the PCA inversion (SPOT_XIinvPCA-3.img). vii. Verify that the inversion is performed using the covariance matrix. viii. Select OK b. Compare the original data with the inverse 2-PC image data. i. Display the original and transformed images side by side, one band at a time. Stretch the frame to display the full image. ii. Look for differences between the images. (Hint: Look at the lake. Look at the boundaries of the lake. iii. Link the two images and use the zoom window to examine random areas. What is your opinion? c. Repeat the inversion procedure using the 1st three PC images. d. Compare the original data with the inverse 3-PC image data. The differences are harder to see by direct visual comparison. To see ONLY the differences you can use the Spectral Math function to subtract one image from the other. i. Select Basic Tools > Spectral Math. ii. Under Enter an expression enter float(s1) - float(s2). The conversion to floating point insures that the difference will be handled correctly (e.g., avoiding byte arithmetic and any possible confusion over the sign.) iii. Select OK. iv. Select S1 in the upper window v. Select Map Variable to Input File below the second window and select the invPC image and select OK. This assigns the invPC image to the variable S1. vi. Use the same procedure to assign S2 to the original SPOT_XI image. vii. Either select output to memory or choose a name for the result file. viii. Select OK and examine the results. Ideally, the difference image should be mostly noise. Based on the statistics, the 3 PC images "explain" 99.9% of the variance in the image data. Generally the last little bit of uncorrelated variance is dominated by noise and can be ignored. Indeed, when the higher order PC data are essentially noise, the inverted data are cleaner and relatively noise free. This can be a major improvement especially with a noisy system. The SPOT data are remarkably "clean", i.e., noise free, and the sorting done by the PCA has not been particularly helpful. What has been removed is what appears in the last PC image which shows the banding noise, but also shows significant image detail. Task B. PCA with hyperspectral data: Hyperion scene of Lansing, NY Use the Lansing_272_clipped image on the CEE6150 Assignments page. This is a 196-band subset of a 224-bandHyperion image shown in class. (Uncalibrated and redundant bands have been removed). Compute and display the statistics for the full scene. (Basic Tools => Statistics => Compute New Statistics.) Be sure to request the covariance image and eigenvalue plots. Save the statistics as a text file. Display the covariance and correlation images. Open the Cursor Location/Value window. You can use this to get specific values for locations in the covariance and correlation images. Evaluate the statistics for the hyperspectral scene. Based on a visual inspection of the correlation matrix, how many distinct spectral regions are there in the image data? Please identify the regions by band number and wavelength range. (The position of the cursor in the Cursor Location/Value window corresponds to the band numbers. The wavelength for each band can be found in the available bands list.) Note: The light gray band in the visible corresponds to the green peak in vegetation. The gray bands in the NIR correspond to atmospheric water absorption bands. The dark bands in the SWIR correspond to strong atmospheric water absorption bands. What are your criteria for selecting the number of unique spectral ranges? What is the typical variance for each range? (You can get a better visual idea of this from the spectral plot of the standard deviation in the statistics window.) Perform the principal components transformation on the full data set. Use the covariance criterion for the transformation. Be careful to name the statistics file in a way that will make it easily identifiable. How many of the PC images appear to have usable information? How many of the PC images are dominated by noise? What percentage of the variance in this subset is "explained" by the PC images with obviously usable information? Perform the inverse PCA using only the first 4 PC images. Evaluate the effect of the inverse PCA using the limited set of PC images. Display the animation set for the inverse PCA bands AND the animation set for the same band range of the original image. Sort through the original images looking for spectral regions that are obviously contaminated with noise. In those regions, compare bands from the inverted PCA and the original data. Is there noise in the original images that has been removed in the inverse PC images? Is there any noise or other artifact in the inverse PC images that was not in the original data? Is there noise or other artifacts in the inverse PC images that have not been removed? If so, can you posit a reason why this would have happened? Link the original and inverse PCA images (Tools > Link > Link Displays), then display the spectral profile for the original and inverse PCA data (Tools > Profile > z-profile (spectrum)). set for the inverse PCA bands AND the original image data. Arrange the images, the zoom images and the spectral profiles so that it is easy to view all together. Examine spectra in homogeneous areas (water, forest, soil) and compare spectra for the two image data sets. Click on an area in the zoom window and then move the cursor using the arrow keys. Water: Note the relative noisiness of the two spectra especially in the SWIR where water is essentially black. Note the relative stability of the spectra as you move the cursor through the water area. Forest: Note the relative clarity of the green peak in the visible. Observe the noise level in the SWIR. Soil Consider the relative smoothness of the spectra in the inverse PCA images Consider the stability of the inverse PCA spectra relative to the original images. Does it appear that 4 PCA was sufficient to characterize the full range of spectral detail in the 196 band image set? Perform the principal components transformation on the spectral range, 700-1300 nm. Use the covariance criterion for the transformation. Be careful to name the statistics file in a way that will make it easily identifiable. How many of the PC images appear to have usable information? How many of the PC images are dominated by noise? What percentage of the variance in this subset is "explained" by the PC images with obviously usable information? Perform the inverse PCA using only the first 3 PC images. Evaluate the effect of the inverse PCA using the limited set of PC images. Display the animation set for the inverse PCA bands AND the animation set for the same band range of the original image. Sort through the images, comparing the filtered and original data for each band and locate any obvious differences between the data sets. Are there bands or band ranges in which the data are obviously altered? Is there noise in the original images that has been removed in the inverse PC images? Is there any noise or other artifact in the inverse PC images that was not in the original data?