Change Detection Exercise: Time Series Change Analysis using Standardized Principal Components Analysis INTRODUCTION In the section covering image transormations, Principal Components Analysis was applied as a form of data compression technique. The principal aim was to identify which bands account for the the largest amount of variance and thus can be selected for use in other analysis tasks like image classification or simply for purpose of image enhancement by combining information from various spectral bands. The variant of PCA used was termed unstandardized PCA becuase it uses the variance-covariance matrix in the calculation of eigenvalues and eigenvectors. The use of PCA in case was investigate variance patterns in the spectral domain. In this exercise we are going to apply another varianit of PCA known as the Standardized Principal Components Analysis to analyze remotely sensed data in the temporal domain. The implementation of Standardized PCA is based on the the use of a correlation matrix which is derived from the covariance matrix by dividing by the standard deviation to produce a matrix of standard scores. This procedure has been found to be very useful in the analysis of time series data sets where the interest is in the identification of phenomena or signals that propagate over time. Often it is applied to single band data (for example vegetation index maps) which map only one given phenomena over the land surface e.g vegetation greenness. The standardization is intended to minimize the undue influence of other extraneous factors e.g. atmospheric interference (aerosols and water vapour), changes surface illumination conditions, e.t.c.. In this way the different time variance patterns of the phenomena of interest (in this case vegetation) can be extracted from the time series measurements effectively. THE TIME SERIES DATA The data set used for this exercise consists of 60 monthly NDVI images (which we can consider to be bands in a spectral sense) for Africa. The data set is part of USAID/FEWS time series archive streching back to 1981. The data set is processed at NASA Goddard Space Flight Center to support both FEWS and FAO's famine early warning activities. The time series that we will use in this exercise is covers the period January 1986 to December 1990. This period of special interest because it will enable us to investigate a number of patterns of variations related to various factors that influence vegetation reflectance patterns both bioclimatic and noise due to sensor instrument characteristics. Each image has 256 columns by 320 rows. The original data was registered to an 8km grid, we have contracted by averaging to 30km grid as a filter for high frequency noise related to topographic variability. This spatial averaging procedures enables us to identify a number of time signals of interest including those related to patterns of interannual variability from vegetation index data. The images are named JAN86GV.IMG .. DEC90.IMG. Each image file has a corresponding documentation file thus named JAN86GV.DOC..DEC90GV.DOC.doc. Ancillary information includes a series of vector files : COUNTRYH.VEC (country boundaries), COASTH.VEC (continental coastline), LAKESH.VEC (lakes) and RIVERSH.VEC (major rivers). PROCEDURE NDVI Image exploration P1.Use the display system of your software to examine the images named JAN86GV and AUG86GV. These two images show the levels of NDVI over Africa for January 1986 and August 1986 respectively. As you can note the highest levels of NDVI are located in the southern hemisphere in January and in the northern hemisphere in August. This show in a broad manner the variation in the pattern and location of maximum vegetation greenness that is related to the climatic growing seasons. In general desert areas (the Sahara in the north, and Namib - Kalahari in southwest) show low levels of NDVI irrespective of the NDVI as they are largely devoid of vegetation. You can visually examine the images for the remaining months in 1986 using your display system. Figure 1. Normalized Difference Vegetation Index maps for January 1986 (left) and August 1986 (right) Standardized PCA Analysis of NDVI time series data P2. Now run your software's Principal Component Analysis routine. Specify the name of your time series file (that list the images to be used in this procedure) or select them from your list of data files. Make sure your images are input sequentially from January 1986 to December 1990. Elect to use the Standardized option. Indicate that you wish to output 8 component images and if required enter the names of the output images. This procedure should take you less 30 minutes depending on the speed of your computer. Note that we can output up to 60 components if we wanted to. You may be required to scale your output images, choose whatever default option is required by your software system. When your software is finished analyzing the data you will have a number of outputs. For the purpose of this exercise, two forms of output are important : the component images and component loadings and per cent variance explained by each component (See Figure 1 below). The loadings are a measure of the degree of correlation between each original monthly input image and the new component patterns. The component loadings are very useful in interpreting the component patterns in this exercise. In order to explore further our results we can plot the the loadings statistics in any statistical analysis package or using your analysis system. Plot graphs for each of the components with the months on the x-axis and the loadings values on the y-axis. For example below is the type of loadings output you should get shown in Figure 2. Figure 2. An example of component loadings output from Standardized PCA analysis P3. Use your software system to display Component 1 and also the component loadings chart for component 1 in your statistical analysis system. Q1. What is the per cent variance accounted for Component 1. Q2. Using the loading graph, what can you infer from the loadings pattern between the original input images and Component 1. Figure 3. Component 1 spatial pattern (left) and loadings chart at right P4. Use your system to display Component 2 image. Also display the loadings chart for Component 2. Q2. What is the percentage variance accounted from Component 2 ? Using the loadings chart, which months in the series have high positive loadings on Component ? and which ones have high negative loadings ? Describe the spatial pattern shown by Component 2. Figure 4. Component 1 spatial pattern (left) and loadings chart at right P5. Use your display systems to view Components 4 to 8 and their respective component loadings charts. As in T-PCQ2 above note the amount of variance accounted for by each component and describe the component spatial pattern using the respective loadings graph. Figure 5. Component 3 spatial pattern (left) and loadings chart at right Figure 6. Component 4 spatial pattern (left) and loadings chart at right Figure 7. Component 5 spatial pattern (left) and loadings chart at right Figure 8. Component 6 spatial pattern (left) and loadings chart at right Figure 9. Component 7 spatial pattern (left) and loadings chart at right Figure 10. Component 8 spatial pattern (left) and loadings chart at right If you have finished examining the eight components you can take a look at any of the later components and compare the spatial and loadings patterns with the higher level components. OBSERVATIONS This exercise raises a number of issues regarding the application of standardized PCA in the temporal domain. Unlike in the spectral domain using unstandized PCA, in this case we are investigating variability of the given phenomena (e.g. NDVI) overtime. PCA allows us to segregate the various patterns of variability embedded in time series data set into different components. We treat the time measurement maps as "spectral bands" thus taken as a time series each pixel and thus each map contains information about the variance chararacteristics of the given phenomena. We can use the same technique to process a time series images of LANDSAT TM data used in the exercise on Principal Components Analysis. The are a number of important observations we can infer from this exercise. Component 1 shows a pattern that is similar to a typical continental vegetation map of Africa. The loadings indicate that all the months are highly correlated with this patterns, with loadings over 0.90. This component alone accounts for 96.7 per cent of the variance in the 60 months time series. However, in time series sense our interest is not really in examining the typical patterns but rather on the change patterns or the atypical components. Component 2 is such an atypical component. It is computed from the residuals after the variance accounted for by Component 1 are removed. Component 2 accounts for only 1.97 of the entire continental scale variance. It however contains very useful information on the seasonality patterns of vegetation. As can be seen in figure 4 it shows a strong positive NDVI anomaly pattern in band stretching from Senegal to Ethiopia in the northern hemisphere (green areas) and negative anomaly in the southern hemisphere (red to deep blue). As illustrated by the component loadings, the positive anomaly in the northern hemisphere has peaks approximately in July - August and troughs from approximately November of one calendar year to April of the following year forming a sinusoidal temporal pattern. These months have a positive and negative correlations with this positive spatial pattern respectively throughout the series. The reverse explanation applies to the southern hemisphere i.e.. peaks in NDVI during December - March and negative anomalies in June - September. This pattern indicates the annual cycle in the flux of greenness in NDVI that is synchronized with the first mode of annual excursions of the ITCZ north and south of the equator. This mode essentially illustrates the annual cycle in greenness associated with summer and winter precipitation solstices. Component 3 is a residual pattern from this pattern calculated after the variance accounted for by Components 1 and 2 has been removed. It accounts for only 0.28 per cent of the total variance. The spatial pattern for component three shows a positive anomaly across the Sahel and in southeastern South Africa, and a strong negative anomaly in a band immediately south of the Sahel (Figure 5). The loadings chart shows a slight bimodal time signal pattern. The positive anomalies are associated with greening in the Sahel between January and March (slight positive loadings) and between July and October. While the later period corresponds well with the peak timing of the Sahelian growing season when the ITCZ reaches its northern most position, the former seems to be an unexpected anomaly at this time of the year when there is no precipitation across the Sahel. Previous research (Eastman and Fulk, 1993), has suggested that this anomaly may be related to attenuation of the NDVI signal by preferential scattering of short-wave length radiation by aerosol dust particles that are common in the atmosphere over the region during this time of the year that is characterized by strong Harmattan winds from the Sahara in the north. This is therefore suggests a false greening that is not indicative of vegetation seasonality during the period January - March. There is also a drop in the component loadings to negative values between April and May prior to the beginning of the growing season. This anomaly is related to the increase in atmospheric water vapor content with the advance of the ITCZ into the Sahelian belt (Justice et al, 1991). Component 4 shows positive NDVI anomalies in the Congo equatorial forest region and in parts of East Africa (Figure 6). As shown in the loadings chart, the months April, May, June and September, October, November are positively correlated with this pattern, while the rest of the months are negatively associated with this pattern. This temporal pattern illustrates the semiannual cycle in vegetation greenness that is related to the annual cycle in precipitation associated with the equinoctial maxima of the ITCZ with peaks in April and October. Component 5 shows a similar seasonality pattern to that of Component 4, with a strong bimodal pattern (Figure 7). In this case however, the peak in the loadings are mainly in May and November. This strongest negative anomalies occur over the desert regions and the strongest positive anomalies occur over the forested regions (e.g. the Congo Forest). However, as can be seen in the loadings chart, the dominant trends is a progressive slight negative trend in the loadings over the 1986-1990 period that is negatively associated with the negative anomaly over the Sahara desert (thus an apparent increase in NDVI over the desert regions and decrease in NDVI over the forested regions). This pattern of apparent greening over desert areas is attributed to anomalies in NDVI due to decay in the orbital cycle of the sensing platform. An aging NOAA-9 was in service during most this time series period, overtime its equatorial crossing time deteriorated from 16.10 hrs. to 14.20 hrs over a four year period. The effect of orbital decay is to attenuate calculated NDVI values over bare surfaces especially desert areas. Bright desert targets provide a higher reflectivity especially in the visible red wavelengths compared to the infrared at low sun angles because of differential degradation in the mapping channels since the prelaunch calibration (Teillet et al, 1990). As a consequence there is an apparent increasein calculated NDVI over desert areas (Price, 1991; Tateishi and Kajiwara, 1992). Similar anomalies have been found in analyses by Kaufaman and Holben, 1990; Eastman and Fulk (1993); Los et al (1994). The changeover to a more stable NOAA-11, can be seen as the correlations drop to negative values in November/December 1988 with a much more constant amplitude pattern in the loadings for the reminder of the series. Component 6 shows a strong positive anomaly in East Africa (eastern Kenya and Ethiopia and Somalia), (Figure 8) with a pronounced bimodality indicating another double greening pattern associated with the ITCZ during its extreme northernmost position in July and another one associated with this discontinuity in December / January. There is a strong negative anomaly in the area of the Congo forest region, and slight positive anomalies in the Kalahari and along the Mediterranean coast of north Africa. The temporal pattern shows a decrease in the amplitude of the anomalies up to about November 1988 before the changeover to NOAA-11. This amplitude variation is related to the shift in the orbital cycle of the sensing and preferentially affects forested areas which appear as dark targets under low sun angle conditions and thus show and hence a decrease in the levels of NDVI over time as the sensor orbit decays.. The spatial pattern of Component 7 (Figure 9) shows a strong positive residual in NDVI across the Sahel, East Africa, and the southeastern Africa coastal region. Examination of the loadings chart shown in figure 10.7 indicates that there were positive associations with this positive pattern in mid-1986, early to mid-1987, late 1988 to mid-1989 and early 1990, indicating anomalous high levels of NDVI in these regions at these times. Between each of these, periods of negative association can be found, indicating lower than usual levels of NDVI (e.g., early 1987, mid/late 1988, and mid to late. Thus the pattern is one which appears to oscillate with a wavelength of roughly 1.5 to 2 years and is largely interannual. The image for Component 8 (Figure 10) shows a very strong and coherent positive residual over Southern Africa (most particularly Botswana and South Africa). Positive residuals are also seen to occur in western Kenya, northeastern Uganda, southern Sudan, and Morocco. The loadings chart (Figure 10) shows negative loadings in early to mid 1987 (negative NDVI anomaly), followed by strongly positive loadings in 1988 reaching a peak in 1989 (positive NDVI anomalies), followed by a progression back towards neutral to slightly negative association in late 1990. This pattern corresponds well with what is known as the El Niño - Southern Oscillation (ENSO) phenomena that is defines the pattern of interannual climate variability over the Southern Africa region by influencing the precipitation patterns and hence the patterns of vegetation greeness. SUMMARY In summary, the patterns shown in components 2-4 and their corresponding time correlation coefficients ( loadings) are largely manifestations of the response of the land surface vegetation matrix to large scale seasonal changes in the precipitation fields associated with the general circulation of the atmosphere. These spatial patterns illustrate the zonal asymmetry about the equator in the NDVI variability that is related the seasonal precipitation patterns. Superimposed on these patterns, as illustrated in component 3, 5 and 6, are anomalies that are related to attenuation of the NDVI time signal that are related to atmospheric conditions and the instability in the cycle of imaging system over time. Components 7 and 8 illustrate slowly varying patterns of variability that are associated with interannual phenomena. Each of these components accounts for a smaller and smaller proportion of variance and the associated anomaly or residual spatial patterns are more regionalized or localized in space. Figure 11. Eigenvector magnitudes as a function of principal component number. Each successive component accounts for a smaller portion of the total variance thus explaining only localized information. The magnitude of the eigenvectors decays exponentially as shown by the fitted least squares line. According to the rules of the logeigenvalue diagram (LEV) (Wilks, 1995), the ideal cutoff point could be component 9, as the bar graphs level out and approach the zero line. Unlike applications of PCA as used traditionallly remote sensing we cannot use the percent variance explained as a determinant for the important components to retain. We can use other measures like spatial autocorrelation or if we plotted the eigenvectors or eigen values on a logarithmic scale as shown in figure 11 above, we can get a sense of what is the cut-off point for the important components. Each case study however will require detailed examinantion of the components patterns and their assocaited loadings. The ability to disentangle different patterns of variability from a time series array like the one used in this study, illustrates the unique ability of principal components technique in the extraction of different types of time series phenomena from complex time series measurements in ways that may not be readily apparent from time profiles of NDVI and other change analysis methods. Back to Module 8 Digital Change Detection CREDITS This exercise was written by Assaf Anyamba at Clark University. The data were provided by USAID/FEWS Project. Similar data sets for the whole world can be obtained at the following URL : NOAA/NASA AVHRR Land Pathfinder Data REFERENCES The use of Standard Principal Components Analysis in the analysis of land surface remotely sensed time series measurements has only happened in the last couple of years. The technique has been used widely in meteorology and climatologigy to identify propagating phenomena in geophysical data sets derived from remote sensing instruments. For basic information on the implementation of Principal Components Analysis refer to the references in the section on Principal Components Analysis in the spectral domain. The references below will be useful for further reading on the applications and interpretation of component patterns derived from time series measuremenst. Anyamba, A. and Eastman, J. R. (1996) Interannual Variability of NDVI over Africa and its relation to El Niño / Southern Oscillation. International Journal of Remote Sensing 17(13) : 2533-2548. Eastman, J. R.and Fulk, M .A. (1993a) Time Series Analysis of Remotely Sensed Data Using Standardized Principal Components Analysis. Proceedings 25th International Symposium on Remote Sensing and Global Environmental Change, Volume I. April, 4-8, Graz - Austria. I485-I496. Eastman, J. R. and Fulk, M. A. (1993b) Long Sequence Time Series Evaluation Using Standardized Principal Components. Photogrammetric Engineering and Remote Sensing. 59(6): 991-996. Eastman, J. R. and Fulk, M. A. (1992) Time Series Map Analysis Using Standardized Principal Component Analysis. ASPRS/ACSM/RT'92 Technical Papers, Vol. 1: Global Change and Education, August 3-8, Washington, D. C. 195-204. Eklundh, L. and Singh, A. (1993) A comparative analysis of standardized and unstandardized Principal Components Analysis in Remote Sensing. International Journal of Remote Sensing, 14(7): 1359-1370. Richards, J. A. (1984) Thematic Mapping from Multitemporal Image Data Using Principal Components Transformation. Remote Sensing of the Environment, 16: 35-46. Singh, A. and Harrison, A. (1985) Standardized Principal Components. International Journal of Remote Sensing, 6(6): 883-896. Tucker, C. J. and Townshend, J. R. G. and Goff, T. E. (1985) African Land-Cover Classification Using Satellite Data. Science, 227(4685): 369-375. Wilks, D. S., (1985) Statistical Methods in the Atmospheric Sciences. New York: Academic Press, New York. 359-398. Software Implementation Notes IDRISI for Windows 1.0 Use DISPLAY to view JAN86GV and AUG86GV using the Idrisi256 pallette. Use the Modify map components to add country and coastal outline boundary files : COUNTRYH.VEC and COASTH.VEC respectively. The Standardized Principal Components Analysis routine in IDRISI is named TSA, and can be found under the Change / Time Series submenu of the Analysis menu. In order to run this module you need to create a time series file with the extention .ts after the file name. This a simple ascii file, than can be created using the IDRISI text editor. It has on its first line the number of image files in the list (in this case 60) and followd on every line by the name of the file. Here is an example of how the file should look like: Figure 12. An example of a Time Series File in the IDRISI editor. Enter the name of the time series file in the first input box when asked for a Time series file Enter the number of components you want produce (You can produce up to 60 components). Select a loading option . The default is DIF (Dat Interchange Format) spreadsheet file or you can Idrisi VALUES file to be used with the PROFILE module. Enter a 3 character prefix for the output file .For example if you use "TSA" your components will be named TSACMP1 ..TSACMP60 and you DIF file will be named TSAPCA.DIF or TSA1.VAL for your values file. The images are scaled and stored as integer components by default to save hardisk space. After TSA has finished running you can use the DISPLAY system to look at you component images and PROFILE to view the loadings charts. If ou choose the DIF option you can import this loadings file into statistical programs like QUATTRO PRO, EXCEL, STATISTICA or Lotus 1-2-3 to plot you temporal patterns like the ones shown above. In some instances, you may want rescale your compent images symetrically using STRETCH in order to better visualize your spatial anomaly patterns.