Time Series Exercise - University of the Western Cape

advertisement
Change Detection Exercise:
Time Series Change Analysis
using
Standardized Principal
Components Analysis
INTRODUCTION
In the section covering image transormations, Principal Components Analysis was
applied as a form of data compression technique. The principal aim was to identify which
bands account for the the largest amount of variance and thus can be selected for use in
other analysis tasks like image classification or simply for purpose of image enhancement
by combining information from various spectral bands. The variant of PCA used was
termed unstandardized PCA becuase it uses the variance-covariance matrix in the
calculation of eigenvalues and eigenvectors. The use of PCA in case was investigate
variance patterns in the spectral domain. In this exercise we are going to apply another
varianit of PCA known as the Standardized Principal Components Analysis to analyze
remotely sensed data in the temporal domain. The implementation of Standardized PCA
is based on the the use of a correlation matrix which is derived from the covariance
matrix by dividing by the standard deviation to produce a matrix of standard scores. This
procedure has been found to be very useful in the analysis of time series data sets where
the interest is in the identification of phenomena or signals that propagate over time.
Often it is applied to single band data (for example vegetation index maps) which map
only one given phenomena over the land surface e.g vegetation greenness. The
standardization is intended to minimize the undue influence of other extraneous factors
e.g. atmospheric interference (aerosols and water vapour), changes surface illumination
conditions, e.t.c.. In this way the different time variance patterns of the phenomena of
interest (in this case vegetation) can be extracted from the time series measurements
effectively.
THE TIME SERIES DATA
The data set used for this exercise consists of 60 monthly NDVI images (which we can
consider to be bands in a spectral sense) for Africa. The data set is part of USAID/FEWS
time series archive streching back to 1981. The data set is processed at NASA Goddard
Space Flight Center to support both FEWS and FAO's famine early warning activities.
The time series that we will use in this exercise is covers the period January 1986 to
December 1990. This period of special interest because it will enable us to investigate a
number of patterns of variations related to various factors that influence vegetation
reflectance patterns both bioclimatic and noise due to sensor instrument characteristics.
Each image has 256 columns by 320 rows. The original data was registered to an 8km
grid, we have contracted by averaging to 30km grid as a filter for high frequency noise
related to topographic variability. This spatial averaging procedures enables us to identify
a number of time signals of interest including those related to patterns of interannual
variability from vegetation index data. The images are named JAN86GV.IMG ..
DEC90.IMG. Each image file has a corresponding documentation file thus named
JAN86GV.DOC..DEC90GV.DOC.doc. Ancillary information includes a series of vector
files : COUNTRYH.VEC (country boundaries), COASTH.VEC (continental coastline),
LAKESH.VEC (lakes) and RIVERSH.VEC (major rivers).
PROCEDURE
NDVI Image exploration
P1.Use the display system of your software to examine the images named JAN86GV and
AUG86GV. These two images show the levels of NDVI over Africa for January 1986
and August 1986 respectively. As you can note the highest levels of NDVI are located in
the southern hemisphere in January and in the northern hemisphere in August. This show
in a broad manner the variation in the pattern and location of maximum vegetation
greenness that is related to the climatic growing seasons. In general desert areas (the
Sahara in the north, and Namib - Kalahari in southwest) show low levels of NDVI
irrespective of the NDVI as they are largely devoid of vegetation. You can visually
examine the images for the remaining months in 1986 using your display system.
Figure 1. Normalized Difference Vegetation Index maps for January 1986 (left) and
August 1986 (right)
Standardized PCA Analysis of NDVI time series data
P2. Now run your software's Principal Component Analysis routine. Specify the name of
your time series file (that list the images to be used in this procedure) or select them from
your list of data files. Make sure your images are input sequentially from January 1986 to
December 1990. Elect to use the Standardized option. Indicate that you wish to output 8
component images and if required enter the names of the output images. This procedure
should take you less 30 minutes depending on the speed of your computer. Note that we
can output up to 60 components if we wanted to. You may be required to scale your
output images, choose whatever default option is required by your software system.
When your software is finished analyzing the data you will have a number of outputs. For
the purpose of this exercise, two forms of output are important : the component images
and component loadings and per cent variance explained by each component (See Figure
1 below). The loadings are a measure of the degree of correlation between each original
monthly input image and the new component patterns. The component loadings are very
useful in interpreting the component patterns in this exercise.
In order to explore further our results we can plot the the loadings statistics in any
statistical analysis package or using your analysis system. Plot graphs for each of the
components with the months on the x-axis and the loadings values on the y-axis. For
example below is the type of loadings output you should get shown in Figure 2.
Figure 2. An example of component loadings output from Standardized PCA
analysis
P3. Use your software system to display Component 1 and also the component
loadings chart for component 1 in your statistical analysis system.
Q1. What is the per cent variance accounted for Component 1.
Q2. Using the loading graph, what can you infer from the loadings pattern
between the original input images and Component 1.
Figure 3. Component 1 spatial pattern (left) and loadings chart at right
P4. Use your system to display Component 2 image. Also display the loadings
chart for Component 2.
Q2. What is the percentage variance accounted from Component 2 ? Using the
loadings chart, which months in the series have high positive loadings on
Component ? and which ones have high negative loadings ? Describe the spatial
pattern shown by Component 2.
Figure 4. Component 1 spatial pattern (left) and loadings chart at right
P5. Use your display systems to view Components 4 to 8 and their respective
component loadings charts. As in T-PCQ2 above note the amount of variance
accounted for by each component and describe the component spatial pattern
using the respective loadings graph.
Figure 5. Component 3 spatial pattern (left) and loadings chart at right
Figure 6. Component 4 spatial pattern (left) and loadings chart at right
Figure 7. Component 5 spatial pattern (left) and loadings chart at right
Figure 8. Component 6 spatial pattern (left) and loadings chart at right
Figure 9. Component 7 spatial pattern (left) and loadings chart at right
Figure 10. Component 8 spatial pattern (left) and loadings chart at right
If you have finished examining the eight components you can take a look at any of the
later components and compare the spatial and loadings patterns with the higher level
components.
OBSERVATIONS
This exercise raises a number of issues regarding the application of standardized PCA in
the temporal domain. Unlike in the spectral domain using unstandized PCA, in this case
we are investigating variability of the given phenomena (e.g. NDVI) overtime. PCA
allows us to segregate the various patterns of variability embedded in time series data set
into different components. We treat the time measurement maps as "spectral bands" thus
taken as a time series each pixel and thus each map contains information about the
variance chararacteristics of the given phenomena. We can use the same technique to
process a time series images of LANDSAT TM data used in the exercise on Principal
Components Analysis.
The are a number of important observations we can infer from this exercise. Component
1 shows a pattern that is similar to a typical continental vegetation map of Africa. The
loadings indicate that all the months are highly correlated with this patterns, with
loadings over 0.90. This component alone accounts for 96.7 per cent of the variance in
the 60 months time series. However, in time series sense our interest is not really in
examining the typical patterns but rather on the change patterns or the atypical
components. Component 2 is such an atypical component. It is computed from the
residuals after the variance accounted for by Component 1 are removed. Component 2
accounts for only 1.97 of the entire continental scale variance. It however contains very
useful information on the seasonality patterns of vegetation. As can be seen in figure 4 it
shows a strong positive NDVI anomaly pattern in band stretching from Senegal to
Ethiopia in the northern hemisphere (green areas) and negative anomaly in the southern
hemisphere (red to deep blue). As illustrated by the component loadings, the positive
anomaly in the northern hemisphere has peaks approximately in July - August and
troughs from approximately November of one calendar year to April of the following
year forming a sinusoidal temporal pattern. These months have a positive and negative
correlations with this positive spatial pattern respectively throughout the series. The
reverse explanation applies to the southern hemisphere i.e.. peaks in NDVI during
December - March and negative anomalies in June - September. This pattern indicates the
annual cycle in the flux of greenness in NDVI that is synchronized with the first mode of
annual excursions of the ITCZ north and south of the equator. This mode essentially
illustrates the annual cycle in greenness associated with summer and winter precipitation
solstices.
Component 3 is a residual pattern from this pattern calculated after the variance
accounted for by Components 1 and 2 has been removed. It accounts for only 0.28 per
cent of the total variance. The spatial pattern for component three shows a positive
anomaly across the Sahel and in southeastern South Africa, and a strong negative
anomaly in a band immediately south of the Sahel (Figure 5). The loadings chart shows a
slight bimodal time signal pattern. The positive anomalies are associated with greening in
the Sahel between January and March (slight positive loadings) and between July and
October. While the later period corresponds well with the peak timing of the Sahelian
growing season when the ITCZ reaches its northern most position, the former seems to be
an unexpected anomaly at this time of the year when there is no precipitation across the
Sahel. Previous research (Eastman and Fulk, 1993), has suggested that this anomaly may
be related to attenuation of the NDVI signal by preferential scattering of short-wave
length radiation by aerosol dust particles that are common in the atmosphere over the
region during this time of the year that is characterized by strong Harmattan winds from
the Sahara in the north. This is therefore suggests a false greening that is not indicative of
vegetation seasonality during the period January - March. There is also a drop in the
component loadings to negative values between April and May prior to the beginning of
the growing season. This anomaly is related to the increase in atmospheric water vapor
content with the advance of the ITCZ into the Sahelian belt (Justice et al, 1991).
Component 4 shows positive NDVI anomalies in the Congo equatorial forest region and
in parts of East Africa (Figure 6). As shown in the loadings chart, the months April, May,
June and September, October, November are positively correlated with this pattern, while
the rest of the months are negatively associated with this pattern. This temporal pattern
illustrates the semiannual cycle in vegetation greenness that is related to the annual cycle
in precipitation associated with the equinoctial maxima of the ITCZ with peaks in April
and October.
Component 5 shows a similar seasonality pattern to that of Component 4, with a strong
bimodal pattern (Figure 7). In this case however, the peak in the loadings are mainly in
May and November. This strongest negative anomalies occur over the desert regions and
the strongest positive anomalies occur over the forested regions (e.g. the Congo Forest).
However, as can be seen in the loadings chart, the dominant trends is a progressive slight
negative trend in the loadings over the 1986-1990 period that is negatively associated
with the negative anomaly over the Sahara desert (thus an apparent increase in NDVI
over the desert regions and decrease in NDVI over the forested regions). This pattern of
apparent greening over desert areas is attributed to anomalies in NDVI due to decay in
the orbital cycle of the sensing platform. An aging NOAA-9 was in service during most
this time series period, overtime its equatorial crossing time deteriorated from 16.10 hrs.
to 14.20 hrs over a four year period. The effect of orbital decay is to attenuate calculated
NDVI values over bare surfaces especially desert areas. Bright desert targets provide a
higher reflectivity especially in the visible red wavelengths compared to the infrared at
low sun angles because of differential degradation in the mapping channels since the prelaunch calibration (Teillet et al, 1990). As a consequence there is an apparent increasein
calculated NDVI over desert areas (Price, 1991; Tateishi and Kajiwara, 1992). Similar
anomalies have been found in analyses by Kaufaman and Holben, 1990; Eastman and
Fulk (1993); Los et al (1994). The changeover to a more stable NOAA-11, can be seen as
the correlations drop to negative values in November/December 1988 with a much more
constant amplitude pattern in the loadings for the reminder of the series.
Component 6 shows a strong positive anomaly in East Africa (eastern Kenya and
Ethiopia and Somalia), (Figure 8) with a pronounced bimodality indicating another
double greening pattern associated with the ITCZ during its extreme northernmost
position in July and another one associated with this discontinuity in December / January.
There is a strong negative anomaly in the area of the Congo forest region, and slight
positive anomalies in the Kalahari and along the Mediterranean coast of north Africa. The
temporal pattern shows a decrease in the amplitude of the anomalies up to about
November 1988 before the changeover to NOAA-11. This amplitude variation is related
to the shift in the orbital cycle of the sensing and preferentially affects forested areas
which appear as dark targets under low sun angle conditions and thus show and hence a
decrease in the levels of NDVI over time as the sensor orbit decays..
The spatial pattern of Component 7 (Figure 9) shows a strong positive residual in NDVI
across the Sahel, East Africa, and the southeastern Africa coastal region. Examination of
the loadings chart shown in figure 10.7 indicates that there were positive associations
with this positive pattern in mid-1986, early to mid-1987, late 1988 to mid-1989 and
early 1990, indicating anomalous high levels of NDVI in these regions at these times.
Between each of these, periods of negative association can be found, indicating lower
than usual levels of NDVI (e.g., early 1987, mid/late 1988, and mid to late. Thus the
pattern is one which appears to oscillate with a wavelength of roughly 1.5 to 2 years and
is largely interannual.
The image for Component 8 (Figure 10) shows a very strong and coherent positive
residual over Southern Africa (most particularly Botswana and South Africa). Positive
residuals are also seen to occur in western Kenya, northeastern Uganda, southern Sudan,
and Morocco. The loadings chart (Figure 10) shows negative loadings in early to mid
1987 (negative NDVI anomaly), followed by strongly positive loadings in 1988 reaching
a peak in 1989 (positive NDVI anomalies), followed by a progression back towards
neutral to slightly negative association in late 1990. This pattern corresponds well with
what is known as the El Niño - Southern Oscillation (ENSO) phenomena that is defines
the pattern of interannual climate variability over the Southern Africa region by
influencing the precipitation patterns and hence the patterns of vegetation greeness.
SUMMARY
In summary, the patterns shown in components 2-4 and their corresponding time
correlation coefficients ( loadings) are largely manifestations of the response of the land
surface vegetation matrix to large scale seasonal changes in the precipitation fields
associated with the general circulation of the atmosphere. These spatial patterns illustrate
the zonal asymmetry about the equator in the NDVI variability that is related the seasonal
precipitation patterns. Superimposed on these patterns, as illustrated in component 3, 5
and 6, are anomalies that are related to attenuation of the NDVI time signal that are
related to atmospheric conditions and the instability in the cycle of imaging system over
time. Components 7 and 8 illustrate slowly varying patterns of variability that are
associated with interannual phenomena. Each of these components accounts for a smaller
and smaller proportion of variance and the associated anomaly or residual spatial patterns
are more regionalized or localized in space.
Figure 11. Eigenvector magnitudes as a function of principal component number. Each
successive component accounts for a smaller portion of the total variance thus explaining
only localized information. The magnitude of the eigenvectors decays exponentially as
shown by the fitted least squares line. According to the rules of the logeigenvalue
diagram (LEV) (Wilks, 1995), the ideal cutoff point could be component 9, as the bar
graphs level out and approach the zero line.
Unlike applications of PCA as used traditionallly remote sensing we cannot use the
percent variance explained as a determinant for the important components to retain. We
can use other measures like spatial autocorrelation or if we plotted the eigenvectors or
eigen values on a logarithmic scale as shown in figure 11 above, we can get a sense of
what is the cut-off point for the important components. Each case study however will
require detailed examinantion of the components patterns and their assocaited loadings.
The ability to disentangle different patterns of variability from a time series array like the
one used in this study, illustrates the unique ability of principal components technique in
the extraction of different types of time series phenomena from complex time series
measurements in ways that may not be readily apparent from time profiles of NDVI and
other change analysis methods.
Back to Module 8 Digital Change Detection
CREDITS
This exercise was written by Assaf Anyamba at Clark University. The data were provided
by USAID/FEWS Project. Similar data sets for the whole world can be obtained at the
following URL : NOAA/NASA AVHRR Land Pathfinder Data
REFERENCES
The use of Standard Principal Components Analysis in the analysis of land surface
remotely sensed time series measurements has only happened in the last couple of years.
The technique has been used widely in meteorology and climatologigy to identify
propagating phenomena in geophysical data sets derived from remote sensing
instruments. For basic information on the implementation of Principal Components
Analysis refer to the references in the section on Principal Components Analysis in the
spectral domain. The references below will be useful for further reading on the
applications and interpretation of component patterns derived from time series
measuremenst.
Anyamba, A. and Eastman, J. R. (1996) Interannual Variability of NDVI over Africa and
its relation to El Niño / Southern Oscillation. International Journal of Remote Sensing
17(13) : 2533-2548.
Eastman, J. R.and Fulk, M .A. (1993a) Time Series Analysis of Remotely Sensed Data
Using Standardized Principal Components Analysis. Proceedings 25th International
Symposium on Remote Sensing and Global Environmental Change, Volume I. April, 4-8,
Graz - Austria. I485-I496.
Eastman, J. R. and Fulk, M. A. (1993b) Long Sequence Time Series Evaluation Using
Standardized Principal Components. Photogrammetric Engineering and Remote Sensing.
59(6): 991-996.
Eastman, J. R. and Fulk, M. A. (1992) Time Series Map Analysis Using Standardized
Principal Component Analysis. ASPRS/ACSM/RT'92 Technical Papers, Vol. 1: Global
Change and Education, August 3-8, Washington, D. C. 195-204.
Eklundh, L. and Singh, A. (1993) A comparative analysis of standardized and
unstandardized Principal Components Analysis in Remote Sensing. International Journal
of Remote Sensing, 14(7): 1359-1370.
Richards, J. A. (1984) Thematic Mapping from Multitemporal Image Data Using
Principal Components Transformation. Remote Sensing of the Environment, 16: 35-46.
Singh, A. and Harrison, A. (1985) Standardized Principal Components. International
Journal of Remote Sensing, 6(6): 883-896.
Tucker, C. J. and Townshend, J. R. G. and Goff, T. E. (1985) African Land-Cover
Classification Using Satellite Data. Science, 227(4685): 369-375.
Wilks, D. S., (1985) Statistical Methods in the Atmospheric Sciences. New York:
Academic Press, New York. 359-398.
Software Implementation Notes
IDRISI for Windows 1.0


Use DISPLAY to view JAN86GV and AUG86GV using the
Idrisi256 pallette. Use the Modify map components to add country
and coastal outline boundary files : COUNTRYH.VEC and
COASTH.VEC respectively.
The Standardized Principal Components Analysis routine in
IDRISI is named TSA, and can be found under the Change / Time
Series submenu of the Analysis menu. In order to run this module
you need to create a time series file with the extention .ts after the
file name. This a simple ascii file, than can be created using the
IDRISI text editor. It has on its first line the number of image files
in the list (in this case 60) and followd on every line by the name
of the file. Here is an example of how the file should look like:
Figure 12. An example of a Time Series File in the IDRISI editor.






Enter the name of the time series file in the first input box when
asked for a Time series file
Enter the number of components you want produce (You can
produce up to 60 components).
Select a loading option . The default is DIF (Dat Interchange
Format) spreadsheet file or you can Idrisi VALUES file to be used
with the PROFILE module.
Enter a 3 character prefix for the output file .For example if you
use "TSA" your components will be named TSACMP1
..TSACMP60 and you DIF file will be named TSAPCA.DIF or
TSA1.VAL for your values file.
The images are scaled and stored as integer components by default
to save hardisk space.
After TSA has finished running you can use the DISPLAY system
to look at you component images and PROFILE to view the
loadings charts. If ou choose the DIF option you can import this
loadings file into statistical programs like QUATTRO PRO,
EXCEL, STATISTICA or Lotus 1-2-3 to plot you temporal
patterns like the ones shown above. In some instances, you may
want rescale your compent images symetrically using STRETCH
in order to better visualize your spatial anomaly patterns.
Download