Quantitative methods

advertisement
Quantitative methods (Wilson, 2003)
Distributions and Statistics
The following exercise provides a review of basic concepts used in statistical
characterization and comparison of data including the concepts of probability, population
parameters and correlation coefficients. The exercise makes use of data obtained from
aerial photographs. The photographs were taken using new hyperspectral imaging
technology. The specific type of hyperspectral imagery used in this study is referred to as
AVIRIS. AVIRIS stands for airborne visual and infrared imaging spectroscopy. The
range of wavelengths scanned by the AVIRIS spectroscope extends from about 0.37
microns to 2.507 microns (see below). This corresponds roughly to the visible and near
or reflected infrared part of the electromagnetic spectrum (below). Each pixel in an
AVIRIS scene contains information recorded at 224 separate wavelengths extending over
the 0.37 to 2.507 micron range. The mean wavelength difference between samples is
approximately 10 nanometers or 1/100th of a micron.
Figure 1: Ultra-violet through thermal infrared regions of the
electromagnetic spectrum.
The AVIRIS image shown below (Figure 2) was recorded at a wavelength of 557.07 nm.
This corresponds to Band 20 in the complete set of 224 spectral views.
Region 1
.
Figure 2: Image
of the Cuprite
area taken at a
wavelength of
0.57microns .
The four areas
analyzed in this
problem are
located by the
light -colored
squares.
Region 2
Region 4
Region 3
The data sets used in this exercise consist of the reflectance values recorded for each
pixel in the 4 square regions located on the image (Figure 1). Individual data sets are
referred to by their band number and region number. Regions 1 through 3 located in each
image were selected because within any given scene these regions appear visually to be
characterized by approximately same relative intensity level. There also appears to be
relatively little intensity variation within each of these areas at a given wavelength.
Region 4 is a high reflectance region with apparently constant reflectance that lies within
a playa lake.
Data sets are referred to by the AVIRIS band number rather than the peak wavelength
corresponding to that band. Hence, the data set containing reflectance values measured
for Band 20 in Region 1 is referred to as 20-1.dat. The data set containing reflectance
values measured in region 2 is 20-2.dat and so on (i.e. 20-3.dat and 20-4.dat).
Band 100 shown below (Figure 3) was photographed at a wavelength of 1284.20nm.
Data sets from this image are named 100-1 through 100-4.dat.
1
2
4
3
Figure 3: AVIRIS image recorded at a wavelength of 1284.2nm corresponding to
Band 100.
Band 200 image (Figure 4) was photographed at a wavelength of 2268.4nm
Datasets from this image are named 200-1 through 200-4.dat.
Figure 4: AVIRIS image recorded at 2268.4nm. Boxes outline areas from which
reflectance values have been extracted.
_____________________________________________
Objective of this week’s and next week’s Computer Lab Exercises: Learn
basics of a new software package – PSI-Plot. Review basic concepts associated with the
univariate and bivariate statistical characterization and comparison of data.
Concepts and Tasks:
 Compute probability distributions of reflectance values.
 Compute sample statistics - mean, variance, standard deviation
 Compute joint variation of two variables - for example compute covariance between
reflectance measured at different wavelength within the same area.
 Compute the correlation coefficient
Exercise Additional bands from the AVIRIS image supplement data presented above. AVIRIS
data is often referred to as hyperspectral data because it measures reflectance at a large
number of different wavelengths in the electromagnetic spectrum. Several instruments
measure multiple band reflectances, such as the Landsat Multispectral Scanner, SPOT
Multispectral Scanner and Jers-1 OPS. One of the more familiar of these is Landsat
Thematic Mapper or Landsat TM. Landsat TM measures reflectance in 7 different
regions of the EM spectrum. The Landsat TM bands are tabulated below.
Band
1
2
3
Wavelength (nm)
450 - 520
520 - 600
630 - 690
4
760 - 900
5
1550 - 1750
6
7
10400 - 12500
2080 - 2350
Characteristics
Blue-green. Maximum penetration of water
Green. Matches green reflection peak of vegetation
Red. Matches a chlorophyll absorption band important to
distinguish vegetation types
Reflected IR. Useful for determining biomass content and
mapping shorelines
Reflected IR. Indicates moisture content of soil and
vegetation
Thermal IR. Nighttime thermal mapping and soil moisture.
Reflected IR. Mineral absorption band associated with
hydrothermally altered rocks.
The data sets provided for this exercise include AVIRIS bands 16 and 18. AVIRIS bands
16, 18 and 20 correspond to 517.6nm, 537.33nm and 557.07nm. These bands span
Landsat TM Green band 2 (520-600 above). Also included in the data sets available for
this exercise is AVIRIS band 52 recorded at 846.28nm. This band lies within Landsat TM
band 4. Note also that AVIRIS band 200 recorded at a 2268.4nm wavelength lies within
band 7 of the Landsat TM data.
The table below summarizes the data available for this exercise.
Wavelength
517.6
537.33
557.07
846.28
1284.2
2268.4
Landsat TM Band #
2
2
2
4
NA
7
AVIRIS Band #
16
18
20
52
100
200
Data set name
16(1 through 4).dat
18(1 through 4).dat
20(1 through 4).dat
52(1 through 4).dat
100(1 through 4).dat
200(1 through 4).dat
To obtain the data made available for this exercise you will need to access shared data on
my office computer through the network neighborhood. The following instructions are
provided as a reference guide to network neighborhood and copying procedures.
PSIPlot Basics
PSI-Plot is a technical plotting and data processing program similar to EXCEL. The
following instructions are meant to take you step-by-step through basic statistical evaluation
and plotting of a couple data sets using the PsiPlot software.
Copy 16-1 and 20-1.dat from your H:\Drive to your C:\ or G:\Drive
AN IMPORTANT NOTE: PSI-Plot does not let you know when a disk is
full. You may think you have saved a file only to find out later that your
disk was full and your file lost. So check your disk to be sure you have
needed storage space.
GETTING INTO PSI-PLOT
Double click on the PSI-Plot icon.
(If the PSIPlot shortcut doesn't appear on your machine we will set one up.)
When the PSI-Plot window opens up, click on FILE then on IMPORT DATA.
Choose the appropriate folder containing 16-1.dat. To import 16-1.DAT, double click on 161.DAT. A single column of data (Column 1) will appear in the spreadsheet. PAGE DOWN,
there are 182 rows or values. HOME (to return to the top).
NOTES
GRAPHING AND PLOTTING!
Let’s generate a histogram of these numbers.
Click on PLOT (a menu drops down)
Click on 2D Special > (menu opens to right)
Note the different options that are available and
Click OK (Plot window opens up)
NOTES
ODDS and ENDS Move the mouse arrow down into the plot area and Click it in an empty area of the
plot.
Note that the plot will be highlighted (eg. square dots appear on the plot margins).
You can resize the plot by clicking on the highlighted edge or corner points. You can move
the plot around by clicking within the bounds of the plot and dragging it to a desired
location. Try it.
To close your plot, move the mouse over to the upper left corner and
click on the - sign. Then
click on CLOSE, then
click on NO (you don’t want to save it). This will return you to your spreadsheet.
Try using a different number of intervals and change the bar size to see what the effects are.
NOTES
Make Another Plot
PLOT
2D SPECIAL
HISTOGRAM
Check out the Advanced Options
Note that the plots default to a landscape layout. Let’s change that to portrait.
Click on FILE (Menu drops down - note variety of selections. Experiment later.)
Click on Printer Setup.
Click on Portrait, then
Click on OK
Resize the graph to fit in the upper third of the sheet.
Double click on the graph title 16-1.DAT.
16-1.DAT will be highlighted in blue. You can type in any title you’d like. Do so now, and
also note the other options in this window, including Font, Size, Italic, etc.
Click on OK.
You can change axis labels the same way.
Note the tool box off to the right.
Click on abc
Bring the cross hairs over to a suitable place on your graph.
Hold down the left mouse button and drag open a rectangular box to place a label.
When you release the left mouse button, a text format window will open up. You can enter
a relevant label as you did for the title and axes above.
Click on OK when done.
NOTES:
Click on your label, and move it around.
Click on an open space within the graph. Note that the graph is highlighted. Now
Click on the label you entered above. Note that the graph remains highlighted.
You will have to push it back.
Go to VIEW (a window drops down).
Click on Push Back.
Now click on your label. Voila!
This has just been a basic run through on some of the options available through
PLOT.
We will return for more throughout the semester. The best way to learn will be to
experiment.
OK, instead of closing the graph, this time just change windows back to your spreadsheet.
Click on window then click on 16-1.DAT. You should now be back in the 16-1.DAT
spreadsheet.
MORE NOTES?
GENERATE HISTOGRAMS OF DATA SETS 16-1 and 20-1.dat
Resize and position both histograms on one page for easy comparison.
PRELIMINARY EXERCISE: Compute statistical characteristics of the two sample
data sets.
Open up the Data, Descriptive Statistics option. Note that you could select more
than one column if you wanted. Note that you can also specify data ranges. Click OK.
The following list of information will appear. Scroll down and check it out.
For now just write down the Minimum value, Maximum value, Median, Mean and
Standard Deviation of this data set.
Return to your plot page and construct a label on the appropriate histogram showing the
above information. Your plot should look something like the following
16-1.dat
80
Frequency of Occurence
70
Min = 4765
Max = 6188
Median = 4996
Mean = 5007.93
S. D. = 149.21
60
50
40
30
20
10
0
4400
4800
5200
5600
6000
6400
Reflectance
Do the same for data set 20-1.dat.

Compute the correlation coefficient between the two data sets using the approach
discussed in Davis.
SAVING YOUR FILES:
PSI-Plot saves files of type filename.PDW and filename.PGW, where the D and G stand for
Data and Graphic. If you wish to leave your files in a data format, it is necessary to Export
them rather than Save them. Data files typically take up less space than PDW files, but this
is up to you.
FOR TODAY:
Print off the plot you put together. Put your name on it and hand it in. Make sure to note the
correlation coefficient.
What to do next?
Exercise 1
You have been given a total of 24 different data sets. Each data set shares some similarity
with other data sets either in terms of wavelength (or band number) and surface location.
There are six different wavelengths and 4 different surface locations.

Devise a hypothesis to test certain features of the data. The hypothesis could be posed as
a test of differences within a given area between the measurements of reflectance made
at different wavelengths. Alternatively a hypothesis could be formed to test for
differences between areas measured at the same wavelength.

State your hypothesis in writing (one paragraph).

State how you will examine your hypothesis (one paragraph - t-test or correlation
coefficient).

Present your results and refer to labeled figures.
All reports should be typewritten. The written report for this exercise should be no more
than one single-spaced typewritten page (12 font or less). Hand in next lab period and be
prepared to present your results in a group discussion.
Download