Quantitative methods (Wilson, 2003) Distributions and Statistics The following exercise provides a review of basic concepts used in statistical characterization and comparison of data including the concepts of probability, population parameters and correlation coefficients. The exercise makes use of data obtained from aerial photographs. The photographs were taken using new hyperspectral imaging technology. The specific type of hyperspectral imagery used in this study is referred to as AVIRIS. AVIRIS stands for airborne visual and infrared imaging spectroscopy. The range of wavelengths scanned by the AVIRIS spectroscope extends from about 0.37 microns to 2.507 microns (see below). This corresponds roughly to the visible and near or reflected infrared part of the electromagnetic spectrum (below). Each pixel in an AVIRIS scene contains information recorded at 224 separate wavelengths extending over the 0.37 to 2.507 micron range. The mean wavelength difference between samples is approximately 10 nanometers or 1/100th of a micron. Figure 1: Ultra-violet through thermal infrared regions of the electromagnetic spectrum. The AVIRIS image shown below (Figure 2) was recorded at a wavelength of 557.07 nm. This corresponds to Band 20 in the complete set of 224 spectral views. Region 1 . Figure 2: Image of the Cuprite area taken at a wavelength of 0.57microns . The four areas analyzed in this problem are located by the light -colored squares. Region 2 Region 4 Region 3 The data sets used in this exercise consist of the reflectance values recorded for each pixel in the 4 square regions located on the image (Figure 1). Individual data sets are referred to by their band number and region number. Regions 1 through 3 located in each image were selected because within any given scene these regions appear visually to be characterized by approximately same relative intensity level. There also appears to be relatively little intensity variation within each of these areas at a given wavelength. Region 4 is a high reflectance region with apparently constant reflectance that lies within a playa lake. Data sets are referred to by the AVIRIS band number rather than the peak wavelength corresponding to that band. Hence, the data set containing reflectance values measured for Band 20 in Region 1 is referred to as 20-1.dat. The data set containing reflectance values measured in region 2 is 20-2.dat and so on (i.e. 20-3.dat and 20-4.dat). Band 100 shown below (Figure 3) was photographed at a wavelength of 1284.20nm. Data sets from this image are named 100-1 through 100-4.dat. 1 2 4 3 Figure 3: AVIRIS image recorded at a wavelength of 1284.2nm corresponding to Band 100. Band 200 image (Figure 4) was photographed at a wavelength of 2268.4nm Datasets from this image are named 200-1 through 200-4.dat. Figure 4: AVIRIS image recorded at 2268.4nm. Boxes outline areas from which reflectance values have been extracted. _____________________________________________ Objective of this week’s and next week’s Computer Lab Exercises: Learn basics of a new software package – PSI-Plot. Review basic concepts associated with the univariate and bivariate statistical characterization and comparison of data. Concepts and Tasks: Compute probability distributions of reflectance values. Compute sample statistics - mean, variance, standard deviation Compute joint variation of two variables - for example compute covariance between reflectance measured at different wavelength within the same area. Compute the correlation coefficient Exercise Additional bands from the AVIRIS image supplement data presented above. AVIRIS data is often referred to as hyperspectral data because it measures reflectance at a large number of different wavelengths in the electromagnetic spectrum. Several instruments measure multiple band reflectances, such as the Landsat Multispectral Scanner, SPOT Multispectral Scanner and Jers-1 OPS. One of the more familiar of these is Landsat Thematic Mapper or Landsat TM. Landsat TM measures reflectance in 7 different regions of the EM spectrum. The Landsat TM bands are tabulated below. Band 1 2 3 Wavelength (nm) 450 - 520 520 - 600 630 - 690 4 760 - 900 5 1550 - 1750 6 7 10400 - 12500 2080 - 2350 Characteristics Blue-green. Maximum penetration of water Green. Matches green reflection peak of vegetation Red. Matches a chlorophyll absorption band important to distinguish vegetation types Reflected IR. Useful for determining biomass content and mapping shorelines Reflected IR. Indicates moisture content of soil and vegetation Thermal IR. Nighttime thermal mapping and soil moisture. Reflected IR. Mineral absorption band associated with hydrothermally altered rocks. The data sets provided for this exercise include AVIRIS bands 16 and 18. AVIRIS bands 16, 18 and 20 correspond to 517.6nm, 537.33nm and 557.07nm. These bands span Landsat TM Green band 2 (520-600 above). Also included in the data sets available for this exercise is AVIRIS band 52 recorded at 846.28nm. This band lies within Landsat TM band 4. Note also that AVIRIS band 200 recorded at a 2268.4nm wavelength lies within band 7 of the Landsat TM data. The table below summarizes the data available for this exercise. Wavelength 517.6 537.33 557.07 846.28 1284.2 2268.4 Landsat TM Band # 2 2 2 4 NA 7 AVIRIS Band # 16 18 20 52 100 200 Data set name 16(1 through 4).dat 18(1 through 4).dat 20(1 through 4).dat 52(1 through 4).dat 100(1 through 4).dat 200(1 through 4).dat To obtain the data made available for this exercise you will need to access shared data on my office computer through the network neighborhood. The following instructions are provided as a reference guide to network neighborhood and copying procedures. PSIPlot Basics PSI-Plot is a technical plotting and data processing program similar to EXCEL. The following instructions are meant to take you step-by-step through basic statistical evaluation and plotting of a couple data sets using the PsiPlot software. Copy 16-1 and 20-1.dat from your H:\Drive to your C:\ or G:\Drive AN IMPORTANT NOTE: PSI-Plot does not let you know when a disk is full. You may think you have saved a file only to find out later that your disk was full and your file lost. So check your disk to be sure you have needed storage space. GETTING INTO PSI-PLOT Double click on the PSI-Plot icon. (If the PSIPlot shortcut doesn't appear on your machine we will set one up.) When the PSI-Plot window opens up, click on FILE then on IMPORT DATA. Choose the appropriate folder containing 16-1.dat. To import 16-1.DAT, double click on 161.DAT. A single column of data (Column 1) will appear in the spreadsheet. PAGE DOWN, there are 182 rows or values. HOME (to return to the top). NOTES GRAPHING AND PLOTTING! Let’s generate a histogram of these numbers. Click on PLOT (a menu drops down) Click on 2D Special > (menu opens to right) Note the different options that are available and Click OK (Plot window opens up) NOTES ODDS and ENDS Move the mouse arrow down into the plot area and Click it in an empty area of the plot. Note that the plot will be highlighted (eg. square dots appear on the plot margins). You can resize the plot by clicking on the highlighted edge or corner points. You can move the plot around by clicking within the bounds of the plot and dragging it to a desired location. Try it. To close your plot, move the mouse over to the upper left corner and click on the - sign. Then click on CLOSE, then click on NO (you don’t want to save it). This will return you to your spreadsheet. Try using a different number of intervals and change the bar size to see what the effects are. NOTES Make Another Plot PLOT 2D SPECIAL HISTOGRAM Check out the Advanced Options Note that the plots default to a landscape layout. Let’s change that to portrait. Click on FILE (Menu drops down - note variety of selections. Experiment later.) Click on Printer Setup. Click on Portrait, then Click on OK Resize the graph to fit in the upper third of the sheet. Double click on the graph title 16-1.DAT. 16-1.DAT will be highlighted in blue. You can type in any title you’d like. Do so now, and also note the other options in this window, including Font, Size, Italic, etc. Click on OK. You can change axis labels the same way. Note the tool box off to the right. Click on abc Bring the cross hairs over to a suitable place on your graph. Hold down the left mouse button and drag open a rectangular box to place a label. When you release the left mouse button, a text format window will open up. You can enter a relevant label as you did for the title and axes above. Click on OK when done. NOTES: Click on your label, and move it around. Click on an open space within the graph. Note that the graph is highlighted. Now Click on the label you entered above. Note that the graph remains highlighted. You will have to push it back. Go to VIEW (a window drops down). Click on Push Back. Now click on your label. Voila! This has just been a basic run through on some of the options available through PLOT. We will return for more throughout the semester. The best way to learn will be to experiment. OK, instead of closing the graph, this time just change windows back to your spreadsheet. Click on window then click on 16-1.DAT. You should now be back in the 16-1.DAT spreadsheet. MORE NOTES? GENERATE HISTOGRAMS OF DATA SETS 16-1 and 20-1.dat Resize and position both histograms on one page for easy comparison. PRELIMINARY EXERCISE: Compute statistical characteristics of the two sample data sets. Open up the Data, Descriptive Statistics option. Note that you could select more than one column if you wanted. Note that you can also specify data ranges. Click OK. The following list of information will appear. Scroll down and check it out. For now just write down the Minimum value, Maximum value, Median, Mean and Standard Deviation of this data set. Return to your plot page and construct a label on the appropriate histogram showing the above information. Your plot should look something like the following 16-1.dat 80 Frequency of Occurence 70 Min = 4765 Max = 6188 Median = 4996 Mean = 5007.93 S. D. = 149.21 60 50 40 30 20 10 0 4400 4800 5200 5600 6000 6400 Reflectance Do the same for data set 20-1.dat. Compute the correlation coefficient between the two data sets using the approach discussed in Davis. SAVING YOUR FILES: PSI-Plot saves files of type filename.PDW and filename.PGW, where the D and G stand for Data and Graphic. If you wish to leave your files in a data format, it is necessary to Export them rather than Save them. Data files typically take up less space than PDW files, but this is up to you. FOR TODAY: Print off the plot you put together. Put your name on it and hand it in. Make sure to note the correlation coefficient. What to do next? Exercise 1 You have been given a total of 24 different data sets. Each data set shares some similarity with other data sets either in terms of wavelength (or band number) and surface location. There are six different wavelengths and 4 different surface locations. Devise a hypothesis to test certain features of the data. The hypothesis could be posed as a test of differences within a given area between the measurements of reflectance made at different wavelengths. Alternatively a hypothesis could be formed to test for differences between areas measured at the same wavelength. State your hypothesis in writing (one paragraph). State how you will examine your hypothesis (one paragraph - t-test or correlation coefficient). Present your results and refer to labeled figures. All reports should be typewritten. The written report for this exercise should be no more than one single-spaced typewritten page (12 font or less). Hand in next lab period and be prepared to present your results in a group discussion.