doc

CROP AREA ASSESSMENTS USING LOW, MODERATE, AND HIGH RESOLUTION IMAGERY: A GEOTOOLS APPROACH Gregory T. Koeln Vice President, Environmental and GIS Services Earth Satellite Corporation 6011 Executive Boulevard, Suite 400 Rockville, MD 20852 gkoeln@earthsat.com R. Peter Kollasch Senior Scientist, Applications Development Earth Satellite Corporation, 6011 Executive Boulevard, Suite 400 Rockville, MD 20852 pkollasc@earthsat.com ABSTRACT Performing agricultural assessments can be prohibitively expensive, especially when extensive ground truth collection is required. An approach has been developed which enables these assessments to be performed without ground collection, utilizing high-resolution imagery in the place of ground truth data. GeoTools is a concept of developing procedures, techniques, training, geospatial software tools, and standard operating procedures (SOPs) that will promote the use, increase the accuracy, reduce time required for analyses, and decrease the cost associated with using multi-source geospatial data for agricultural assessments. This is primarily accomplished through the development and use of new methods for data collection and analysis, as well as, the integration of digital image processing and statistical analysis tools. The GeoTools approach statistically integrates the use of high, moderate, and low spatial resolution imagery in a Nested Area Frame Sampling (NAFS) or multi-stage sampling approach. GeoTools has been successfully used to calculate cropland area in various parts of the world. The steps used to derive cropland area included: 1) stratifying the study area using coarse resolution imagery, 2) assigning an a priori percent crop area to each stratum, 3) selecting the optimum locations for sampling cropland area using moderate resolution satellite imagery, 4) extracting cropland area from the moderate resolution imagery; 5) correcting the cropland area derived from moderate resolution imagery with data collected from high resolution imagery, 6) calculating the total cropland area for the study area, 7) calculating the confidence interval of the cropland area, and 8) validating the results. INTRODUCTION Agricultural surveys as performed by agricultural agencies frequently use a multistage approach that incorporates on-the-ground sampling as one of the stages. This approach involves significant expense and logistical support, which may not be available in many circumstances. The methodology exists to perform highly accurate assessments, but the extensive use of field data required makes these procedures expensive to perform, and nearly impossible to complete if the area under study is difficult to access. For this reason, a methodology has been sought that would allow accurate agricultural assessments to be performed utilizing imagery alone. Such a technique has the potential of reducing the cost required for extensive field data collection, as well as making it possible to perform studies in regions where such collection is virtually impossible. Imagery acquired from Earth-orbiting satellites has long been a primary source of data for these surveys. These surveys extensively utilize commercially available imagery, which is available in a wide variety of resolutions, ranging from the coarse resolution of AVHRR, Sea WIFS and SPOT Vegetation, through the moderate resolution of Landsat, SPOT, IRS LISS and others. The recent advent of high-resolution imagery from IKONOS, and soon to be available from QuickBird and other satellites makes it possible to consider the use of high-resolution satellite imagery in this context. Aerial photography and airborne scanners represent other sources of high-resolution imagery. Effective utilization of these many imagery sources of varied resolution for the purpose of performing agricultural assessments is the objective of the GeoTools project. GeoTools is an effort to develop the methodology and tools which will decrease the cost and increase the accuracy involved in using multiple resolutions of imagery together to perform agricultural assessments. The major thrust of the GeoTools project is in the following areas:  Develop and refine the operational and technical procedures  Document these procedures in the form of Standard Operating Procedures (SOPs)  Develop software tools that facilitate performing these techniques. Several GeoTools approaches have been developed which achieve the desired effect of reducing the cost of imagery exploitation while preserving or increasing the accuracy of the results. The savings are accomplished by utilizing statistical sampling techniques. Two techniques of note are called Nested Area Frame Sampling and Area Frame Sampling. Both techniques utilize image stratification as the basis for a sampling approach. Nested Area Frame Sampling (NAFS) is a multistage stratified area estimate involving three separate resolutions of imagery, where Area Frame Sampling (AFS), as described here, involves only a single stage of sampling using two resolutions of imagery. This paper will describe the technical approach to NAFS, the more complex of the two procedures, and will allude only briefly to the AFS approach. NESTED AREA FRAME SAMPLING The NAFS approach described here relies almost entirely on imagery analysis to arrive at an accurate estimate of how much of a specified crop or crops is growing in the region of interest. For this reason, it can be applied where extensive fieldwork is impossible or is too expensive. The procedure is generally applied to one crop or a collection of crops (e.g. row crops, orchards, or subsistence crops) at a time. If results for multiple crops are required, a separate analysis is required. The same data may potentially be used to support these analyses. The approach is designed for use in large regions, such as a country or perhaps a continent. Nested Area Frame Sampling is a sampling-based methodology that uses a multistage stratified area estimation procedure. Sampling is used as an alternative to a census or total enumeration approach. A census done entirely with high-resolution imagery (HRI) data would be very accurate, but prohibitively expensive, while a cultivated land inventory calculated from AVHRR and TM data alone would not achieve the required accuracy. Consequently, the double sampling or two-stage strategy using multiple resolutions of imagery is employed in the NAFS procedure to achieve higher accuracy with a lower cost. Three distinct resolutions of imagery are used in the NAFS approach. The entire study area is stratified using coarse resolution imagery (typically AVHRR). Samples collected with moderate resolution imagery, such as TM data, are collected to characterize the strata. Finer resolution samples (referred to as segments) are sampled within the footprint of the primary samples. For ease of reference in the following discussion, the three levels of imagery required are referred to by the name of the sensor whose data we have typically used in each context. This substitution is employed to make a complicated situation more comprehensible. Figure 1 shows which image type is used to represent each level of imagery. The reader should understand, when encountering the term AVHRR, that the term ‘Stratification’ can be substituted, as can an equivalent (in spatial resolution) sensor name, such as OrbView II, SPOT Vegetation or IRS Wifs. The same discussion applies equally to references to the term TM (Primary Samples) and HRI (Secondary Samples). AN OVERVIEW OF THE NAFS APPROACH NAFS is a two-stage sampling procedure. The major steps in the process are as follows:  Definition of the spectral strata  Selection of primary sampling units  Estimation of cultivated area for primary samples  Selection of secondary sampling units  Correction of primary sample results with secondary sample data  Calculation of total cultivated area for study area and sub-regions of the study area  Calculation of confidence intervals for the estimate.  Validation of the cultivated area estimate I. Stratification Coarse (AVHRR) II. Primary Samples Moderate (TM) III. Secondary Samples Hi-Resolution (HRI) Figure 1. Illustration of the NAFS Concept AVHRR or other coarse resolution data are used to stratify the study area. The stratification is used to improve the efficiency of sampling by targeting sample selection to areas where they will do the most good. Prior knowledge of the crop’s occurrence in the study area is used in the targeting process. After primary sampling areas are identified, the TM or other moderate resolution imagery are acquired, georectified, and analyzed. These data are interpreted for the crop of interest. From these data the percent of cultivated land in each stratum is calculated. Multiplying the area of each stratum times the percent of cultivated land in each stratum and summing these values for the entire study area yields a preliminary estimate of the total cultivated area of the crop. A similar sample targeting approach is used to locate the secondary samples, which are collected within the footprints of the primary samples. The secondary sample data are used to correct the cultivated area statistics obtained directly from the TM data. The secondary samples are analyzed and the results used to adjust estimates derived from the primary samples. The estimate is improved by correcting the percent of cultivated area obtained for each stratum from the TM data through the use of regression analysis comparing the HRI analysis areas directly with the TM analysis. The regression approach is used to create correction factors that can be used to adjust the percent of cultivated area obtained from the TM data for each stratum. These adjusted estimates are used to re-determine the percentcultivated land for each stratum. The final estimate of cultivated land for the entire study area and the selected administrative areas is obtained by using the adjusted percent of cultivated land by stratum in a direct expansion for the entire study area and selected administrative areas. Any bias introduced in the categorization of cultivated area from the TM data will be corrected with the secondary sample data processed from HRI. The HRI provides a substitute to in situ data for hard-to-sample areas and a cost-effective means for collecting data on cultivated areas even for regions for which access is not difficult. The validation process tries to identify potential biases in the cultivated area estimate and places confidence intervals on the estimate. STRATIFICATION OF THE STUDY AREA The first step in the NAFS procedure is stratification of the study area. In statistical analyses, the objective of stratification is to segment the population into units that are similar in the characteristic being measured for the population. The variance of the measure of the population will be less within each stratum than between strata. In the NAFS case, the percent of the stratum representing the crop of interest is the quantity for which spatial consistency is desired. In a good stratification of the study area, each stratum is homogeneous with regard to the percent of the crop in the stratum. Ideally, sampling for percent cultivated area in various locations within the stratum should result in similar percentages Stratification serves two purposes. First, it improves sampling efficiency by optimizing the number and location of samples to be taken. Second, it provides the basis for calculating the estimate for the entire study area. The basic equation for creating the estimate is as follows: n ec   s 1 pa s s where: ec is the crop estimate, s is the stratum number, n is the number of strata, ps is the percent of the crop of interest represented by the stratum, and as is the area of the stratum. Table 1 illustrates a simple stratification with four strata, and shows how the stratum areas are used as weights to calculate an estimate for the entire study area. s as ps ec 1 2 3 4 Total 1,000 km sq 2,000 km sq 3,000 km sq 4,000 km sq 10,000 km sq 60% 50% 30% 10% 600 km sq 1,000 km sq 900 km sq 400 km sq 2,900 km sq Table 1. Example of Use of Strata to Calculate Total Cultivated Area AVHRR data are used to create the stratification. AVHRR 10-day composites for selected time periods are available for download from the US Geological Survey’s EROS DATA Center (EDC) at their worldwide web site (http://edcwww.cr.usgs.gov/landdaac/1KM/comp10d.html). The two bands of the time series of AVHRR data are downloaded and processed to compute a greenness index, reducing each 10-day composite to a single band. This greenness index is reprojected to the required projection of the study. Multiple greenness bands covering the entire growing season in the area, perhaps for several years, are used in the analysis. These are combined into a single data file and processed using a standard image classification procedure to obtain up to 200 spectral classes, which are used as strata. ERDAS IMAGINE’s ISODATA and Maximum Likelihood classifier routines are used to create the stratification for the study area. Using many temporal composites in this way creates a temporal classification that effectively stratifies the study area into regions with similar greenness response over time. The number of strata used depends on the characteristics of the crop under study and the budget for collecting and analyzing primary and secondary samples. Our assumption is that a larger number of strata will increase the likeliness that each stratum will be stationary (homogeneous with regard to percent cultivated area). It is clear that an increase in the number of strata must be accompanied by an increase in the number of primary and secondary samples required to characterize them. Any stratum that is not sampled or is poorly sampled may need to be grouped with another stratum with a similar greenness response. The degree of consistency that strata possess with respect to the characteristic being measured is termed stationarity, and strata possessing this consistency are called stationary. Lack of stationarity is the major issue relating to the stratification. If prior information on the planted extent of the crop of interest is available, the stationarity of each stratum can be tested. This check can be done visually by displaying the prior data for each stratum masked (e.g. by changing opacity) by the stratum extent. This may also be approached mathematically. Strata found to be non-stationary may be divided into two strata, or possibly merged with other strata. An alternative is to ensure that non-stationary strata are sampled at a higher rate than other strata, so that the variance due to non-stationarity is reduced. Some strata will be readily recognized as non-cultivated regions (not containing the crop of interest). Previous experience indicates that one of the major sources of error in the NAFS approach is the contribution from large strata which should have no contribution, but which have been contaminated by coregistration issues, which can be significant when dealing with widely diverse resolutions. These strata are identified and removed from the analysis so that they do not cause an unwarranted increase in the estimate. SELECTION OF THE PRIMARY SAMPLE AREAS Both primary and secondary samples are required for the NAFS approach. The primary sampling unit is the TM scene. The primary samples will be used to estimate the percent of cultivated area of the crop of interest for each stratum. Each TM scene will provide samples for many strata. The number of TM scenes (primary samples) to be processed is a tradeoff between processing cost and minimizing sampling variance. The more TM scenes selected, the smaller the sampling variance, but the greater the cost. The allocation of TM samples includes determining the number of samples to be acquired and the location of the optimum set of TM samples. By the nature of TM data, the TM data is a cluster sample. The truly independent sample is the farmer’s field, but to have primary samples the size of a farmer’s and to randomly sample farmer’s fields throughout the study area would be cost prohibitive. Consequently, the TM data represents a cluster sample and the allocation (both in total number and distribution) of the cluster samples (TM data) is critical to the success of the NAFS approach. Ideally, the average size of the farmer’s field would be known. Without this knowledge, an estimate can be made (i.e. 20 ha). Because the total number of independent samples (farmer’s fields) per stratum is ultimately so large, the variance due to sample size is negligible and the importance of knowing the average field size is reduced. A sequential allocation strategy is utilized to ensure that all strata are adequately sampled. This method employs a variance equation that also permits the utilization of prior knowledge to optimally select samples. Each potential sampling area is identified and characterized by how much area of each stratum it samples. In practice this step is generally performed by overlaying on the study area a grid with cells somewhat smaller than the size of the primary samples. The grid cells are then characterized by how much of each stratum they sample. At each step of this sampling process, for each potential sample area (cell), the resulting variance of the current sample set, if this sample is selected, is computed. From the set of potential samples, that sample is selected which produces the lowest variance. The equation that computes this variance is shown below. The sampling value of the sample allocation needs to be determined. The sampling value can be measured as the overall variance of the percent cultivated land in each stratum as computed from intersecting (IMAGINE’s SUMMARY command) the strata with prior data on cultivated lands. This variance, Var (Pag), is defined below.   A Var ( Pag )    s s 1   As  s n 2   Ps (1  Ps )  Ns   where: s is the stratum number, n is the number of strata created from the AVHRR data, As is the area of each stratum, Ps is the expected proportion of cultivated land in each stratum computed using prior knowledge of the study area (this parameter is initialized to 0.5 if no prior dataset is available), and Ns is the number of fields allocated in stratum s. To illustrate the allocation of primary samples (TM scenes), assume that 100 TM scenes cover the entire study area. Var (Pag) is calculated for each of the 100 TM scenes. The TM scene with the smallest Var (Pag) is the first TM scene allocated. The second scene allocated is the scene which, when paired with the first scene, produces the smallest Var (Pag). The third scene allocated is the scene which, when combined with the first two scenes allocated, produces the smallest Var (Pag). This process is repeated until all the required scenes are allocated. A plot of the Var (Pag) for each combination of scenes (1 through 100) will aid in deciding the break point for the total number of scenes (primary samples) to obtain. Figure 2 illustrates the reduction in Var (Pag) as the number of scenes increases and a potential break point. Figure 2. Reduction of Variance as Number of Primary Samples Increases. ESTIMATION OF CULTIVATED AREA FOR THE PRIMARY SAMPLES The objective of the exploitation of the TM scenes is to identify which areas represent the crop of interest. This job is a classic land use characterization procedure, and could be done with any of a range of procedures. We have chosen to do this with an unsupervised classification approach, which employs three general processes: clustering, labeling, and raster editing. To cluster, an unsupervised classification routine (IMAGINE’s ISODATA) is used to cluster the multispectral data into a predetermined number of spectral classes. A higher number of spectral classes is used for complex images with significant potential for spectral class confusion (when a single spectral class represents more than one informational class) or when the number of required informational classes is high. ISODATA builds a signature file that defines the class centroids and IMAGINE’s maximum likelihood classification routine is used to determine the spectral class for each pixel in an image. An image analyst will label each of the spectral classes as to the informational class (landcover class) that it best represents. IMAGINE’s Raster Attribute Editor is used for this process, called grouping or labeling. The analyst assigns like thematic colors to the spectral classes defining a single informational class (e.g. all spectral classes representing water might be colored blue) and assigns each spectral class to the landcover class that it best fits. IMAGINE’s Raster Attribute Editor and raster editing capabilities provide good tools for labeling the spectral classes and recoding the spectral classes to the final informational classes. A review is conducted, and any labeling errors detected are corrected through raster editing. The most common source of error comes from the need to assign spectral classes that contain confusion (represent more than one land cover class) to a single target class. These confused classes are identified, and most of the raster editing is focused upon them. In general, a higher number of spectral classes in the classification will reduce the amount of spectral class confusion and improve the accuracy of the classification, and thereby reduce the amount of raster editing required. However, the higher the number of spectral classes, the more difficult the process of labeling the spectral classes to informational classes becomes. Using 240 classes in the classification seems to be a good compromise since it minimizes file size (each value still fits in one byte, leaving some space for target classes), spectral confusion (by increasing the number of spectral classes), and complexity of labeling. Tools are needed which will more efficiently assign spectral classes to informational classes. The basic steps to determine cultivated area for the TM scenes (primary samples) are listed below, without further explanation: 1. Geocode the data to the best available map sources, 2. Create 1:250,000-scale image prints for each TM scene, 3. Using the HRI for the segment data and other sources of ground truth, delineate on the image prints examples of cultivated and non-cultivated areas to help aid the analysts in assigning spectral classes to the required informational classes, 4. Create spectral signature file (240 spectral classes) using ISODATA, 5. Create categorized image using signature file and maximum likelihood classifier, 6. Group (label) the 240 spectral classes to desired informational classes, 7. Raster edit any observable spectral confusion, 8. Produce overlay of derived land cover at the same scale as the image print, 9. Review overlay of derived land cover and the image print, 10. Note any areas of potential errors, 11. Review derived landcover map by quality control team, and 12. Repeat steps 6 through 11 until the quality control team has approved the landcover map. ALLOCATION OF THE SECONDARY SAMPLES HRI will be used as the secondary samples. Every primary sample (TM scene) should have several secondary samples collected within its footprint. Within the footprint of the secondary samples, segments measuring 3 km by 3 km are selected for interpretation. The purpose of the secondary samples is to correct the estimate obtained from the TM data with higher resolution data. From previous studies, it appears that sampling 20 segments per TM image is adequate for this process. Several approaches have been utilized for locating the secondary samples within the footprint of the TM scenes. One approach is simple random selection of 3 km by 3 km blocks. If the crop of interest represents only a small area of the scene, a random selection of 20 segments on a TM scene could result in sampling more segments without cultivated areas than those with cultivated areas. This can be avoided by employing a stratified random sample. A grid of 3 km by 3 km segments is generated to cover the entire TM scene. For each potential segment, the land cover categorization from the TM data is used to determine percentage of no data (clouds, shadows, and buffer) for each segment. In addition, by intersecting the segments with the categorized TM data, each segment is assigned the percent-cultivated area contained in the segment. Only segments that have little missing data are then processed. Twenty five percent of the segments should be randomly sampled from those segments that are 25 percent or less cultivated. Twenty five percent of the segments should be randomly sampled from those segments that are 26 to 50 percent cultivated, and fifty percent of the segments sample should be sampled from the segments that are more than 50 percent cultivated. ESTIMATION OF CULTIVATED AREA FOR THE SECONDARY SAMPLES The high-resolution data for the secondary samples, once obtained, must be interpreted for the crop of interest. The cultivated area for each of the selected segments is obtained by total enumeration of the cultivated area in each segment. The outline of the segment location is transferred to the HRI. The analyst then creates a vector coverage for each segment which delineates the area of the crop of interest on the segment. This technique is very accurate, but time consuming. An alternative to the total enumeration method for determining the percent-cultivated area on each segment is the dot grid analysis approach. This method is especially appropriate if the high-resolution data is available only in hardcopy. A grid of dots with 15 columns by 15 rows is laid over each 3 km by 3 km segment. At the center point of each grid, the analyst determines if the center point is the crop of interest, not the crop, or missing data (e.g. cloud, cloud shadow, or data drop). Percent cultivation for each segment is then calculated from the ratio of these counts. The dot grid sampling approach may be nearly as accurate as the total enumeration technique and may be much less expensive to calculate. Either technique is improved in accuracy and reduced in cost if softcopy is available for the high-resolution imagery. ESTIMATION OF TOTAL CULTIVATED AREA Estimation of the total cultivated area is done in four steps. A correction factor is calculated for each TM scene that corrects percent-cultivated area. The correction is applied within scene and the corrected percent-cultivated area for each stratum is determined. A new overall estimate of the percent crop for each stratum is computed by taking an area-weighted average of the contributions from each scene to that stratum. The total cultivated area for each stratum is then determined by multiplying the adjusted percent-cultivated area for the stratum by the area of the stratum and summing across all strata in the study area. The interpreted results for the HRI segments are pair wise matched with the TM results for the exact same areas to create a within-scene regression equation to correct the percent cultivated area. The TM results become the independent variable, and the HRI results the dependent variable in this process. In this process, the results derived from the TM analysis become a predictor, when used with the regression equation developed from this pair-wise analysis, to predict what the results from the more accurate HRI interpretation would be. A sample of the input to this regression process for one scene is shown in Table 2. Segment ID 1 2 3 4 5 6 7 8 9 10 11 12 Percent Cultivated Area From TM Data 28.1 22.6 80.9 93.6 88.3 37.7 64.2 33.7 51.2 82.4 61.7 43.5 Percent Cultivated Area From HRI 23.3 28.4 76.9 84.8 74.3 46.7 53.3 35 48.2 88.7 64.3 38.6 Table 2. Example of Segment Data Used to Correct Percent Cultivated Area as Derived From TM Imagery. The regression analysis for the above example yields the equation: eHRI = 5.815 + 0.8616 * eTM where: eHRI is the percent crop corrected with the HRI results, and eTM is the percent crop derived from the TM analysis. The input points and the graph of the equation are shown in Figure 3. Figure 3. Graph of Regression Equation for Correcting the Percent Crop Area for Primary Sampling (TM Data) Units Based upon the Secondary Samples (HRI). The regression correction is applied within the same scene it was developed in. The TM-derived percent of each stratum representing the crop of interest is plugged into the regression equation derived from its own scene to arrive at a more precise estimate of the percent crop in the stratum, corrected by the high-resolution analysis. The next step is to regroup these corrected values to create a table showing percentage of cultivated land for each stratum. From this table an area-weighted average of the overall percent of stratum 1 represented by the crop of interest is calculated. Table 3 gives an example of the layout of this table and illustrates the computation for a single stratum. Stratum ID Scene ID Total Area of Stratum (ha) on Scene Percent of Crop in Stratum from TM 1 1 1 1 1 1 2 3 N Total 250,000 150,000 50,000 25,000 475,000 55 60 58 53 Percent of Crop in Stratum Corrected with HRI 50.6 55.2 53.4 48.8 52.3 Cultivated Land in Stratum from TM Data (ha) 126,500 82,800 26,700 12,200 248,200 Table 3. Example of Percentage of Cultivated Land by Stratum. In the numeric example shown, the weighted average is 52.3 percent. The same procedure must be applied to all of the strata. The entire purpose of the HRI analysis was to arrive at these refined values for the percent crop represented by each stratum. The final step in estimating total cultivated area is to multiply the percent cultivated area in each stratum by the total area of the stratum and to sum these areas for all the strata for the final estimate. This expansion is illustrated in Table 4. Stratum Stratum Area (sq km) 1 2 3 4 . N Total 4,750 5.280 3,210 6,100 3,200 22,540 Percent Cultivated (as adjusted) 52.3 20.8 12.6 3.5 Total Cultivated Area (sq km) 2,484.25 1,098.24 404.46 213.5 80.2 2,566.4 6,766.85 Table 4. Example of Table Used to Calculate Total Cultivated Area. VALIDATION OF TOTAL CULTIVATED AREA Examining corroborative data from other sources and calculation of sampling error and associated confidence interval are means of evaluating cultivated area estimations. Although corroborative data for estimations of cultivated area are often outdated or unreliable, these data can be used to gain confidence that no major or systematic error was applied in the process. In assessing cultivated area, three sources of variance must be considered: variance due to sample size, variance due to stratum stationarity, and variance due to labeling error (classification errors). Care must be taken to avoid any potential bias in the sample selection process, since this will not be detected in the variance estimation procedures. As has been earlier stated, the sample size for these studies is so large when treating the farmer’s field as an independent sample, that the variance due to sample size can be considered to be negligible. One issue relating to sample size is the question of whether each stratum is adequately sampled by the primary sample set. A graph of sampling density by stratum can be produced to determine this. An example of such graph is shown in Figure 4. Any stratum that is not well sampled by primary sampling units (TM data) should be combined with the most similar stratum (based on greenness response). As a general rule, all strata should have a two-percent or larger sampling by the primary sampling units. This analysis can be done as soon as the strata and primary sampling units have been defined. Figure 4. Sample Density by Stratum. Stationarity analysis is more complex. After the TM analysis is complete, the stationarity can be assessed by laying a grid over the entire study area and determining the density of each grid cell with respect to the crop of interest. The mean and variance of the percent-cultivated land for each stratum in each grid cell can be calculated. One can also re-compute the overall estimator by leaving out a primary sample, computing the overall estimator with the missing sample and repeating these processes N-1 times, where N represents the number of primary samples (TM scenes) processed. Understanding how much variance is due to classification error must be approached by a more round-about method. By measuring the overall variance of the process (see below), and subtracting the variance due to stationarity (assuming the variance due to sample size is zero), the variance due to labeling can be estimated. The difference should be an estimate of the variance due to classification inconsistency between the HRI and TM analysis (primary and secondary sampling units). We have chosen a non-parametric approach to calculating a confidence interval for the estimate. In part this is done because the NAFS approach frequently incorporates estimates that are based upon a series of independent regression equations and other estimators, and consequently, an empirical formula for estimating the confidence interval of the overall estimator cannot be calculated in a straightforward approach. A resampling approach is utilized because it makes fewer assumptions about the character of the data being measured, and is computationally simple, although a very large number of computations are required. The process starts with the set of N samples, typically the 3 km by 3 km blocks which were interpreted with HRI data and which formed the basis for the regression analyses. The method involves selecting from this set of N samples, a very large number (usually 10,000) of subsamples of N samples (sampled with replacement, so that any one sample might be repeated multiple times in a subsample). Each of these subsamples receives the exact same processing sequence that was used to perform the final estimate. By this means, 10,000 new final estimates have been produced, each somewhat different from the others because of the sampling with replacement. By ordering these estimates from smallest to largest, one gets a measure of confidence in the result. The 5,000th value in the sequence should approximate the final estimate in magnitude. Selecting the 500th and 9500th values provides a lower and upper confidence interval at the 90% level. Similarly, selecting the 250th and 9750th values in the sequence provides a lower and upper confidence interval at the 95% level. This approach is illustrated in Figure 5. In this illustration the final estimate is 420,000 ha, and the 90% confidence bounds are 360,000 ha and 480,000 ha. Figure 5. Illustration of Non-Parametric Confidence Interval Calculation. Currently, the NAFS approach does not have a way to ascertain the optimum number of primary samples (TM scenes) required to achieve a desired accuracy. Selecting too few samples per primary will result in a confidence interval wider then desired. Selecting too many samples will exceed the user-expected confidence interval, but at higher cost. We envision a process in which a few samples are obtained and the variance for these samples ascertained. Based upon the variance of these small numbers of samples, the required number of samples needed to obtain the desired confidence interval will be ascertained. The NAFS process has proven very effective for estimating the area of crops cultivated on a regional scale. The process is unquestionably complex and difficult to perform. The procedures continue to be refined to identify the pitfalls and the places where improvements can be made. An effort is underway to develop software tools that will reduce the complexity of parts of the process, and thereby minimize the potential for errors in performing the procedures. These software tools are being developed within ERDAS Imagine through a cooperative effort between ERDAS and Earth Satellite Corporation. Standard Operating Procedures are being produced, to make the approach more widely useable, and to facilitate understanding of the process.

doc

Related documents

Products

Support

doc

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib