ACCURACY ASSESSMENT OF A VEGETATION MAP OF NORTHEASTERN CALIFORNIA USING PERMANENT PLOTS AND FUZZY SETS 1998 AUTHORS Jeff Milliken Remote Sensing/GIS Specialist, USDI Bureau of Reclamation Sacramento, California Debby Beardsley Research Forester, PNW Research, US Forest Service Portland, Oregon Samantha Gill Assistant Professor, California Polytechnic University San Luis Obispo, California ABSTRACT The accuracy of a northeastern California vegetation map was assessed using the data from a grid of permanent Forest Inventory and Analysis (FIA) plots collected by Region 5 and the Pacific Northwest Research Station (PNW). The map was assessed in three parts: the Modoc National Forest, the Lassen National Forest and the lands outside National Forest boundaries. Accuracy was assessed hierarchically resulting in separate assessments for vegetation growth form (lifeform), and species association (CALVEG) within lifeform. A fuzzy logic approach was employed. Fuzzy sets allowed for the recognition that plots did not always fit unambiguously into a single map class. For each plot, all possible map classes were given a rating between absolutely wrong (1) and absolutely right (5). The accuracy of the map for lifeform was high. On average 82% of the sites had the best possible label, and 89% of the sites had labels that would be considered ‘right’. For many of the map classes, the accuracy was greater than 75%. The grid plot design undersampled some of the mapped classes but was a cost effective way to generate an accuracy assessment based on a probability sample. www.fs.fed.us/r5/rsl/publications/ 1 INTRODUCTION The Lassen-Modoc project was a USDA Forest Service Region 5 and California Department of Forestry and Fire Protection cooperative vegetation mapping program covering 9 million acres of the northeastern portion of California (fig. 1). Vegetation maps were produced using remote-sensed processing and GIS modeling techniques (Miller, 1994). Image data for this project was 1991 Landsat Thematic Mapper. The classification was completed in the fall of 1995. For each polygon (minimum mapping unit of 1 hectare), a lifeform type and CALVEG (Classification and Assessment with Landsat of Visible Ecological Groupings) (USDA, 1981) type were mapped. In addition, size and density were mapped for forest CALVEG types. The map was assessed in three parts: the Modoc National Forest, the Lassen National Forest and the lands outside National Forest boundaries. The purpose of this paper is to report the accuracy assessment results for lifeform and CALVEG classes. Readers may contact Ralph Warbington at the Region 5, Remote Sensing Laboratory in Sacramento, California, for the detailed accuracy assessment of all the mapped classes. Figure 1. Lassen-Modoc Project Area. The most common approach of collecting ground truth information for the purpose of assessing the accuracy of a map is to visit a site in the field corresponding to a polygon on the map and to classify it based on the classification scheme used in the mapping project. This method is often very www.fs.fed.us/r5/rsl/publications/ 2 time consuming and expensive due to the number of samples needed to perform a valid accuracy assessment and the field costs associated with visiting each site. Therefore, many maps have no accuracy assessment information. The approach in this study was to use permanent USFS Forest Inventory and Analysis (FIA) (USDA, 1992; USDA, 1995) field plots as ground truth sites. These plots were already established. Thus, the costs associated with collecting accuracy assessment data were significantly reduced. A modified fuzzy logic accuracy assessment approach based on Gopal and Woodcock (1994) was used in this project. The concept of a fuzzy set was introduced by Zadeh (1965, 1973) to describe imprecision that is characteristic of much of human reasoning. With fuzzy sets, there are different grades of membership within a class. In the case of a vegetation map, one label may be absolutely correct, but other labels may be considered good or acceptable. For example, for a given site (in this case an inventory plot within a map polygon) a map label of red fir may be considered absolutely correct, but a map label of subalpine conifer might still be considered acceptable. Using the traditional error matrix, only one possible answer (considered to be the best answer by an 'expert' in the field) is compared to the map label. Fuzzy set theory allows the user and producer to look at ranges of acceptable answers. METHODS Accuracy site data collection The accuracy site data for the accuracy assessment were permanent USFS FIA field plots installed on a 3.4 mile grid across California. These plots had been measured independently from the vegetation mapping project in order to provide current estimates of forest land area, timber volume, net annual growth and mortality and harvest in California. Plot installation on the Modoc and Lassen National Forests was administered by the USFS Region 5 inventory staff between 1993 and 1994. Plots on lands outside National Forest boundaries were measured by the Pacific Northwest Research station of the USFS (PNW) in 1992. The inventory grid provided 312 accuracy sites on the Modoc National Forest, 291 sites on the Lassen National Forest and 701 sites on lands outside National Forests. The National Forest plots were a cluster of 5 points spanning 2.5 acres. At each point species, diameter and height were collected on live and dead trees. www.fs.fed.us/r5/rsl/publications/ 3 Percent cover of all understory species was also recorded. In addition, at each point, the inventory crew assigned a best and second best lifeform class and CALVEG type. On several points, the inventory crew only assigned a best lifeform or CALVEG type because, in their opinion, there was only one correct answer (USDA, 1995). The crew had no knowledge of the map labels when making these evaluations. The plots installed by PNW were a cluster of 5 points over 6 acres (USDA, 1992). For 701 of the sites, crews classified each point of each plot as conifer, hardwood, rangeland (a combination of shrub and/or herb), or non-vegetated. The lifeform classification of 70% of these sites was done with photo interpretation. For 30% of the sites the classification was made on the ground. All 701 sites were used for the lifeform assessment of lands outside National Forests. For the 215 sites visited on the ground, crews collected species, diameter and height on live and dead trees and percent cover on shrub, grass and herb species. The sites visited on the ground were used for the CALVEG assessment of lands outside National Forests. CALVEG was not a fieldcollected item for the PNW crews. Therefore, a CALVEG type was assigned to each point by using summaries of the field-collected data (percent cover by species by vegetation layer), the field plot descriptions, and a CALVEG key. Assigning fuzzy ratings for each possible map label The accuracy assessment was based on comparing the map label of each sample site with evaluations based on ground data. For each site, a rating was given for all possible lifeform and CALVEG labels of the map without knowledge of the actual map label at the site. The rating scheme used in this study was: 5: absolutely right. If this were the map label it would be a perfect match. 4: good. Would be happy to find this label on the map. 3: acceptable. Maybe not the best possible map label but it is acceptable. 2: understandable but wrong. Not an acceptable map label. There is something about the site that makes the label understandable but there is clearly a better one. 1: absolutely wrong. The label is absolutely unacceptable. Ratings for each possible label were derived from the inventory crew’s evaluations as well as knowledge of vegetation gradients within lifeform and CALVEG classes. The following procedure was used to assign a ‘fuzzy’ rating www.fs.fed.us/r5/rsl/publications/ 4 to each possible lifeform and CALVEG class for each site. First, a score of two was given to a class the field crew considered the best label for the point and a score of one was given to a class the crew considered second best. These scores were summed over all the points in the cluster plot and divided by the maximum possible score (2 * the number of points in the cluster) to obtain a normalized score. Fuzzy ratings were then assigned to the normalized scores as follows: normalized score >0.9 0.6-0.9 0.4-0.6 0.2-0.4 <0.2 fuzzy rating 5 (absolutely right) 4 3 2 1 (absolutely wrong) Using this approach, the class which was assigned the best rating at each point by the field crew was assured to be given a fuzzy rating of 5 (absolutely right) and one that was not assigned to any point in a plot was given a fuzzy rating of 1 (absolutely wrong). Secondly, because the crews only indicated best and second best lifeform and CALVEG classes, the ratings of some of the classes were increased based on expert knowledge of which possible map labels would be acceptable given the map label that received the highest score. RESULTS AND DISCUSSION Lifeform Accuracy Lifeform was the first level of the accuracy assessment to be evaluated (table 1). The fuzzy logic approach provided two measures of accuracy: the MAX operator and the RIGHT operator. The MAX operator was the more conservative measure of accuracy. This operator measured how frequently the map label was the best choice for the site. The RIGHT operator accepted matches using any degree of right which in this assessment was any score less than or equal to 3. In other words, the RIGHT operator measured how frequently the map label was an acceptable choice for the site. Using the MAX operator, the overall lifeform accuracy of the map was between 77% and 88%. Using the RIGHT operator, the overall lifeform accuracy of the map increased to between 84% and 96%. The accuracy of the map for lifeform was also weighted by the area of each class in the map. Of the classes with an adequate www.fs.fed.us/r5/rsl/publications/ 5 sample, the least accurate was the shrub class. The matrices below (tables 2-4) show between which classes confusion occurred. For some classes, there are more errors than sites because, at some sites, more than one class had a higher rating than the map label. These matrices identified the number of times classes received a rating greater than the map label. Columns show errors of omission and rows show errors of commission. An error of omission means an area of a ‘known’ class has been omitted from the map. An error of commission means a particular mapped class includes areas that are better labeled as other classes. In the shrub class on the Modoc National Forest, there were many more errors of commission than omission meaning the shrub class was probably overmapped. Because most of these errors of commission occurred with the conifer class, the conifer class was probably undermapped and misidentified as shrub. The classification system required that 10% conifer cover would be mapped as conifer. However, there were typically areas of sparse conifer cover that had extensive shrub understories and spectrally ‘looked’ more like a shrub lifeform. The increase in accuracy in the shrub class using the RIGHT operator, indicated that the confusion between the conifer and shrub classes was due to these sparse conifer stands, as “fuzzy” ratings accounted for sparse conifer stands. www.fs.fed.us/r5/rsl/publications/ 6 The majority of the lifeform confusion on the Lassen National Forest portion of the map was between the conifer and the shrub classes. The matrix below (table 3) would suggest that the conifer class was somewhat overmapped rather than undermapped as on the Modoc National Forest part of the map. However, because most of the errors of commission were with the shrub class and most of the errors of omission were also with the shrub class, it was difficult to predict a trend in the error between conifer and shrub. On this portion of the map, the shrub class was the least accurate (table 1) and the error was with all other lifeforms (table 3). Confusion between shrub, herbaceous, and nonvegetated classes was probably due to spectral similarity between desert-type shrub communities and dry grass or barren ground. www.fs.fed.us/r5/rsl/publications/ 7 The overall lifeform accuracy of the map was somewhat less for areas outside National Forests. The combined shrub/herb class was the least accurate which was probably due to the fact that this class was an aggregated class consisting of chaparral, herbaceous and shrub types. This class was overmapped and the conifer and hardwood types were somewhat undermapped. A review of the site data for this portion of the map suggested that the confusion between the conifer and shrub/herb classes occurred within western juniper stands and the confusion between the hardwood and shrub/herb classes primarily occurred within blue oak stands. This confusion is understandable considering that both western juniper and blue oak communities often have widely spaced trees with shrub/herb understories. CALVEG accuracy There was an adequate sample to assess the accuracy of the conifer CALVEG classes on all three portions of the map, and the shrub CALVEG classes on the Modoc National Forest. Conifer CALVEG accuracy Overall accuracy of the conifer CALVEG map labels was greater than 75% using the RIGHT operator. On the Modoc National Forest portion of the map, three classes that comprised 76% of the conifer area (white fir, western juniper, and eastside pine) were highly accurate. The most troublesome class on the Modoc National Forest was the mixed-conifer fir class where only 3% of the time was the map label the best choice for the site. There was also low accuracy in the red fir class. In the mixed-conifer fir class most of the confusion was with eastside pine and white fir (table 6). Mixed-conifer fir accuracy increased significantly when using the RIGHT operator. In the Warner Mountains, mixed- www.fs.fed.us/r5/rsl/publications/ 8 conifer fir is a ‘transitional’ type in elevation between eastside pine and white fir. Thus, mixed-conifer fir would be considered an acceptable class for some eastside pine and white fir sites although not the best answer. The confusion matrix (table 6) showed that most of the error in the mixed-conifer fir class were errors of commission indicating that too much mixed-conifer fir was mapped. The majority of confusion in the red fir class was with lodgepole pine and whitebark pine. The error is understandable given that red fir is a major associate of lodgepole pine in the Medicine Lakes area and a major associate of whitebark pine in alpine areas. On the Lassen National Forest portion of the map, accuracy was low for the mixed-conifer types when using the MAX operator. However, accuracy increased dramatically using the RIGHT operator. This increase was seen on the Modoc National Forest portion of the map, as well, and is probably indicative of mixed classes in general. The greatest amount of confusion in the mixed-conifer fir class was with the white fir class (table 7). Users of the map can expect to find areas of the map labeled as mixed-conifer fir that are actually better labeled as white fir. Similarly, the mixed-conifer pine appeared to be www.fs.fed.us/r5/rsl/publications/ 9 overmapped as there were more errors of commission than omission in this class. Users would expect to find areas of the map labeled mixed-conifer pine which would be better labeled eastside pine or ponderosa pine. Using the MAX operator, the accuracy of conifer CALVEG map labels for areas outside National Forests was low (35%) but increased to 78% using the www.fs.fed.us/r5/rsl/publications/ 10 RIGHT operator (table 1). As only 4 of the twelve mapped CALVEG classes were adequately sampled for accuracy assessment, the overall conifer CALVEG accuracy figures for this portion of the map could be misleading. The mixedconifer pine class may be overmapped and on a number of sites better labeled as ponderosa pine (table 8). The ponderosa pine class showed confusion with a number of the CALVEG classes (table 8). Shrub CALVEG accuracy The overall accuracy for the shrub classes on the Modoc National Forest using the MAX operator was 45%, but increased to 91% using the RIGHT operator (table 9). Most of the confusion was between basin sagebrush, low sagebrush and bitterbrush (table 10), the three classes with a sufficient sample for accuracy assessment. The magnitude of error between these classes was not www.fs.fed.us/r5/rsl/publications/ 11 large which is why the accuracy improved to 91% using the RIGHT operator. This is not surprising considering that the CALVEG description for basin sagebrush lists low sagebrush as a likely associate, and bitterbrush is associated with both the basin sagebrush and the low sagebrush classes. As seen in table 10, users of these maps may expect basin sagebrush to be ‘overmapped’ in areas of low sagebrush. In addition, bitterbrush and low sagebrush are likely to be overmapped in areas of basin sagebrush. www.fs.fed.us/r5/rsl/publications/ 12 CONCLUSIONS Overall accuracy at the lifeform level was high for all areas assessed. Using the MAX operator, lifeform accuracy ranged from 77% for areas outside National Forests to over 85% for both the Modoc and Lassen National Forests. When the RIGHT operator was used, the accuracy increased to 84% for areas outside National Forests and to greater than 90% for areas within the two National Forests. Accuracy for the CALVEG types that were adequately sampled were not as high as lifeform, but were generally greater than 75% using the RIGHT operator. Because lifeform accuracy was high, any aggregation of CALVEG types to more general categories (e.g., Wildlife Habitat Relationship types or Society of American Forester types) would typically result in greater accuracy. A number of classes in the map were undersampled and results for these classes should be used with caution. However, as a function of the FIA grid inventory design, classes that cover most of the map were adequately sampled. Additional accuracy assessment sites are recommended for undersampled classes. Using inventory data to assess the accuracy of a vegetation map is a unique and promising approach. Because assessing the accuracy of such a large area is typically very expensive (the value of the data set in this assessment is over $690,000), using inventory data can provide a cost effective way of assessing the accuracy of vegetation maps. The initial set of data could be supplemented by techniques, such as post-stratification, cluster sampling, double sampling, or regression estimators similar to those suggested by Stehman (1996). In this way, information needed to adequately assess the accuracy of vegetation maps can be incorporated into standard forest inventory designs. LITERATURE CITED Gopal, S. and C.E. Woodcock. (1994). Theory and Methods for Accuracy Assessment of Thematic Maps Using Fuzzy Sets, Photogrammetric Engineering and Remote Sensing, 60(2): 181-188. Miller S., H. Eng, M. Byrne, J. Milliken, M. Rosenberg. (1994). Northeastern California Vegetation Mapping: A Joint Agency Effort, Remote Sensing and Ecosystem Management: Proceedings of the Fifth Forest Service Remote Sensing Applications Conference, April 11-15, 1994, pp. 115-125. ASPRS, Bethesda, Maryland. www.fs.fed.us/r5/rsl/publications/ 13 Stehman S.V. 1996. Cost-effective, Practical Sampling Strategies for Accuracy Assessment of Large-Area Thematic Maps, Spatial Accuracy Assessment in Natural Resources and Environmental Sciences: Second International Symposium, U.S.D.A. Forest Service, Rocky Mountain Forest and Range Experiment Station, Fort Collins, CO, General Technical Report RM-GTR-277, pp 485-492. USDA, U.S. Forest Service - Regional Ecology Group. (1981). CALVEG: A Classification of California Vegetation, San Francisco, CA. 168p. USDA, U.S. Forest Service - PNW Research (1992). California Inventory, Portland, OR. Field Manual for USDA, U.S. Forest Service - Region 5. (1995). Forest Inventory and Analysis User's Guide, San Francisco, CA. Zadeh, L. (1963). Outline of a New Approach to the Analysis of Complex or Imprecise Concepts. IEEE Transactions: Systems, Man, and Cybernetics, SMC 3:28-44. Zadeh, L. (1965). Fuzzy Sets, Information and Control, 8:328-353. ACKNOWLEDGMENTS Thanks to the California Department of Forestry and Fire Protection for joint funding of this effort. www.fs.fed.us/r5/rsl/publications/ 14