CHAPTER 7 Quality in the GLIMS Glacier Database Bruce H. Raup, Siri Jodha S. Khalsa, Richard L. Armstrong, William A. Sneed, Gordon S. Hamilton, Frank Paul, Fiona Cawkwell, Matthew J. Beedle, Brian P. Menounos, Roger D. Wheate, Helmut Rott, Liu Shiyin, Li Xin, Shangguan Donghui, Cheng Guodong, Jeffrey S. Kargel, Chris F. Larsen, Bruce F. Molnia, Joni L. Kincaid, Andrew Klein, and Vladimir Konovalov ABSTRACT Global Land Ice Measurements from Space (GLIMS) is an international initiative to map the world’s glaciers and to build a geospatial database of glacier vector outlines that is usable via the World Wide Web. The GLIMS initiative includes glaciologists at 82 institutions, organized into 27 Regional Centers (RCs), who analyze satellite imagery to map glaciers in their regions of expertise. The results are collected at the U.S. National Snow and Ice Data Center (NSIDC) and ingested into the GLIMS Glacier Database. A concern for users of the database is data quality. The process of classifying multispectral satellite data to extract vector outlines of glaciers has been automated to some degree, but there remain stages requiring human interpretation. To quantify the repeatability and precision of data provided by different RCs, we designed a method of comparative image analysis whereby analysts at the RCs and NSIDC could derive glacier outlines from the same set of images, chosen to contain a variety of glacier types. We carried out four such experiments. The results were compiled, compared, and analyzed to quantify inter-RC analysis consistency. These comparisons have improved RC ability to produce consistent data, and in addition show that in the lower reaches of a glacier, precision of glacier outlines is typically 3 to 4 pixels. Variability in the accumulation area and over parts of the glacier that are debris covered tends to be higher. The ingest process includes quality control steps that must be passed before data are accepted into the database. These steps ensure that ingested data are well georeferenced and internally consistent. The GLACE experiments and ingest time quality control steps have led to improved quality and consistency of GLIMS data. This chapter presents the GLACE experiments and the quality control steps incorporated in the data ingest process. More recent similar studies are referenced. 7.1 INTRODUCTION GLIMS is the first attempt to build a globally complete, high-resolution map of glacier extents; currently there are complete regional glacier inventories and incomplete global inventories. The GLIMS Glacier Database has begun to allow new scientific questions to be addressed, such as global statistics of glacier area and area elevation distribution, global trends in glacier area change and mass change, and regional variability in rates of change. The GLIMS Glacier Database contains not just point locations for glaciers, as in the World Glacier Inventory (WGI), but also glacier outlines as closed polygons, which record where the glacier boundaries were at a specific time. Also recorded for many glaciers are extents of supraglacial debris and lakes, proglacial lakes, snow lines, and approximate center flow lines, as well as nonspatial data such as glacier name, source imagery and maps, and analyst details. 164 Quality in the GLIMS glacier database As of early 2014, the number of glacier outlines in the GLIMS Glacier Database was 122,414, representing approximately 70% of the estimated total of Earth’s glaciers. The total area covered in the database is 520,000 km 2 , also about 70% of the estimated total area. For such a global database to be useful and trustworthy to users, close attention must be paid to data quality and consistency. Glacier outlines need to have good consistency between regions and over time in order for scientific questions to be addressed. The design of the database itself imposes a consistent set of parameters and one data model on GLIMS analysts, but despite large gains in the degree of automation of glacier classification in satellite imagery, automated algorithms must be tailored to the particular characteristics of glaciers from region to region, and human judgment and subjectivity remain necessary ingredients of the mapping process. The calculation of area changes introduces additional pitfalls. Factors that affect the quality of glacier outlines derived from satellite imagery include image georeferencing; variations of seasonal snow cover; debris cover on glaciers; working definition of ‘‘glacier’’ as an entity that may be connected to other ice bodies; and difficulties in defining ice flow divides. Differing interpretations of snowfields in the accumulation area or of debriscovered ice in the ablation region can greatly affect the calculated area for a glacier, possibly leading to erroneous climatic interpretation. This chapter presents the ways in which the GLIMS core developers and RCs have addressed methodological challenges encountered in spaceborne glacier mapping. These steps include the development of standard methods for mapping land ice from satellite imagery; the development of standard tools, such as GLIMSView, for glacier mapping and packaging of the resulting data; glacier analysis comparison experiments (GLACE), in which mapping results from multiple analysts are compared; the design of the GLIMS Glacier Database; and the quality control steps in the data ingest process. 7.2 longitude/latitude or projected coordinates), but the choice for representation of the outcrop is less obvious. Geographic Information System (GIS) tools allow polygons to have ‘‘holes’’, and this method is a frequent choice for representing nunataks. Holes are integral to the polygon, however, and must therefore share attributes with that polygon. Within GLIMS it was decided to allow for the possibility that nunataks would have a separate set of attributes, and so they are represented by separate polygons instead of holes in the glacier outline polygon. This one example illustrates the need for standard ways of representing glacier entities within GLIMS. The GLIMS Analysis Tutorial (http://glims.org/ MapsAndDocs/guides.html ) documents the GLIMS approach to modeling glacier entities. Additionally, it is important to have standard formats for transferring glacier-mapping data from the analyst to the GLIMS Glacier Database. The GLIMS Core Technical Group defined a standard GLIMS data transfer format, which is documented at http:// glims.org/MapsAndDocs/datatransfer/data_transfer _specification.html. A software tool called GLIMSView was created in order to make it easier for GLIMS RCs to produce glacier-mapping data in the correct data model and to package these data in the GLIMS data transfer format. It supports manual digitization of glacier boundaries from satellite imagery, and exports the outlines and all attributes (e.g., name of analyst, Regional Center information, physical parameters such as glacier area, etc.) in the GLIMS data transfer format. It can also import already existing glacier outlines, and therefore can be used as a packaging tool for glacier outlines to prepare them for ingest into the GLIMS Glacier Database. It has been used for both purposes by a number of RCs. GLIMSView is free downloadable (open-source) software that runs on Linux and Windows. Development ceased in 2009, and continued development is contingent on new funding for that purpose. Similar functionality could be built in the form of plug-ins for GIS software such as QGIS, GRASS, or ArcGIS. STANDARD METHODS AND TOOLS Different people have different ideas about how to represent glacier boundaries digitally. For example, imagine a glacier with a rock outcrop in the middle of it (a nunatak). The glacier outline is typically represented by a polygon (sequence of vertices in 7.3 ACCURACY AND PRECISION IN GLACIER MAPPING Given the distributed nature of glacier-mapping efforts in GLIMS, it was recognized early on that Accuracy and precision in glacier mapping 165 Figure 7.1. Five manual digitization trials described in Sneed (2007), performed separately from the GLACE experiments. Five independent digitizations of a glacier boundary are plotted over the source image. the differences in mapping results (from different algorithms and analysts) needed to be quantified. Several experiments have been done, conducted either by individual Regional Centers or set up by the Core GLIMS Team, to compare results under controlled conditions. These experiments have focused on analytical variations and all sources of error arising from applying different image classification algorithms, manual image interpretation, and the complete end-to-end effect of the mapping effort. To evaluate repeatability of manual digitization, the GLIMS participants in Sweden investigated the effects of human interpretation on manual digitization results by having nine operators outline distinct lake shorelines in a high-resolution aerial photograph, and found that relative uncertainty in the resulting outlines was 2.5 pixels, though this could be improved by applying binary-encoded transects perpendicular to the lake boundaries (Sannel and Brown 2010). Similarly, Sneed (2007) describes a test whereby the terminus of a glacier in Svalbard was digitized five times independently, and the results were compared. They found that in the case of a glacier of area 1.242 km 2 , variations in digitization of the terminal boundary would result in area uncertainty of approximately 1.7%. A part of the set of outlines is shown in Fig. 7.1. Paul (2007) tested the repeatability of manual digitization by one person, and also by two people, and found that relative error in resulting glacier area exceeds 10% when the glacier area is 0.1 km 2 or smaller. For larger glaciers, relative error was 5% or less. Suites of automated methods used for the initial mapping of glacier outlines have been analyzed in several previous studies (Albert 2002, Paul et al. 2002, Paul and Kääb 2005, Racoviteanu et al. 2009) and generally show only marginal differences among the applied methods. Many common image classification algorithms perform well for clean glaciers (glaciers lacking rock-debris cover), and most of them perform poorly when glaciers are debris covered. 166 Quality in the GLIMS glacier database Some additional characterizations and assessments of error are given in many chapters in this book. We draw special attention to the treatments provided by Ramachandran et al. (Section 6.3 on sensor calibration and ASTER image geometric corrections and errors, and Section 6.4.2.3 on detection versus full resolvability of features); Demuth et al. (Online Supplement 16.3 on error of digitized glacier boundaries); and Krumwiede et al. (Section 22.4.6 on error of digitized glacier boundaries and areas). Certainly as GLIMS and other glacier analysis initiatives move toward change assessments and other derivatives, the origination and propagation of all significant errors must be tracked with ever greater care. 7.4 GLACIER ANALYSIS COMPARISON EXPERIMENTS (GLACE) The GLIMS Core Team decided to implement a series of glacier analysis comparison experiments (GLACE, pronounced the same as ‘‘glass’’) to quantify uncertainty in glacier mapping from satellite imagery. Four GLACE experiments have been carried out to date: GLACE 1, GLACE 2, GLACE 2A, and GLACE 3A. GLACE 1 and GLACE 2 focused on automated methods for glacier mapping from imagery, and participants were allowed to use the software tools and algorithms of their choice. GLACE 2A and GLACE 3A evaluated only manual digitization of the glacier boundaries. 7.4.1 GLACE 1 and GLACE 2 GLACE 1 and GLACE 2 allowed the participants to use the tools and algorithms they plan to use operationally in GLIMS. The goal was to assess the precision and repeatability (variability) in the resulting data under realistic conditions. These experiments were not intended to assess the absolute accuracy of mapping results. Hence, mapping results were compared with each other, but not with any independent and validated source of glacier boundary information. In both GLACE 1 and GLACE 2, images were chosen to contain a variety of glacier types, and various types of boundaries: ice–rock, ice–vegetation, and ice–ice (Table 7.1). A digital elevation model (DEM) was made available as ancillary data, to be used to aid interpretation of optical imagery. However, because we knew that some participants had the facility to orthorectify and terrain-correct imagery and others did not, we chose to prohibit orthorectification for the purposes of these experiments, so that the results would all be comparable. The participants used a variety of methods, ranging from manual digitization to fully automated techniques (Table 7.2). In GLACE 1, participants were requested to digitize the boundary of one small glacier manually. GLACE 1 was conducted in 2004 and results were reported at the August 2004 GLIMS Workshop in Oslo, Norway, the Fall 2004 Meeting of the American Geophysical Union (Raup et al. 2004), and the December 2004 GLIMS Mini-workshop in San Francisco. GLACE 2 was carried out in the autumn of 2005, and results were reported at the GLIMS Meeting in New Zealand in February 2006 and at the Arctic Workshop in Boulder, Colorado in March 2006. GLACE 2 included a change detection component using multitemporal optical imagery. The analysis methods used in GLACE 1 and GLACE 2 are summarized in Table 7.2. Many of the automated methods applied a threshold to the ratio of two sensor channels (Paul and Kääb 2005). The normalized difference snow index was also used, which for ASTER can be defined as ðB1 B4Þ=ðB1 þ B4Þ (where B1 ¼ Band 1 and B4 ¼ Band 4) (Hall et al. 1995, Paul 2007). When B1 is saturated, B2 is sometimes used. Individual Table 7.1. Satellite images used in the GLACE experiments. Image ID Acquisition date Sensor GLACE No. SC:AST_L1A.003:2004103566 September 6, 2001 ASTER 1 P050R24_5T910921 September 21, 1991 Landsat TM 2 SC:AST_L1A.003:2010881449 September 21, 2000 ASTER 2, 2A SC:AST_L1A.003:2035265399 July 20, 2006 ASTER 3A Glacier analysis comparison experiments (GLACE) 167 Table 7.2. Tools and techniques used in GLACE 1 and GLACE 2. The group (participant) numbers below have been assigned randomly (separately for GLACE 1 and GLACE 2). GLACE 1 Group Tools 1 2 3 4 5 6 7 Matlab, GLIMSView ERDAS Imagine, Arc/Info Arc/Info, GLIMSView ENVI, PCI, Arc/Info Arc/Info Matlab, GLIMSView, topo maps PCI Works Techniques Band ratio 3/4, 3/6, 3/8 ! RGB; manual, maps Band ratio 3/4, threshold 2.0, visual interpretation Unsupervised classification with manual editing Multistep ratio thresholding algorithm Ratio 3/4, threshold 2.4; manual in shadows; >0.2 km 2 Ratio 3/4, threshold 2.5 PCA on 1–4, NDSI GLACE 2 1 2 3 4 5 6 7 8 PCI Matlab, ERDAS, GLIMSView GLIMSView PCI, ESRI ENVI 4.2, Google Earth ENVI 4.1, ESRI PCI, ESRI GLIMSView, ESRI, ENVI, ERDAS Band ratio enhancement; manual delineation of outlines Three different band ratios as RGB; manual interpretation Manual delineation of outlines Band ratio, threshold Manual delineation of outlines Manual delineation of outlines Bands 3–5 supervised classification for accumulation, ablation Unsupervised classification based on NDSI and ASTER 2/5 ratio; manual cleanup of automatically generated vectors Abbreviations: RGB ¼ red, green, blue; PCA ¼ principal components analysis; NDSI ¼ normalized difference snow index. GLACE 2A and GLACE 3A employed only manual digitization. algorithm choices were based on participants’ previous experience applying them to glaciers in their regions. Comparisons of automated glaciermapping algorithms are given by Albert (2002), Paul et al. (2002), and Paul (2007). While GLACE 1 revealed systematic problems with image preprocessing and interpretation, the goal of GLACE 2 was to derive a quantitative estimate of confidence in GLIMS analysis results, with an additional focus on change detection. We selected two images covering the same area, the Klinaklini Glacier and surrounding glacier system in the Coast Mountains of British Columbia, Canada (Fig. 7.2): an ASTER scene, acquired September 21, 2000 and a Landsat 5 TM scene, acquired September 21, 1991 (precisely nine years earlier; see Table 7.1). This allowed participants to evaluate the ability to detect surface changes based on images acquired from different instruments with different characteristics, such as spatial and radiometric resolution. The region features a glacier system containing many tributaries, a variety of sizes of mountain glaciers, clearly visible transient snow lines, ice flow divides, various glacier boundary types, and debris-covered as well as clean glaciers. While not all RCs have glaciers with morainal material in their normal GLIMS domains, the ice masses in these images provided a region of clean ice that we predicted would work well with algorithms tuned for high-latitude types of glaciers (with minimal debris cover). 7.4.2 GLACE 2A and GLACE 3A (manual digitization) GLACE 2A and GLACE 3A were performed as part of dedicated GLIMS workshops, and participants interpreted the imagery and manually created outlines while sitting together in a computer lab at the workshop venues. The goal of these experiments was to remove from consideration the differences arising from the application of different algorithms and tools, and use only manual methods in order to evaluate variability in human interpretation of the imagery. GLACE 2A was conducted as part of a GLIMS workshop held in Tucson, Arizona in September 2005. Approximately 10 participants used the GLIMSView software package (http://glims.org/ glimsview/) to manually digitize the boundaries of a small glacier in British Columbia from the ASTER image used in GLACE 2 (Table 7.1). 168 Quality in the GLIMS glacier database GLACE 1 GLACE 2 Figure 7.2. Images used in GLACE 1 and GLACE 2. (Left) False-color composite ASTER image acquired September 6, 2001; (right) false-color composite Landsat TM image acquired September 21, 1991. Details on the images used in all the GLACE experiments are listed in Table 7.1. Figure can also be viewed as Online Supplement 7.1. ASTER bands 1, 2, and 3 were displayed as blue, green, and red, respectively, to create a visible nearinfrared (VNIR) false-color composite image. After a short learning period to get familiar with GLIMSView, each participant visually interpreted the image and produced a vector outline of the glacier extent by tracing its perimeter with the mouse, basing their interpretation on their glaciological expertise and previous experience viewing satellite imagery of glaciers. They also produced vector lines to denote the location of snow lines and center flow lines. The glacier’s boundaries included a flow boundary (ice–ice contact), as well as ice–rock boundaries. After producing a glacier outline using only the ASTER image, the participants viewed the glacier using Google Earth, which at that time included a moderate-resolution multispectral image (probably from Landsat’s TM instrument) and a DEM. The combination of the multispectral imagery and elevation data is viewable as a pseudo-3D scene from an arbitrary angle. The analysts used this new source of information with the ASTER image and created a new set of outlines. The GLACE 3A experiment was similar to GLACE 2A, and was held in conjunction with the August 2006 GLIMS Workshop, held in Cam- bridge, England. Participants manually digitized the boundary of the terminus of the Klinaklini Glacier, British Columbia, Canada. Participants for all four experiments are listed in Table 7.5 (p. 182). 7.5 GLACE RESULTS 7.5.1 GLACE 1 and GLACE 2 The quality of the results in GLACE 1 was variable and the experiments revealed problems such as (1) georeferencing errors (Fig. 7.3, left panel), (2) interpretation errors in manual digitization, (3) interpretation differences in manual digitization (Fig. 7.4), and (4) algorithmic deficiencies in automated methods (Fig. 7.3, right panel). An example of an interpretation error is the inclusion of nonglacier material, such as a rock slope or proglacial lake, within the glacier boundary. Interpretation differences result from varying definitions of what to include as a ‘‘glacier’’ (e.g., should the laterally adjacent snow slope be part of the glacier? Where should the boundary between a debris-covered glacier and a partly ice-cored moraine that is separate from the glacier be drawn?). Algorithmic defi- GLACE results 169 Figure 7.3. (Left) All GLACE 1 glacier boundaries overlaid on the ASTER image that was analyzed in the experiment. Gross georeferencing errors, due to some initial difficulty in handling ASTER imagery, are apparent. (Right) Some GLACE 1 glacier boundaries for Spencer Glacier overlaid on the ASTER image. Classification errors include inclusion of the proglacial lake as part of the glacier (group 3, orange), and exclusion of lightly debriscovered ice near the glacier terminus (group 6, yellow). Blue ¼ group 2; green ¼ group 1. Figure can also be viewed as Online Supplements 7.2a and 7.2b. Figure 7.4. (Left) GLACE 1 boundaries for Skookum Glacier overlaid on the ASTER image that was analyzed in the experiment. A portion of the glacier is debris covered, making it dark in color. Some analysts mistakenly excluded this from their glacier polygons. Analysts also differed in their interpretation of the snowfield on the glacier’s northern side (north is up in image). (Right) Two GLACE 2 glacier outlines overlaid on the September 9, 2000 ASTER image from that experiment. Some analysts included the small tributary glacier (indicated by arrow) as part of the Klinaklini Glacier, while others did not. Figure can also be viewed as Online Supplements 7.3a and 7.3b. 170 Quality in the GLIMS glacier database Table 7.3. Quantitative comparison between different versions of manually digitized outlines for a specific glacier (an unnamed glacier on the east side of Boggs Peak, 12 km east of Portage, Alaska; 60.835 N, 148.742 W, GLIMS ID G211257E60835N, visible in Fig. 7.3 in upper central part of left image), produced by the participants in GLACE 1. Group numbers have been assigned randomly. Group number Area (km2 ) 1 3 4 5 6 7 Mean Std. Dev. 1.79 2.81 3.68 3.91 4.01 5.65 3.642 1.293 ciencies led to underestimation of glacier area in several cases. For example, parts of the tongues of some glaciers were lightly debris covered, leading some algorithms to misclassify those regions as rock (nonglacier). GLACE 1 was the first of this kind of test, and was therefore a learning experience at various levels. Notably, many in the GLIMS community were new to ASTER imagery, which poses unique challenges (Abrams et al. 2002) for georeferencing in some software. Additionally, at the time of this experiment, the GLIMS community had not yet formulated a single definition of ‘‘glacier’’ for the purposes of GLIMS glacier delineation. These problems were starting to be addressed by the time of the GLACE 2 experiment. Given the different data models in which some of the automatically generated data were delivered, meaningful quantitative comparisons among them were impossible without modifying some of the data first. In light of this and the large qualitative differences, qualitative comparisons were deemed sufficient for most of the outlines submitted in this round of GLACE experiments. By contrast, the manually digitized glacier outlines of GLACE 1 were all similar to each other. Table 7.3 shows the calculated areas and their summary statistics. Not all groups produced a manually digitized outline for this glacier. All the outlines produced in the GLACE 2 experiment are shown in Fig. 7.5. The georeferencing problems encountered in GLACE 1 were largely mitigated in GLACE 2. However, interpretation differences remained. Fig. 7.4 (right panel), for example, shows that different analysts treated smaller tributary glaciers differently. In this case, one analyst included the small tributary as part of the main glacier, while another excluded it. This sort of problem led to an extensive discussion at the 2006 New Zealand GLIMS Meeting, and subsequently on the GLIMS electronic mailing list, about how to specify a strict practical definition of the term glacier for use within the GLIMS project. This resulted in a formal definition being included in the GLIMS Analysis Tutorial, as discussed below. In order to quantify the differences between outlines produced from the same image, for a given pair of outlines (from two different analysts), we calculated the straight line (shortest) Euclidean distance between each vertex of one outline and the other outline. This was done by generating a ‘‘distance grid’’ for each polygonal outline where the value at each grid cell is the normal distance from the cell center to that outline. Each grid was then sampled at the locations of the vertices of all the other outlines. These distances are similar to the Hausdorff distance (Alt et al. 1995) used in other disciplines (polygonal feature matching in medical imaging, for example). However, instead of retaining the maximum of these distances (the Hausdorff distance), we examined the statistics for all of them. The result is two sets of distances for every possible outline pair, each set consisting of distances between each vertex of one outline and the other. (There are two sets because calculation of the distances from one set of vertices and the other outline is not a symmetric operation.) Each set of distances represents a measure of the difference between two outlines, and these have been plotted as box-andwhisker plots in Fig. 7.6. The extent of the boxes is the interquartile range, the whiskers extend from the 5th to 95th percentiles, and outliers are shown as circles. The thick horizonal line is the median. In terminus areas, the polygons generally had hundreds of vertices. Distances are calculated between the vertex of one polygon and the interpolated straight line (within the UTM Zone 9 projection) connection to the other. Because vertex density is high, there is no effect from varying numbers of vertices in the polygons. Fig. 7.7 shows the distribution of distances from the 581 vertices in polygon 1 to polygon 3, two of the better and more consistent polygons. Standard deviation is 71 m, or approxi- GLACE results 171 Figure 7.5. All outlines from GLACE 2, Landsat image. The outlines generally match well in the terminus area, whereas there is high variability in the accumulation area. The analyst who produced the red outline applied a different (non-GLIMS) data model, and digitized the contribution of each tributary to the terminus trunk separately. The yellow outline excluded morainal material in the terminus area which should have been included in the glacier outline. Figure can also be viewed as Online Supplement 7.4. mately 4.7 pixels. Therefore, total positional uncertainty due to all sources for the best analysts was about 4.7 pixels. An additional feature of the GLACE 2 experiment was analysis of two images, separated by nine years, of the same glacier system. Participating RCs produced a set of glacier outlines from each image and provided an estimate of area change for the glacier. Some analyses showed a slight increase in area, while others showed a slight decrease. On aggregate, the overall results showed area change that was not statistically different from zero. However, the results from the most internally consistent analysis indicated that the Klinaklini Glacier lost approximately 1% of its area from 1992 to 2000 (Table 7.4). Note that the standard deviation of the measured area changes is greater than the mean (or median) change. The anomalously high area from group 5 is due to inclusion of rock outcrops internal to the glacier in the area computation. This, and the areas for group 2, were identified as outliers and were excluded from the summary statistics in Table 7.4. Similarly, the change in area from group 6 was omitted from the summary statistics of area change due to its obvious underestimate of area. Overall, the area of the Klinaklini Glacier does not appear to have changed significantly during the nine years between image acquisitions. Mass loss can only be inferred, but there is evidence in the images, such as elevated vegetation trim lines, of glacier thinning. 7.5.2 GLACE 2A and GLACE 3A In the manual analysis of the small glacier near Klinaklini Glacier (GLACE 2A), the analysts each produced either one or two outlines. Some produced one, then after viewing the glacier in Google Earth, produced another using the additional information. Others had not produced an outline by the time they viewed the glacier in Google Earth, and produced only the second outline. A few of the 172 Quality in the GLIMS glacier database Figure 7.6. The distances between all the vertices of one glacier outline and the other outlines were determined. This matrix of box plots summarizes interpolygon distances in GLACE 2. Distance between each polygon is shown on the diagonal for comparison. One polygon (from group 6), was created from automated methods that led to a narrower outline than the others; hence the distances between that outline and the others (bottom row, rightmost column) are larger than the other pairs. The distances on the vertical axes are in meters. The width of the box plots has no meaning. The calculation of distances from the vertices of one polygon to another polygon is not a symmetric operation, though the values are generally similar. participants were satisfied enough with their first outline that seeing Google Earth made no difference, and they produced no second outline. The results were highly variable, particularly in the interpretation of ice–ice flow boundaries (ice divides) in the upper snow-covered reaches of the glacier, as well as the terminus region (Fig. 7.8). Viewing the upper part of the glacier using only the nadir image, analysts found it difficult to consistently identify where the change in slope was between the glacier of interest and its neighbor. In the terminus region, a rocky or debris-covered area adjacent to the glacier was interpreted to be a valley wall by some analysts, and a debris-covered GLACE results 173 Figure 7.7. Histogram of the distances between outlines 1 and 3 in GLACE 2. These two outlines are visually consistent with each other. glacier by others. Fig. 7.8 shows the outlines superimposed over the imagery provided by Google Earth, where it is clear from the topographic information that the rocky area is a valley wall. The red lines were produced before viewing the glacier in Google Earth, and the blue lines were produced after. There is less variability in the blue outlines compared with the red. Fig. 7.9 shows variability in the resulting areas calculated from the outlines before viewing the three-dimensional data of Google Earth (left panel) and after (right panel). In this case, the use of the 3D information led to less variability as well as a smaller final outline for the glacier. In general, however, we do not expect the use of 3D information to lead to smaller estimates of glacier size, but only to reduce variability of the estimates. The addition of topographic information enabled the analysts to interpret the scene with higher confidence, and the resulting outlines were in much better agree- ment with each other. This exercise emphasized the fact that topographic information is crucial for proper boundary delineation where there are ice flow divides and supraglacial debris. The outlines produced in GLACE 3A are shown in Fig. 7.10. The lateral boundaries are well identified by all participants, but there are a few slight differences in the terminus region. 7.5.3 Discussion As discussed above, errors can be categorized as georeferencing errors, interpretation errors, interpretation differences, or algorithmic deficiencies. The automated glacier-mapping methods used in the GLACE tests were based only on multispectral data (not topography), and thus were best suited for delineation of glaciers without optically thick (opaque) and extensive debris cover or ice divides. Some of the larger errors were due to debris cover 174 Quality in the GLIMS glacier database Table 7.4. Changes in area of the Klinaklini Glacier as determined by the different groups participating in GLACE 2. Group numbers have been assigned randomly, and differently from GLACE 1. In both area and area change measurements, data that were clearly outliers, marked in the table by asterisks, were removed before calculating the summary statistics at the bottom of the table. Acquisition dates for the ASTER and Landsat scenes were September 21, 2000 and September 21, 1991, respectively. Group number ASTER area (km 2 ) TM area (km 2 ) Area change (km 2 ) Area change (%) 1 450.7 441.3 9.4 2.13 2 304.7 316.8 12.1 3.82 3 409.5 408.6 0.9 0.22 4 454.4 453.4 1 0.22 5 677.7 n/a n/a n/a 6 459.8 503.9 44.1 8.75 7 402.1 413.7 11.6 2.8 8 474.4 479.9 5.5 1.15 Min 402.1 408.6 12.1 3.82 Max 474.4 503.9 9.4 2.13 Median 452.55 447.4 2.3 0.465 Mean 441.82 450.1 2.98 0.867 Std Dev 29.13 (6.6%) 37.20 (8.3%) 8.34 2.18 or tributaries being excluded from the glacier area, varying interpretation of ice flow divides, and an iceberg-filled lake being included in the glacier area. The participants who edited the results from their automated algorithms to compensate for these effects achieved improved results. It is clear that topographic information can be crucial for accurate delineation of glacier boundaries, especially in accumulation zones and where there is supraglacial debris cover. While manual digitization is well suited to final editing, automated algorithms are recommended as the first step to produce a glacier map for an entire scene. Manual editing can then be used to fix errors due to debris cover and cast shadow. Algorithms that use both multispectral imagery and topography to map debris-covered glaciers automatically are being used increasingly, and are valuable sources of a first map of glaciers in regions where debris-covered glaciers are numer- ous. Automated algorithms remove human subjectivity from the process and can map an entire satellite scene hundreds of times faster than purely manual digitization (Bishop et al. 2001, Paul et al. 2004, Raup et al. 2007a, Racoviteanu et al. 2009). The GLACE tests have helped the GLIMS community converge on appropriate algorithms for different glacier types (Paul et al. 2009). For interpretation errors and differences, the GLACE experiments have been instrumental in identifying and reducing errors and variability among glacier outlines produced from satellite imagery by different GLIMS collaborators. One of the chief outcomes is a standard definition of ‘‘glacier’’ for the purposes of GLIMS. The definition was agreed upon through discussions at several conferences since the first GLACE experiment, and has been included in a document called the GLIMS Analysis Tutorial (Raup and Khalsa GLACE results 175 Figure 7.8. GLACE 2A outlines viewed over the default imagery in Google Earth. The red lines were produced before viewing the glacier in Google Earth, and the blue lines were produced after. 2007). The definition is crafted to be specifically applicable to satellite remote sensing of glaciers. The definition reads A glacier or perennial snow mass, identified by a single GLIMS glacier ID, consists of a body of ice and snow that is observed at the end of the melt season, or, in the case of tropical glaciers, after transient snow melts. This includes, at a minimum, all tributaries and connected feeders that contribute ice to the main glacier, plus all debris-covered parts of it. Excluded is all exposed ground, including nunataks. An ice shelf shall be considered as a separate glacier. The ramifications of this definition, such as how to treat steep rock walls that are the source of snow that avalanches onto the glacier, are discussed in the GLIMS Analysis Tutorial. The tutorial also documents recommended practices for the creation of GLIMS data within the context of an appropriate data model. Another consequence of these experiments is the continued development of a series of standard analysis modules and algorithm descriptions that RCs can use to produce uniform glacier data for GLIMS. These standard methods can be implemented in GLIMSView, though this software is currently primarily used for manual digitization of glacier outlines, surface facies, and glacier center lines. GLIMSView provides a framework for consistent production and formatting of outline data, and is extensible for future inclusion of additional processing algorithms (Raup et al. 2007b). Various processing protocols have been discussed in the GLIMS community, and have been documented in some Regional Center work flow guides. All these documents, and the GLIMS Analysis Tutorial, are available at http://glims.org/ As additional tools are implemented and guidelines are developed, we anticipate conducting further comparative image analysis experiments to validate the protocols and analysis modules as they evolve. Within the context of a recent European glacier-mapping project, the Glaciers Climate 176 Quality in the GLIMS glacier database have their own preferred software tools, and have tailored algorithms to the characteristics specific to glaciers in their own regions. Several processing protocols have therefore been recommended, each tailored to a specific set of problems associated with a particular type of glacier system. Development of appropriate tools is an ongoing effort. GLIMS held a workshop in Boulder, Colorado, U.S.A. in June of 2008 to address these specific topics. The primary outcome was a more detailed set of guidelines tailored to different glacier types and software packages. More information about this workshop can be found in Racoviteanu et al. (2009). 7.6 GLIMS GLACIER DATABASE AND THE DATA INGEST PROCESS Glacier outlines, attributes, and related metadata are stored at NSIDC in a relational database. The database software, PostgreSQL with PostGIS add-ons, is open source, and contains data types and functions well suited to storing geospatial and related nongeospatial data. Glacier outlines are stored as polygons whose vertices are in the longitude/latitude (geographic) coordinate system on the WGS-84 datum. 7.6.1 Ingest quality control steps Figure 7.9. Box plots showing the variability of the glacier area calculated from the glacier outlines produced before (left) viewing the glacier in Google Earth and after (right). The extent of the box in the interquartile range, the whiskers extend to the 5th and 95th percentiles, and outliers are shown as circles. The thick horizontal line is the median. Change Initiative (Glaciers_CCI), other comparative image analysis experiments have been carried out and have reached similar conclusions to the GLACE results (Paul et al. 2012). GLIMSView remains a useful tool for ensuring adherence to standardization protocols by guiding the analyst through predetermined processing steps in the protocol, or through its use as a ‘‘filter’’ program, which ensures that certain processing steps have been taken before exporting the data into the data transfer format. We recognize, however, that different researchers within the GLIMS project When a Regional Center has produced a set of GLIMS glacier data, it submits the data to NSIDC via a Web interface that captures metadata on the processing steps used. These metadata include information on tools used, how geocoding of the source imagery was done, radiometric calibration, topographic correction, and the algorithms used for classification and interpretation of the imagery. Several quality control (QC) steps are applied at NSIDC before final ingest into the database. These include automated checking for data completeness and integrity (e.g., existence of necessary IDs and other attributes, proper segment order and correct circulation direction (handedness) of polygons, proper numeric range, polygon closure), and visualizing the data on a map and within Google Earth. When problems are found, the RC is contacted with a request to fix the problems. Finally, after the data have been inserted into the database and become publicly viewable via the web interface, the submitter is notified and requested to view the dataset via the GLIMS web map browser and do a final check for accuracy. ASTER data for GLIMS: STARS, DARs, gain settings, and image seasons 177 Figure 7.10. Results of manual glacier delineation performed in GLACE 3A. Lateral boundaries are well identified by all participants, but there was some disagreement about the details of the terminus, due primarily to differing interpretations of broken ice. The first QC steps are performed automatically by software in the data submission system. Before any person sees the submission, uploaded data files are checked for proper formatting, presence and integrity of required attributes, proper coordinate system for spatial data (latitude and longitude on the WGS-84 datum), proper formatting and existence of GLIMS glacier identifiers, and proper closure and data model for glacier boundary polygons. The quality control steps implemented in the ingest process, in the form of software and procedures, have been effective in ensuring that bad data do not get ingested into the database. Fig. 7.11 shows the glacier inventory for British Columbia displayed in Google Earth before ingest. This visualization method allows for easy identification of errors in the dataset. Fig. 7.12 shows an example where an error in one of the glacier IDs broke the link between the outline and its metadata, and subsequently the outline for that glacier was dropped by the ingest software. The ingest software issued a warning about this, and the visual clue in Google Earth is unmistakable. Fig. 7.13 shows an example from a different region where there was an offset between submitted glacier outlines and imagery in 178 Quality in the GLIMS glacier database Figure 7.11. The 17,585 GLIMS glacier outlines for British Columbia displayed in Google Earth for quality checking before ingest into the GLIMS Glacier Database. Figure 7.12. A missing outline for a glacier in British Columbia becomes obvious when displayed in Google Earth. This error was corrected before ingest. Red lines represent glacier boundaries; green polygons surround rock outcrops that are internal to the glacier. GLIMS Glacier Database and the data ingest process 179 Figure 7.13. GLIMS glacier outlines showing a geographic offset in Google Earth. This prompted checking with the Regional Center. Red lines represent glacier boundaries; green polygons surround rock outcrops that are internal to the glacier. Google Earth. In some cases it may be possible that the georeferencing of the imagery in Google Earth is incorrect, but in practice this is rare, and any offset of the glacier outlines in Google Earth warrants further investigation of the submitted data. The ingest software checks many other items for basic data integrity, with the result that the data in the GLIMS Glacier Database are consistent in metadata and ID links. Some metadata fields are optional, and population of these fields varies, but the QC steps in the ingest process strive to ensure high-quality data in all mandatory fields and also those optional ones that are populated. As a result of these QC steps, the data that are ingested into the GLIMS Glacier Database typically have only the types of errors and uncertainties typified by the best of the outlines in the GLACE experiments. Uncertainty is generally three or four pixels (1) in the terminus region, and can be considerably higher (hundreds of meters) in the accumulation areas at ice–ice boundaries. It is expected that improvements in the georeferencing of source imagery and mapping of ice divides will be achieved as more accurate DEMs become available. 7.6.2 Representation of measurement error For each segment of each glacier outline, the GLIMS Glacier Database contains fields that store positional uncertainty. There are four different fields for each polygon segment: ‘‘local’’ and ‘‘global’’ uncertainty in the x and y directions, both expressed in meters. Local uncertainty is an estimate of the location precision of each vertex in the polygon, and is usually directly related to image resolution, though it can be affected by interpretation difficulty, such as at ice flow divides or debriscovered ice at the terminus. Global uncertainty is an estimate of the accuracy of the entire segment’s position, generally related to georeferencing accuracy of the image. These fields are mandatory; they cannot be left blank. At the time of ingest, these numbers are compared with the positions of the polygons viewed over imagery as supplied by the analyst and in Google Earth. Polygons are sometimes also overlaid on Shuttle Radar Topography Mission (SRTM) DEM visualizations as an additional check. 180 Quality in the GLIMS glacier database These uncertainty values are determined by the analyst, taking into account image resolution, quality (e.g., extent of cloudiness), snow conditions, amount of debris cover on the glacier, and ease of determining flow divides, if present, which in turn depends on the quality of ancillary data such as DEMs, ground-based photographs, or field-based data such as velocity measurements. These considerations are generally documented in the processing description, also stored within the database. . " AND glacier_dynamic.analysis_id=$id" . " AND glacier_dynamic.record_status=’okay’))/1000000" . " WHERE analysis_id = $id"; } else { $update_statement = ’UPDATE glacier_dynamic SET db_calculated_area =’ . ’ (select sum(st_area(st_transform(glacier_polys,32767)))’ . ’ FROM glacier_polygons, glacier_dynamic’ . " WHERE line_type=’glac_bound’" . " AND glacier_polygons.analysis_id=$id" . " AND glacier_dynamic.analysis_id=$id" . " AND glacier_dynamic.record_status=’okay’)/1000000" . " WHERE analysis_id = $id"; 7.6.3 Derived parameters in the database } As part of the ingest process, it is possible to derive additional parameters from glacier outlines and store these as additional information in the database. Currently, the primary derived parameter is glacier area. While many Regional Centers provide the area of each glacier, at ingest time the area of each glacier is calculated and stored within the database. This ensures that (1) every glacier has an associated area stored, and (2) all such areas are calculated in a consistent manner. For each glacier, PostGIS SQL functions are used to project the coordinates to cylindrical equal area (projected meters); calculate the area of the polygon(s) tagged with glac_bound—that is, the glacier boundary polygon(s); calculate the area of the polygon(s) tagged with intrnl_rock—that is, the internal rock (nunatak) boundary polygon(s); subtract the internal rock area from the area within the glacier boundary polygon(s) to get the final area for the glacier. The area calculations are done using the usual formula for calculating the area of planar polygons, thus an equal area projection must be used. This Perl code constructs the correct SQL query, depending on whether there are internal rock polygons: if (count_of_internal_rocks($id) > 0) { $update_statement = ’UPDATE glacier_dynamic SET db_calculated_area =’ . ’ ((select sum(st_area(st_transform(glacier_polys,32767)))’ . ’ FROM glacier_polygons, glacier_dynamic’ . " WHERE line_type=’glac_bound’" . " AND glacier_polygons.analysis_id=$id" . " AND glacier_dynamic.analysis_id=$id" . " AND glacier_dynamic.record_status=’okay’)" . ’ - (select sum(st_area(st_transform(glacier_polys,32767)))’ . ’ FROM glacier_polygons, glacier_dynamic’ . " WHERE line_type=’intrnl_rock’" . " AND glacier_polygons.analysis_id=$id" The PostGIS function st_transform performs the projection, and 32767 is an identifier for the cylindrical equal area projection. 7.7 CONCLUSION The GLIMS community has taken steps to ensure the high quality of data in the GLIMS Glacier Database. GLIMS analysis comparison experiments have revealed specific potential problems in deriving glacier outlines from satellite imagery that can lead to inconsistent results when building a database of such outlines from multiple sources. The problems are more related to methodological questions during postprocessing than to technical issues of initial image classification. The central question of what constitutes a ‘‘glacier’’ touches most of the problems encountered: treatment of tributaries and rock outcrops, location of ice divides, interpretation of debris-covered glacier parts and lakes with icebergs, and snowfields that may hide parts of the glacier perimeter or obscure a small ice patch completely. The experiments have led to the development, adoption, and documentation of definitions, processing protocols, tools, and quality-control steps that have improved the consistency and quality of glacier data going into the database. After the documents were distributed to the GLIMS community, the analysis quality was observed at the data ingest stage to have improved. We estimate that glacier outline digitization repeatability (1) is of the order of 3 to 4 pixels (45 to 60 m for ASTER) in regions where interpretation is straightforward, but uncertainties can remain much higher for individual glaciers where interpretation is difficult (e.g., ice flow divides in regions without proper DEMs). Incorporation of topographic References 181 information into the work flow is thus crucial to reduce this uncertainty. As tools, protocols, and data availability evolve, more GLACE tests will likely be carried out. 7.8 ACKNOWLEDGMENTS The GLIMS initiative at the NSIDC was begun with the support of NASA awards NNG04GF51A and NNG04GM09G. We would like to thank the late Mark Dyurgerov, Paul Geissler, Christian Georges, Chris Helm, Ella Lee, and Claudia Riedl for their involvement in the GLACE experiments. ASTER data courtesy of NASA/GSFC/METI/ Japan Space Systems, the U.S./Japan ASTER Science Team, and the GLIMS project. 7.9 REFERENCES Abrams, M., Hook, S., Ramachandran, B. (2002) Aster User Handbook, Version 2. NASA Jet Propulsion Laboratory, Pasadena, CA. Albert, T.H. (2002) Evaluation of remote sensing techniques for ice-area classification applied to the tropical Quelccaya Ice Cap, Peru. Polar Geography, 26(3), 210– 226. Alt, H., Behrends, B., and Blömer, J. (1995) Approximate matching of polygonal shapes. Annals of Mathematics and Artificial Intelligence, 13, 251–265. Bishop, M.P., Bonk, R., Kamp, U., and Shroder, J. (2001) Terrain analysis and data modeling for alpine glacier mapping. Polar Geography, 25(3), 182–201. Hall, D.K., Riggs, G., and Salomonson, V. (1995) Development of methods for mapping global snow cover using moderate resolution imaging spectroradiometer data. Remote Sensing of Environment, 54(2), 127–140. Paul, F. (2007) The New Swiss Glacier Inventory 2000: Application of Remote Sensing and GIS (Schriftenreihe Physische Geographie, Glaziologie und Geomorphodynamik No. 52). Universität Zürich, 210 pp. Paul, F., Kääb, A., Maisch, M., Kellenberger, T., Haeberli, W. (2002) The new remote-sensing-derived Swiss glacier inventory, I: Methods. Annals of Glaciology, 34, 355–361. Paul, F., Huggel, C., and Kääb, A. (2004) Combining satellite multispectral image data and a digital elevation model for mapping debris-covered glaciers. Remote Sensing of Environment, 89, 510–518. Paul, F., and Kääb, A. (2005) Perspectives on the production of a glacier inventory from multispectral satellite data in the Canadian Arctic: Cumberland Peninsula, Baffin Island. Annals of Glaciology, 42, 59–66. Paul, F., Barry, R., Cogley, J., Frey, H., Haeberli, W., Ohmura, A., Ommanney, C., Raup, B., Rivera, A., and Zemp, M. (2009) Recommendations for the compilation of glacier inventory data from digital sources. Annals of Glaciology, 50(53), 119–126. Paul, F., Barrand, N., Berthier, E., Bolch, T., Casey, K., Frey, H., Joshi, S., Konovalov, V., Bris, R.L., Moelg, N. et al. (2012) On the accuracy of glacier outlines derived from remote sensing data. Annals of Glaciology, 54(63), 171–182. Racoviteanu, A.E., Paul, F., Raup, B., Khalsa, S.J.S., and Armstrong, R. (2009) Challenges and recommendations in mapping of glacier parameters from space: Results of the 2008 Global Land and Ice Measurements from Space (GLIMS) workshop, Boulder, Colorado, USA. Annals of Glaciology, 53, 53–69. Raup, B., and Khalsa, S.J.S. (2007) GLIMS Analysis Tutorial. National Snow and Ice Data Center, Boulder, CO. Available at http://glims.org/MapsAndDocs/ guides.html Raup, B., Khalsa, S., Armstrong, R., Cawkwell, F., Georges, C., Hamilton, G., Sneed, W., Jr., and Wheate, R. (2004) Comparative image analysis to ensure data quality in the global land ice measurements from space (GLIMS) glacier database. EOS Trans. Am. Geophys. Union, 85(47), Supplement, abstract H23D-1151. Raup, B., Kääb, A., Kargel, J.S., Bishop, M.P., Hamilton, G., Lee, E., Paul, F., Rau, F., Soltesz, D., Khalsa, S.J.S. et al. (2007a) Remote sensing and GIS technology in the Global Land Ice Measurements from Space (GLIMS) project. Computers and Geosciences, 33, 104– 125, doi: 10.1016/j.cageo.2006.05.015. Raup, B., Racoviteanu, A., Khalsa, S., Helm, C., Armstrong, R., and Arnaud, Y. (2007b) The GLIMS geospatial glacier database: A new tool for studying glacier change. Global and Planetary Change, 56, 101– 110, doi: 10.1016/j.gloplacha.2006.07.018. Sannel, A.B.K., and Brown, I.A. (2010) High-resolution remote sensing identification of thermokarst lake dynamics in a subarctic peat plateau complex. Canadian Journal of Remote Sensing, 36(Suppl. 1), S26–S40. Sneed, W.A. (2007) Satellite remote sensing of Arctic glacier–climate interactions. Master’s thesis, University of Maine. 182 Quality in the GLIMS glacier database Table 7.5. Participants in GLACE 1, GLACE 2, GLACE 2A, and GLACE 3A, and their affiliations at the time of the experiments. a Regional Center Number (Steward nmber) Institution, participants Experiments 3 University of Alberta, Canada; Fiona CAWKWELL 1, 2, 3A 3 (536) University of Northern British Columbia, Canada; Roger WHEATE and Brian MENOUNOS 1, 2 6 University of Maine, U.S.A.; Gordon HAMILTON and Bill SNEED 1, 2, 2A —a Cambridge University; Narelle BAKER 8 CAREERI, Lanzhou, China; Guodong CHENG, Shiyin LIU, Xin LI, Donghui SHANGGUAN 1 5 Portland State University, U.S.A.; Matthew HOFFMAN 3A 11 University of Innsbruck, Austria; Claudia RIEDL, Helmut ROTT 1, 2 11 (507) University of Zurich, Switzerland; Frank PAUL 1, 2 13 University of Innsbruck, Austria; Christian GEORGES 1 602 University of Colorado, U.S.A.; Matthew BEEDLE 2 17 Russian Academy of Sciences, Russia; Vladimir KONOVALOV 2 —a Texas A&M University, U.S.A.; Andrew KLEIN, Joni KINCAID 2 604 U.S. Geological Survey (Flagstaff, AZ), U.S.A.; Ella LEE, Paul GEISSLER 2A 606 University of Arizona, U.S.A.; Jeff KARGEL 2A —a University of Alaska, U.S.A.; Christopher LARSEN 2A 602 University of Colorado, U.S.A.; Bruce RAUP 2A, 3A 602 University of Colorado, U.S.A.; Christopher HELM 2A —a U.S. Geological Survey (Reston, VA), U.S.A.; Bruce MOLNIA 2A —a University of Colorado, U.S.A.; Mark DYURGEROV 2A —a University of Otago, New Zealand, Shelley MACDONELL 3A Dashes indicate no assignment yet to a Regional Center. 3A