Articles Local Search for Optimal Global Map Generation Using Middecadal Landsat Images Robert A. Morris, John Gasch, and Lina Khatib n NASA and the United States Geological Survey (USGS) are collaborating to produce a global map of the Earth using Landsat 5 Thematic Mapper (TM) and Landsat 7 Enhanced Thematic Mapper Plus (ETM+) remote sensor data from the period of 2004 through 2007. The map is composed of thousands of scene locations, and for each location there are tens of different images of varying quality to choose from. Constraints and preferences on map quality make it desirable to develop an automated solution to the map-generation problem. This article formulates a global map-generator problem as a constraint-optimization problem (GMGCOP) and describes an approach to solving it using local search. The article also describes the integration of a GMG solver into a graphical user interface for visualizing and comparing solutions, thus allowing for solutions to be generated with human participation and guidance. 84 AI MAGAZINE Overview and Motivation NASA’s LCLUC1 program is partnering with the EROS2 Data Center to produce a high-resolution mosaic map of the Earth. The map will consist of a data set of high-quality images of the Earth’s continental landmass using Landsat 5 (L5) Thematic Mapper (TM) and Landsat 7 (L7) Enhanced Thematic Mapper Plus (ETM+) sensor data from the middecadal period of 2004 through 2007. This project is known as the Global Land Survey 2005 (GLS-2005). The primary purpose of such maps is to facilitate monitoring of global changes in land cover by Earth scientists. The end product will be composed of roughly 9500 Worldwide Reference System 2 (WRS-2)3 Landsat scene locations; there are typically 10 or more high-quality candidate images available for each scene location. Eventually, more than 300,000 images must be evaluated and down-selected to create the final survey data set. The resulting data map will be distributed to the public at no charge through a USGS website. In addition to providing benefits to researchers in the Earth sciences, it will likely become the next-generation backdrop for GoogleEarth (which currently uses the GeoCover-2000 data set). A collection of diverse preference criteria defines a high-quality image map. First, a good map will typically consist of the best (most cloud-free) image data available per scene. The metric employed for this measure is the automated cloud-cover assessment (ACCA), a statistic derived from an algorithm that identifies clouds from data through the difference in mean temperature with Earth’s surface. Second, as the majority of Earth Copyright © 2009, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602 Articles science applications deal with health and density of agricultural and other vegetative land cover, images taken during peak vegetation maturity are preferred. The metric employed for this purpose is the normalized difference vegetation index (NDVI). NDVI represents the historical average vegetation density and maturity within each global land scene cell by calendar month. Thus, the preference is toward images taken during seasons with high NDVI. Third, to be usable for regional scientific studies, it is preferable to choose image data that are seasonally consistent with neighboring scenes. Fourth, to accommodate land-cover and land-use change analysis, there is a preference for images that were taken in the same season as high-quality images from previous survey data sets, for example, from the GeoCover-2000 set. Finally, due to a malfunction of the image scanner on L7, ETM+ produces imagery that has coverage discontinuities such that an individual image covers only 78 percent of the land area. To compensate, two images of the same scene taken on different days must be combined to produce a composite image that partially or fully closes the gaps. Pairs of images of a common scene must therefore be chosen to maximize coverage (minimize gap), which means the two scenes should be mutually out of phase. Each image is assigned a gap–phase statistic, or GPS, which is an absolute measure of the geometric registration of the image scan line with respect to the scene center point. Such GPS values are used to compute the area coverage of composite images. The size of the space of candidate global image maps, as well as the number of criteria for quality, make manual scene selection difficult and tedious. Effective manual scene selection would require a visual comparison of thousands of image map mosaics, looking for defects to image quality (such as cloud cover) or smoothness of the mosaic (for example, due to adjacent images being taken in different seasons). For this reason, the Global Land Survey team sought the use of automation to assist in the process of generating and comparing candidate image maps for quality. In particular, it became evident that it is straightforward to map the scene selection process for a global image map into a standard constraint-optimization problem, for which effective automated solvers exist. This article presents a formulation of the mapgeneration problem as a constraint-optimization problem and describes an approach to solving the problem using local search. The next section describes the problem in more technical detail. We then describe a local search approach to solving the problem. Finally, we discuss the user interface, large area scene selection interface (LASSI), into which the solver is integrated. Global Map Generation as a Constraint-Optimization Problem Constraint optimization defines a set of approaches to solving a wide range of computationally hard problems. Constraint-optimization problems (COPs) generalize constraint-satisfaction problems (CSPs). A CSP consists of a set of variables and associated domains that provide candidate values to the variables. A set of constraints defines relationships among values over subsets of the variables. Solutions to a CSP are complete assignments of domain values to the variables that satisfy the constraints. A COP (Larrosa and Dechter 2003) is a CSP that contains an objective function for ordering the set of solutions in terms of a measure of quality. Often the objective function is based on a representation of a distinction between hard and soft constraints. Hard constraints must be satisfied by any solution. Soft constraints can be used to represent preferences for value combinations among subsets of variables. The objective function value for a solution can then be expressed as a combination (for example, sum) of the preference values for the soft constraints. Global map generation (GMG) can be viewed as a constraint-optimization problem (GMG-COP), with a set of variables V = {vi,j} indexed by WRS-2 path and row number i, j. Each variable vi,j thus represents a scene location and is associated with a domain Di,j = {di,j,1, . . . , di,j,m}, where each di,j,k represents a TM or ETM+ image taken at that location. The WRS organization of scene locations into path and row induces a grid or lattice structure of binary relations between scene locations (see figure 1). Because of the symmetry of adjacency, it suffices to represent the set of adjacent scenes in terms of two functions, n : Vi,j → Vi,j− 1 (north neighbor) and e : Vi,j → Vi− 1,j (east neighbor), which return the variable corresponding to the scene that is north (east) of the designated variable. A solution s to the GMG-COP is a set of assignments s = {vi,j ← 冬di,j,k, di,j,l冭}. The need for a pair of images arises from the L7/ETM+ gap anomaly. One partial image is called the base, which covers approximately 78 percent of the WRS scene area. The other partial image is called the fill. If the base is a TM image from L5, where there are no missing image data, we set di,j,l = di,j,k by convention. For an arbitrary solution s, we write bs(vi,j), fs(vi,j) for the base and fill values for the scene vi,j assigned by s. The GMG problem is a multiobjective optimization problem, in which a set of potentially competing preference criteria are used to evaluate and compare solutions. The preference criteria are single image, ETM+ composite, and criteria relating pairs of adjacent images. Single image criteria include minimizing cloud cover, maximizing NDVI value, preferring acquisi- SUMMER 2009 85 Articles Image Scene Centers WRS Path Number 20 19 18 30 31 Typical Scene Area Land area covered in each scene varies by latitude 32 Image paths are numbered westward with path 001 passing through Greenland and eastern South America WRS Row Number Image rows are numbered southward beginning at latitutde N80 degrees, with row 60 centered on the equator Figure 1. The Worldwide Reference System. The WRS coordinate system for uniquely identifying scene locations on the Earth for Landsat remote sensing in terms of a system of 233 paths and 248 rows. The slanted vertical line represents the satellite track (path). The horizontal line represents a row. Each rectangular scene is identified by a unique path, row number. In a COP-GMG problem, each variable is associated with a unique scene location, and the domain of each variable is the set of images captured by either L5 or L7 for that scene. The result is a grid organization for the constraints and preferences that determine global map quality. Figure redrawn from an image courtesy NASA.9 tion dates centered in study period (2005 or 2006 versus the fringe years 2004 or 2007), prefer imagery taken by a specific sensor (L7/ETM+ or L5/TM), boosting preference for L5/TM imagery in agricultural regions,4 and maximizing seasonal compatibility with imagery from earlier data set (GeoCover-2000). ETM+ composite criteria include minimizing gaps in data that remain after compositing image pairs, minimizing the temporal difference between the composited images’ acquisition dates, minimizing cloud cover of fill image, and maximizing NDVI value of fill image. Criteria relating pairs of adjacent images include minimizing the temporal difference between the image acquisition date, minimizing the seasonality difference between the images (the days of the year, ignoring the year the images were acquired), and preferring adjacent images acquired from a common sensor (sensor homogeneity), that is, both L5/TM or both L7/ETM+. 86 AI MAGAZINE To assess each candidate image, the image is represented as a vector of metadata of values that score the image on each criterion. The result is a set of 16 criteria for evaluating solutions, as shown in table 1. The table provides a description of the criterion, an associated function (defined below) that allows for the generation of a quality measure for each metadata value input, the weight variable assigned to it, and an example weight value used for generating image maps of North America (discussed further later). Each metadata vector value is associated with a function that is used to evaluate solution quality. We normalize by considering merit values in the range [0, 1] where 0 is worst and 1 is best. This way the objective function is a maximization and always positive. The functions defined in table 1 are interpreted as follows. First, the quality of an individual image can be depicted in terms of two functions: NDVI merit value: ndvi : D → [0, 1]; and cloud cover: Articles Description Weight N. America Weight NDVI value of the base image Criterion Function ndvi(bS(vi,j )) w1 60 Cloud cover of the base image acca(bS(vi,j)) w2 20 NDVI value of the fill image ndvi(f S(vi,j)) w3 30 Cloud cover of the fill image acca(f S(vi,j)) w4 20 Temporal difference between the base and fill images dateΔ(bS(vi,j), f S(vi,j)) w5 10 Area covered with base/fill image composition cover(bS(vi,j), f S(vi,j)) w6 15 Temporal difference between north/south neighbors dateΔ(bS(vi,j), bS(n(vi,j))) w7 0 Temporal difference between east/west neighbors dateΔ(bS(vi,j), bS(e(vi,j))) w8 0 Day of year (seasonal) difference between north/south neighbors doyΔ(bS(vi,j), bS(n(vi,j))) w9 4 Day of year (seasonal) difference between east/west neighbors doyΔ(bS(vi,j), bS(e(vi,j))) w10 4 L5/TM image chosen isL5(bS(vi,j)) w11 0 L7/ETM+ image chosen isL7(bS(vi,j)) w12 10 Sensor homogeneity across neighbors sh(bS(vi,j)) w13 5 Image acquisition date in the range (2005 and 2006) prefYr[05 − 06](bS(vi,j)) w14 10 Day of year (seasonal) difference with GeoCover-2000 data set doyΔ(bS(vi,j), bS (gc00(vi,j))) w15 15 Preference for gap-free (L5/TM) imagery over agricultural areas gfAg(bS(vi,j), vi,j) w16 40 Table 1. GMG Evaluation Criteria. Shown in this table are descriptions, mathematical functions, weight variable names, and weights employed for North America map generation using the GLS-2005 data set. acca : D → [0, 1]. Second, the area coverage function cover : D × D → [0, 1], assigns a value that indicates goodness of fit between a base and fill image used in a composite. Third, there are two functions associated with measuring the time difference between the acquisition of neighboring pairs of images. Absolute day difference, dateΔ: D × D → [0, 1] is the number of days between image acquisition.5 Day of year difference, doy Δ : D × D → [0, 1] is the gap in days (ignoring the year in which an image was acquired). The latter function is used to reward solutions that assign images that are seasonally similar, regardless of year, whereas the former rewards solutions with pairs of images with small temporal distance. To express relative preferences for TM or ETM+ images, the function isL5(bs(v i,j)) : D → {0, 1} assigns 1 to images acquired by TM, and 0 otherwise. Function isL7(bs(vi,j)) is similar for base images acquired by ETM+. Function sh(bs(vi,j)) : D → {0, 0.5, 1} assigns a value representing the degree of sensor homogeneity with the north/east neighboring base images: 0 if different from both, 0.5 if same as exactly one, and 1 if same as both (which means all three neighbors are L5/TM or all are L7/ETM+). prefYr[05 − 06](bs(vi,j)) : D → {0, 1} is true (that is, 1) if the base image comes from the 2005–2006 GLS database. doyΔ(bs(vi,j), gc00i,j ) : D × D → [0, 1] measures seasonal closeness between the candidate image and that of the image gc00i,j selected for the GeoCover-2000 survey (that is, it is preferable to monitor landcover over time between images captured during the same season). Finally, gfAg(bs(vi,j), vi,j) : D → [0, 1] where gfAg(bs(vi,j), vi,j) = Ag(vi,j) ∗ isL5(bs(vi,j)). The function Ag() evaluates to the fraction of landmass of the WRS cell containing significant agricultural area, where 0.0 represents none and 1.0 represents 100 percent. The set of solutions can be ordered in terms of the objective of maximizing individual scene quality while maximizing base or fill phase difference, and minimizing the temporal differences between (the bases of) adjacent images. Given an arbitrary solution s, its score is the value of the weighted summation depicted in figure 2. An optimal solution s* to this GMG problem is one that receives the maximum score based on f. Solving Using Local Search Local search defines a class of approximation algorithms that can find near-optimal solutions within reasonable running times. Given the pair (S, f), where S is the set of solutions and f is the SUMMER 2009 87 Articles f(s) = Σ i,j w 1 ∗ ndvi(bs (vi,j)) + w 2 ∗ acca(bs (vi,j)) + w 3 ∗ ndvi( fs (vi,j)) + w 4 ∗ acca( fs (vi,j)) + w 5 ∗ dateΔ(bs (vi,j), fs(vi,j)) + w 6 ∗ cover(bs (vi,j), fs(vi,j)) + w 7 ∗ dateΔ(bs (vi,j), bs(n(vi,j))) + w 8 ∗ dateΔ(bs (vi,j), bs(e(vi,j))) + w 9 ∗ doyΔ(bs(vi,j), bs(n(vi,j))) + w 10 ∗ doyΔ(bs(vi,j), bs(e(vi,j))) + w 11 ∗ isL5(bs (vi,j)) + w 12 ∗ isL7(bs(vi,j)) + w 13 ∗ sh(bs(vi,j)) + w 14 ∗ prefYr [05 − 06](bs(vi,j)) + w 15 ∗ doyΔ(bs(vi,j), bs(gc00(vi,j))) + w 16 ∗ gfAg(bs(vi,j ), vi,j) space; only problems with small induced width can be solved. Given a set of variables and associated constraints, finding the ordering of the variables with a minimum induced width is an NP-hard problem. Although to our knowledge no proof exists, it appears that the induced width of a constraint graph arranged as a square grid of size n × n is n. This linear growth rate imposes a practical limitation on the size of problems solved in a reasonable time by bucket elimination to roughly n = 30, too small for the GMG problem.6 Hybrid approaches that combine bucket elimination with branch and bound have demonstrated an improvement in performance over pure bucket elimination for problems with a grid structure (Larrosa, Morancho, and Niso 2005). Although these results justify the future application of these methods to the GMG problem, in this effort we did not attempt to solve the problem using a complete method, but rather chose local search. Reasons for Local Search Figure 2. The Function f for Scoring an Arbitrary Solution s to a GMG-COP. f(s) is evaluated as a weighted summation of the scores for s with respect to the component criterion functions defined in table 1. objective function, let S* be the set of best solutions (that is, the ones with the highest score according to f), and f* be the best score. Members of S* are called global optima. Local search iteratively searches through the set S to find a local optimal solution, a solution for which no better can be found. Local optima need not be in general members of S*. Complete Approaches to Solving COPs Computationally, the problem solved by GMG is similar in structure to the problem of assigning frequencies to radio transmitters (Cabon et al. 1999) and other generalizations of the map-coloring problem. There are two complete general constraint-based methods for solving such problems: through search, as with branch-and-bound algorithms; and through variable elimination, for example, using bucket elimination (Dechter 2003). The worst case time and space complexity of the latter is tightly bounded by a parameter of the problem called the induced width, which arises out of an ordering of the variables. Specifically, complexity of bucket elimination is O(n * d w+1), where n is the number of variables, d is the size of the largest domain, and w is the induced width. In practice, the primary drawback in performance is 88 AI MAGAZINE The reasons for adopting a local search method to solving constraint-optimization problems are well documented. They include anytime performance, flexibility and ease of implementation, and the ability to solve large problems. Anytime performance: On average, local search behaves well in practice, yielding low-order polynomial running times (Aarts and Lenstra 1997). Because the criteria space is high-dimensional, it is difficult, from what comes first, to quantitatively characterize globally preferred solutions. Consequently, our customers were interested in a system that could examine large parts of the search space quickly to determine weight settings that produced adequate results. Flexibility and ease of implementation: Our customers required us to build, and demonstrate the advantages of, automated solutions in a short period of time (2 months). Local search can be easily implemented. Ability to solve large problems: As optimization problems go, the GMG-COP can be considered large. Local search has been shown to be effective on large problems. There are also domain-specific reasons for choosing local search. Specifically, since the cloudassessment (ACCA) statistic is a close but imperfect metric of cloud truth, particularly around coastlines, it was felt that it made more sense to have a solver that could generate multiple solutions easily and allow humans to conduct further evaluations of the solutions and, where necessary, tweak them. A complete solver, which would likely take significantly longer in generating a single solution, would make such an iterative process less viable. Articles Implementation To conduct the search, local search relies on the notion of a neighborhood function, N : S → S, a function that takes one solution (called the current solution) and returns a new solution that differs from the current in some small way. To be effective, a neighborhood function should be simple; it should not require a lot of time to compute. For GMG-COP, the neighborhood function randomly selects a cell and replaces the selected image with a new one. A neighborhood function is exact if each local optimum it finds as the result of search is globally optimal. The neighborhood function used to solve GMG-COP is not exact. Designing a local search algorithm is based on deciding three components: how an initial solution, or seed, is generated, how to select a neighboring solution, and when to terminate search. For the GMG solver described in this paper, we took a simple approach to deciding these issues, reasoning that complexity should be introduced only as needed, that is, only as warranted by inferior performance of simpler approaches, as expressed by the customers. First, a good design for a seed generator is one that intuitively starts in a good location in the search space. A good location is one that is relatively close to optimal solutions, where close is measured by the length of the path from it to an optimal solution using the neighborhood function. For the GMG-COP, we chose a seed that picks the highest individual quality image for each cell, ignoring preferences related to adjacency. This seed is easy to generate (there is no need to consider adjacency constraints) and should be a good quality solution because it favors cloud-free images with high NDVI value. Second, choosing a neighboring solution requires, first, choosing which cell to change. The simplest approach is to pick the cell at random. Since local search is memoryless, in the sense that it does not keep track of where it’s been previously, it may not be able in general to avoid examining the same solution multiple times. To avoid this, sometimes algorithms have taboo lists, lists of variables recently chosen to change. Variables are put on the list after being chosen and eventually taken off after some number of iterations. Variables on the list can’t be selected on a given iteration. In our implementation we applied an extreme case of taboo list: once a scene is selected for examination, it is immediately placed on the taboo list to allow for all other scenes to be examined in the current iteration (the ordering of scene selection is random). Third, given a selected cell, there are also a number of ways to select among the set of neighboring solutions based on changes made to that cell. Some are deterministic; that is, given the same decision to make, the algorithm will make the same choice each time. Others are nondeterministic. Algorithms such as simulated annealing and genetic algorithms are nondeterministic. Initially, we opted for a deterministic approach, of which there are two kinds: first improvement or best improvement. First improvement examines neighbors, in a local search sense, until one is found that is better than the current solution; that one becomes the new current solution. Best improvement examines all the neighbors and picks the one that improves upon the current solution the most. Either of these generates a greedy approach, one that always chooses an improving solution. A variation of best improvement is where a neighbor with the best score is chosen, even if the score is worse than the score of the current solution. This approach allows for the possibility that a globally optimal solution may not be on the greedy path from an initial seed solution. Finally, choosing a termination condition requires deciding how many solutions will be generated before the algorithm halts. The simplest approach will be to define a termination condition that says halt when you reach the first locally optimal solution or after a fixed number of solutions, MAX, have been generated, whichever comes first. A slightly more sophisticated version of this simple local search is called multistart: here, for some fixed number of runs, we start with different initial (seed) solutions. Such initial solutions can be fully randomly generated (our implementation), semirandomly generated, or deterministically generated. An example of deterministically generated initial solutions employed here is to assign to each scene the best self-quality image/pair. Alternatively, the local optimum of one run of simple local search can be used as the initial solution for the next run. Testing and Results Testing the GMG-COP occurred in two stages. First, we compared different variations in multistart local search to determine the best performing algorithm. Four variations were tested, based on two variations of two criteria: the initial solution and the choice of neighbor. The initial solutions tried were a randomly generated solution and the solution consisting of the set of images that scored highest individually (that is, with respect to cloud cover, NDVI, and base–fill quality). The choice of neighbor was either done on a first improvement basis, that is, the first alternative that improved the overall score, or best improvement basis, that is, of all the images, selecting the one that most improved the score. The results indicate that the best strategy for finding high-quality solutions is through exploration: with a random initial solution, and a best improvement neighbor selection, SUMMER 2009 89 Articles progress was quickly made towards solutions with higher quality than those found by the other approaches. We speculate that a random seed works better than one based on individual scene quality because the latter forced the search into a local optimum that was not globally optimal. In the other stage, we were interested in the extent of the improvement offered by an automated solution over current practice, which consists of manually generating solutions. Towards this end, tests were conducted by the customers at USGS and the Landsat mission using the GeoCover2000 (GC2K) data set. The results showed that GMG, implemented as a simple algorithm that we later improved significantly, produced a solution that was 23 percent better quality than the manually generated solution, based on the objective function scores. The customers viewed this result as significant enough to warrant integration and deployment of the solver. On the complete GLS-2005 data set, the GMG solver converges to a solution in about a minute. A more detailed discussion of experiments during GMG development is found in Khatib et al. (2007) and Morris et al. (2008). Large Area Scene Selection Interface (LASSI) As noted previously, the GMG solver arrives at its solution based on input consisting of a metadata representation of an image. Metadata furnish a low-fidelity, quantified assessment of several image criteria, such as cloud contamination and vegetation maturity (NDVI). Each metadata metric is a global average assessment over the entire image area, but each metric is subject to its own systematic errors. For example, in Franks et al. (2008), the ACCA algorithm for generating cloudcover metadata sometimes is not able to detect prevalent cirrus clouds, haze, or forest fire smoke. Visual inspection, on the other hand, can easily detect these scene imperfections. In general, a GMG solution is only as good as the metadata it uses to select the corresponding image, so noisy input can translate into a less than ideal solution. To address the reality of these potential sources of suboptimal solutions, GMG is embedded into a graphical user interface and visualization tool known as LASSI (large area scene selection interface).7 LASSI allows users to adjust objective function criteria weights prior to launching the GMG solver, visually assess mosaic thumbnail image renderings of the GMG solutions, examine quality of solutions with respect to each objective function criterion, manually disqualify individual images from consideration by GMG based on visual assessment, manually override images selected by GMG based on visual assessment and external 90 AI MAGAZINE knowledge, and remotely access and view browse imagery for each WRS cell from the USGS Landsat image archive. In the following, we illustrate these capabilities in the process of scene selection for GLS-2005, summarizing some of the details found in Franks et al. (2008). Setting Weights for GLS-2005 Scene selection for GLS-2005 often involved running GMG separately for different regions of the Earth, with different weight assignments to criteria based on different climate and other relevant differences among the regions. The user initializes these weights somewhat by educated trial and error. The relative weights are influenced by the depth of the candidate image pool, the land cover characteristics, and seasonality. A deeper candidate pool allows the user to set the weights more aggressively. In dry regions, the ACCA weights can be relaxed in favor of more aggressive NDVI. Conversely, in tropical regions the NDVI and temporal factors can be downweighted in favor of more aggressive ACCA. For agricultural regions, we prefer L5/TM where available. For other temperate regions, the temporal factors are more important, trading slight degradation of ACCA or NDVI in favor of selections that minimize bordering scene discontinuities. For example, table 1 shows weight assignments eventually chosen for North American scene selection. This assignment reflects the relative importance of the vegetation (NDVI) criteria, w1 and w3, due to the primary application of the data for landuse change studies. Cloud-cover weights (ACCA), w2 and w4, could be relatively lower because the average cloud cover of the data set was low, and hence this criterion could be fairly easily satisfied. The two distinct values for acquisition date difference between base and filler images arose from the importance of a complete image in areas of rapid land-use change, such as croplands. In general, the goal for GLS-2005 data was to achieve a 95 percent coverage with each base-pair pair. For more details on the interpretation of the weight assignments, see Franks et al. (2008), which focuses in depth on the user perspective. Viewing Solution Quality Maps The LASSI interface allows users to examine the quality of each GMG-generated solution on each individual criterion. Each metadata map portrays a metadata criterion in color gradients on a WRS-2 map grid. These maps enable the user to assess the quality of the solution with respect to a single dimension. Figure 3 is one such view, in this instance, of the NDVI metric. Similar metadata maps are available for sensor (discriminates Landsat 5 TM versus Landsat 7 ETM+ images in the Articles Figure 3. LASSI Screen Capture. This screen capture shows NDVI quality of the GMG-generated solution after optimizing over all criteria. Lighter color signifies better quality image of the displayed criterion. solution set); day of year (relative time of year of base acquisitions); year (acquisition year); ACCA (cloud assessment, measure of cloud pixels— domain 0 to 10 percent); NDVI (NDVI, normalized with respect to peak NDVI by WRS scene); raw NDVI (nonnormalized—raw—NDVI); preference year (distinguishes whether the chosen image was acquired within the middle two years, such as 2005 or 2006, or the fringe years, such as 2004 or 2007); preferred day of year (depicts the seasonal temporal difference between acquisition dates of the chosen image with respect to the date of the corresponding WRS scene in the GeoCover-2000 data set; minimizing this time difference improves the utility of the two data sets for trending analysis); northern neighbor (depicts the temporal difference between neighboring along-track scenes, north to south; minimizing this difference reduces potential discontinuities in the resulting endproduct map); and eastern neighbor (depicts the temporal difference between neighboring adjacent-path scenes, east to west). Other metadata views depict the quality of images chosen to fill gaps in the L7/ETM+ images. These include quality of the ACCA fill, the cloud assessment of gap fill images; coverage, that is, the relative success of gap-filling (100 percent coverage is desired for each base-fill image pair); NDVI difference, that is, for each base-fill image pair, the relative difference in seasonality between these images (if the difference is too great, the pair may produce a composited image with undesirable artifacts); and temporal difference, that is, for each base-fill image pair, the relative difference in acquisition dates between these images (if the difference is too great, the pair may result in artifacts similar SUMMER 2009 91 Articles Figure 4. GMG Optimized Images of Southwest United States. This screen shot shows a portion of the GMG-optimized solution of North America. For a selected thumbnail, the LASSI interface shows all candidate images along the bottom of the screen. User may override the GMG-selected image by choosing one of the alternatives. to NDVI differences, in addition to differences in sun angle). Viewing Thumbnail Image Mosaics After a solution has been produced by GMG, the user may use the LASSI interface to view thumbnails of the selected scenes. By double-clicking any metadata map, LASSI produces a mosaic map of thumbnail browse images for that area. The user may also view a full-screen browse image of any thumbnail. This is especially useful for revealing cloud-contaminated images that may not be accurately represented in the metadata ACCA criterion, as explained above. This mosaic map display also enables the user to view the chosen ETM+ base and fill images for sideby-side comparison, as seen in figure 4. A small 92 AI MAGAZINE window in the lower right of this display plots the monthly NDVI of this scene with markers showing the relative acquisition time of year of the base and fill images (where applicable). Adjacent to each thumbnail is a list of metadata criteria and values. Finally, the bottom of the screen features a horizontally scrolling list of thumbnails for all candidate images of any selected WRS grid cell. From this list, the user may manually override the original GMG selection by choosing alternate acquisitions for the base or the fill images. Global Scene Generation Process Global scene generation using GMG/LASSI is an iterative process. The user initializes the objective function weight parameters and then invokes GMG to produce a strawman solution. After exam- Articles Figure 5. Screen Shot of North America NDVI Metadata Map. Taken from a GMG produced solution that was optimized only on cloud cover (ACCA) criteria. ining the metadata maps, the user tweaks these weights if necessary to compromise in some dimensions to improve others. In the end, the user will view the solution’s thumbnail mosaic map and manually fine-tune it if necessary to eliminate imagery with popcorn clouds, contrails, snow, or other contaminants that may not have been accounted for in the metadata. For the 2005 global map construction, the project has elected to pursue the problem independently for each continental landmass. This way the GMG objective function weights may be tailored for each global region. The 2005 scene selection of North America using GMG/LASSI was completed first. Then, by the end of 2008, the scene selection of the rest of the global continental landmass was completed.8 Compromise among Competing Criteria With multicriteria optimization, we often encounter the problem of the negative interactions among criteria, that is, when increasing the quality of a solution with respect to one criterion causes a decreased quality with respect to another. For example, when only the ACCA criterion was preferred, GMG generated a map for North America that is suboptimal in NDVI. See figure 5 and compare it to figure 3 where optimization occurred jointly over both factors (in addition to all other factors). Note the existence of darker areas, which indicate lower quality assignments. Figure 6 shows the images (and interface) for the case of preferring only ACCA but neglecting all other factors. The resulting map shows seasonal discontinuities, par- SUMMER 2009 93 Articles Figure 6. Screen Shot of Southeast United States Thumbnail Images. Taken from a GMG produced solution that was optimized only on cloud cover (ACCA) criteria. ticularly notable by the presence of snow cover in some images. A comparison to figure 4 suggests that the marginal improvement over cloud coverage (ACCA) is a poor trade for the deterioration in seasonality quality (NDVI). Such a detrimental effect on NDVI, which was observed for all other factors as well, supports the argument that all factors must be considered simultaneously when building the best map. It is apparent from confronting the GMG problem that automated solvers are better suited to evaluate the effects of the interactions among multiple criteria (more than two) than humans. Conclusions This article has described an approach for generating high-quality global image maps that incorpo- 94 AI MAGAZINE rates an automated COP local search solver into a robust visual interface that allows users to manually manipulate solution images. Using the solver to manage the GLS-2005 survey yielded measurable improvements in the quality of global image maps, with a beneficial reduction in the efforts required to produce those maps. GLS-2005 scene selection was complicated by the utilization of multiple sensors, as well as requirements for gapfilling, which were not considerations in prior global map productions such as GeoCover-2000. A completely manual solution to optimal global map generation would have been infeasible. The combined approach we employed, using automation augmented by human guidance, proved to be a feasible compromise. Based on a successful application of the GMG on the middecadal global Landsat data set, the GMG will be used for additional data set projects as well. Articles One such mapping application has to do with the USGS goal to create a state mosaic for all 50 states in the United States. The use of GMG has the potential of automating a large part of that effort. Due to the scanner failure on L7, GMG can greatly reduce the labor necessary to exploit the Landsat data archive. The L5 and L7 remote sensing spacecrafts are expected to continue service into 2012, and GMG offers potential ongoing benefit with its ability to rapidly explore and assess large numbers of candidate solutions for regional and global Earth science studies and other mapping applications. USGS and NASA are already planning for the next decadal survey, GLS-2010. That plan includes leveraging on the success of LASSI and GMG. The objective function criteria and aspects of the interface occasionally undergo refinements based on evolving customer requirements. For example, the preference for L5/TM imagery over agricultural scenes was a late enhancement after preliminary North America map production revealed distinct crop maturity discontinuities in farmland where GMG had selected L7/ETM+ gap-fill composite pairs. Another recent change allows for more human intervention into the solution generated. For example, as the result of a recent update, if a user manually selects an L7 base-fill image pair, then the automated solver is not allowed to alter that selection or reverse the base-fill images. The global map-generation problem provides an ideal domain for testing and evaluating constraint-based optimization solvers. Furthermore, the GMG solver is of significant potential benefit to the Earth science research community, allowing scientists access to improved automated tools to study the Earth’s changing ecosystem. There are future plans to apply the approach described in this paper to generating complete moon maps using Clementine image data. Acknowledgements Thanks to Steven Covington and Jeremy Frank for their contributions to this work, and also to the anonymous referees for their helpful suggestions on improving the presentation. Notes 1. Land-Cover and Land-Use Change (lcluc.umd.edu). 2. USGS Earth Resources and Observation Science (eros.usgs.gov). 3. L5 and L7 follow the WRS-2 coordinate system for indexing locations on the Earth where data is acquired. WRS-2 indexes a location through a set of paths and rows, with a 16-day repeat cycle. L5 follows the WRS-2 system with a temporal offset of 8 days relative to L7. The WRS-2 indexes orbits (paths) and scene centers (rows) into a global grid system (day time and night time) of 233 paths by 248 rows. We refer to each path, row element as a scene location. See figure 1. 4. L5 imagery is preferred, where available, over predominantly agricultural regions because temporal differences between L7/ETM+ base and fill images result in more pronounced and problematic artifacts in homogeneous farmland. 5. Same interpretation for the temporal difference between base and fill images criterion. 6. As noted in personal correspondence with Javier Larrosa in 2007. 7. LASSI is currently tailored toward Landsat imagery, but it is feasible to customize it to work with other imagery sources. 8. Availability of data to date can be found at landsat.usgs.gov/science_GLS2005.php. 9. Taken from the NASA Landsat Science Data Users Handbook (http://landsathandbook.gsfc.nasa.gov/handbook/handbook_htmls/chapter5/chapter5.html) References Aarts, E., and Lenstra, J. K. 1997. Local Search in Combinatorial Optimization. New York: John Wiley and Sons. Cabon, B.; de Givry, S.; Lobjois, L.; Schiex, T.; and Warners, J. P. 1999. Radio Link Frequency Assignment. Constraints 4(1): 78– 87. Dechter, R. 2003. Constraint Processing. San Francisco: Morgan Kaufmann. Franks, S.; Masek, J.; Headley, R.; Gasch, J.; Covington, S.; and Arvidson, T. 2008. Large Area Scene Selection Interface (LASSI): Methodology of Selecting Landsat Imagery for the Global Land Survey 2005. Unpublished paper. Khatib, L.; Gasch, J.; Morris, R.; and Covington, S. 2007. Local Search for Optimal Global Map Generation Using Mid-Decadal Landsat Images. In Preference Handling for Artificial Inteligence: Papers from the 2007 AAAI Workshop. Technical Report WS-0710. Menlo Park, CA: AAAI Press. Larrosa, J., and Dechter, R. 2003. Boosting Search with Variable Elimination in Constraint Optimization and Constraint Satisfaction Problems. Constraints 8(3): 303– 326. Larrosa, J.; Morancho, E.; and Niso, D. 2005. On the Practical Use of Variable Elimination in Constraint Optimization Problems: Still-life as a Case Study. JAIR 23: 421– 440. Morris, R.; Khatib, L.; Gasch, J; and Covington, S. 2008. Automated Global Map Generation Using Mid-Decadal Landsat Images. Ninth International Symposium on Artificial Intelligence, Robotics and Automation in Space (iSAIRAS). Pasadena, CA: Jet Propulsion Laboratory, California Institute of Technology. Robert Morris is a researcher in the automated planning and scheduling group in the Intelligent Systems Division at NASA Ames Research Center. For many years he has served as principal investigator on projects dealing with applying automated information technologies for sensor web coordination. He is currently the principal (or coprincipal) investigator on a number of Earth science projects for building applications of automated planning and scheduling technologies. His research interests include representing and reasoning about time, constraint-based planning, and developing search techniques for constraint optimization. John Gasch is a systems engineer at NASA Goddard Space Flight Center, presently as a senior mission analyst for the Landsat Flight Operations Team. Gasch has contributed to the design and development of numerous mission planning and scheduling systems for NASA and USGS Earth science missions, as well as electronic countermeasures systems for DOD. He received a B.S. in computer science from the University of Maryland and an M.S. in computer science from Johns Hopkins University. His concentration is on maximizing the utility of Earth observation remote sensing satellite systems and improving mission operations efficiency. Lina Khatib is a senior research engineer and scientist with SGT Inc. at NASA Ames Research Center where, since 1999, she has worked in the Planning and Scheduling group, Intelligent Systems Division, on various projects in intelligent automation for space and Earth sciences. Her Ph.D. is in computer science with expertise in constraint reasoning and preferences, heuristic search, temporal reasoning, intelligent planning and scheduling, and problem solving. SUMMER 2009 95