Local Search for Optimal Global Map Generation Using Middecadal Landsat Images

advertisement
Articles
Local Search for Optimal
Global Map Generation Using
Middecadal Landsat Images
Robert A. Morris, John Gasch, and Lina Khatib
n NASA and the United States Geological Survey (USGS) are collaborating to produce a global map of the Earth using Landsat 5 Thematic
Mapper (TM) and Landsat 7 Enhanced Thematic Mapper Plus (ETM+) remote sensor data
from the period of 2004 through 2007. The
map is composed of thousands of scene locations, and for each location there are tens of different images of varying quality to choose from.
Constraints and preferences on map quality
make it desirable to develop an automated solution to the map-generation problem. This article formulates a global map-generator problem
as a constraint-optimization problem (GMGCOP) and describes an approach to solving it
using local search. The article also describes the
integration of a GMG solver into a graphical
user interface for visualizing and comparing
solutions, thus allowing for solutions to be generated with human participation and guidance.
84
AI MAGAZINE
Overview and Motivation
NASA’s LCLUC1 program is partnering with the EROS2 Data
Center to produce a high-resolution mosaic map of the Earth.
The map will consist of a data set of high-quality images of the
Earth’s continental landmass using Landsat 5 (L5) Thematic
Mapper (TM) and Landsat 7 (L7) Enhanced Thematic Mapper
Plus (ETM+) sensor data from the middecadal period of 2004
through 2007. This project is known as the Global Land Survey
2005 (GLS-2005). The primary purpose of such maps is to facilitate monitoring of global changes in land cover by Earth scientists.
The end product will be composed of roughly 9500 Worldwide Reference System 2 (WRS-2)3 Landsat scene locations;
there are typically 10 or more high-quality candidate images
available for each scene location. Eventually, more than
300,000 images must be evaluated and down-selected to create
the final survey data set. The resulting data map will be distributed to the public at no charge through a USGS website. In addition to providing benefits to researchers in the Earth sciences, it
will likely become the next-generation backdrop for GoogleEarth (which currently uses the GeoCover-2000 data set).
A collection of diverse preference criteria defines a high-quality image map. First, a good map will typically consist of the
best (most cloud-free) image data available per scene. The metric employed for this measure is the automated cloud-cover
assessment (ACCA), a statistic derived from an algorithm that
identifies clouds from data through the difference in mean temperature with Earth’s surface. Second, as the majority of Earth
Copyright © 2009, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602
Articles
science applications deal with health and density
of agricultural and other vegetative land cover,
images taken during peak vegetation maturity are
preferred. The metric employed for this purpose is
the normalized difference vegetation index
(NDVI). NDVI represents the historical average
vegetation density and maturity within each global land scene cell by calendar month. Thus, the
preference is toward images taken during seasons
with high NDVI. Third, to be usable for regional
scientific studies, it is preferable to choose image
data that are seasonally consistent with neighboring scenes. Fourth, to accommodate land-cover
and land-use change analysis, there is a preference
for images that were taken in the same season as
high-quality images from previous survey data
sets, for example, from the GeoCover-2000 set.
Finally, due to a malfunction of the image scanner
on L7, ETM+ produces imagery that has coverage
discontinuities such that an individual image covers only 78 percent of the land area. To compensate, two images of the same scene taken on different days must be combined to produce a
composite image that partially or fully closes the
gaps. Pairs of images of a common scene must
therefore be chosen to maximize coverage (minimize gap), which means the two scenes should be
mutually out of phase. Each image is assigned a
gap–phase statistic, or GPS, which is an absolute
measure of the geometric registration of the image
scan line with respect to the scene center point.
Such GPS values are used to compute the area coverage of composite images.
The size of the space of candidate global image
maps, as well as the number of criteria for quality,
make manual scene selection difficult and tedious.
Effective manual scene selection would require a
visual comparison of thousands of image map
mosaics, looking for defects to image quality (such
as cloud cover) or smoothness of the mosaic (for
example, due to adjacent images being taken in
different seasons). For this reason, the Global Land
Survey team sought the use of automation to assist
in the process of generating and comparing candidate image maps for quality. In particular, it
became evident that it is straightforward to map
the scene selection process for a global image map
into a standard constraint-optimization problem,
for which effective automated solvers exist.
This article presents a formulation of the mapgeneration problem as a constraint-optimization
problem and describes an approach to solving the
problem using local search. The next section
describes the problem in more technical detail. We
then describe a local search approach to solving
the problem. Finally, we discuss the user interface,
large area scene selection interface (LASSI), into
which the solver is integrated.
Global Map Generation as a
Constraint-Optimization Problem
Constraint optimization defines a set of approaches to solving a wide range of computationally hard
problems. Constraint-optimization problems
(COPs) generalize constraint-satisfaction problems
(CSPs). A CSP consists of a set of variables and associated domains that provide candidate values to
the variables. A set of constraints defines relationships among values over subsets of the variables.
Solutions to a CSP are complete assignments of
domain values to the variables that satisfy the constraints. A COP (Larrosa and Dechter 2003) is a CSP
that contains an objective function for ordering
the set of solutions in terms of a measure of quality. Often the objective function is based on a representation of a distinction between hard and soft
constraints. Hard constraints must be satisfied by
any solution. Soft constraints can be used to represent preferences for value combinations among
subsets of variables. The objective function value
for a solution can then be expressed as a combination (for example, sum) of the preference values for
the soft constraints.
Global map generation (GMG) can be viewed as
a constraint-optimization problem (GMG-COP),
with a set of variables V = {vi,j} indexed by WRS-2
path and row number i, j. Each variable vi,j thus
represents a scene location and is associated with a
domain Di,j = {di,j,1, . . . , di,j,m}, where each di,j,k represents a TM or ETM+ image taken at that location.
The WRS organization of scene locations into path
and row induces a grid or lattice structure of binary relations between scene locations (see figure 1).
Because of the symmetry of adjacency, it suffices to
represent the set of adjacent scenes in terms of two
functions, n : Vi,j → Vi,j− 1 (north neighbor) and e :
Vi,j → Vi− 1,j (east neighbor), which return the variable corresponding to the scene that is north (east)
of the designated variable.
A solution s to the GMG-COP is a set of assignments s = {vi,j ← 冬di,j,k, di,j,l冭}. The need for a pair of
images arises from the L7/ETM+ gap anomaly. One
partial image is called the base, which covers
approximately 78 percent of the WRS scene area.
The other partial image is called the fill. If the base
is a TM image from L5, where there are no missing
image data, we set di,j,l = di,j,k by convention. For an
arbitrary solution s, we write bs(vi,j), fs(vi,j) for the
base and fill values for the scene vi,j assigned by s.
The GMG problem is a multiobjective optimization problem, in which a set of potentially competing preference criteria are used to evaluate and
compare solutions. The preference criteria are single image, ETM+ composite, and criteria relating
pairs of adjacent images.
Single image criteria include minimizing cloud
cover, maximizing NDVI value, preferring acquisi-
SUMMER 2009 85
Articles
Image Scene Centers
WRS Path Number
20
19
18
30
31
Typical
Scene Area
Land area
covered in each
scene varies by
latitude
32
Image paths are numbered
westward with path 001
passing through Greenland
and eastern South America
WRS
Row Number
Image rows are numbered
southward beginning
at latitutde N80 degrees,
with row 60 centered
on the equator
Figure 1. The Worldwide Reference System.
The WRS coordinate system for uniquely identifying scene locations on the Earth for Landsat remote sensing in terms of a system of 233
paths and 248 rows. The slanted vertical line represents the satellite track (path). The horizontal line represents a row. Each rectangular scene
is identified by a unique path, row number. In a COP-GMG problem, each variable is associated with a unique scene location, and the
domain of each variable is the set of images captured by either L5 or L7 for that scene. The result is a grid organization for the constraints
and preferences that determine global map quality. Figure redrawn from an image courtesy NASA.9
tion dates centered in study period (2005 or 2006
versus the fringe years 2004 or 2007), prefer
imagery taken by a specific sensor (L7/ETM+ or
L5/TM), boosting preference for L5/TM imagery in
agricultural regions,4 and maximizing seasonal
compatibility with imagery from earlier data set
(GeoCover-2000).
ETM+ composite criteria include minimizing
gaps in data that remain after compositing image
pairs, minimizing the temporal difference between
the composited images’ acquisition dates, minimizing cloud cover of fill image, and maximizing
NDVI value of fill image.
Criteria relating pairs of adjacent images include
minimizing the temporal difference between the
image acquisition date, minimizing the seasonality difference between the images (the days of the
year, ignoring the year the images were acquired),
and preferring adjacent images acquired from a
common sensor (sensor homogeneity), that is,
both L5/TM or both L7/ETM+.
86
AI MAGAZINE
To assess each candidate image, the image is represented as a vector of metadata of values that
score the image on each criterion. The result is a
set of 16 criteria for evaluating solutions, as shown
in table 1. The table provides a description of the
criterion, an associated function (defined below)
that allows for the generation of a quality measure
for each metadata value input, the weight variable
assigned to it, and an example weight value used
for generating image maps of North America (discussed further later).
Each metadata vector value is associated with a
function that is used to evaluate solution quality.
We normalize by considering merit values in the
range [0, 1] where 0 is worst and 1 is best. This
way the objective function is a maximization and
always positive.
The functions defined in table 1 are interpreted
as follows. First, the quality of an individual image
can be depicted in terms of two functions: NDVI
merit value: ndvi : D → [0, 1]; and cloud cover:
Articles
Description
Weight
N. America Weight
NDVI value of the base image
Criterion Function
ndvi(bS(vi,j ))
w1
60
Cloud cover of the base image
acca(bS(vi,j))
w2
20
NDVI value of the fill image
ndvi(f S(vi,j))
w3
30
Cloud cover of the fill image
acca(f S(vi,j))
w4
20
Temporal difference between the base and fill images
dateΔ(bS(vi,j), f S(vi,j))
w5
10
Area covered with base/fill image composition
cover(bS(vi,j), f S(vi,j))
w6
15
Temporal difference between north/south neighbors
dateΔ(bS(vi,j), bS(n(vi,j)))
w7
0
Temporal difference between east/west neighbors
dateΔ(bS(vi,j), bS(e(vi,j)))
w8
0
Day of year (seasonal) difference between north/south neighbors
doyΔ(bS(vi,j), bS(n(vi,j)))
w9
4
Day of year (seasonal) difference between east/west neighbors
doyΔ(bS(vi,j), bS(e(vi,j)))
w10
4
L5/TM image chosen
isL5(bS(vi,j))
w11
0
L7/ETM+ image chosen
isL7(bS(vi,j))
w12
10
Sensor homogeneity across neighbors
sh(bS(vi,j))
w13
5
Image acquisition date in the range (2005 and 2006)
prefYr[05 − 06](bS(vi,j))
w14
10
Day of year (seasonal) difference with GeoCover-2000 data set
doyΔ(bS(vi,j), bS (gc00(vi,j)))
w15
15
Preference for gap-free (L5/TM) imagery over agricultural areas
gfAg(bS(vi,j), vi,j)
w16
40
Table 1. GMG Evaluation Criteria.
Shown in this table are descriptions, mathematical functions, weight variable names, and weights employed for North America map generation using the GLS-2005 data set.
acca : D → [0, 1]. Second, the area coverage function cover : D × D → [0, 1], assigns a value that indicates goodness of fit between a base and fill image
used in a composite. Third, there are two functions associated with measuring the time difference between the acquisition of neighboring pairs
of images. Absolute day difference, dateΔ: D × D →
[0, 1] is the number of days between image acquisition.5 Day of year difference, doy Δ : D × D →
[0, 1] is the gap in days (ignoring the year in which
an image was acquired). The latter function is used
to reward solutions that assign images that are seasonally similar, regardless of year, whereas the former rewards solutions with pairs of images with
small temporal distance. To express relative preferences for TM or ETM+ images, the function
isL5(bs(v i,j)) : D → {0, 1} assigns 1 to images
acquired by TM, and 0 otherwise. Function
isL7(bs(vi,j)) is similar for base images acquired by
ETM+. Function sh(bs(vi,j)) : D → {0, 0.5, 1} assigns
a value representing the degree of sensor homogeneity with the north/east neighboring base
images: 0 if different from both, 0.5 if same as
exactly one, and 1 if same as both (which means
all three neighbors are L5/TM or all are L7/ETM+).
prefYr[05 − 06](bs(vi,j)) : D → {0, 1} is true (that is,
1) if the base image comes from the 2005–2006
GLS database. doyΔ(bs(vi,j), gc00i,j ) : D × D → [0, 1]
measures seasonal closeness between the candidate image and that of the image gc00i,j selected
for the GeoCover-2000 survey (that is, it is preferable to monitor landcover over time between
images captured during the same season). Finally,
gfAg(bs(vi,j), vi,j) : D → [0, 1] where gfAg(bs(vi,j), vi,j)
= Ag(vi,j) ∗ isL5(bs(vi,j)). The function Ag() evaluates
to the fraction of landmass of the WRS cell containing significant agricultural area, where 0.0 represents none and 1.0 represents 100 percent.
The set of solutions can be ordered in terms of
the objective of maximizing individual scene quality while maximizing base or fill phase difference,
and minimizing the temporal differences between
(the bases of) adjacent images. Given an arbitrary
solution s, its score is the value of the weighted
summation depicted in figure 2. An optimal solution s* to this GMG problem is one that receives
the maximum score based on f.
Solving Using Local Search
Local search defines a class of approximation
algorithms that can find near-optimal solutions
within reasonable running times. Given the pair
(S, f), where S is the set of solutions and f is the
SUMMER 2009 87
Articles
f(s) = Σ i,j
w 1 ∗ ndvi(bs (vi,j))
+ w 2 ∗ acca(bs (vi,j))
+ w 3 ∗ ndvi( fs (vi,j))
+ w 4 ∗ acca( fs (vi,j))
+ w 5 ∗ dateΔ(bs (vi,j), fs(vi,j))
+ w 6 ∗ cover(bs (vi,j), fs(vi,j))
+ w 7 ∗ dateΔ(bs (vi,j), bs(n(vi,j)))
+ w 8 ∗ dateΔ(bs (vi,j), bs(e(vi,j)))
+ w 9 ∗ doyΔ(bs(vi,j), bs(n(vi,j)))
+ w 10 ∗ doyΔ(bs(vi,j), bs(e(vi,j)))
+ w 11 ∗ isL5(bs (vi,j))
+ w 12 ∗ isL7(bs(vi,j))
+ w 13 ∗ sh(bs(vi,j))
+ w 14 ∗ prefYr [05 − 06](bs(vi,j))
+ w 15 ∗ doyΔ(bs(vi,j), bs(gc00(vi,j)))
+ w 16 ∗ gfAg(bs(vi,j ), vi,j)
space; only problems with small induced width
can be solved.
Given a set of variables and associated constraints, finding the ordering of the variables with
a minimum induced width is an NP-hard problem.
Although to our knowledge no proof exists, it
appears that the induced width of a constraint
graph arranged as a square grid of size n × n is n.
This linear growth rate imposes a practical limitation on the size of problems solved in a reasonable
time by bucket elimination to roughly n = 30, too
small for the GMG problem.6 Hybrid approaches
that combine bucket elimination with branch and
bound have demonstrated an improvement in performance over pure bucket elimination for problems with a grid structure (Larrosa, Morancho, and
Niso 2005). Although these results justify the
future application of these methods to the GMG
problem, in this effort we did not attempt to solve
the problem using a complete method, but rather
chose local search.
Reasons for Local Search
Figure 2. The Function f for Scoring an
Arbitrary Solution s to a GMG-COP.
f(s) is evaluated as a weighted summation of the scores for s with respect to
the component criterion functions defined in table 1.
objective function, let S* be the set of best solutions (that is, the ones with the highest score
according to f), and f* be the best score. Members
of S* are called global optima. Local search iteratively searches through the set S to find a local optimal solution, a solution for which no better can be
found. Local optima need not be in general members of S*.
Complete Approaches to Solving COPs
Computationally, the problem solved by GMG is
similar in structure to the problem of assigning frequencies to radio transmitters (Cabon et al. 1999)
and other generalizations of the map-coloring
problem. There are two complete general constraint-based methods for solving such problems:
through search, as with branch-and-bound algorithms; and through variable elimination, for
example, using bucket elimination (Dechter 2003).
The worst case time and space complexity of the
latter is tightly bounded by a parameter of the
problem called the induced width, which arises out
of an ordering of the variables. Specifically, complexity of bucket elimination is O(n * d w+1), where
n is the number of variables, d is the size of the
largest domain, and w is the induced width. In
practice, the primary drawback in performance is
88
AI MAGAZINE
The reasons for adopting a local search method to
solving constraint-optimization problems are well
documented. They include anytime performance,
flexibility and ease of implementation, and the
ability to solve large problems.
Anytime performance: On average, local search
behaves well in practice, yielding low-order polynomial running times (Aarts and Lenstra 1997).
Because the criteria space is high-dimensional, it is
difficult, from what comes first, to quantitatively
characterize globally preferred solutions. Consequently, our customers were interested in a system
that could examine large parts of the search space
quickly to determine weight settings that produced
adequate results.
Flexibility and ease of implementation: Our customers required us to build, and demonstrate the
advantages of, automated solutions in a short period of time (2 months). Local search can be easily
implemented.
Ability to solve large problems: As optimization
problems go, the GMG-COP can be considered
large. Local search has been shown to be effective
on large problems.
There are also domain-specific reasons for
choosing local search. Specifically, since the cloudassessment (ACCA) statistic is a close but imperfect
metric of cloud truth, particularly around coastlines, it was felt that it made more sense to have a
solver that could generate multiple solutions easily and allow humans to conduct further evaluations of the solutions and, where necessary, tweak
them. A complete solver, which would likely take
significantly longer in generating a single solution,
would make such an iterative process less viable.
Articles
Implementation
To conduct the search, local search relies on the
notion of a neighborhood function, N : S → S, a function that takes one solution (called the current solution) and returns a new solution that differs from
the current in some small way. To be effective, a
neighborhood function should be simple; it
should not require a lot of time to compute. For
GMG-COP, the neighborhood function randomly
selects a cell and replaces the selected image with a
new one. A neighborhood function is exact if each
local optimum it finds as the result of search is
globally optimal. The neighborhood function used
to solve GMG-COP is not exact.
Designing a local search algorithm is based on
deciding three components: how an initial solution, or seed, is generated, how to select a neighboring solution, and when to terminate search. For
the GMG solver described in this paper, we took a
simple approach to deciding these issues, reasoning that complexity should be introduced only as
needed, that is, only as warranted by inferior performance of simpler approaches, as expressed by
the customers.
First, a good design for a seed generator is one
that intuitively starts in a good location in the search
space. A good location is one that is relatively close
to optimal solutions, where close is measured by
the length of the path from it to an optimal solution using the neighborhood function. For the
GMG-COP, we chose a seed that picks the highest
individual quality image for each cell, ignoring
preferences related to adjacency. This seed is easy
to generate (there is no need to consider adjacency constraints) and should be a good quality solution because it favors cloud-free images with high
NDVI value.
Second, choosing a neighboring solution
requires, first, choosing which cell to change. The
simplest approach is to pick the cell at random.
Since local search is memoryless, in the sense that
it does not keep track of where it’s been previously, it may not be able in general to avoid examining the same solution multiple times. To avoid
this, sometimes algorithms have taboo lists, lists of
variables recently chosen to change. Variables are
put on the list after being chosen and eventually
taken off after some number of iterations. Variables
on the list can’t be selected on a given iteration. In
our implementation we applied an extreme case of
taboo list: once a scene is selected for examination,
it is immediately placed on the taboo list to allow
for all other scenes to be examined in the current
iteration (the ordering of scene selection is random).
Third, given a selected cell, there are also a number of ways to select among the set of neighboring
solutions based on changes made to that cell. Some
are deterministic; that is, given the same decision
to make, the algorithm will make the same choice
each time. Others are nondeterministic. Algorithms such as simulated annealing and genetic
algorithms are nondeterministic. Initially, we opted for a deterministic approach, of which there are
two kinds: first improvement or best improvement. First improvement examines neighbors, in a
local search sense, until one is found that is better
than the current solution; that one becomes the
new current solution. Best improvement examines
all the neighbors and picks the one that improves
upon the current solution the most. Either of these
generates a greedy approach, one that always
chooses an improving solution. A variation of best
improvement is where a neighbor with the best
score is chosen, even if the score is worse than the
score of the current solution. This approach allows
for the possibility that a globally optimal solution
may not be on the greedy path from an initial seed
solution.
Finally, choosing a termination condition
requires deciding how many solutions will be generated before the algorithm halts. The simplest
approach will be to define a termination condition
that says halt when you reach the first locally optimal
solution or after a fixed number of solutions, MAX,
have been generated, whichever comes first. A
slightly more sophisticated version of this simple
local search is called multistart: here, for some fixed
number of runs, we start with different initial
(seed) solutions. Such initial solutions can be fully randomly generated (our implementation),
semirandomly generated, or deterministically generated. An example of deterministically generated
initial solutions employed here is to assign to each
scene the best self-quality image/pair. Alternatively, the local optimum of one run of simple local
search can be used as the initial solution for the
next run.
Testing and Results
Testing the GMG-COP occurred in two stages. First,
we compared different variations in multistart
local search to determine the best performing algorithm. Four variations were tested, based on two
variations of two criteria: the initial solution and
the choice of neighbor. The initial solutions tried
were a randomly generated solution and the solution consisting of the set of images that scored
highest individually (that is, with respect to cloud
cover, NDVI, and base–fill quality). The choice of
neighbor was either done on a first improvement
basis, that is, the first alternative that improved the
overall score, or best improvement basis, that is, of
all the images, selecting the one that most
improved the score. The results indicate that the
best strategy for finding high-quality solutions is
through exploration: with a random initial solution, and a best improvement neighbor selection,
SUMMER 2009 89
Articles
progress was quickly made towards solutions with
higher quality than those found by the other
approaches. We speculate that a random seed
works better than one based on individual scene
quality because the latter forced the search into a
local optimum that was not globally optimal.
In the other stage, we were interested in the
extent of the improvement offered by an automated solution over current practice, which consists of
manually generating solutions. Towards this end,
tests were conducted by the customers at USGS
and the Landsat mission using the GeoCover2000 (GC2K) data set. The results showed that
GMG, implemented as a simple algorithm that we
later improved significantly, produced a solution
that was 23 percent better quality than the manually generated solution, based on the objective
function scores. The customers viewed this result
as significant enough to warrant integration and
deployment of the solver.
On the complete GLS-2005 data set, the GMG
solver converges to a solution in about a minute. A
more detailed discussion of experiments during
GMG development is found in Khatib et al. (2007)
and Morris et al. (2008).
Large Area Scene
Selection Interface (LASSI)
As noted previously, the GMG solver arrives at its
solution based on input consisting of a metadata
representation of an image. Metadata furnish a
low-fidelity, quantified assessment of several
image criteria, such as cloud contamination and
vegetation maturity (NDVI). Each metadata metric
is a global average assessment over the entire
image area, but each metric is subject to its own
systematic errors. For example, in Franks et al.
(2008), the ACCA algorithm for generating cloudcover metadata sometimes is not able to detect
prevalent cirrus clouds, haze, or forest fire smoke.
Visual inspection, on the other hand, can easily
detect these scene imperfections. In general, a
GMG solution is only as good as the metadata it
uses to select the corresponding image, so noisy
input can translate into a less than ideal solution.
To address the reality of these potential sources of
suboptimal solutions, GMG is embedded into a
graphical user interface and visualization tool
known as LASSI (large area scene selection interface).7 LASSI allows users to adjust objective function criteria weights prior to launching the GMG
solver, visually assess mosaic thumbnail image
renderings of the GMG solutions, examine quality of solutions with respect to each objective function criterion, manually disqualify individual
images from consideration by GMG based on visual assessment, manually override images selected
by GMG based on visual assessment and external
90
AI MAGAZINE
knowledge, and remotely access and view browse
imagery for each WRS cell from the USGS Landsat
image archive.
In the following, we illustrate these capabilities
in the process of scene selection for GLS-2005,
summarizing some of the details found in Franks et
al. (2008).
Setting Weights for GLS-2005
Scene selection for GLS-2005 often involved running GMG separately for different regions of the
Earth, with different weight assignments to criteria
based on different climate and other relevant differences among the regions. The user initializes
these weights somewhat by educated trial and
error. The relative weights are influenced by the
depth of the candidate image pool, the land cover
characteristics, and seasonality. A deeper candidate
pool allows the user to set the weights more aggressively. In dry regions, the ACCA weights can be
relaxed in favor of more aggressive NDVI. Conversely, in tropical regions the NDVI and temporal
factors can be downweighted in favor of more
aggressive ACCA. For agricultural regions, we prefer L5/TM where available. For other temperate
regions, the temporal factors are more important,
trading slight degradation of ACCA or NDVI in
favor of selections that minimize bordering scene
discontinuities.
For example, table 1 shows weight assignments
eventually chosen for North American scene selection. This assignment reflects the relative importance of the vegetation (NDVI) criteria, w1 and w3,
due to the primary application of the data for landuse change studies. Cloud-cover weights (ACCA),
w2 and w4, could be relatively lower because the
average cloud cover of the data set was low, and
hence this criterion could be fairly easily satisfied.
The two distinct values for acquisition date difference between base and filler images arose from the
importance of a complete image in areas of rapid
land-use change, such as croplands. In general, the
goal for GLS-2005 data was to achieve a 95 percent
coverage with each base-pair pair. For more details
on the interpretation of the weight assignments,
see Franks et al. (2008), which focuses in depth on
the user perspective.
Viewing Solution Quality Maps
The LASSI interface allows users to examine the
quality of each GMG-generated solution on each
individual criterion. Each metadata map portrays a
metadata criterion in color gradients on a WRS-2
map grid. These maps enable the user to assess the
quality of the solution with respect to a single
dimension. Figure 3 is one such view, in this
instance, of the NDVI metric. Similar metadata
maps are available for sensor (discriminates Landsat 5 TM versus Landsat 7 ETM+ images in the
Articles
Figure 3. LASSI Screen Capture.
This screen capture shows NDVI quality of the GMG-generated solution after optimizing over all criteria. Lighter color signifies better quality image of the displayed criterion.
solution set); day of year (relative time of year of
base acquisitions); year (acquisition year); ACCA
(cloud assessment, measure of cloud pixels—
domain 0 to 10 percent); NDVI (NDVI, normalized
with respect to peak NDVI by WRS scene); raw
NDVI (nonnormalized—raw—NDVI); preference
year (distinguishes whether the chosen image was
acquired within the middle two years, such as
2005 or 2006, or the fringe years, such as 2004 or
2007); preferred day of year (depicts the seasonal
temporal difference between acquisition dates of
the chosen image with respect to the date of the
corresponding WRS scene in the GeoCover-2000
data set; minimizing this time difference improves
the utility of the two data sets for trending analysis); northern neighbor (depicts the temporal difference between neighboring along-track scenes,
north to south; minimizing this difference reduces
potential discontinuities in the resulting endproduct map); and eastern neighbor (depicts the
temporal difference between neighboring adjacent-path scenes, east to west).
Other metadata views depict the quality of
images chosen to fill gaps in the L7/ETM+ images.
These include quality of the ACCA fill, the cloud
assessment of gap fill images; coverage, that is, the
relative success of gap-filling (100 percent coverage
is desired for each base-fill image pair); NDVI difference, that is, for each base-fill image pair, the
relative difference in seasonality between these
images (if the difference is too great, the pair may
produce a composited image with undesirable artifacts); and temporal difference, that is, for each
base-fill image pair, the relative difference in acquisition dates between these images (if the difference
is too great, the pair may result in artifacts similar
SUMMER 2009 91
Articles
Figure 4. GMG Optimized Images of Southwest United States.
This screen shot shows a portion of the GMG-optimized solution of North America. For a selected thumbnail, the LASSI interface shows all
candidate images along the bottom of the screen. User may override the GMG-selected image by choosing one of the alternatives.
to NDVI differences, in addition to differences in
sun angle).
Viewing Thumbnail Image Mosaics
After a solution has been produced by GMG, the
user may use the LASSI interface to view thumbnails of the selected scenes. By double-clicking any
metadata map, LASSI produces a mosaic map of
thumbnail browse images for that area. The user
may also view a full-screen browse image of any
thumbnail. This is especially useful for revealing
cloud-contaminated images that may not be accurately represented in the metadata ACCA criterion,
as explained above.
This mosaic map display also enables the user to
view the chosen ETM+ base and fill images for sideby-side comparison, as seen in figure 4. A small
92
AI MAGAZINE
window in the lower right of this display plots the
monthly NDVI of this scene with markers showing
the relative acquisition time of year of the base and
fill images (where applicable). Adjacent to each
thumbnail is a list of metadata criteria and values.
Finally, the bottom of the screen features a horizontally scrolling list of thumbnails for all candidate images of any selected WRS grid cell. From
this list, the user may manually override the original GMG selection by choosing alternate acquisitions for the base or the fill images.
Global Scene Generation Process
Global scene generation using GMG/LASSI is an
iterative process. The user initializes the objective
function weight parameters and then invokes
GMG to produce a strawman solution. After exam-
Articles
Figure 5. Screen Shot of North America NDVI Metadata Map.
Taken from a GMG produced solution that was optimized only on cloud cover (ACCA) criteria.
ining the metadata maps, the user tweaks these
weights if necessary to compromise in some
dimensions to improve others. In the end, the user
will view the solution’s thumbnail mosaic map and
manually fine-tune it if necessary to eliminate
imagery with popcorn clouds, contrails, snow, or
other contaminants that may not have been
accounted for in the metadata.
For the 2005 global map construction, the project has elected to pursue the problem independently for each continental landmass. This way the
GMG objective function weights may be tailored
for each global region. The 2005 scene selection of
North America using GMG/LASSI was completed
first. Then, by the end of 2008, the scene selection
of the rest of the global continental landmass was
completed.8
Compromise among Competing Criteria
With multicriteria optimization, we often
encounter the problem of the negative interactions
among criteria, that is, when increasing the quality of a solution with respect to one criterion causes a decreased quality with respect to another. For
example, when only the ACCA criterion was preferred, GMG generated a map for North America
that is suboptimal in NDVI. See figure 5 and compare it to figure 3 where optimization occurred
jointly over both factors (in addition to all other
factors). Note the existence of darker areas, which
indicate lower quality assignments. Figure 6 shows
the images (and interface) for the case of preferring
only ACCA but neglecting all other factors. The
resulting map shows seasonal discontinuities, par-
SUMMER 2009 93
Articles
Figure 6. Screen Shot of Southeast United States Thumbnail Images.
Taken from a GMG produced solution that was optimized only on cloud cover (ACCA) criteria.
ticularly notable by the presence of snow cover in
some images. A comparison to figure 4 suggests
that the marginal improvement over cloud coverage (ACCA) is a poor trade for the deterioration in
seasonality quality (NDVI). Such a detrimental
effect on NDVI, which was observed for all other
factors as well, supports the argument that all factors must be considered simultaneously when
building the best map. It is apparent from confronting the GMG problem that automated solvers
are better suited to evaluate the effects of the interactions among multiple criteria (more than two)
than humans.
Conclusions
This article has described an approach for generating high-quality global image maps that incorpo-
94
AI MAGAZINE
rates an automated COP local search solver into a
robust visual interface that allows users to manually manipulate solution images. Using the solver
to manage the GLS-2005 survey yielded measurable improvements in the quality of global image
maps, with a beneficial reduction in the efforts
required to produce those maps. GLS-2005 scene
selection was complicated by the utilization of
multiple sensors, as well as requirements for gapfilling, which were not considerations in prior
global map productions such as GeoCover-2000. A
completely manual solution to optimal global map
generation would have been infeasible. The combined approach we employed, using automation
augmented by human guidance, proved to be a
feasible compromise.
Based on a successful application of the GMG on
the middecadal global Landsat data set, the GMG
will be used for additional data set projects as well.
Articles
One such mapping application has to
do with the USGS goal to create a state
mosaic for all 50 states in the United
States. The use of GMG has the potential of automating a large part of that
effort. Due to the scanner failure on
L7, GMG can greatly reduce the labor
necessary to exploit the Landsat data
archive. The L5 and L7 remote sensing
spacecrafts are expected to continue
service into 2012, and GMG offers
potential ongoing benefit with its ability to rapidly explore and assess large
numbers of candidate solutions for
regional and global Earth science studies and other mapping applications.
USGS and NASA are already planning
for the next decadal survey, GLS-2010.
That plan includes leveraging on the
success of LASSI and GMG.
The objective function criteria and
aspects of the interface occasionally
undergo refinements based on evolving customer requirements. For example, the preference for L5/TM imagery
over agricultural scenes was a late
enhancement after preliminary North
America map production revealed distinct crop maturity discontinuities in
farmland where GMG had selected
L7/ETM+ gap-fill composite pairs.
Another recent change allows for
more human intervention into the
solution generated. For example, as
the result of a recent update, if a user
manually selects an L7 base-fill image
pair, then the automated solver is not
allowed to alter that selection or
reverse the base-fill images.
The global map-generation problem
provides an ideal domain for testing
and evaluating constraint-based optimization solvers. Furthermore, the
GMG solver is of significant potential
benefit to the Earth science research
community, allowing scientists access
to improved automated tools to study
the Earth’s changing ecosystem. There
are future plans to apply the approach
described in this paper to generating
complete moon maps using Clementine image data.
Acknowledgements
Thanks to Steven Covington and Jeremy Frank for their contributions to
this work, and also to the anonymous
referees for their helpful suggestions
on improving the presentation.
Notes
1. Land-Cover and Land-Use Change
(lcluc.umd.edu).
2. USGS Earth Resources and Observation
Science (eros.usgs.gov).
3. L5 and L7 follow the WRS-2 coordinate
system for indexing locations on the Earth
where data is acquired. WRS-2 indexes a
location through a set of paths and rows,
with a 16-day repeat cycle. L5 follows the
WRS-2 system with a temporal offset of 8
days relative to L7. The WRS-2 indexes
orbits (paths) and scene centers (rows) into
a global grid system (day time and night
time) of 233 paths by 248 rows. We refer to
each path, row element as a scene location.
See figure 1.
4. L5 imagery is preferred, where available,
over predominantly agricultural regions
because temporal differences between
L7/ETM+ base and fill images result in
more pronounced and problematic artifacts in homogeneous farmland.
5. Same interpretation for the temporal difference between base and fill images criterion.
6. As noted in personal correspondence
with Javier Larrosa in 2007.
7. LASSI is currently tailored toward Landsat imagery, but it is feasible to customize it
to work with other imagery sources.
8. Availability of data to date can be found
at landsat.usgs.gov/science_GLS2005.php.
9. Taken from the NASA Landsat Science
Data Users Handbook (http://landsathandbook.gsfc.nasa.gov/handbook/handbook_htmls/chapter5/chapter5.html)
References
Aarts, E., and Lenstra, J. K. 1997. Local
Search in Combinatorial Optimization. New
York: John Wiley and Sons.
Cabon, B.; de Givry, S.; Lobjois, L.; Schiex,
T.; and Warners, J. P. 1999. Radio Link Frequency Assignment. Constraints 4(1): 78–
87.
Dechter, R. 2003. Constraint Processing. San
Francisco: Morgan Kaufmann.
Franks, S.; Masek, J.; Headley, R.; Gasch, J.;
Covington, S.; and Arvidson, T. 2008. Large
Area Scene Selection Interface (LASSI):
Methodology of Selecting Landsat Imagery
for the Global Land Survey 2005. Unpublished paper.
Khatib, L.; Gasch, J.; Morris, R.; and Covington, S. 2007. Local Search for Optimal
Global Map Generation Using Mid-Decadal
Landsat Images. In Preference Handling for
Artificial Inteligence: Papers from the 2007
AAAI Workshop. Technical Report WS-0710. Menlo Park, CA: AAAI Press.
Larrosa, J., and Dechter, R. 2003. Boosting
Search with Variable Elimination in Constraint Optimization and Constraint Satisfaction Problems. Constraints 8(3): 303–
326.
Larrosa, J.; Morancho, E.; and Niso, D.
2005. On the Practical Use of Variable Elimination in Constraint Optimization Problems: Still-life as a Case Study. JAIR 23: 421–
440.
Morris, R.; Khatib, L.; Gasch, J; and Covington, S. 2008. Automated Global Map
Generation Using Mid-Decadal Landsat
Images. Ninth International Symposium on
Artificial Intelligence, Robotics and Automation in Space (iSAIRAS). Pasadena, CA: Jet
Propulsion Laboratory, California Institute
of Technology.
Robert Morris is a researcher in the automated planning and scheduling group in
the Intelligent Systems Division at NASA
Ames Research Center. For many years he
has served as principal investigator on projects dealing with applying automated
information technologies for sensor web
coordination. He is currently the principal
(or coprincipal) investigator on a number
of Earth science projects for building applications of automated planning and scheduling technologies. His research interests
include representing and reasoning about
time, constraint-based planning, and
developing search techniques for constraint optimization.
John Gasch is a systems engineer at NASA
Goddard Space Flight Center, presently as a
senior mission analyst for the Landsat
Flight Operations Team. Gasch has contributed to the design and development of
numerous mission planning and scheduling systems for NASA and USGS Earth science missions, as well as electronic countermeasures systems for DOD. He received
a B.S. in computer science from the University of Maryland and an M.S. in computer science from Johns Hopkins University. His concentration is on maximizing
the utility of Earth observation remote
sensing satellite systems and improving
mission operations efficiency.
Lina Khatib is a senior research engineer
and scientist with SGT Inc. at NASA Ames
Research Center where, since 1999, she has
worked in the Planning and Scheduling
group, Intelligent Systems Division, on various projects in intelligent automation for
space and Earth sciences. Her Ph.D. is in
computer science with expertise in constraint reasoning and preferences, heuristic
search, temporal reasoning, intelligent
planning and scheduling, and problem
solving.
SUMMER 2009 95
Download