The effects of uncertainty in forest inventory plot locations

advertisement
The effects of uncertainty in forest inventory plot locations
Ronald E. McRoberts, Geoffrey R. Holden, and Greg C. Liknes
North Central Research Station, USDA Forest Service,
Saint Paul, Minnesota 55108 USA
_____________________________________________________________________
Abstract
Data from forest inventory plots are used to obtain a variety of estimates,
many of which require knowledge of exact plot locations.
Although
geographic coordinates of plot locations are measured with greater precision
than in the past, the measurements are still subject to uncertainty. In
addition, to comply with policies that prohibit disclosure of exact locations,
some inventory programs release only perturbed plot locations to the public.
For design-based estimates, a worst case scenario of the effects of
uncertainty in plot locations is evaluated in terms of the radius of circular
areas of interest and the maximum uncertainty in plot locations. In a modelbased context, the effects of uncertainty in plot locations on predictions from
a logistic regression model calibrated using inventory data and satellite
imagery are investigated. Finally, a method is developed for circumventing
the effects of perturbed plot locations when using spectral values of satellite
imagery as independent variables in modeling applications.
_____________________________________________________________________
Introduction
Many applications of forest inventory data are spatial in nature and require knowledge
of exact plot locations. Examples include estimating the volume of timber within a
specified distance of a mill and calibrating models using inventory plot data and the
spectral values of satellite imagery for pixels containing the plot centers. Although
coordinates of plot locations are quite precisely measured using global positioning
system (GPS) receivers, aerial imagery, and digitization methods, the coordinates still
have uncertainty associated with them. In addition, for privacy and sample integrity
reasons, some forest inventory programs are prohibited from disclosing exact plot
locations to the public. For example, the Forest Inventory and Analysis (FIA) program
of the Forest Service, U.S. Department of Agriculture, randomly perturbs all plot
locations by as much as 0.8 km (0.5 mi) and some by as much as 1.6 km (1.0 mi)
before releasing them to the public. In these situations, uncertainty in plot locations
may contribute to both bias and uncertainty in estimates based on inventory data.
The objectives of the study were threefold: (1) to estimate the effects of uncertainty in
plot locations on the uncertainty in design-based estimates, (2) to estimate the effects
of uncertainty in plot locations on the uncertainty in model-based estimates, and (3) to
investigate methods for circumventing the effects of perturbed plot locations for model-
based applications that use spectral values of satellite imagery as independent
variables.
Methods
Estimation using inventory plot data has historically been design-based, although
model-based applications are becoming more extensive. The properties of designbased estimators derive from the sampling designs used to obtain the data. Designbased estimates often consist of plot-based means and variances of means for
selected areas of interest (AOI). For design-based estimates, the primary effect of
uncertainty in plot locations is that the set of plots determined to be in an AOI will
exclude some plots that are actually in the AOI and include some that are actually
outside the AOI. The negative effects of the uncertainty in plot locations decrease as
the uncertainty decreases, as the size of circular AOIs increases, and as the strength
of spatial correlation among plot observations increases.
The properties of model-based estimators derive from the mathematical forms of the
models, unexplained residual variability around model predictions, and the spatial
correlation among residuals.
For this study, model-based estimation entails
formulating mathematical models of the relationships between dependent and
independent variables, predicting the value of the dependent variable for each
estimation unit in the AOI, and calculating the mean of predictions over all estimation
units in the AOI . When the dependent and independent variables are observed at the
same geographic location and the plot coordinates themselves are not independent
variables, then the estimated model of the relationship is unaffected by uncertainty in
plot locations, although the spatial correlation among model prediction residuals may
be poorly estimated. For many analyses, however, the dependent and independent
variables are not observed coincidentally. For example, the independent variables
may be spectral values of satellite image pixels containing the centers of the sampling
units on which the dependent variable is observed. In this case, uncertainty arises in
model-based estimates as a result of uncertainty in the plot locations, errors in image
registration, and errors in co-registration of the plot locations to the image. When the
independent and dependent variables are not observed at the same geographic
location, bias may be introduced into the model predictions, residual variability may
increase, and the spatial correlation among residuals may be poorly estimated. The
effects of uncertainty in plot locations on model-based estimates decrease as the
uncertainty in plot location decreases, as registration errors decrease, as the size of
the sampling unit increases, and as the strength of the spatial correlation among
observations of the variables increases. However, unlike the case of design-based
estimation, the negative effects of uncertainty in plot locations do not necessarily
decrease as the size of the AOI increases.
McRoberts et al. (in press) proposed a framework for systematically estimating the
effects of uncertainty in forest inventory plot locations on design- and model-based
estimates. The framework proposed estimating the effects separately for design- and
model-based estimation, and considered three factors: (1) the range and strength of
spatial correlation, (2) the sizes of AOIs, and (3) the spatial resolution of the units on
which variables are observed.
The effects of uncertainty in plot locations on design-based estimates
R
For a circular AOI (Figure 1), the expected correlation, β, between design-based
estimates using exact plot locations and estimates using locations with uncertainty
may be expressed in terms of five quantities: (1) the radius, R, of the AOI; (2) the
distribution of plot location errors in the interval, [-rmax, rmax], where rmax is the maximum
uncertainty; (3) the number of plots with exact locations in B whose locations with
uncertainty place them in C; (4) the number of plots with exact locations in C whose
locations with uncertainty place them in B; and (5) the spatial correlation of the
attribute of interest.
r max
r ma x
A
B
C
Figure 1. Circular AOI.
A worst case scenario occurs when all plots with exact locations in B are replaced by
plots with exact locations in C and when observations for the plots in B are
uncorrelated with observations for plots in C. The latter condition is truly worst case
and occurs only when the status of the forest in B is substantially different than the
status of the forest in C; e.g., B is forest and C is nonforest. Under this scenario,
assuming a maximum uncertainty distance of rmax, the worst case correlation between
means estimated using data from exact locations and data from locations with
uncertainty may be expressed as,
2
π(R − rmax )
rmax ⎞
Area A
⎛
=
=
−
ρ=
1
.
⎜
⎟
⎝
R ⎠
Area A + AreaB
πR2
2
[1]
The worst case correlation with respect to AOI radius, R, is shown in Figure 2 for four
values of rmax: 0.04 km, 0.20 km, 0.80 km, and 1.60 km. The first value of rmax
corresponds approximately to maximum GPS error, and the third and fourth values
correspond to the intermediate and maximum plot location perturbing distances,
respectively, used by the FIA program.
.
1.0
Correlation
0.8
0.6
Maximum perturbing distance
0.4
0.05 km
0.20 km
0.80 km
1.60 km
0.2
0.0
0
10
20
30
40
50
AOI radius (km)
Figure 2. Correlations for worst case scenarios.
The effects of uncertainty in plot locations on model-based estimation
McRoberts (in review) developed a logistic regression model to predict the probability
of forest for estimation units corresponding to individual satellite image pixels:
E(pi ) =
exp(β 0 + β1X i1 +...+β m X im )
1 + exp(β 0 + β1X i1 +...+β m X im )
[2]
where E(.) denotes statistical expectation, pi is the probability of forest for the ith pixel,
exp(.) is the exponential function, Xij is the value of the jth spectral band for the ith pixel,
and the βs are parameters to be estimated. Observations of forest/nonforest were
obtained from FIA plots, and the spectral data was obtained from Landsat TM/ETM+
imagery. The FIA field plot consists of four 7.31-m (24-ft) radius circular subplots. The
subplots are configured as a central subplot and three peripheral subplots with centers
located at 36.58 m (120 ft) and azimuths of 0o, 120o, and 240o from the center of the
central subplot. Locations of forested or previously forested plots are measured using
GPS receivers, while locations of non-forested plots are measured using aerial
imagery and digitization methods. For this study, inventory data for three 15-km radius
circular study areas in Minnesota, USA, were used and consisted of observations for
83 plots for which 200 subplots were completely forested and 132 subplots were
completely non-forested. Landsat imagery for two Minnesota scenes, rows 27 and 28
of path 28, for three dates corresponding to early, peak, and late vegetation green-up
were used. The spatial configuration of the FIA subplots with centers separated by
36.58 m and the 30-m x 30-m spatial resolution of the TM /ETM+ imagery permits
individual subplots to be associated with individual image pixels. The subplot area of
167.87 m2 is approximately 19 percent of the 900 m2 pixel area. The satellite imagebased predictor variables consisted of the normalized difference vegetation index
(NDVI) and the greenness, brightness, and wetness tasseled cap (TC) transformations
of the spectral values, scaled to the interval [0,255], for each of the three image dates.
The calibration of the logistic regression model was based on data aggregated from all
three study areas.
Within each of the three 15-km radius study areas, design- and model-based
estimates of proportion forest area were calculated. To estimate the effects of
uncertainty in inventory plot locations, four maximum perturbation distances were
selected: rmax=0.05 km, rmax=0.20 km, rmax=0.80 km, and rmax=1.60 km. The first value
corresponds, approximately, to the sum of maxiumum GPS error and maximum image
registration error; the third and fourth values correspond to the intermediate and
maximum plot location perturbing distances, respectively, used by the FIA program.
The plot locations were perturbed in four steps: (1) perturbations, rlon for longitude and
rlat for latitude, were randomly selected from a uniform distribution with positive density
2
2
in the interval [–rmax, rmax]; (2) the total perturbation, rtot = rlon + rlat , was checked to
ensure that rtot ≤ rmax; if not, Step 1 was repeated; (3) rlon and rlat were added to the
exact coordinates of all four subplots for each plot to obtain the perturbed locations;
and (4) the perturbed subplot locations were checked to ensure they remained in their
respective study area; if not, Steps 1-3 were repeated. The spectral values of the
pixels containing the perturbed subplot centers were associated with the subplot
attributes, the model was recalibrated, and predictions, p$ , of the probability of forest
were calculated for each pixel in each study area. The procedure was repeated 10
times for each selection of rmax.
The analyses took two forms. First, for each study area and for each selection of rmax,
proportion forest area was estimated as the mean of pixel predictions over the entire
study area for the exact and each of the 10 perturbed plot models. The bias in
proportion forest area estimates due to perturbed plot locations was calculated as the
difference between the mean of the 10 estimates for perturbed plot models and the
estimate for the exact plot model. Second, for the exact and each of the 10 perturbed
plot models, all pixels in each study area were classified as nonforest if the predicted
probability of forest was p$ ≤ 0.5 and forest if p$ > 0.5 . The misclassification proportion
for each perturbed plot model was calculated as the proportion of pixels for which the
classifications based on exact and perturbed plot model predictions differed. For each
selection of rmax and for each study area, the mean misclassification proportion over
the 10 perturbed plot models was calculated.
Circumventing the effects of perturbed plot locations
For models that use spectral values of satellite imagery as independent variables, a
method for circumventing the effects of perturbed plot locations was investigated.
First, a check was made to determine if searches of an entire Landsat image using
only spectral values could uniquely locate individual pixels. If so, inventory data
appended with the spectral values of corresponding pixels provides sufficient
information to determine exact plot locations, a violation of the FIA non-disclosure
policy. One, two, and three dates of imagery were searched to determine the number
of pixels with the same combination of spectral values for each of 1,000 randomly
selected pixels. If only a small proportion of image pixels have unique combinations of
spectral values, then inventory data appended with spectral values of associated
pixels may be released with confidence that most exact subplot locations cannot be
determined. However, if a large proportion of pixels have unique combinations of
spectral values, then alternatives must be sought.
An alternative may be to perturb the spectral values of the pixels associated with the
exact plot locations. If the perturbed spectral values mask the exact plot locations but
yet retain sufficient information for model calibration, then the effects of perturbed plot
locations may be circumvented while yet complying with the FIA non-disclosure policy.
First, for each of the 1,000 randomly selected image pixels, each spectral value for all
three dates of imagery was perturbed by randomly selecting an integer from a uniform
distribution with positive density in the interval, [-Imax, Imax] where Imax=1, 2, …, 5. For
one, two, and three dates of imagery, all image pixels at the minimum distance in
spectral space from each of the 1,000 pixels with perturbed spectral values were
identified. Second, for the logistic regression model, the spectral values for pixels
associated with FIA subplots were perturbed by randomly selecting an integer from a
uniform distribution with positive density in the interval, [-Imax, Imax] where Imax=1, 2, …,
5. The model was then recalibrated, and predictions, p$ , were calculated for each
pixel in each study area. The procedure was repeated 10 times. The bias and mean
misclassification proportion were calculated in the same manner as when plot
locations were perturbed.
Results
Design-based estimates
Based on equation [1] and Figure 2, the effects of GPS error (rmax=0.40) on designbased estimates is negligible. For circular AOIs and a maximum perturbing distance of
0.8 km, the minimum worst case correlation between means of proportion forest area
based on exact and perturbed plot locations is greater than ρ=0.90 when the radius of
a circular AOI is greater than R=16 km and is greater than ρ=0.95 when the radius is
greater than R=32 km. For a maximum perturbing distance of 1.6 km, a radius of
R=32 km is required for a worst case of correlation of ρ=0.90, and a radius of R=64 km
is required for a worst case correlation of ρ=0.95. The actual correlations will depend
on the shape of the AOI and the spatial correlation among observations of the attribute
of interest.
Model-based estimates
The 10 replications of procedures for each rmax and each Imax produced coefficients of
variation for estimates of bias that were generally less than 0.10 and always less than
0.15. Coefficients of variation for mean misclassification proportion were generally
less than 0.05 and always less than 0.10. Thus, although the replications were small
in number, they were sufficient to produce quite precise estimates of bias and mean
misclassification proportion.
One effect of uncertainty in plot locations when dependent and independent variables
are not observed at the same geographic locations is bias in estimated model
relationships. Even the relatively small uncertainty associated with the combination of
GPS and image registration errors produced detectable bias and misclassification for
the logistic regression model (Figures 3 and 4).
0.2
Bias
0.1
y
Stud
a3
Are
0.0
Study Area
1
-0.1
Study Are
a
2
-0.2
0.0
0.4
0.8
1.2
1.6
Maximum plot location perturbation (km)
Mean misclassification proportion
Figure 3. Bias in proportion forest estimates for 15-km radius study areas.
0.4
Stud
0.3
0.2
a3
2
Study Area
Study Area 1
0.1
0.0
y Are
0.0
0.4
0.8
1.2
1.6
Maximum plot location perturbation (km)
Figure 4. Mean misclassification proportion for 15-km radius study areas.
Circumventing the effects of perturbed plot locations
With one, two, and three dates of Landsat TM/ETM+ imagery, 0.118, 0.974, and 1.000
proportions of pixels, respectively, were uniquely located when the spectral values
were not perturbed (Table 1). These proportions indicate that appending inventory
subplot data with spectral values of corresponding pixels violates the FIA nondisclosure policy.
Table 1 Results of searching image for a given pixel.
Proportion of 1,000 pixels
Maximum
pixel
Pixel of interest at minimum distance
perturbation
Single pixel1
Multiple pixels2
Pixel of interest
not at minimum
distance3
Single date
0
0.118
0.882
0.000
1
0.002
0.034
0.964
2
0.002
0.004
0.994
3
0.000
0.002
0.998
4
0.000
0.002
0.998
5
0.000
0.002
1.000
Two dates
0
0.974
0.016
0.000
1
0.242
0.418
0.340
2
0.072
0.138
0.790
3
0.020
0.036
0.944
4
0.010
0.028
0.962
5
0.002
0.006
0.992
Three dates
0
1.000
0.000
0.000
1
0.708
0.230
0.062
2
0.286
0.266
0.448
3
0.108
0.146
0.746
4
0.082
0.058
0.860
5
0.032
0.034
0.934
1
The proportion of 1,000 pixels for which the pixel was the only pixel at the
minimum spectral distance.
2
The proportion of 1,000 pixels for which the pixel was one of multiple pixels
at the minimum spectral distance.
3
The proportion of 1,000 pixels for which the pixel was not among the pixels
at the minimum spectral distance.
Assume, arbitrarily, that the criterion for compliance with the FIA non-disclosure policy
is that only 10 percent or fewer pixels can be located by searching the image. This
criterion means that the proportion of searches for individual pixels for which the pixel
of interest is not found at the minimum spectral distance is 0.900 or greater. For one
date of Landsat TM/ETM+ imagery, spectral value perturbations of randomly selected,
uniformly distributed integers from the interval [-1, 1] are sufficient; for two dates,
integers from the interval [-3, 3] are sufficient; and for three dates, integers from the
interval [-5,5] are sufficient. Thus, relatively small perturbations are sufficient to mask
the locations of subplots.
The negative effects of perturbing spectral values with integers from the interval [-5, 5]
on logistic model predictions were a small bias in estimates of proportion forest area
and a small misclassification proportion (Table 2). However, the negative effects were
smaller than or comparable to the effects of perturbing plot locations from the interval
[-0.05 km, 0.05 km] (Table 2). The important result of this finding is that the effects of
perturbing spectral values from the interval [-5, 5] are no greater than the combined
effects of GPS measurement error and image registration error.
Table 2.
Study
Area
1
2
3
Effects of perturbing locations and spectral values.
Plot location perturbation
Spectral value perturbation
[-0.05 km, 0.05 km]
[-5, 5]
Bias
Mean
Bias
Mean
misclassification
misclassification
proportion
proportion
0.0306
0.0399
0.0088
0.0193
0.0383
0.0507
0.0152
0.0280
0.0690
0.1085
0.0899
0.0969
Conclusions
Three conclusions may be drawn from this study. First, for design-based analyses, the
effects of GPS error are negligible, and the worst case effects of greater uncertainty for
circular AOIs may be estimated from Equation [1] and Figure 2. Similar results for
AOIs of other shapes are expected except when the AOIs have very narrow
components. Second, appending spectral values of satellite image pixels to inventory
data violates the FIA policy of not disclosing exact plot locations. The third conclusion,
however, is that the negative effects of perturbing plot locations in order to comply with
the policy may be circumvented at least partially for the logistic regression model by
perturbing spectral values with randomly selected, uniformly distributed integers from
the interval [-5, 5]. This action not only masks inventory plot locations but retains most
of the information in the imagery. Further, the effects on bias and misclassification of
these spectral value perturbations were typically less than the effects of GPS and
image registration errors. Nevertheless, caution should be exercised when using
different model forms and when predicting forest attributes that exhibit different spatial
correlation.
Reference
Liknes, G.C., Holden, G.R., Nelson, M.D., and McRoberts, R.E. (2005) Spatially
locating FIA plots from pixel values. In R. E. McRoberts, G.A. Reams, P. C. Van
Deusen, and W.H. McWilliams (Eds). Proceedings of the Fourth Annual Forest
Inventory and Analysis Symposium. NC-GTR- U.S. Department of Agriculture, Forest
Service, St. Paul, MN.
McRoberts, R.E., Holden, G.H., Nelson, M.D., Moser, W.K., Lister, A.J., King, S.L.,
LaPoint, E.B., Coulston, J.W., Smith, W.B., and Reams, G.A. (In press) Estimating
and circumventing the effects of perturbing and swapping inventory plot locations.
Journal of Forestry.
McRoberts, R.E. (In review) Using a logistic regression model, satellite imagery, and
inventory data to estimates forest area. Remote Sensing of Environment.
Download