Error Sources in Regional Airborne LiDAR Surveys Ross Nelson Biospheric Sciences Branch, Code 614.4 NASA-Goddard Space Flight Center Greenbelt, Maryland 20771 USA 1-301-614-6632 Ross.F.Nelson@nasa.gov November 17, 2005 EXTENDED ABSTRACT Airborne lasers may be used as sampling tools to estimate regional (e.g., county, state, province, prefecture, ecozone) forest resources. LiDARs (Light Detection and Ranging) are used to measure distances from aircraft to forest canopy and from aircraft to ground along flight transects 10s to 1000s of kilometers long. The height of the forest canopy, i.e., the difference between these ranging measurements, can be quantitatively related to the amount of wood on the ground. When used in conjunction with Line Intercept Sampling techniques (LIS, Kaiser 1983; DeVries 1986), airborne LiDAR profiles (Figure 1) may be employed to estimate forest merchantable volume, total aboveground dry biomass, carbon, and forest fuel loads over large areas. One critical component of this sampling procedure is the development of an equation or set of equations to predict, for instance, biomass, as a function of laser height measurements. Typically, these equations are developed by locating ground plots in selected areas beneath portions of the airborne LiDAR transect. Ground-measured biomass is paired with coincident laser measurements of forest height, and parametric or nonparametric regression approaches are used to construct the predictive equations. Investigators employ different models to relate ground-measured biomass to laser-measured forest heights. Height-biomass relationships generally tend to be linear or slightly curvilinear, and the relationships are frequently heteroskedastic (Lambert et al. 2005, see Figure 2). Log-log models are commonly employed (e.g., Næsset 2002; Næsset and Gobakken 2005) to control biomass scatter as height increases. Although these log models improve regression fit and control heteroskedasticity, an argument has been made (Nelson et al. 2004) that 1 regional estimates are less accurate when regression estimates are backtransformed, even after accounting for the back-transformation bias (Wiant and Harner 1979). Figure 1. An airborne LiDAR profile acquired over the state of Delaware, on the mid-Atlantic coast of the eastern US, in the summer of y2000. The bottom picture is a color-infrared airphoto with land cover delineated and the aircraft flight line (yellow) superimposed. The 6 digit numbers (hhmmss, GMT) denote the aircraft location, recorded once every 2 seconds by the LiDAR GPS. The aircraft is flying ~50m/s; approximately 1.4 km of flight line is shown. The top graph depicts the associated laser trace of vegetation heights (from Nelson et al 2004). The laser spike at the stream actually depicts null or zero laser returns; water absorbs the near-infrared (λ=0.905 μm) pulse. Fifty-six flight lines totaling over 5159 km of flight data over the state of Delaware were acquired during the summer of y2000. A small, profiling, LiDAR (the Portable Airborne Laser System, PALS, Nelson et al. 2003a) was used to collect the forest canopy height information. The 56 parallel lines oriented N-S along the long axis of the State, were spaced 1 km apart. Delaware includes 3 counties - Newcastle, Kent, and Sussex,- and county and statewide estimates of 2 forest volume, biomass, carbon, impervious surface area, open water area, and wildlife habitat have been generated (Nelson et al. 2003b, 2004, 2005). Variance estimates reported in these studies were calculated assuming that the systematically acquired data were actually a random sample. Figure 2. A scatterplot of 90th decile laser height (d90, X axis, in m) versus total aboveground dry biomass (tagdb, Y axis, in t/ha) for two study areas in North Carolina (NC) and Tennessee (ORNL–Oak Ridge National Lab) USA. The blue and red points are deciduous forest; the green points are loblolly pine plantations. BioSAR is a vegetation RaDAR. PALS height values are illustrated. An assumption of randomness in a systematic survey can lead to a significant variance overestimate (Osborne 1942; Nyyssönen 1967; 1971), especially if a population is ordered or spatially autocorrelated (Cochran 1977, pg. 221; Sukhatme et al. 1984, pg. 417). Conversely, a laser-based line intercept sample ignores various sources of error, including the regression error discussed above, leading to variance underestimates. The objective of this study is twofold: 3 (1) Determine and quantify the effects of including regression error in the variance of laser-based estimates of biomass. (2) Test three different weighted variance estimators – a simple random sampling estimator (SRS), a successive differences estimator (SD, Lindeberg 1924; 1926; Guest 1951), and Newton's Method (NM, T. Gregoire, personal communication, 2005) - and compare these to the empirical systematic sample variances to identify the most accurate estimator. LIS Sampling Error without and with Regression Error: County and statewide standard errors are calculated without and with regression error. Regression error is introduced by adding random error to the regression estimate of biomass at the segment level, where a segment is a short section of a linear laser transect ≤40m and completely contained within a cover type. Regression error is assumed to be normally distributed, and the size of the error is based on the standard deviation of the regression residuals, as follows: reg reg is the regression RMSE, and j is the land cover reg jks = N[0,1] * s j , where s j subscript. Weighted sums of segment estimates are averaged to calculate flight line estimates, and weighted flight line estimates are averaged to calculate regional estimates. Weights are related to lengths of segments or flight lines. When standard errors based on strictly linear predictive models are compared with ln-ln model standard errors, the ln-ln standard errors are, on average, 4-5 times larger than the comparable linear results. The addition of linear regression error adds, on average, 2-10% to the LIS error. The addition of back-transformed ln-ln regression error adds, on average, 20-40%. Though the ln-ln regression models had uniformly higher R2 values, results suggest that, at least with the models developed in this study and with the procedures employed to process the flight line data, the use of ln-ln models to predict biomass leads to inflated variances, poorer cross-validation accuracy (Nelson et al 2004), and excessive bias. These results reflect the fact that small residual errors can grow significantly when the ln(biomass) estimates are back-transformed. Considering a strictly linear, predictive regression model, 95% confidence limits are on the order of 5-10 t/ha for the forested cover classes if an area the 4 size of Delaware is transected with flight lines spaced 2 km apart, i.e., 28 flight lines. The more ubiquitous the cover type on the landscape, in general, the smaller is the standard error of estimate. Estimating Regional Sampling Error – 3 Estimators Results for the three variance estimators – SRS, SD, and NM - were compared to empirical results to see how well the estimators tracked the systematic sample results, with regression error included. The results indicated that, for study areas between 2500 - 5000 km2, the weighted simple random sampling estimator, averaged across cover type and sampling intensity, tracked the empirical standard errors within ~0-20%. The SRS estimator, within these areal bounds, is conservative. Below 2500 km 2, unfortunately, the SRS estimator quickly becomes pathologically nonconservative. The weighted successive differences estimator was most accurate, of the three considered, on study areas below 2500 km2. The SD estimator was consistently conservative thoughout the range of areas considered in this study. The SD estimator overestimated the systematic sample standard errors, including regression error by 10-33%. Trends suggest that the weighted SRS estimator should be considered on areas exceeding 5000 km, though this observation is based on extrapolation. LITERATURE CITED 1. Cochran, William G. 1977. Sampling Techniques, 3rd ed., John Wiley & Sons, New York. 428 pgs. 2. DeVries, P.G. 1986. Sampling Theory for Forest Inventory. Springer-Verlag, New York. 399 p. 3. Guest, P.G.. 1951. The Estimation of Standard Error from Successive Finite Differences. Journal of the Royal Statistical Society, Series B (Methodological) 13(2): 233-237. 4. Kaiser, L. 1983. Unbiased Estimation in Line Intercept Sampling. Biometrics 39: 965-976. 5 5. Lambert, M.-C., C.-H. Ung, and F. Raulier. 2005. Canadian national tree aboveground biomass equations. Canadian Journal of Forest Research 35: 1996-2018. 6. Lindeberg, J.W. 1924. Über die Berechnung des Mittelfehlers des Resultates einer Linientaxierung. Acta Forestalia Fennica 25: 3-22. (in German) 7. Lindeberg, J.W. 1926. Zur Theorie Derr Linientaxierung. Acta Forestalia Fennica 31(6): 3-9. (in German) 8. Næsset, E. 2002 Predicting forest stand characteristics with airborne scanning laser using a practical two-stage procedure and field data. Remote Sensing of Environment 80: 88-99. 9. Næsset, E., and T. Gobakken. 2005. Estimating forest growth using canopy metrics derived from airborne laser scanner data. Remote Sensing of Environment, 96(3-4): 453-465.. 10. Nelson, R. F., G. Parker, and M. Hom. 2003a. A Portable Airborne Laser System for Forest Inventory. Photogrammetric Engineering and Remote Sensing 69(3): 267-273. 11. Nelson, R.F., E.A. Short, and M.A. Valenti. 2003b. A Multiple Resource Inventory of Delaware Using Airborne Laser Data. BioScience 53(10): 981-992. 12. Nelson, R.F., M. Valenti, A. Short, and C. Keller. 2004. Measuring Biomass and Carbon in Delaware Using an Airborne Profiling LiDAR. Scandinavian Journal of Forest Research 19: 500-511. [Erratum. 2005, 3: 283-284.] 13. Nelson, R.F., C. Keller, and R. Ratnaswamy. 2005. Locating and Estimating the Extent of Delmarva Fox Squirrel Habitat Using an Airborne LiDAR Profiler. Remote Sensing of Environment, 96(3-4); 292-301. 14. Nyyssönen, A., P. Kilkki, and E. Mikkola. 1967. On the Precision of Some Methods of Forest Inventory. Acta Forestalia Fennica 81. 60 pgs. 15. Nyyssönen, A., P. Roiko-Jokela, and P. Kilkki. 1971. Studies on improvement of the efficiency of systematic sampling in forest inventory. Acta Forestalia Fennica 116. 26 pgs. 16. Osborne, J.G. 1942. Sampling Errors of Systematic and Random Surveys of Cover-type Areas. Jour. American Statistical Assn. 37(218): 256-264. 17. Sukhatme, P.V., B.V. Sukhatme, S.Sukhatme, and C. Asok. 1984. Sampling Theory of Surveys with Applications. Iowa State University Press, Ames, Iowa. 526 pgs. 18. Wiant, H.V., and E.J. Harner. 1979. Percent Bias and Standard Error in Logarithmic Regression. Forest Science 25(1): 167-168. 6