Towards Small-footprint Airborne LiDAR-assisted Large Scale Operational Forest Inventory - A case study of integrating LiDAR data into Forest Inventory and Analysis in Kenai Peninsula, Alaska Yuzhen Li A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2009 Program Authorized to Offer Degree: College of Forest Resources University of Washington Graduate School This is to certify that I have examined this copy of a doctoral dissertation by Yuzhen Li and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made. Chair of the Supervisory Committee: _____________________________________________________ Gerard F. Schreuder Reading Committee: ______________________________________________________ Gerard F. Schreuder ______________________________________________________ David G. Briggs ______________________________________________________ Eric C. Turnblom Date: _____________________________ In presenting this dissertation in partial fulfillment of the requirements for the doctoral degree at the University of Washington, I agree that the Library shall make its copies freely available for inspection. I further agree that extensive copying of the dissertation is allowable only for scholarly purposes, consistent with “fair use” as prescribed in the U.S. Copyright Law. Requests for copying or reproduction of this siddertation may be referred to ProQuest Information and Learning, 300 North Zeeb Road, Ann Arbor, MI 48106-1346, 1-800-521-0600, to whom the author has granted “the right to reproduce and sell (a) copies of the manuscript in microform and/or (b) printed copies of the manuscript made from microform”. Signature____________________ Date________________________ University of Washington Abstract Towards Small-footprint Airborne LiDAR-assisted Large Scale Operational Forest Inventory -A case study of integrating LiDAR data into Forest Inventory and Analysis in Kenai Peninsula, Alaska Yuzhen Li Chair of the Supervisory Committee: Professor Gerard F. Schreuder College of Forest Resources Many studies have already demonstrated that small-footprint airborne LiDAR has the capacity to measure forest biophysical characteristics and the accuracy of the results is relatively consistent and independent of specific LiDAR systems. However, most previous studies were conducted in small research areas. To date, there have been relatively few examples of applying LiDAR to large area operational forest inventory because of the high cost and lack of methodology and expertise. The main objective of this research is to develop processing and analysis techniques to facilitate the use of small-footprint LiDAR data for large-scale Forest Inventory and Analysis (FIA) on the Kenai Peninsula of Alaska. Results from this study indicate that it is possible to develop parsimonious regression models for different forest types using three primary LiDAR metrics - mean height, coefficient of variation of height and canopy point density. LiDAR mean height represents canopy height in the field, coefficient of variation of height represents canopy depth, and canopy point density represents canopy cover. These three LiDAR metrics succinctly describe the 3D canopy structure and have clear biological interpretation. Forest aboveground biomass models using these three LiDAR metrics have R2 values ranging from 0.68 to 0.87 for three different forest types. This research also assessed plot position error and plot size on these three LiDAR metrics and predicted forest biomass through simulation. Results show that the accuracy of plot position and plot size are important factors affecting the accuracy and precision of LiDAR metrics and predicted biomass in heterogeneous forest stands. Results suggested that small position error is acceptable in homogeneous forest stands, but accurate field plot positions are necessary in heterogeneous forest stands. In the context of FIA, acquiring accurate coordinates for the subplots is not currently part of the standard plot protocol. If it is not possible to obtain accurate GPS locations for each subplot, linking LiDAR data with field measurements using larger plots, which encompass four subplots, may provide a way to characterize forest condition at similar scale as the combination of the four subplots. Finally, maps of predicted plot-level forest height over the whole study region were produced from both LiDAR data and field measurements, and the distribution of predicted stand height from field data is very similar to the distribution of predicted LiDAR mean height. In conclusion, the methodology and results presented in this dissertation demonstrate that it is feasible to integrating LiDAR data with existing FIA field plot network. Table of Contents List of Figures............................................................................................................iii List of Tables.............................................................................................................. v Chapter 1 Introduction................................................................................................ 1 1.1 Current FIA inventory scheme ......................................................................... 2 1.2 Using airborne LiDAR in forest inventory....................................................... 4 1.3 Research objective............................................................................................ 7 Chapter 2 Literature Review ...................................................................................... 9 2.1 LiDAR system .................................................................................................. 9 2.1.1 Airborne LiDAR system.......................................................................... 10 2.2 Airborne LiDAR application in forestry ........................................................ 12 2.2.1 Creating Digital Terrain Models in forested area.................................... 12 2.2.2 Deriving forest structure characteristics for forest inventory.................. 13 2.2.3 Applying LiDAR data in operational forest inventory............................ 18 Chapter 3 LiDAR-derived Metrics Selection ........................................................... 25 3.1 Introduction .................................................................................................... 25 3.1.1 LiDAR metrics selection ......................................................................... 25 3.1.2 Generality of LiDAR-based forest structure prediction models.............. 27 3.2 Data and methods ........................................................................................... 29 3.2.1 Study sites................................................................................................ 29 3.2.2 LiDAR data ............................................................................................. 31 3.2.3 LiDAR metrics selection methods........................................................... 33 3.3 Results ............................................................................................................ 35 3.3.1 LiDAR metrics selected by principal component analysis...................... 35 3.3.2 Model comparisons ................................................................................. 38 3.4 Discussion....................................................................................................... 40 i Chapter 4 Effects of Plot Position Error and Plot Size on LiDAR-derived Metrics and Predicted Biomass ............................................................................................. 44 4.1 Introduction .................................................................................................... 44 4.2 Data and methods ........................................................................................... 46 4.2.1 Unsupervised classification ..................................................................... 47 4.2.2 The contagion spatial variation index...................................................... 48 4.2.3 Simulation................................................................................................ 49 4.3 Results ............................................................................................................ 50 4.3.1 LiDAR patch classification and spatial variation.................................... 50 4.3.2 Effects of plot location error and plot size on LiDAR-derived metrics .. 56 4.3.3 Effects of plot location error and plot size on predicted biomass ........... 64 4.4 Discussion....................................................................................................... 65 Chapter 5 Forest Height Prediction from Field Measurement and LiDAR Data via Spatial Models .......................................................................................................... 70 5.1 Introduction .................................................................................................... 70 5.2 Study area and data description...................................................................... 71 5.3 Methods .......................................................................................................... 73 5.4 Results ............................................................................................................ 74 5.4.1 Empirical semivariogram model fitting................................................... 74 5.4.2 Spatial prediction..................................................................................... 76 5.4.3 Difference in predicted plot-level heights between field-based measurements and LiDAR-based measurements ............................................. 81 5.5 Discussion....................................................................................................... 83 Chapter 6 Conclusions.............................................................................................. 86 List of References..................................................................................................... 94 ii List of Figures Figure 1.1 The FIA national hexagon array (Bechtold and Patterson 2005).................. 2 Figure 1.2 FIA Phase 2 plot design (Bechtold and Patterson 2005) .............................. 3 Figure 3.1 Location of three study sites (denoted by black stars) ................................ 29 Figure 3.2 Results of plot-level LiDAR-based estimation of aboveground biomass for Capitol Forest (CF), Mission Creek (MC), Kenai Peninsula (KE) and the combined dataset with mean height, coefficient of variation of height and canopy point density as predictor variables.................................................................................................... 40 Figure 4.1 LiDAR patches classification results based on 5X5m grids without field plot location .................................................................................................................. 52 Figure 4.2 LiDAR patches classification results based on 5X5m grids with field plot center indicated by black asterisk................................................................................. 53 Figure 4.3 LiDAR patches clumped classification results based on 5X5m grids without field plot location.......................................................................................................... 54 Figure 4.4 LiDAR patches clumped classification results based on 5X5m grids with field plot center indicated by black asterisk ................................................................. 55 Figure 4.5 Contagion value for classified LiDAR patches........................................... 56 Figure 4.6 Mean of the differences between LiDAR-derived mean height from simulated plots and from original plots over 100 simulations. .................................... 57 Figure 4.7 Standard deviation of the differences between LiDAR-derived mean height from simulated plots and from original plots over 100 simulations............................. 59 Figure 4.8 Mean of the differences between LiDAR-derived canopy cover from simulated plots and from original plots over 100 simulations. .................................... 60 Figure 4.9 Standard deviation of the differences between LiDAR-derived canopy cover from simulated plots and from original plots over 100 simulations................... 61 Figure 4.10 Mean of the differences between LiDAR-derived coefficient of variation of height from simulated plots and from original plots over 100 simulations. ............ 62 iii Figure 4.11 Standard deviation of the differences between LiDAR-derived coefficient of variation of height from simulated plots and from original plots over 100 simulations.................................................................................................................... 63 Figure 4.12 Ratio of average residual from simulated plots versus the mean of fieldestimated biomass over 100 simulations ...................................................................... 65 Figure 5.1 Map of study area. Picture in the middle is LANDSAT ETM+ image for the study area and red circles indicate field plot locations ........................................... 72 Figure 5.2 Empirical semivariogram fitting of four aggregated plot-level height ....... 75 Figure 5.3 Maps of predicted plot-level heights from field measurements along with their standard error estimates........................................................................................ 78 Figure 5.4 Maps of predicted plot-level heights from LiDAR data along with their standard error estimates................................................................................................ 79 Figure 5.5 Empirical cumulative distribution function and kernel density function of predicted plot-level heights .......................................................................................... 80 Figure 5.6 Differences of predicted plot-level heights between field-based measurements and LiDAR-based measurements ......................................................... 82 Figure 5.7 Empirical probability density function of the differences of predicted plotlevel heights.................................................................................................................. 83 iv List of Tables Table 3.1 . Summary of field plots for three study sites............................................... 29 Table 3.2 LiDAR system specification for three study sites ........................................ 32 Table 3.3 Correlation between principal components and original LiDAR metrics .... 37 Table 3.4 Final above ground biomass regression models from different statistical methods......................................................................................................................... 39 Table 4.1 Proportion of classes in classified LiDAR patches (5mX5m resolution) .... 53 Table 4.2 Biomass regression models for 30 selected FIA plots based on original plot location for two different plot sizes.............................................................................. 64 Table 5.1 Summary of predicted plot-level height....................................................... 77 v ACKNOWLEDGEMENTS I would like to express my sincere appreciation to my major professor, Dr. Gerard F. Schreuder, for his time, patience and contributions to the successful completion of this research. I have been very privileged to get guidance and support from Dr. Schreuder, even after his retirement. My appreciations are also extended to Dr. Hans-Erik Andersen and Mr. Robert McGaughey for their valuable insights, technique support and constructive challenges they raised through this project. They made my graduate study the most enriching professional experience I have had so far. I also would like to thank the members of my supervisory committee, Professor David Briggs and Professor Eric Turnblom for their long-term profession and personal support. They both were on my Master committee before. Without their trust and support, I probably won’t be here today. My thanks also go to Stephen Reutebuch of the USDA Forest Service for his guidance. I also would like to thank Mr. Ken Winterberger from the Anchorage Forestry Science laboratory, USDA Forest Service PNW station, for helping with field data and answering many of my questions. I also would like to thank my graduate student friends - Jacob Strunk, Tobey Clarkin, Andrew Cooke, Akira Kato, Joowon Park, Tracey Marsh, Alicia Sullivan, Rapeepan Kantavichai and Nora Konnyu for their technical or mental support. I appreciate the financial support provided by Forest Inventory and Analysis program and Precision Forestry Cooperative. vi Finally, my deepest appreciation goes to my family, especially to my husband, my son, my parents for their love, support and sacrifices over this long journey. I thank you all from the bottom of my heart. vii 1 Chapter 1 Introduction Operational forest inventory collects information relating to forest resources at large landscape, regional, or national level. It includes estimating forest area, species composition, growth, mortality, and harvesting. The USDA Forest Service Forest Inventory and Analysis (FIA) is one example of operational forest inventory at national level. The FIA program of the USDA Forest Service conducts periodic surveys of forestland in the United States to determine its extent, condition, growth and removal both on private and public lands. As national and regional interest in assessing and monitoring ecosystem sustainability has grown in recent years, FIA is being asked to provide information on the distribution and trends of forest structure and diversity at more detailed levels with higher accuracy and shorter inventory cycle (American Forest Council 1992). This represents a significant challenge for the traditional ground-based forest inventory, especially in remote areas like Alaska where costs to install and remeasure ground plots are prohibitive. Airborne Light Detection and Ranging (LiDAR) is an active remote sensing technique that offers the potential to capture detailed three-dimensional information describing tree canopies over large area in a very short time. It has been reported repeatedly in the literature that airborne LiDAR can produce reliable and useful estimates of tree-level, average plot-level and stand-level forest inventory parameters, such as tree height, tree diameter distribution, stand volume, and biomass, especially in scientific projects conducted in forest types with relatively simple composition and structure, such as boreal forest. Attempts are also being made to introduce airborne LiDAR to operational forest inventory, especially in Nordic countries with promising results (Naesset et al. 2004, Naesset 2007). Integrating LiDAR into large-scale operational FIA may provide an efficient way to reduce the inventory cycle and increase the accuracy of estimates compared with the current FIA inventory design. 2 1.1 Current FIA inventory scheme Current FIA, as conducted by the US Forest Service, consists of three Phases. In Phase 1, remote sensing imagery, mostly aerial photographs, are used to stratify lands into forest and nonforest. Phase 2 involves field data collection. Permanent field plots are established based on a national array of approximately 6000-acre hexagons with each containing one permanent ground plot (Figure 1.1) (Bechtold and Patterson 2005). All vegetated plots that fall on National Forest Systems land and forested plots on other lands will be ground sampled. At each ground plot, a cluster of four circular subplots arranged in a fixed pattern is established (Figure 1.2). Tree and plot measurements are collected. Phase 3 is designed to assess forest health by sampling a subset of Phase 2 plots. Approximately one out of every 17 Phase 2 plots is identified as a Phase 3 plot and measures related to forest ecosystem health, such as tree crown, soil, lichen and down woody debris, are collected. Figure 1.1 The FIA national hexagon array (Bechtold and Patterson 2005) 3 Figure 1.2 FIA Phase 2 plot design (Bechtold and Patterson 2005) The Phase 2 plot design consists of a cluster of four circular subplots (Figure 1.2). The subplots are 1/24 acre in size with a radius of 24.0 ft. The center subplot is subplot 1. Subplots 2, 3 and 4 are located 120.0 ft horizontal at azimuth of 0, 120 and 240 degrees, respectively from the center of subplot 1. The center of subplot 1 (plot center) is obtained using a Global Positioning System (GPS) receiver while the centers of other subplots are often obtained using tape and compass based on the horizontal distance and azimuth among subplots. Data on trees with diameter of 5.0in or greater is collected on each subplot. Each subplot contains a microplot of approximately 1/300 acre in size with a radius of 6.8ft. The center of the microplot is offset 90 degrees and 12.0 ft horizontal from each subplot center. Sapling and seedlings are measured on each microplot. Field plots may also include annular plots of ¼ acre in size with a radius of 58.9 ft with annular plot center coinciding with each subplot center. Annular plots are used to sample rare events, such as very large tree (USDA Forest Service 2003). This clustering pattern was designed to sample more local variation and at the same time overcome the constraints of cost and time associated with simple random 4 sampling, since travel expenditures makes up most of the inventory cost (Birdsey1995, Bechtold and Patterson 2005). Currently, the FIA inventory design is essentially ground-plot based and inventory parameters for large areas are estimated by applying statistical estimating models. This involves logistically-complex labor-intensive field work and often incorporates intricate sampling schemes and extrapolation efforts due to the inherent complexity of forest areas. The costs in terms of money, time and labor are huge. In addition, FIA’s sampling design has an intensity of one plot per approximately 6000 acres and is assumed to produce a random, equal probability sample. Traditionally, the FIA program has reported estimates of forest attributes for states and counties. Due to the low sampling rate (1/24*4/6000*100 =0.003%), FIA sample design may not be adequate to capture the forest spatial variability at large scales. Some studies already indicate that current FIA plot size is not big enough to capture density of large trees and snags, species richness and mortality in mature old-growth Douglas-fir stands (Gray 2003). Finally, the number of FIA field plots available for model development is often constrained by accessibility and cost. 1.2 Using airborne LiDAR in forest inventory LiDAR actively transmits beams of light toward an object of interest, and receives the light that is scattered and reflected by objects in its path. The difference in time from transmission to reception is used to calculate the distance (range) of the object by multiplying the time differential by the speed of light. Unlike passive remote sensing, LiDAR does not image reflected or emitted solar radiance from objects in a given scene, instead it systematically emits near infrared laser pulses and records the georeferenced x, y, and z coordinates and intensity of the reflections, resulting in a highdensity and high-accuracy 3D point cloud (Flood and Gutelius 1997). 5 In a forested area, the ability of some laser pulses to penetrate partly into and possibly through the forest canopy to produce several separately recordable reflections provides the theoretical basis for analysis of three-dimensional (3-D) forest structure using LiDAR measurements (Ackermann 1999). The 3-D characteristics of LiDAR data make it possible to measure the vertical dimension of the canopy, which is difficult to measure in ground surveys or using aerial photographs. At the same time, it is easy to filter out the ground returns in LiDAR data to avoid mixing ground and canopy reflections, which is a common problem in 2D photograph and satellite image. In addition, compared to field-based inventory, LiDAR data has other advantages, such as the short data acquisition and processing time, extensive area coverage, precise georeferenced location and highly accurate measurements. All these features make it very attractive to the forestry community. In recent years, the topic of using of airborne LiDAR to describe forest structure characteristics has been widely studied. Numerous studies have shown that mean tree height and canopy height distribution can be directly retrieved from LiDAR data at the plot level. Other important structure characteristics, such as stem volume, basal area, stand density, aboveground biomass and canopy fuel parameters can be estimated by regression techniques at acceptable accuracy and precision (Naesset 2002, Andersen 2003, Holmgren 2004, Maltamo et al. 2004, Naesset et al. 2004, Andersen et al. 2005). Also there have been some attempts to extract individual tree attributes from LiDAR measurements (St-Onge 2000, Young et al. 2000, Hyyppa et al. 2001, Lim et al. 2001, Magnussen et al. 2001, Popescu et al. 2002, 2003 and Holmgren et al. 2003, Brandtberg 2007), including species classification (Brandtberg et al. 2003), individual tree measurements (Persson et al. 2002) and growth (Yu et al. 2004a). Many experiments have already demonstrated that LiDAR has the capacity to measure forest biophysical characteristics and the accuracy of the results is relatively consistent and independent of specific LiDAR systems. Before transiting from research to practical 6 application, more comprehensive research efforts are needed to assess LiDAR performance over large area. Study over large area is necessary to determine the actual accuracy of predictions of forest stand attributes using LiDAR measurements. However, it is unusual to have accurately georeferenced field plots available over large regions. The USDA Forest Service FIA provides a unique opportunity because FIA establishes and maintains nationwide field plot network and collects field data using the same field protocol. In addition, FIA ground plots distributed over all forested area in the nation (with a few exceptions, such as interior Alaska and Hawaii) cover a complete range of forest condition and these existing plots are measured over time. Field measurements collected by FIA can be used to develop and validate LiDAR analysis models, whereas LiDAR techniques make the data that previously was available only for a few ground plots available for a much larger region, providing information on spatial variability which is hard to capture using field plots alone. However, there are some problems integrating LiDAR data with FIA field plots. The common procedures using LiDAR data require accurate locations for field plots. Traditional ground-based FIA plots are not designed to provide spatially explicit information, and the plot location information recorded in the field only serves as an approximate reference for locating the plot for the next visit. The accuracy of these locations is usually poor. In addition, because of the canopy interference with GPS reception, it is often difficult to obtain accurate ground positions, especially under dense canopy. Errors of 10 meters in current FIA plot records are not uncommon and some plots may have position error as high as 50 meters (Reutebuch et al. 2005). For example, investigators in the North Central FIA unit found the average separation distance of 1145 remeasured FIA plots was 13.6m with standard deviation of 46.2m (Hoppus and Lister 2006). This makes it difficult to georeference LiDAR data with FIA field plots and presents a challenge to integrate LiDAR data with FIA field 7 measurement. There are some studies acknowledging the possible position error of the ground reference plots (Brandtberg et al. 2003, Holmgren et al. 2003) and their recommendation is to use high-precision GPS units. For the FIA field plot network, considering the large number of plots involved, the cost of GPS equipments and weight for field crews to carry them are high, thus acquiring the accurate plot location using high-precision GPS is expensive, especially for the remote areas. 1.3 Research objective This study explores the feasibility of using multiple-return small-footprint LiDAR data to support large scale forest inventory in southcentral Alaska USA, where field work is expensive and useful satellite imagery is not easy to obtain due to persistent cloud cover. The main objective was to investigate the utility of small-footprint LiDAR data in the context of large-scale assessment and monitoring of forest height and biomass, especially in remote regions such as Alaska. This study examined three important questions regarding the use of LiDAR in the context of operational forest inventory: 1) is it possible to select a small set of LiDAR metrics which have strong prediction power and also have clear biological interpretation? 2) what are the effects of plot position error and plot size on derived LiDAR metrics and predicted plot-level biomass? 3) what are the differences between predicted plot-level heights based on operational field inventory and on LiDAR measurements when compared over a large region using spatial modeling? This study attempts to make a contribution to the boarder field of forest measurements through the innovative application and analysis of LiDAR for forest inventory, especially for large-scale operational forest inventory where accurate field plot positions are not available. Results from this study will provide valuable information regarding the usability of LiDAR for the US forest Service FIA program given the operational constraints of the current FIA field plot design. 8 The main study area is located in the west of the Kenai Mountains, Kenai Peninsula, in south central Alaska. This area covers approximately 3000 square miles and primary forest types are white spruce (Picea glauca), black spruce (Picea mariana), paper birch (Betula papyrifera) and mixed spruce and birch. A total of 105 FIA permanent field plots located in this area were used in this study. In addition, two other small areas were also used as supplementary samples. One is a 5.2 km2 study area within the Capitol State Forest, western Washington State. This area is dominated by Douglas-fir (Pseudotsuga menziesii) and western hemlock (Tsuga heterophylla). The other area is located in the Mission Creek watershed, in the eastern Cascade Mountains of Washington State. The main species are Douglas-fir and Ponderosa pine (Pinus ponderosa) with scattered grand-fir (Abies grandis). The remainder of this dissertation is organized as follows: Chapter 2 reviews the relevant literature on LiDAR systems and methods used to apply LiDAR in forest measurement and inventory. Chapter 3 describes three methods to select LiDAR metrics-stepwise regression, principal component analysis and Bayesian Model Averaging and presents forest aboveground biomass models using selected LiDAR metrics. Chapter 4 describes a simulation approach to examine the effects of plot position error and plot size on derived LiDAR metrics and predicted plot-level biomass. Chapter 5 applies spatial modeling techniques to produce maps of predicted plot-level height over western Kenai, and then compares predicted heights from operational field inventory and LiDAR measurements. Chapter 6 summarizes the main conclusions and implications of the research. The limitations of this study are also discussed in Chapter 6. 9 Chapter 2 Literature Review 2.1 LiDAR system LiDAR is an active laser remote sensing technology. It transmits laser pulses typically in the infrared wavelengths toward an object of interest at high frequencies, and receives the light that is scattered and reflected by objects in its path. The difference in time from transmission to reception is used to calculate the distance (range) of the object by multiplying the time differential by the speed of light. A typical LiDAR system consists of three main components: a Global Positioning System (GPS) to provide position information, an Inertial Navigation System (INS) for attitude determination and a laser scanner to provide the range from the laser-beam firing point to its footprint (Bang et al. 2008). By varying the wavelength of the light transmitted, pulse frequency and duration, and other factors, LiDAR can be used in a variety of applications to detect numerous substances. Scanning laser systems may be mounted on different platforms: on a tripod (terrestrial LiDAR system), on aircraft (airborne LiDAR system), or on satellite (space-borne LiDAR system). Ground-based laser scanning is used to capture very high-resolution data describing architectural details in construction projects. Ground-based laser scanning systems have been used in forestry research, and they can provide detailed reconstructions of trunk, branch and leaf distribution from which tree locations, diameter and height, timber volume and canopy gap fraction can be quantified (Hopkinson et al. 2004, Danson et al. 2008, Litkey et al. 2008), but the complexity of forest scenes makes analysis very complicated. Space-borne LiDAR systems have often been used in atmospheric research and a few large-scale ecosystem studies (Blair et al. 2001, Lefsky et al. 2002, Boudreau et al. 2008). Due to limited data availability and coarse resolution, there are not many studies that apply space-borne LiDAR data 10 for forest inventory (Pflugmacher et al. 2008, Pang et al. 2008). Airborne LiDAR systems are commercially available and have been used to map and model terrain elevation. In the past two decades, airborne LiDAR systems have been used to model forest canopy structure and function, mostly in the scientific research projects (Lefsky et al. 1999, 2002, Næsset 2002, Drake et al. 2003, Holmgren 2004, Lim and Treitz 2004, Maltamo et al. 2004, Mean et al. 1999, Næsset et al. 2004, Andersen et al. 2005). There are also some efforts to promote airborne LiDAR system in operational forest inventory, especially in Scandinavia counties (Naesset 2007). 2.1.1 Airborne LiDAR system In airborne laser scanning, a swath of terrain under the aircraft is surveyed through the lateral deflection of the laser pulses and the forward movement of the aircraft. The scanning pattern within the swath is established by an oscillating mirror or rotating prism which causes the pulses to sweep across in a consistent pattern below the aircraft (Baltsavias 1999b). Baltsavias (1999a) and Wehr and Lohr (1999) presented the basic principles and formulas of airborne LiDAR. Baltsavias (1999b) compared laser scanning to photogrammetry in the following aspects: sensors, platforms, flight planning, data acquisition conditions, imaging, object reflectance, automation, accuracy, flexibility and maturity, production time and costs, and concluded that the two technologies are fairly complementary and their integration can lead to more accurate and complete products. There are two main categories of airborne LiDAR systems: small-footprint discretereturn LiDAR and large-footprint, waveform-recording LiDAR. Small-footprint discrete-return LiDAR devices measure either one (single-return systems) or a small number (multiple-return systems) of heights by identifying, in the return signal, major peaks that represent discrete objects in the path of the laser illumination. The distance corresponding to the time elapsed before the leading edge of the peak(s), and sometimes the power of each peak, are the typical measurements recorded by this type 11 of system (Wehr and Lohr 1999). Large-footprint waveform-recording devices record the time-varying intensity of the returned energy from each laser pulse, providing a record of the height distribution of the surfaces illuminated by the laser pulse. Small-footprint discrete-return LiDAR systems have a small Instantaneous Field Of View (IFOV) which is usually between 0.2m and 0.9m. The small diameter of their footprint and the high repetition rates of these systems together can yield dense distributions of sampled points. Thus, discrete-return systems are preferred for detailed mapping of ground and canopy surface topography (Flood and Gutelis 1997). Another advantage is their ability to aggregate the data over areas and scales specified during data analysis, so that specific locations on the ground, such as a particular forest inventory plot or even a single tree crown, can be characterized. Finally, discrete-return systems are readily available, with ongoing and rapid development. Large-footprint waveform LiDAR systems have a large IFOV which is usually 5m or larger (although small-footprint waveform LiDAR systems are starting to emerge). Waveform-recording LiDAR systems record the entire time-varying power of the return signal from all illuminated surfaces and are therefore capable of collecting more information on canopy structure than all but the most spatially dense collections of small-footprint LiDAR. In addition, waveform-recording LiDAR integrates canopy structure information over a relatively large-footprint and is capable of storing that information efficiently, from the perspective of both data storage and data analysis. Finally, waveform-recording LiDAR is currently being collected globally from the spaceborne ICESat system (Lefsky et al. 2002). Means et al. (2000) gives a good comparison between small and large footprint LiDAR by examining them with respect to their design, capabilities and uses, especially in the context of forestry application. The primary differences between small- and large-footprint LiDARs involve the scale and resolution of terrain and vegetation characterization. 12 The technical capabilities of LiDAR systems have increased rapidly. Baltsavias (1999a, 1999b) reviewed existing commercial systems and firms ten years ago. All systems have been improved since then. For example, for small-footprint LiDAR systems, the industry standard has advanced from systems emitting 5000 pulses per second and measuring a single return to those emitting between 75,000 and 250,000 pulses per second, and measuring up to seven returns, with most recording the intensity of each return (Moffiet et al. 2005). Today most LiDAR systems can record multiple pulses. Some systems have an integrated digital camera to provide digital images that can be used in bare earth modeling and feature classification procedures. 2.2 Airborne LiDAR application in forestry Over forested areas, most of the laser pulses are reflected by the leaves and branches of the trees, but a certain fraction of the laser pulses can pass partly or through forest canopy and reach the forest floor through small gaps in the canopy. Thus it is possible to reconstruct both the three dimensional structure of forest canopy and the terrain surface under canopy using LiDAR point cloud data. 2.2.1 Creating Digital Terrain Models in forested area To generate Digital Terrain Models (DTM), terrain LiDAR points have to be separated from vegetation LiDAR points. Various filtering algorithms have been proposed. Kilian et al. (1996) generated a DTM based on mathematical morphology operations for comparing height differences. Kraus and Pfeifer (1998) used a discriminate function and introduced linear prediction into the DTM generation. Axelsson (1999, 2000) described a method based on progressive densification of a triangular irregular network (TIN), a surface is allowed to fluctuate within certain values and points from the point cloud are added to the TIN during iteration, these iterations proceed until no further low ground points can be added. Vosselman (2000) proposed a slope based 13 filtering method. The basic idea behind his algorithm is that a large height difference between two nearby points is unlikely to be caused by a steep slope in the terrain, therefore the higher point has a high probability of being a non-ground point, such as a vegetation hit. Wang et al. (2007) introduced a Guassian-fitting model to identify ground returns. The filtering algorithm used by the commercial software TopScan is an iterative procedure that first computes a rough terrain model from the lowest LiDAR points found in a moving window of a rather large size. All points with residuals exceeding a given threshold are filtered out, and a new DTM is calculated from the remaining points. This step is repeated several times, reducing the window size with each iteration (Petzold et al. 1999). In terms of terrain model accuracy, Kraus and Pfeifer (1998) report a vertical root mean squared error (RMSE) of 57 cm for a wooded area in Austria. In a study over open areas with flat hard surfaces, Pereira and Janssen (1999) report accuracies of 15 cm. In a study under a conifer forest canopy in western Washington, Reutebuch, et al. (2003) reported overall ground surface errors of 22 cm with a standard deviation of 24 cm under a variety of canopy densities. Despite intense efforts in the creation of high-resolution DTMs from LiDAR data, the characterization of terrain topography under dense forest conditions remain challenging. 2.2.2 Deriving forest structure characteristics for forest inventory There are two main approaches for deriving forest characteristics using small-footprint discrete LiDAR: individual tree delineation approach and plot-level regression model based on LiDAR canopy height distribution approach (Reutebuch et al. 2005, Packalen et al. 2008). The former approach is usually used for high resolution LiDAR 14 data with 5-10 LiDAR returns per square meter, and the latter is used for low resolution LiDAR with about one LiDAR return per square meter (Packalen et al. 2008). 2.2.2.1 Individual tree identification and single-tree properties derivation A common method in individual tree delineation is to detect trees from an interpolated canopy height model by locating local maxima of the height values. After that, trees are segmented around the local maxima using some kind of region growing algorithm (St-Onge 2000, Young et al. 2000, Hyyppa et al. 2001, Magnussen et al. 2001, Popescu et al. 2002, 2003 and Holmgren et al. 2003). Once treetops are located, tree height can be obtained by subtracting corresponding heights from the DTM. Tree diameter and crown area can be predicted using their relationship with tree height and tree volume can be calculated using estimated diameter and height (Hyyppa and Inkinen 1999, Persson et al. 2002). Other individual tree parameters, such as the height to crown, are also derived from LiDAR points (Maltamo et al. 2006, Popsecu and Zhao 2008). Estimated height using individual tree extraction approaches is usually lower than field measured tree height (Nilsson 1996, Perssson et al. 2002, Andersen et al. 2006). Hyyppa and Inkinen (1999) reported a standard error of less than 1m for the estimated height of overstory coniferous trees, and Perssson et al. (2002) reported much less than 1m and Brandtberg (1999) reported slightly more than 1m for a test using Norway spruce. Tendency of underestimation of height is probably due to 1) light transmitted by the laser will usually penetrate the outer surface of the tree crowns before a significant return signal is recorded and 2) a large portion of the pulses will be reflected from the lower part of the visible tree crowns. Factors that influence the quality of the 3D single tree extraction algorithm are the density of the raw point cloud and the forest condition. Higher point density will 15 improve the accuracy of tree extraction and better results can be expected for a less dense forest stands. Magnussen et al. (1999) proposed that if 6–10 laser hits per tree crown are obtained, individual trees may be detected. The biggest challenge when using a canopy height models to identify individual trees is that neighboring trees are often not separated so a tree group instead of a single tree is often formed (Young et al. 2000). Moreover, only the dominant tree layer can be detected, and smaller trees in the intermediate and lower height level cannot be recognized since they are invisible in the canopy height model. Some attempts have been made to improve this. In the study by Maltamo et al. (2004), a theoretical distribution function was used to produce estimates of timber volume and number of stems. Assuming a Weibull distribution of the tree height, large trees were obtained with individual tree delineation whereas the number of undetected small trees was predicted from the Weibull distribution. They reported that the accuracy of estimated stand volume and stand density were improved. There are some new efforts to extract individual trees using raw LiDAR points instead of canopy height model. Wang et al. (2007) claimed that their procedure can detect trees in the lower canopy layer. But they didn’t test the accuracy due to the lack of field data. There are some studies on species differentiation using LiDAR data. Brandtberg et al. (2003) classified three deciduous species: oaks, red maples and yellow poplars using LiDAR intensity data and relative height differences between the first and last vegetation returns. Holmgren and Persson (2004) classified Scots pine and Norway spruce using the structure and shape of the tree crowns and intensity data. Moffiet et al. (2005) conducted exploratory data analysis to assess the potential of laser return type and return intensity as variables for classifying white cypress pine and poplar box. Brandtberg (2007) presented a new approach - directed graph for tree species classification and tried to develop a theoretical framework based on the laser interaction with trees. An improved classification accuracy of 64% was reported for 16 three leaf-off individual tree species: oak, red maple and yellow poplar. However, Holmgren et al. (2008) reported that laser scanner data alone do not provide enough information to enable tree species classification at the individual tree-level and the best discrimination can be obtained when using a combination of LiDAR and multispectral data. The identification of a range of species or of distinct trees in more heterogenerous forests has yet to be demonstrated using LiDAR data. Most individual tree based approaches have been applied in coniferous managed forests and moderate success has been achieved in delineating trees and predicting certain metrics. However, it is still difficult in natural forests since the distribution of trees on different species and size classes is complex. 2.2.2.2 Plot-level regression-model-based forest structure measurement Three dimensional LiDAR data represent measurements of reflecting surfaces within forest canopy. Canopy structure characteristics, such as canopy height profile and canopy LiDAR point density distribution, have been successfully derived and used to estimate forest stand characteristics, such as basal area, stand density, tree diameter distribution, stand volume, aboveground biomass, and canopy fuel parameters (Lefsky et al. 1999, 2002, Næsset 2002, Drake et al. 2003, Holmgren 2004, Lim and Treitz 2004, Maltamo et al. 2004, Næsset et al. 2004, Andersen et al. 2005, Bollandsas and Naesset 2007). The most popular procedure described in the literature is to apply multiple linear regression techniques to relate the spatial distribution of LiDAR returns to coincident plot-level stand inventory variables. Naesset (2002) presented a two-stage LiDAR-based stand inventory procedure that has been widely adopted. In the first stage, individual canopy height distributions of LiDAR points were created for each training plot and regression relationships between stand structure variables and LiDAR metrics extracted from canopy height distribution were developed. In the second stage, all forest stands in the LiDAR acquisition area 17 were divided into a grid of cells with cell size equal to the training plot size. Based on the developed empirical regression model, stand structure variables were predicted for each cell and final stand estimates were computed as the average or total values of the individual cell prediction. In studies carried out across a wide variety of different forest types in North America, Japan, Europe, Australia, and Canada, LiDAR-derived canopy structure metrics have been shown to be highly correlated with forest inventory variables and most reported coefficient of determinations are greater than 0.6 (Lefsky et al. 1999, 2002, Næsset 2002, Drake et al. 2003, Wulder 2003, Holmgren 2004, Lim and Treitz 2004, Maltamo et al. 2004, Næsset et al. 2004, Andersen et al. 2005, Tickle et al. 2006, Bollandsas and Naesset 2007). Experiences from leading research conducted in Scandinavia indicate that laser-based stand inventory is able to produce stand information with accuracies superior to those of conventional methods based on fieldwork and aerial photo-interpretation (Naesset 2007). Besides deriving inventory variables in structurally homogeneous single-layer forests, there are some attempts to quantify forest structure in heterogeneous multi-layer forests. Zimble et al. (2003) studied the possibility of using height variance from LiDAR data and field data to distinguish the single story and multi-story forest. Riano et al. (2003) used cluster analysis to separate understory trees and overstory trees. Maltamo et al. (2005) applied a histogram thresholding method to the height distribution of laser hits to separate different tree layers. Current research shows a trend toward a combination of laser data and spectral imagery. LiDAR can be used to locate and describe properties of tree crowns while spectral imagery is used to enhance species classifications (Lim et al. 2001, Holmgren et al. 2003, St-Onge 2003, Popescu et al. 2004). Using a combination of LiDAR data and aerial photograph, Packalen and Maltamo (2007) tried nonparametric k-most 18 similar neighbor method to predict species specific forest variables such as volume, stem density, basal area median diameter and tree height. The results showed similar accuracy to results from the current stand-level field inventory in Finland. The characteristics of Scots pine and Norway spruce were predicted more accurately than those of deciduous trees. 2.2.2.3 Comparing individual tree and plot-level regression model approaches Individual tree identification-based approaches offer the possibility of providing individual tree-level parameters for all trees and are operationally preferable since most existing inventory systems rely on individual tree information. In addition, individual tree approaches represent an improvement over current inventory sampling methods based on data collected on a small number of intensively monitored plots and then applied broadly across the forested landscape. However, there is still a long way to go before the individual tree-based approaches can be used in operational practice. Plot-level regression modeling is straightforward and suitable for operational use. Many plot-level regression models have been developed for different forest types. One disadvantage is that accurate ground plot position is required for model development. Packalen et al. (2008) compared an individual tree detection approach and plot-level regression modeling using the same dataset from managed boreal forests in Finland. They concluded that both approaches produced equally accurate estimates of stem volume and Lorey’s height. However, stem density estimates were less accurate with both approaches. In particular, the individual tree approach had a large bias and RMSE and underestimated stand density. 2.2.3 Applying LiDAR data in operational forest inventory So far LiDAR applications in forestry are concentrated largely in scientific research projects. Application of LiDAR for operational forest inventories has been relatively limited because of the high cost involved, and the lack of methodology and expertise. 19 The increasing availability of commercial LiDAR systems, decreasing cost, and recognition of the wide range of information that can be obtained from LiDAR data have led to an increase in utilization. Operational applications have occurred in recent years, especially in Northern Europe. The first operational application in the world was developed in Norway by Prof. Erik Naesset from Agricultural University of Norway (Naesset 2002, 2004). Now this method is commercially marketed and implemented in Norway. A survey company in Norway called Prevista (http://www.prevista.no/) offers services using LiDAR for large area operational forest surveys and it claims deliver many important forest stand characteristics at competitive price, such as volume per acre, mean diameter, diameter distribution, dominant and mean height, basal area and number of stems. It has completed several projects for Norway and Sweden. The method they use is a two-stage procedure (Naesset 2002). In the first stage, georeferenced field plots and LiDAR plots were used to develop stratum-specific empirical relationships between various metrics derived from the laser data and tree characteristics measured in the field. Such relationships are extrapolated in the second stage to provide corresponding estimates for each stand. Their method is intended for use in area-based inventories where the aim is to provide estimates of growth and volume in each stand for the purposes of forest management planning. So far six projects have been conducted using this method with the largest project covering a total area of 49,000ha (Naesset 2007). It was reported that the differences between LiDAR predicted and ground reference values are from -0.58 to -0.85m for mean height (standard deviation: 0.64 to 1.01m), -0.60m to -0.99m for dominant height (sd: 0.67 to 0.84m), 0.15 to 0.74cm for mean diameter (sd: 1.33 to 2.42cm), 34 to 108ha-1 for stem number (sd: 97 to 466ha-1), 0.43 to 2.51m2ha-1 for basal area (sd: 1.83 to 3.94m2ha-1), and 5.9 to 16.1 m3ha-1 for volume (sd: 15.1 to 35.1m3ha-1) (Naesset 2004). This procedure depends on precise locations of field plots in the first stage 20 (Naesset 2002, 2004). The main conclusion from Nordic countries is that the tested procedures, although slightly different between countries and validated with data from different laser instruments, seem to be robust for use in practical inventories over large areas, at least if the forest is dominated by coniferous species. Topographic variability and variability in laser sampling density seem to have limited impact on the applicability of the stand based procedures. The bias seems to be at an acceptable level, and the precision for most of the evaluated stand characteristics is higher than those obtained using traditional inventory methods. The methods are also superior to conventional inventory methods as far as inventory costs and data utility are concerned (Nesset et al. 2004). In the USA, Parker and Evans (2004) proposed a double sampling method of using LiDAR for forest inventory and they applied this method in a 1,200 acre forest in Louisiana (Parker and Glass 2004, Parker and Mitchel 2005) and 5,000 acre timberland in central Idaho (Parker and Evans 2004). Unlike Naesset’s methods, Parker’s method is essentially individual-tree based. Individual trees were selected from LiDAR data using a focal filter procedure from a smoothed LiDAR canopy surface and tree height was calculated as the difference between interpolated canopy and DTM surface (Parker and Mitchel 2005). Relationships between Diameter at Breast Height (DBH) and height, estimated from field data, were applied to LiDAR tree heights to predict DBH for LiDAR trees. Basal area and volume were then calculated on both coincident LiDAR and field plots, and they are used as auxiliary variables in the double-sampling to predict the variable of interest, such as volume of the total area. They reported that there was no statistical difference on adjusted mean volume estimates between high density (four hits per 1m2) versus low-density (one hit per 1m2) LiDAR data, even though it appears that tree heights from high-density LiDAR more closely approximate ground-measured height. They also reported sampling errors of 8.16% versus 7.60% without height adjustment and 8.98% versus 8.63% with height adjustment on the Lousiana site for height and low-density LiDAR, 21 and 11.5% sampling error on the Idaho site for low-density LiDAR. In their study, using adjusted height increases sampling error of the double-sample volume regression estimates, which is kind of unusual. Their explanation is that the doublesample procedure adjusts the bias between phase 1 (LiDAR plots) and phase 2 (Ground plots) volume estimates and any additional error introduced by the height adjustment affects the regression estimation. The height adjustment removed the bias in the LiDAR height estimate, thus dampening the inherent variation in heights and volume that is normally adjusted by the regression estimator. In addition, they used a Monte Carlo simulation to randomly assign LiDAR height measurements to species group and achieved volume distribution across species, which is an interesting approach to provide species information in LiDAR data applications. However, they didn’t provide complete validation for their methods. Nelson et al. (2004) used first-return data from an airborne laser profiler combined with a video camera to estimate forest volume and biomass by line intercept sampling method in Delaware (5205 km2). Instead of depending on accurate registration between airborne laser and ground transects, they defined an inventory procedure based on a canopy simulator which uses mapped ground tree data to recreate a canopy model of the ground plots at 0.25m*0.25m resolution. Then linear and multiple regression equations were fit between ground measurements and simulated laser measurement on ground transect. Finally these relationships were used in conjunction with airborne laser data acquired over the study site to produce regional estimates (Nelson et al. 1997). The original test in tropical forest of Costa Rica didn’t produce good results. On two of three study sites, the laser estimates of basal area, volume and biomass grossly misrepresented ground estimates. Estimations on the third site were within 24% of ground estimates. In the test in Delaware (Nelson et al. 2004), they reported that merchantable volume estimates from the LiDAR profiler were less than US Forest Service estimates by 15% statewide and 22% at the county level. Total above-ground dry biomass estimates were within 22% of USFS estimates at the 22 county level and within 20% at state level. In general the relationships developed in their study are not as strong as those obtained in most other LiDAR studies. Some of the reasons are probably because they only used first return from LiDAR profiling system, while others use multiple returns from a LiDAR scanning system. The profiling system only collects data from a narrow strip beneath the platform. Recording only first returns limited their capability to extract accurate digital terrain model. In addition the first returns only contains information on canopy height, not on canopy vertical structure. In Australia, LiDAR, combined with large scale photography, was used to quantify the species distribution and forest structure in a 220,000 hectare area (Tickle et al. 2006). LiDAR and photography were acquired over 150 primary sampling units of size 7.5 ha (500m*150m). Photography was used for species interpretation and forest type stratification, and LiDAR was used for extracting canopy height information. Each of the 150 primary sampling units was then subdivided into 30 systematically numbered secondary sampling units which were 50m*50m in area. Based on the stratification results, and considering access condition and travel time, a total of 34 secondary sampling units were established as ground plots. Regression relationships on height, foliage/branch projected cover and foliage projected cover were developed between coincident LiDAR plots and ground plots at individual tree and stand level. The R2 values are were high and the regression relationships were then extrapolated to the whole area. After comparison with several existing survey systems, they claimed sampling with photography and LiDAR, either singularly or in combination, provided similar estimates at the broad levels but also allowed access to more detailed record. For example, based on species interpretation from large scale photography and structure estimates from LiDAR, they found that Callitris and Angophora dominated higher height classes while Acacias generally dominated the lower height classes. This kind of detailed information is impossible to get from the traditional inventory. 23 Similar efforts on large-scale LiDAR forest inventory are also reported in Canada (Wulder 2003). Among all such efforts, the only operational application is the one in Norway. Olsson (2003) attributed Norway’s success to several positive factors, such as coniferous dominated landscape; researchers working with laser scanning of forest resources; a surveying company owning a modern laser scanner and the presence of state subsidies to help coordinate large forest mapping efforts among many land owners. For most other places, despite the promising results from intensive research efforts, there is still a long way to go before applying LiDAR to large-area forest survey operationally. There are several reasons. First, the current cost of LiDAR data is still high especially for high point density (about $1 per acre); second, there is a lack of documented relationships between forest canopy structure measured by LiDAR and the forest structure measured in the field over large areas (Lefsky et al. 1999). Most previous studies have a relatively small number of field plots from a restricted area, the accuracy and precision in predicting forest stand attributes may be overestimated, both by the small sample size and the relative uniformity of species composition and environmental condition over these small study areas; third, the lack of knowledge on the relationship between LiDAR system settings and measurement precision. Studies so far have concentrated on linking coincident LiDAR and field plots together and testing what measurements LiDAR is best suited for (Popescu et al. 2003). There haven’t been many efforts on studying how to take full advantage of the wall-to-wall mapping of forest structure provided by LiDAR, such as assessing spatial variability across large area to guide field plot distribution. There are three main advantages when using LiDAR data in large-scale operational forest inventory. First, LiDAR data can be collected quickly over large areas and is readily amenable to automated processing and analysis, so information can be quickly updated when changes happen. Second, laser data can be used to extend a limited ground sampling effort over areas that may not be easily accessible by ground 24 inventory crews. Third, LiDAR data don’t have saturation problems (Nilson & Peterson, 1994), commonly seen in passive sensed image products. 25 Chapter 3 LiDAR-derived Metrics Selection 3.1 Introduction Given the anticipated decline in the cost of LiDAR data collection in the near future, it is expected that LiDAR data will be an increasingly useful tool in forest inventory. In a few years, the use of LiDAR data may be as commonplace as the use of aerial photos and topographic maps today. However, most published LiDAR studies focus on developing empirical regression relationships between LiDAR metrics and forest structure field measures and do not consider LiDAR metric selection and biological interpretation explicitly. In addition, most LiDAR-based models were developed within a relatively small study area. Little work has been done to assess the generality of these models across different forest types and regions. In order for LiDAR data to be useful as an operational tool in forest management, these questions have to be addressed. 3.1.1 LiDAR metrics selection Forest canopy is the photosynthetic powerhouse of forest productivity and it is closely related to what is commonly referred to as stand structure - defined as the size and number of woody stems per unit area, and related statistics (Oliver and Larson 1996). The close connection between canopy structure and woody stems provides the biological basis for the strong regression relationship between LiDAR-derived (canopy-based) structural metrics and field measurements of stand structure (woody stem-based). However, due to the complex 3D structure (position and orientation) of forest canopy components and the variation in reflectivity between leaves, branches, and twigs within tree crowns, interactions between canopy and laser pulses are very complex. A few exiting studies have attempted to describe the laser photon interaction 26 with forest canopy using SLICER large-footprint waveform data (Ni-Meister et al. 2001), but physical models using small-footprint discrete-return LiDAR data are not yet available, although with increases in pulse rate and data density this might become possible in the future. The most popular procedure described in the literature is to apply multiple linear regression techniques to link LiDAR canopy structure metrics with coincident forest stand field measurements. The large number and complex spatial arrangement of LiDAR returns over forest canopies can result in a large set of potential predictor variables for regression analysis. As an example, a total of 46, 44 and 39 LiDAR metrics were used in the regression models in Næsset (2002), Næsset (2004), and Hall et al. (2005), respectively. LiDAR data are 3D measurements of tree components: stems, branches, and foliage; thus most LiDAR metrics are related to canopy height and often highly correlated. Regression models with highly-correlated independent variables are not stable from a statistical perspective and hard to interpret from the biological perspective. Model parsimony – minimizing the number of LiDAR metrics and avoiding redundant information – needs to be seriously considered in model building. Næsset et al. (2005) reduced the number of LiDAR variables from 34 original LiDAR metrics to 7, 5 and 3 non-correlated principal components for the young forest, mature forest on poor sites, and mature forest on good sites respectively. Although this method ensured that there was no correlation between predictor variables, it is difficult to interpret the models because principal components themselves are a linear combination of the original LiDAR metrics and they don’t have a clear physical meaning. Hudak et al. (2006) applied best-subset regression on a suite of 26 predictor variables derived from LiDAR, Advanced Land Imager multispectral and panchromatic data and geographic (X, Y, Z) location, and identified small sets of variables for predicting tree basal area and tree density. Best-subset regression uses the branch-and-bound algorithm to find a specified number of best models containing a specified number of predictor variables. The problem with the best-subset regression 27 is the number of predictors has to be defined in advance, so the best model is for a given number of predictor variables instead of for all possible models. Hall et al. (2005) selected LiDAR predictor variables from a pool of 39 LiDAR metrics based on mechanistic hypotheses of why these metrics should be good predictors for each stand structural variable considered. The problem is that the relationships between LiDAR canopy measurement and field stand structure are very complex and it is difficult to validate their mechanistic hypotheses. Lefsky et al. (2005a) explored LiDAR metrics selection using large-footprint SLICER data in western Oregon and Washington states. The correlations between LiDAR canopy structure and field stand structure indices was analyzed using canonical correlation analysis. Mean height, cover (or leaf area index) and height variability were found to represent the fundamental data structure, contained the majority of data variability, and were associated with physical characteristics. This method provided a way to place both LiDAR canopy metrics and field stand indices within the overall covariance structure and can be used as a guide for model selection. Since the description of LiDAR canopy structure developed in their study was designed specifically for large-footprint SLICER waveform data, it is not clear how this method can be adapted to small-footprint LiDAR point data. 3.1.2 Generality of LiDAR-based forest structure prediction models Many site-specific empirical relationships have been developed across a variety of forest types in both Europe and North America, but published models are very different in terms of model precision, model form and the LiDAR predictor variables included. As high-resolution LiDAR data become increasingly available, there is a great need for simple, accurate, and physically meaningful prediction models that can be used or easily adapted to different regions and sensor systems. Næsset et al. (2005) studied the effect of inventory site on estimating mean tree height, dominant height, mean diameter, stem number, basal area, and timber volume. Separate regression models were developed for each inventory area as well as common models using two inventory areas simultaneously. He concluded that the coefficients of LiDAR-based 28 models do not differ significantly across two tested inventory sites except for the mean height. Lefsky et al. (2002) found that a single regression model based on mean canopy height and mean canopy cover derived from large-footprint waveform SLICER data was sufficient to model aboveground biomass across three biomes: temperate deciduous, temperate coniferous and boreal coniferous. Lefsky et al. (2005b) compared the relationship between LiDAR-measured canopy structure and coincident field measurements of forest stand structure using data from five locations in the Pacific Northwest of the USA with contrasting composition. Of the 17 stand structure variables considered, they reported eight equations that were valid for all sites, including aboveground biomass and leaf area index. Instead of dividing data into training and testing samples, data from all study sites were used to develop prediction models and the predicted values from the overall regression model were compared with the observed values for each site to check the generality of the model, so the RMSE values reported in their paper were not truly RMSE, but residual standard deviation. It is highly possible that RMSE values were underestimated and the generality of the overall model was overestimated. In contrast, Drake et al. (2003) reported that the relationship between LiDAR metrics and aboveground biomass were significantly different between two study areas using Laser Vegetation Imaging Sensor (LVIS) data. Besides different LiDAR systems applied, reasons for these inconsistent results are not clear, and further work is needed to investigate the generality of LiDAR-based prediction models. This study tested three different variable selection methods (Stepwise regression, principal component analysis and Bayesian Modeling Averaging) to develop LiDARbased aboveground biomass prediction models for three different forest types –a moist Douglas-fir (Pseudotsuga menziesii) / western hemlock (Tsuga heterophylla) forest in western Washington state, dry Ponderosa pine (Pinus ponderosa) forest in the eastern Cascade Mountains of Washington state, and a birch/spruce forest on the Kenai peninsula of Alaska. As an exploratory study, the objectives were to investigate: 1) 29 whether it is possible to develop LiDAR-based aboveground forest biomass models with a small set of LiDAR metrics that have a clear biological interpretation; and 2) whether models from different variable selection methods are significantly different in terms of the goodness of the model fit. 3.2 Data and methods 3.2.1 Study sites Both LiDAR and field data were collected over three study areas: 1) Capitol Forest in western Washington State, 2) Mission Creek in eastern Washington State, and 3) Kenai Peninsula in south-central Alaska (Figure 3.1). A summary of the field plots for these study sites are shown in Table 3.1. Figure 3.1 Location of three study sites (denoted by black stars) Table 3.1 . Summary of field plots for three study sites Study Location Forest type Stand age sitea (yr) CF western Douglas-fir and western 70 Washington state hemlock, moist site MC Eastern Douglas-fir and 25 Cascades, Ponderosa pine, dry site Washington state KE South-central Spruce and birch 74 Alaska a CF: Capitol Forest; MC: Mission Creek; KE: Kenai Peninsula. Plot size (ac) 0.2 Number of plots 98 Trees per acre 60 0.62 66 112 0.167 105 66 30 Area 1 was a 5.2 km2 study area within the Capitol State Forest, western Washington State (122.990W to 123.323W, 46.828N to 47.087N). The area is dominated by Douglas-fir and western hemlock. Additional species include western red cedar (Thuja plicata), red alder (Alnus rubra), and maple (Acer spp.). A total of 98 field inventory 0.2-acre plots were used in this study. Field inventory was conducted in the fall of 1998 and spring of 1999 and measurements acquired at each plot included species and diameter at breast height (DBH) for all trees greater than 14.2 cm in DBH. In addition, total height and height-to-base-of-live crown were measured on a representative selection (47%) of trees over the range of diameters using a hand-held laser rangefinder. This site is in the location of an ongoing experimental silvicultural trial, and a detailed description of the plot measurement protocol can be found in a previous report (Curtis et al. 2004). Area 2 was located in the Mission Creek watershed, in the eastern Cascade Mountains of Washington State (120.450W to 120.631W, 47.383N to 47.477N). The main species are Douglas-fir and Ponderosa pine with scattered grand-fir (Abies grandis). A total of 66 plots with plot size 50 m by 50 m were used in this study. Data collected at each plot included tree species, DBH, and three height measurements for all trees: height to dead crown, height to live crown, and total height. Canopy closure, the proportion of open sky obscured by vegetation, was measured using a Lemmon Spherical Densiometer Model-A at each sampled grid point. This site is part of an ongoing forest fire and fire surrogates experiment carried out by the US Forest Service, and a detailed description of the plot measurement protocol can be found in a previous paper (Lolley 2005). Field measurements were collected in the summer of 2003 and all trees were stem-mapped in the summer 2004 using an Impulse laserrangefinder and Trimble GPS system. Area 3 was located in the west of the Kenai Mountains, Kenai Peninsula, south central Alaska (149.498W to 151.804W, 59.580N to 61.456N). The area covers approximately 3000 square miles and elevation ranges from sea level to 600 m. 31 Primary forest types are white spruce (Picea glauca), black spruce (Picea mariana), paper birch (Betula papyrifera) and mixed spruce and birch. A total of 105 Forest Inventory and Analysis (FIA) permanent field plots located in this area were used in this study. Each field plot consists of a cluster of four circular subplots approximately 1/24 acre in size with a radius of 24.0 ft. Most plots were measured by FIA crews in the summers of 2001-2003. Trees greater or equal to 5 inch in DBH were tallied and tree height was measured for several site trees within the plot. For detailed plot and tree measurement information, please refer to the Forest Inventory and Analysis National Core Field Guide (2005). Plot-level aboveground biomass (including leaves, branches and stem) was estimated for Capitol Forest and Mission Creek study areas using BIOPAK (Means et al. 1994). For the Kenai study area, aboveground biomass of individual trees was estimated using equations developed in Washington, Oregon and the British Columbia (Shaw 1979, Alemdag 1984, Manning et al. 1984, and Singh 1984) and plot-level aboveground biomass was then calculated by summing all trees within the four subplots. 3.2.2 LiDAR data 3.2.2.1 LiDAR system specification High-density LiDAR data were acquired over the Capitol Forest study area with a SAAB TopEye system mounted on a helicopter platform in March 1999. LiDAR data for Kenai Peninsula and Mission Creek study areas were acquired with an OPTECH ALTM 30/70 kHz LiDAR system mounted on a twin-engine Cessna 320 in May and August 2004 respectively. The system settings and flight parameters are shown in Table 3.2. 32 Table 3.2 LiDAR system specification for three study sites Study LiDAR system Flying Flying Swath sitea speed heightb width (m/s) (m) (m) Laser pulse density (points/m2) Beam Footprint (diameter, cm) CF MC 4 >4 40 84 >4 84 SAAB TopEye 750 25 70 OPTECH ALTM 1200 50 300 30/70 kHz LiDAR KE OPTECH ALTM 1200 50 300 30/70 kHz LiDAR a CF: Capitol Forest; MC: Mission Creek; KE: Kenai Peninsula. b Flying height is above ground level height. 3.2.2.2 Derivation of LiDAR metrics For each study site, the vendor provided raw LiDAR point data consisting of XYZ coordinates and return intensity information for all LiDAR points in ASCII text format. In addition, the vendor provided “filtered ground” data representing ground returns isolated via a proprietary filtering algorithm. These filtered ground returns were used to generate a digital terrain model (DTM). All return observations (points) were spatially registered to the DTM according to their coordinates. The relative height of each point was computed as the difference between its Z coordinate and the terrain surface height. Points with a relative height value less than 2 m were excluded to eliminate ground hits and the effect of stones, shrubs, etc. and the remaining points were considered to be laser canopy hits. A set of variables that describe the canopy height distribution (the 10th, 25th, 50th, 75th, 90th height percentiles, maximum height, mean height and coefficient of variation of height) were calculated from all returns of the laser canopy hits for each field plot. In addition, the canopy point density (d) was calculated as the percentage of the first return canopy hits divided by the total number of first returns (both canopy hits and ground hits). At the Kenai Reninsula study site, LiDAR metrics were calculated at the big plot level, which contains all four subplots, instead of individual subplot level. The list of plot-level LiDAR metrics was then 33 merged with the plot-level field-based aboveground biomass estimates and imported into the R statistical analysis software. 3.2.3 LiDAR metrics selection methods 3.2.3.1 Stepwise regression Multiple linear regression models, which include all extracted LiDAR metrics as predictor variables, were first applied for each study area. Based on residual plots and variable transformations suggested by the Alternating Conditional Expectations method (Raftery and Richardson 1996), logarithm transformed forest biomass was used as the dependent variable. Standard backward stepwise regression was then conducted and the best fitting models were selected based on the lowest Akaike Information Criterion (AIC) value. 3.2.3.2 Bayesian model averaging Bayesian Model Averaging (BMA) is a Bayesian method that involves averaging over all possible combinations of independent variables and accounts for uncertainty about model form and assumptions (Raftery et al. 2005). Under BMA, all possible models are considered and predictor variables are selected based on the posterior probability. The posterior distribution of predictor variable is a weighted average of its posterior distribution under each of the models considered, where a model’s weight is equal to the posterior probability that it is correct, given that one of the models considered is correct. This method avoids the problem that the selected model depends on the order in which variable selection and outlier identification are carried out. Suppose we have data D and we want to make inference about an unknown quantity χ. If there are p possible predictors in the regression model, the number of models K could be quite large (as many as 2p). The BMA P( x | D) = ∑i =1 P( x | D, M i ) * P( M i | D) , k posterior distribution of χ is 34 where P(χ | D, Mi) is the posterior distribution of χ given the model Mi, and P(Mi|D) is the posterior probability that Mi is the correct model, given that one of the models considered P( M i | D) = is correct. The posterior model probability is given by P( D | M i ) * P( M i ) ∑ k i =1 P( D | M i ) * P( M i ) where P(D| Mi) is the integrated likelihood of model Mi and it could be approximated by Bayesian Information Criterion (BIC). BICi = n * log(1 − Ri ) + Pi * log(n) , where 2 R2i is the value of R-square, Pi is the number of predictors for the i-th regression model and n is the sample size (Raftery et al. 1997). The sum over all models is approximated by finding the models with the highest posterior probability using the fast leap and bounds algorithm. As an attempt to select both LiDAR metrics and models at the same time, BMA was used and the model with the highest model posterior probability was selected. 3.2.3.3 Principal component analysis Principal component analysis describes the variation of a set of multivariate data in terms of a set of uncorrelated variables, each of which is a particular linear combination of the original variables. The first principal component accounts for as much variation of the original data as possible, the second component is chosen to account for as much remaining variation as possible subject to being uncorrelated with the first component and so on (Everitt and Dunn 2001). Using principal component analysis, a subset of variables that explain the majority of variation can be selected from a large set of (possibly highly correlated) predictor variables. The procedure is as follows: 1) Decide how much of the total variation contained in the original variables needs to be accounted for, where values between 70% and 90% are usually suggested (Jolliffe 1972); 2) Find the number of components which explain such variation. This number indicates the effective dimensionality of the data and is the size of the subset of original variables to be retained; and finally, 3) Original variables are selected, one 35 associated with each component, as the one not already chosen which has the greatest absolute coefficient value on the component. Principal component analysis was used to select LiDAR metrics from the pool of available LiDAR metrics, such as maximum height, mean height, 10th, 25th, 50th, 75th, 90th height percentiles, coefficient of variation of height and the canopy point density. The minimal variation that needed to be explained was set to 95%. Two kinds of principal component regression models were developed. One was using the most significant principal components as predictor variables (denoted as PCA_1) and the other was using selected LiDAR metrics from principal component analysis as predictor variables (denoted as PCA_2). Separate aboveground biomass regression models were developed using the selection methods described above for each study site as well as common models using three study sites simultaneously. 3.3 Results 3.3.1 LiDAR metrics selected by principal component analysis Principal component analysis indicated that the first three principal components accounted for more than 95% of the total variation contained in the original set of LiDAR metrics. This is true for three individual study sites and the combined dataset. To be specific, the first three principal components explained 98.5%, 96.0%, 97.6% and 98.6% of the total variation contained in the original LiDAR metrics for the Capitol Forest, Mission Creek, Kenai Peninsula study sites and the combined dataset respectively. Based on the criteria set for variable selection, this means that only three original LiDAR metrics are needed to explain the majority of the variation contained in the LiDAR data. The coefficients defining the nine principal components with the original LiDAR metrics are shown in Table 3.3. These coefficients were scaled so that 36 they represent correlations between LiDAR metrics and the principal components. For all three study sites, mean height had the largest absolute correlation with the first principal component, coefficient of variation of height had the largest absolute correlation with the second principal component, and canopy point density had the largest absolute correlation with the third principal component. Therefore, mean height, coefficient of variation of height and canopy point density explain most of the variation in the original LiDAR metrics set and they were selected as the most predictive variables for regression model PCA_2. After combining three study sites together, mean height, canopy point density and coefficient of variation of height were selected again as the most predictive variables, but their order is slightly different from that for the individual sites (Table 3.3). For the individual study sites, coefficient of variation of height had the largest correlation with the second principal component and canopy point density had the largest absolute correlation with the third principal component, while for the combined dataset, the coefficient of variation of height had the largest absolute correlation with the third principal component and canopy point density had the largest absolute correlation with the second principal component. 37 Table 3.3 Correlation between principal components and original LiDAR metrics Study site PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 CF Maxht -0.368 -0.221 -0.174 0.247 0.595 0.600 Meanht -0.392 -0.141 -0.443 0.785 CV 0.133 -0.776 0.373 -0.468 0.120 P10 -0.298 0.511 0.720 -0.353 P25 -0.385 -0.491 -0.486 0.515 0.271 0.129 -0.124 P50 -0.388 -0.110 -0.155 -0.132 -0.189 -0.751 -0.227 -0.380 P75 -0.383 -0.170 -0.394 0.795 0.182 P90 -0.378 -0.206 0.152 -0.381 0.594 -0.315 -0.429 D 0.128 -0.146 -0.979 MC Maxht -0.334 -0.357 -0.120 -0.473 0.662 -0.279 Meanht -0.434 0.119 0.883 CV -0.158 0.627 0.408 -0.637 P10 -0.179 0.546 -0.696 -0.218 0.246 0.247 P25 -0.360 0.325 0.361 0.493 0.580 -0.135 -0.158 P50 -0.427 0.293 -0.313 0.604 0.400 -0.301 P75 -0.427 0.109 -0.331 -0.162 -0.797 -0.141 P90 -0.405 -0.216 -0.143 -0.343 -0.623 0.423 -0.271 D -0.975 0.184 KE Maxht -0.330 -0.391 -0.313 0.789 Meanht -0.391 0.167 0.894 CV -0.232 -0.486 -0.274 0.100 -0.788 P10 -0.343 0.334 -0.686 -0.305 0.341 0.256 -0.123 P25 -0.372 0.244 -0.212 -0.597 -0.447 0.427 -0.111 P50 -0.386 0.112 0.276 -0.412 0.264 -0.654 -0.292 P75 -0.386 0.114 0.406 0.171 0.506 0.585 -0.193 P90 -0.378 -0.187 0.243 -0.169 0.493 -0.629 -0.157 -0.249 D -0.191 -0.971 Combined Maxht -0.351 -0.350 0.709 0.492 Meanht -0.365 -0.147 0.906 CV 0.274 -0.843 -0.365 -0.275 P10 -0.342 -0.135 0.269 -0.869 -0.165 P25 -0.363 -0.113 -0.566 0.611 0.304 -0.142 -0.220 P50 -0.364 0.188 -0.231 -0.677 0.505 -0.217 P75 -0.362 -0.164 0.192 -0.397 -0.195 -0.766 -0.129 P90 -0.360 -0.227 0.158 0.110 -0.437 0.631 0.362 -0.244 D -0.167 0.974 -0.101 -0.105 * PC1: the first principal component; PC2: the second principal component; … ; PC9: the ninth principal component; Maxht: maximum height of all LiDAR returns above 2m within plot boundary (m); Meanht: mean height of all LiDAR returns above 2m within plot boundary (m); CV: coefficient of variation of height based on all LiDAR returns above 2m within plot boundary; P10: 10 percentile height of all LiDAR returns above 2 meters within plot boundary (m); P25: 25 percentile height of all LiDAR returns above 2 meters within plot boundary (m); …; P90: 90 percentile height of all LiDAR returns above 2 meters within plot boundary (m) D: canopy point density (D) was calculated as the percentage of the first return canopy hits divided by the total number of first returns (both canopy hits and ground hits). Canopy hits have height greater or equal to 2 meters. 38 3.3.2 Model comparisons Table 3.4 lists final biomass models selected from different variable selection methods for the individual study sites and the combined study sites. All models have high Rsquare values ranging from 0.67 to 0.88. R-square values in the Kenai site were lower than those in the Mission Creek site, which were a little lower than those in the Capitol Forest site. Within each study site, Stepwise models had slightly higher R-square values than BMA and PCA models, which means that Stepwise models explained slightly more variation in aboveground biomass than BMA models and models from principal component analysis (PCA_1 and PCA_2). BMA models explained almost the same amount of variation as models containing the first three principal components (PCA_1) and models containing only mean height, coefficient of variation of height and canopy point density (PCA_2). Despite the similar R-square values within each study site, the number of LiDAR metrics selected by different statistical methods was different and Stepwise models tended to contain more LiDAR metrics than BMA and PCA models (Table 3.4). Canopy point density was the only LiDAR metric selected by Stepwise, BMA and PCA_2 models for all three study sites. The coefficients of canopy point density were consistent (i.e. approximately the same) within each study site, but not consistent across study sites. For other LiDAR metrics selected, their coefficients were very different across different selection methods and different study sites (Table 3.4), which indicated that the common model using the combined dataset was not good enough to capture individual variation within each study site. Across different study sites, PCA_2 models contain the same set of LiDAR metrics: mean height, coefficient of variation of height and canopy point density. Figure 2 shows LiDAR-based biomass prediction from PCA_2 models versus field-based biomass estimate for three separate models and the common model from the combined dataset. As indicated in Figure 3.2, overall model fit was good for both separate models and the common model as the relationship was not far from the 1:1 line. 39 However, the coefficients for these three LiDAR metrics were very different across study sites. The coefficient of mean height was 0.04 at Capitol Forest, 0.05 at Mission Creek, 0.34 at Kenai Peninsula and 0.11 for the combined dataset. The model coefficient of the coefficient of variation of height was 0.03 at Capitol Forest, 2.29 at Mission Creek, 9.36 at Kenai Peninsula and 5.66 for the combined dataset. Finally for the canopy point density, the coefficient was 2.35 at Capitol Forest, 1.85 at Mission Creek, 2.61 at Kenai Peninsula and 3.14 for the combined dataset. Table 3.4 Final above ground biomass regression models from different statistical methods Study site Method Final Model Number of R2 predictor variables CF Step LN(Bio b) = 9.50+0.097*Meanht +1.47*CV 4 0.88 0.05*P90+2.42*D BMA LN(Bio) = 9.97+0.03*P25+2.39*D 2 0.87 LN(Bio) = 12.46-0.02*PC1+0.01*PC2 0.61*PC3 3 0.87 PCA_1a PCA_2 LN(Bio) = 9.88+0.04*Meanht +0.03*CV+2.35*D 3 0.87 MC Step LN(Bio) = 7.97-0.03*Maxht +0.47*Meanht 6 0.76 +4.73*CV-0.10*P25-0.23*P75+1.89*D BMA LN(Bio) = 8.83+0.05*Meanht +2.29*CV+1.85*D 3 0.74 PCA_1 LN(Bio) = 11.70-1.0*PC1 -0.09*PC2 -0.39*PC3 3 0.73 PCA_2 LN(Bio) = 8.83+0.05*Meanht +2.29*CV +1.85*D 3 0.74 5 0.70 KE Step LN(Bio) = 1.58 -2.72*Meanht +14.03*CV +1.48*P25 +1.48*P75 +2.90*D BMA LN(Bio) = 2.83+8.70*CV+0.25*P75 +2.70*D 3 0.69 PCA_1 LN(Bio) =9.89-0.49*PC1 -0.55*PC2 -0.41*PC3 3 0.67 PCA_2 LN(Bio) = 2.41+0.34*Meanht +9.36*CV +2.61*D 3 0.68 6 0.75 Combined Step LN(Bio)=5.49+0.42*Meanht+ 5.18*CV0.66*P50+0.66*P75-0.30*P90 +2.98*D BMA LN(Bio)=5.49+0.42*Meanht+ 5.18*CV 6 0.75 0.66*P50+0.66*P75-0.30*P90 +2.98*D PCA_1 LN(Bio)=11.23-0.42*PC1+0.70*PC2-0.64*PC3 3 0.71 PCA_2 LN(Bio)=5.64+0.11*Meanht + 5.66*CV+3.14*D 3 0.72 a PCA_1 is regression models using the first three principal components: PC1, PC2 and PC3 as predictor variables; PCA_2 is regression models using mean height, coefficient variation of height and canopy point density as predictor variables. b LN(Bio): log transformed above ground biomass (kg/ha). 40 MC 11.5 12.0 12.5 12.0 11.5 13.0 10.5 11.0 11.5 12.0 Field LN(Biomass) KE Combined 12.5 13 12 11 10 6 8 9 8 9 Predicted LN(Biomass) 14 10 11 12 13 Field LN(Biomass) 7 Predicted LN(Biomass) 11.0 11.0 Predicted LN(Biomass) 13.0 12.5 12.0 11.5 Predicted LN(Biomass) 12.5 CF 4 6 8 10 Field LN(Biomass) 12 4 6 8 10 12 Field LN(Biomass) Figure 3.2 Results of plot-level LiDAR-based estimation of aboveground biomass for Capitol Forest (CF), Mission Creek (MC), Kenai Peninsula (KE) and the combined dataset with mean height, coefficient of variation of height and canopy point density as predictor variables. Lines represent 1:1 relationship. 3.4 Discussion As expected, there is a significant relationship between field-based aboveground biomass estimates and LiDAR metrics for our three study sites. The biological basis behind this is the ecological and biomechanical links between canopy vertical 41 structure and forest stand structure parameters. From the perspective of tree form and function development, there is usually a connection between the differences in vertical canopy structure and differences in forest biomass both through forest succession and across areas with contrasting environmental conditions. For example, Larson (1963) reported that crown geometry and crown position exert considerable control over bole form and vertical distribution of stem increment. LiDAR sensors directly measure three-dimensional characteristics of forest canopy structure, which provides a good foundation for high correlations between LiDAR metrics and forest biomass. However, trees might develop different stem and crown shape relationships across different environmental conditions and geological regions, even for the same species. This might explain why model coefficients were different across the three study sites. In this study, mean height, coefficient of variation of height and canopy point density were selected by principal component analysis as the most predictive variables with the same order for all three LiDAR datasets tested, and biomass models developed using these three metrics had high R-square values. LiDAR mean height represents canopy height in the field, coefficient of variation of height represents canopy depth, and canopy point density represents canopy cover. These three LiDAR metrics succinctly describe the 3D canopy structure, which provides explanation why these metrics capture the majority of variation contained in LiDAR data. From a resource management standpoint, these kinds of LiDAR-based forest structure models would be analogous to the use of aerial stand volume tables that have been widely used in forest inventory for a long time. Aerial stand volume tables present (in tabular form) the relationship between forest structure variables easily estimated from aerial photos often mean tree height and percent canopy cover - and stand volume (Paine and Kiser 2003). Because aerial photos are passively-sensed, these methods cannot account for variation in stand volume associated with the vertical structure of the canopy. Previous studies have indicated that crown ratio, defined as the ratio of the crown length to total stem length, is an important indicator of the growth history of the tree and 42 significantly influences the allometric scaling between foliage and wood biomass (Makela and Valentine 2006). The use of three-dimensional forest structure information provided by LiDAR has the potential to provide reliable estimation for variation associated with the canopy vertical structure. The most predictive LiDAR metrics set (mean height, coefficient of variation of height, and canopy point density) found in this study is consistent with the mean tree height and percent canopy cover used in the aerial stand volume table while the third variable, coefficient of variation of height, is a measure of canopy vertical variation. Because most LiDAR returns are from the dominant trees, especially from the outer canopy of the dominant trees, the distribution of LiDAR return heights is weighted toward the tallest trees. As a result, the LiDAR mean height likely represents the height of the overstory trees. On the other hand, field-derived forest stand structure parameters are calculated using all trees in the plot. So the inclusion of the coefficient of variation of height helps to account for intermediate tree crowns in the overstory and suppressed trees in the understory. Within each study site, LiDAR canopy structure information summarized by mean height, coefficient of variation of height and canopy point density did explain a similar amount of variation compared to other models. The predictive ability of these three LiDAR metrics is good for forest biomass across all three forest types, which indicates that the combination of mean height, coefficient of variation of height and canopy density represents a sufficient and concise quantitative description of the canopy structural content and therefore provides a good representation of stand structure characteristics. Models using these three LiDAR metrics likely capture the fundamental allometric relationships between foliage volumes and stem biomass. This finding is consistent with results from large foot print SLICER data (Lefsky et al. 2005a), in which mean height, cover or leaf area index and height variability were found to explain the most of variability in forest physical characteristics. 43 After combining our three datasets together, mean height, coefficient of variation of height and canopy point density were again found to explain the majority of variation. However, the coefficients from the combined model were different from the individual models, which suggests that the general model representing all study sites may produce more bias for each individual site than models developed for the specific site. In comparison to stepwise and BMA models, models containing mean height, coefficient of variation of height and canopy point density (PCA_2) explained similar levels of variation in aboveground biomass, but PCA_2 models are relatively simple in model format and have clear biological interpretation. The straightforward prediction models described in this study will greatly facilitate the application of LiDAR to practical forest inventory and management. To characterize forest stand structure, a remote measurement of canopy structure that is rapid, reproducible, and with a spatial resolution commensurate with the scale of structural variation is needed because existing ground-based approaches are slow, inexact, or highly-averaged spatially. As a rapidly-growing remote sensing technology, LiDAR offers great potential to capture detailed three-dimensional canopy information rapidly. Findings from this study indicate that it is possible to develop straightforward regression models for different forest types using three primary LiDAR metrics - mean height, coefficient of variation of height and canopy point density. If this is true for a wide range of forest types and LiDAR systems, the operational use of LiDAR for forest inventory may become common in the future. 44 Chapter 4 Effects of Plot Position Error and Plot Size on LiDAR-derived Metrics and Predicted Biomass 4.1 Introduction The most popular approach to using small-footprint discrete LiDAR data in a forest inventory is to develop empirical regression relationships between forest stand structure parameters measured in the field and LiDAR metrics extracted from laser canopy hits within corresponding field plots. Accurate field plot location is crucial for successfully linking LiDAR data with field-measured forest biophysical variables. Field plot locations are often obtained using a global positioning system (GPS). The accuracy of GPS locations depends on survey environment, survey parameters and methodology (Piedallu and Gegout 2005). Currently, GPS manufactures only provide accuracy specifications under clear sky conditions. It is known that the accuracy of GPS under a forest canopy is much lower than under clear sky conditions because trees attenuate or completely block the GPS satellite signals. There are some studies acknowledging the error in the position of ground reference plots in forest conditions (Bolduc et al. 1999, Brandtberg et al. 2003, Holmgren et al. 2003) and their recommendation is to use relatively expensive, survey-grade high-accuracy GPS units. Due to the cost concern, this could be done when only a small number of field plots are involved and the plots are in single-layer forests without dense canopy. In large area operational inventories, less accurate easy-to-carry, recreational-grade GPS receivers are commonly used. For example, FIA plots were originally located with a variety of handheld GPS units, ranging from military-grade Rockwell PLGR units to recreational-grade units, which provided a wide range of positional error sometimes exceeding 20 meters in the horizontal direction (Hoppus and Lister 2006). Traditionally, plot location in the large scale forest inventory is intended to assist the field crews in relocating the plots, as well as to document their general location, so 45 obtaining plot locations with several meters of error is acceptable. However, in the context of a double-sampling inventory design, error in the position of the ground reference plots may result in a mismatch between field plots and LiDAR data, which could weaken the empirical relationship between field measurements and LiDARderived metrics, which may, in turn, influence the estimation of forest inventory variables. A variety of techniques have been used in the past to obtain more accurate field plot positions. In a study in Australia, average inventory field plot position error is approximately 10 meters (Hollaus et al. 2007), and manual co-registration of the forest inventory data to the LiDAR data has been carried out. The position of each sample plot center is adjusted so that the measured single-tree positions best fit the visually detectable tree positions in the LiDAR canopy height model and the measured tree height best fit the canopy height model. Only 103 of the 143 sample plots could be clearly co-registered to the LiDAR data (Hollaus et al. 2007). In addition, this method is time-consuming and subjective. Gobakken and Naesset (2008) assessed the effects of positioning errors on LiDARderived metrics and biophysical stand properties through simulation. Nine different levels of field position errors were assessed. It was reported that the standard deviation of the differences between various LiDAR-derived metrics generated at incorrect plot positions and those generated at the true positions increased with increasing plot position error. However, the mean of the differences between incorrect plot positions and ground-truth positions was not reported. They also concluded that plot position errors had a larger effect in poor sites with more scattered trees compared to more productive sites with denser canopies and more evenly-distributed trees. Breidenbach et al. (2007) examined plot position error on predictions of Lorey’s tree height. First, thirty simulated plot locations were generated with plot centers a specified distance away from the true plot center and an angle between these simulated plots of 12 46 degree. Then the 3rd quartile of LiDAR point height was derived for each of simulated plots. Finally Lorey’s tree height was calculated from linear regression models with the 3rd quartile height as the covariate. The conclusion was that the root mean squared error for tree height increased only slightly with increasing distance to the plot center. Is accurate plot location absolutely necessary? Under what condition this requirement can be relaxed? Depending on forest condition and field plot size, a small position error may be acceptable. Slight mismatches between field plots and LiDAR data may not make a significant difference in the estimation. It maybe reasonable to expect that LiDAR-derived metrics are relatively stable with large plot sizes and in homogenous forest stands, but not with small plots or in heterogeneous stands. This hypothesis could be tested by relocating and resizing LiDAR virtual plots, and then comparing LiDAR metrics extracted from new plots with the original one. The objective of this chapter is to assess the effects of plot location error and plot size on selected LiDARderived metrics and predicted biomass through simulation. 4.2 Data and methods LiDAR data from ninety-five 300m*300m patches from western Kenai Peninsula, Alaska were used for this study. In chapter 3 and Li et al (2008), it was shown that three LiDAR-derived metrics, mean height, coefficients of variation of heights and canopy LiDAR point density, explained the majority of variation contained in LiDAR data. These three LiDAR metrics were calculated for each patch at 1m*1m, 5m*5m, 10m*10m, and 15m*15m resolution. Then, unsupervised classification was conducted and a spatial variation index, contagion, was computed for each LiDAR patch. Forest stands often contain clusters of trees which are distinctly different, in terms of horizontal and vertical structure, from surrounding trees. This may be due to species differences, age differences, site factors or silvicultural treatments. It is suspected that the spatial variation index would be sensitive to these differences. Based on contagion 47 value on each LiDAR patch, LiDAR patches were then grouped into three categories: homogeneous, medium, and heterogeneous. Ten FIA plots from each category were selected and simulations were made on these thirty plots. 4.2.1 Unsupervised classification Registered maps at four different resolutions (1m*1m, 5m*5m, 10m*10m, and 15m*15m) were generated in ENVI© based on LiDAR-derived mean height, coefficient of variation of height, and canopy LiDAR point density. For each resolution, maps were combined into a multiband raster image with each cell in the raster has a three-dimensional attribute vector of LiDAR metrics. ISODATA algorithm was used to classify the raster image (Tou and Gonzalez 1974). ISODATA first randomly chooses k initial cluster centers, or means, then classifies each pixel to the closest cluster. The new cluster mean vectors are calculated. The process is iterated and these initial cluster centers are updated until the "change" between the iterations is small. The objective of the ISODATA algorithm is to minimize the within-cluster variability. The ISODATA algorithm implemented in ENVI© follows fourteen principle steps detailed in Tou and Gonzalez (1974). In this study, the number of iterations was set to ten, the minimum number of classes was set to one and the maximum number of classes was set to ten. The minimum number of pixels in each class was set to one and the maximum class deviation was set to one. The minimum class distance was set to five and maximum number of merged pairs was set to two. Typically classified images suffer from a lack of spatial coherency (speckle or holes in classified areas). Adjacent similar classified areas were then smoothed using morphological operators. The selected classes were clumped together by first performing a dilate operation and then an erode operation on the classified image using a kernel of size 3*3. 48 4.2.2 The contagion spatial variation index Here we borrow the concept of spatial variation from landscape ecology to assess the classified LiDAR image. Spatial variation is a function of spatial scale which encompasses both extent and grain. Extent is the overall area encompassed by an investigation. Grain is the size of the individual units of observation. Any inferences on spatial variability in a system are dependent on the scale and are constrained by the extent and grain of investigation. Here we would like to assess spatial variation in the classified LiDAR map and the extent is the fixed 300m*300m LiDAR patch. The grain is the grid size used to calculate LiDAR metrics. Metrics describing spatial variation usually fall into two categories: those that quantify the composition of features without reference to spatial attributes and those quantify the spatial configuration requiring spatial information (Cushman & McGarigal 2003). Composition metrics associate with the variety and abundance of the attribute of interest, such as richness, evenness and diversity. Spatial configuration refers to the spatial character and arrangement, position, or orientation of the experimental units within the landscape (Cushman & McGarigal 2003). Contagion is one of the common metrics used to describe spatial configuration and it is used here to quantify spatial variation contained in the LiDAR patches. Contagion index is defined as ⎡ ⎤⎡ ⎤ ⎢ ⎥ ⎢ gik gik ⎥ ⎢ ⎥ ⎢ ⎥ P P * ln( ) * i i ∑∑ m m ⎥ ⎢ i =1 k =1 ⎢ gik gik ⎥ ∑ ∑ ⎢⎣ ⎥ ⎢ ⎥⎦ k =1 k =1 ⎦⎣ Contagion = (1 + ) *100 2 * ln(m) m m where Pi is proportion of LiDAR patch occupied by class i, gik is the number of adjacencies between pixels of class i and class k, and m is the number of classes present in the LiDAR patch. Contagion approaches 0 when the distribution of adjacencies (at the level of individual cells) among unique classes becomes increasingly uneven and it equals 100 when all classes are equally adjacent to all other patch types. FRAGSTAT (McGarigal and Marks 1995) was used to calculate contagion indexes. 49 For 95 classified LiDAR maps, the contagion index was calculated at the grid level. After visual examination of the classified map, the raw LiDAR point clouds, and the photos taken on field visits, LiDAR patches with contagion index less than 30 were considered to be heterogeneous in terms of spatial variation, 30-60 were considered to be medium, and greater than 60 were consider to be homogenous. 4.2.3 Simulation Thirty field plots and corresponding LiDAR patches, ten from each spatial variation category, were selected for the simulation study. To investigate the effects of position errors on metrics derived from the laser data, the position errors of field plots were simulated. This was done by introducing a horizontal shift to the field plot coordinates prior to extracting laser points within the new plots. Horizontal distance shifts from the original field plot positions were altered using a sequence of distances from 1m to 20m in increments of 1m, and the direction shifts from the original field plot positions were altered randomly. For each fixed shift distance, 100 simulated plots were generated. Mean height, coefficient of variation of height and canopy point density were computed from LiDAR points within the original plot boundary and LiDAR points within the shifted plot boundary. The differences between corresponding metrics derived from the plots with simulated positions and plots with original positions were computed for each sample plot at each simulation. The mean and standard deviations of these differences were summarized for each spatial variation category. For LiDAR-derived metrics, three plot sizes were tested: 0.04 acre (corresponding to FIA subplot size), 0.08 acre (corresponding to the doubled FIA subplot size) and 1.5 acre (corresponding to the big plot which contains four FIA subplots). For each plot size considered, sixty thousand simulated plots (100 simulations * 20 distances * 30 50 plots) were generated. LiDAR points within these simulated plots were clipped and LiDAR-derived metrics were calculated for each simulated plot. The effects of plot position errors and plot size on predicted biomass were also assessed using the following procedure: 1) biomass estimates were calculated using field-measured DBH and height; 2) regression models were developed with log transformed biomass as a dependent variable, and LiDAR-derived mean height, canopy LiDAR point density and coefficient of variation of height from original plots as predictive variables; 3) biomass was predicted for each simulated plot using coefficients of developed regression models and LiDAR-derived metrics from simulated plots; 4) the residuals between back-transformed predicted biomass and original biomass estimates were calculated; 5) residuals were summarized according to simulated distances and LiDAR patch spatial variation categories. Since FIA only measures trees within the four subplots, only plot size 0.04 acre and 1.5 acre were considered when assessing the effects on predicted biomass. 4.3 Results 4.3.1 LiDAR patch classification and spatial variation Four different grid cells were used to calculate LiDAR-derived mean height, coefficient of variation of height, and canopy point density: 1m*1m, 5m*5m, 10m*10m, and 15m*15m, and an ISODATA unsupervised classification was implemented on the raster maps with different resolution. The classification maps at 1m*1m had a salt and pepper appearance and classification maps at 10m*10m and 15m*15m appeared too smoothed (not shown here). Considering that the average tree crown radii in this sample dataset is 2.8m, classification maps at 5m*5m was selected for further investigation. Figure 4.1 and 4.2 show raw classification results at 5m*5m 51 resolution without and with field plot locations, and Figure 4.3 and 4.4 show corresponding smoothed classification results. In total, six classes were produced and their proportions of total area are listed in Table 4.1. Since no ground training dataset is available for classification, biological interpretation of the classes is impossible. However, this doesn’t matter in this study where the main objective of classification is to stratify areas in the LiDAR patches. Since it was found in Chapter 3 that the three LiDAR-derived metrics used for classification have the capability to capture the majority of variation contained in the LiDAR data, the classification results should provide information useful when stratifying areas within the LiDAR patch. Assuming LiDAR data accurately captures forest stand structure, LiDAR image classification results should indicate forest structure strata in the field. Considering that numerous previous studies have shown that LiDAR can collect highly detailed measurements of three-dimensional forest structure (Lefsky et al. 1999, 2002, Næsset 2002, Drake et al. 2003, Holmgren 2004, Lim and Treitz 2004, Maltamo et al. 2004, Næsset et al. 2004, Andersen et al. 2005, Bollandsas and Naesset 2007), this assumption seems very reasonable. In addition, field visits to 24 plots during the summer of 2007 confirmed that the classification results make sense. Classification results clearly indicate that some forest areas characterized by the LiDAR patches are more uniform than others (Figure 4.1, 4.3). In general, the majority of field plots are near the center of LiDAR patches though there are a few exceptions (Figure 4.2 and 4.4). Some field plots are located within the main class of their LiDAR patches while other field plots are located near the boundary of different classes. The contagion index based on clumped classes ranges from 23.3 to 90.9 with the mean of 44.1. Larger contagion value indicates more homogeneous spatial arrangement of classes within the boundary. Twelve LiDAR patches have a contagion 52 value less than 30, ten LiDAR patches have a contagion value greater than 60, and seventy-three LiDAR patches have a contagion value between 30 and 60 (Figure 4.5). Figure 4.1 LiDAR patches classification results based on 5X5m grids without field plot location 53 Figure 4.2 LiDAR patches classification results based on 5X5m grids with field plot center indicated by black asterisk Table 4.1 Proportion of classes in classified LiDAR patches (5mX5m resolution) Class Class 1 Class 2 Class 3 Class 4 Class 5 (red) (green) (blue) (yellow) (cyan) Proportion (%) 26.54 36.43 19.09 11.74 5.39 Class 6 (pink) 0.84 54 Figure 4.3 LiDAR patches clumped classification results based on 5X5m grids without field plot location 55 Figure 4.4 LiDAR patches clumped classification results based on 5X5m grids with field plot center indicated by black asterisk 56 100 Contagion 90 80 70 60 50 40 30 20 10 SLD0153 SLD0103 SLD0082 SLD0063 SLD0040 SLD0024 KNI0334 SLD0006 KNI0320 KNI0299 KNI0293 KNI0278 KNI0258 KNI0232 KNI0214 KNI0200 KNI0188 KNI0180 KNI0169 KNI0152 KNI0141 KNI0129 KNI0119 KNI0112 KNI0097 KNI0089 KNI0080 KNI0070 KNI0062 KNI0050 KNI0042 KNI0026 KNI0020 KNI0012 KNI0002 0 Plot ID Figure 4.5 Contagion value for classified LiDAR patches 4.3.2 Effects of plot location error and plot size on LiDAR-derived metrics Thirty field plots and their corresponding LiDAR patches, ten from each category, were selected for simulation. Mean and standard deviation of the differences between LiDAR-derived metrics from simulated plots and from original plots are shown in Figure 4.6-4.11, in which the homogenous category is colored in orange, the medium category is colored in green and the heterogonous category is colored in blue. Figure 4.6 shows boxplots of the average differences between corresponding LiDARderived mean height computed from the simulated plots and from the original plots for three different LiDAR patch types over 100 simulations. For plot size of 0.04 acre and 0.08 acre, the averaged differences for LiDAR-derived mean height are within ±0.5m for homogenous LiDAR patch which is colored in orange. This increase to ±2m for the medium and the heterogeneous LiDAR patches which are colored in green and blue. It is clear from Figure 4.6 that as the distance between simulated plot position and original plot position increases, the averaged differences for LiDAR-derived mean height in the homogenous LiDAR patches are small and stay relatively stable. In contrast, the averaged differences in the medium and heterogeneous LiDAR patches increase until the position error is around 10m. For the largest plot size of 1.5 acre, the averaged differences of LiDAR-derived mean height between simulated plots and original plots are very small in all three types of LiDAR patches and they don’t change much as the position error increases. 57 Figure 4.6 Mean of the differences between LiDAR-derived mean height from simulated plots and from original plots over 100 simulations. 58 Figure 4.7 shows the standard deviation of the differences between corresponding LiDAR-derived mean height computed from the simulated plots and from original plots across a sequence of distances over 100 simulations. The standard deviation of the differences increases as the distance between simulated plot center and original plot center increases. For fixed position error, the standard deviation of the differences in the homogenous LiDAR patch is smaller than those in the medium and heterogeneous LiDAR patches. Comparing plot size 0.04 acre, 0.08 acre and 1.5 acre, standard deviation increases as plot size decreases. Figure 4.8 and 4.9 shows the mean and standard deviation of the differences between LiDAR-derived canopy cover (represented in percentage) computed from simulated plots and from original plots. LiDAR canopy cover here has the same definition as canopy point density, and the only difference is that canopy cover is represented as a percentage while canopy point density is represented as a ratio. Similar to mean height, the averaged differences of LiDAR canopy cover in the heterogeneous LiDAR patches are greater than those in the medium and homogenous LiDAR patches. As distances between simulated plots and original plots increases, the mean of the differences of LiDAR-derived canopy cover gently increases. As plot size increases, the averaged differences decrease. A similar pattern exists for the standard deviation of the differences on LiDAR canopy cover. Figures 4.10 and 4.11 show the mean and standard deviation of the differences between LiDAR-derived coefficient variation of height from simulated plots and from original plots. Similar to LiDAR-derived mean height and canopy cover, the average difference of coefficient of variation of height increases as LiDAR patches become more heterogeneous; it also increases as the distance between simulated plots and original plots increases; and decreases when plot size increases from 0.04 acre to 0.08 acre to 1.5 acre. 59 Figure 4.7 Standard deviation of the differences between LiDAR-derived mean height from simulated plots and from original plots over 100 simulations. 60 Figure 4.8 Mean of the differences between LiDAR-derived canopy cover from simulated plots and from original plots over 100 simulations. 61 Figure 4.9 Standard deviation of the differences between LiDAR-derived canopy cover from simulated plots and from original plots over 100 simulations 62 Figure 4.10 Mean of the differences between LiDAR-derived coefficient of variation of height from simulated plots and from original plots over 100 simulations. 63 Figure 4.11 Standard deviation of the differences between LiDAR-derived coefficient of variation of height from simulated plots and from original plots over 100 simulations 64 4.3.3 Effects of plot location error and plot size on predicted biomass Table 4.2 shows the regression model of biomass based on LiDAR metrics from the original plot position at subplot level and at the 1.5acre plots level which contain all four subplots. Table 4.2 Biomass regression models for 30 selected FIA plots based on original plot location for two different plot sizes Plot size Model R2 0.04 acre LN(Biomassa) = 7.004+0.113*meanht +0.014*cv+1.484*d 0.498 1.5 acre LN(Biomass)=10.246+0.024*meanht -0.702*cv+2.700*d 0.634 a Biomass: above ground biomass (kg/ha); meanht: LiDAR-derived mean height (m); cv: coefficient of variation of LiDAR-derived height; d: canopy point density represented by ratio Figure 4.12 shows boxplots of the ratio of the averaged residual versus the mean of estimated biomass from field measurement for each category over simulated distances. The residual is the difference between estimated biomass from field measurement and predicted biomass that is obtained using regression models in Table 4.2 and LiDARderived metrics from simulated plots. For each LiDAR patch category, the mean of estimated biomass from field measurements is fixed across position error distances and simulations, so changes in the ratio indicate changes in the residual. For plot size 0.04 acre, the ratio in the homogenous LiDAR patches is within 15% of the mean of estimated biomass from field measurement and it stays relatively stable as the simulated plot is moved away from the original plots, while ratios in the medium and heterogeneous LiDAR patches show increasing residual as distance increases. Most residuals in the medium and heterogeneous categories are within 50% of the mean of the estimated biomass from field measurement; however, a few residuals are nearly 100% of the mean biomass (Figure 4.12). In addition, the majority of ratios in the medium LiDAR patches are negative, which indicates predicted biomass in the medium category tends to be less than the estimated biomass from field measurement. For plot size of 1.5 acre, all three categories have a small ratio. As position error 65 increases, the ratio in the homogeneous patches barely changes while the ratio in the medium and heterogeneous patches slightly increases. Figure 4.12 Ratio of average residual from simulated plots versus the mean of field-estimated biomass over 100 simulations 4.4 Discussion This study presents an automatic procedure to assess plot position error and plot size on LiDAR-derived metrics. First, grid-level LiDAR metrics were extracted from the 3D LiDAR point cloud, and a multi-band LiDAR image was created with each band 66 consisting of a single LiDAR metric. Then, unsupervised classification was implemented to stratify LiDAR patches. Finally, simulation was conducted and LiDAR metrics from simulated plots and from original plots compared. One advantage of this method is that only LiDAR data were used to assess spatial variation. No ground information is necessary. LiDAR data usually cover a larger area than field plots, thus using LiDAR data for classification could capture more spatial variation information, which could provide information on using LiDAR data as a sampling tool to guide where to locate field plots. However, further validations are needed before application in operational forest inventories. The results in this study have shown that three important LiDAR-derived metrics – mean height, canopy point density and coefficient of variation of height- are sensitive to plot position error, especially in the LiDAR patches of homogeneous forests. As the distance between simulated plots and original plots increases, the mean and the standard deviation of the differences between LiDAR-derived metrics from simulated plots and from original plots increase. In addition, plot size greatly affects the differences between these three LiDAR-derived metrics from simulated plots and from original plots. As plot size increases, the mean and standard deviation of the differences decrease. At plot size of 1.5 acre, the averaged difference of LiDARderived metrics between simulated plots and original plots are small for LiDAR patches of homogeneous to heterogeneous forest. The findings are consistent with results from Gobakken and Naesset (2008), who reported that the standard deviation of the differences for the LiDAR height percentiles, LiDAR density-related metrics, maximum laser canopy height, arithmetic mean laser canopy height and coefficient of variation of laser canopy height increased with increasing plot position error. The effects of plot size on LiDAR-derived metrics are not surprising. For a fixed simulated distance, larger plot size means more overlap area between simulated plots and original plots thus increasing the chance for small differences between LiDAR- 67 derived metrics from simulated plots and from original plots. For example, if the simulated plot centers are 5 meters away from original plot center, the common area is 57.3% for a plot size of 0.04 acre, 69.4% for a plot size of 0.08 acre and 92.7% for a plot size of 1.5 acre. Beyond the distance of 14 meters, there are no overlaps between simulated plots and original plots for a plot size of 0.04 acre, but 20.7% overlap for a plot size of 0.08 acre and 79.8% overlap for a plot size of 1.5 acre. In this study, LiDAR metrics derived from LiDAR patches of homogeneous forest are found to stay relatively stable as the distance between simulated plot centers and original plot center increases, but not in LiDAR patches of heterogeneous forest, especially for the small plot size. It should be emphasized here that the definition of homogeneous and heterogeneous is based on the contagion value for the classified LiDAR patch (300m*300m) of forest and not based on forest structure measured in the field plots, even though these two are closely correlated. Due to the financial limitations and inability to accessing some FIA plots, it was not possible to check the consistency of spatial variation between LiDAR patch and forest structure in the field for all plots studied. However, field visits to 24 field plots during summer 2007 confirmed that classification results and spatial variation grouping were reasonable. In addition, numerous previous studies have shown that LiDAR-derived metrics can capture three-dimensional forest structure. Thus spatial variation in LiDAR patch should provide a good indication of the spatial variation in terms of forest structure. Since LiDAR-derived metrics are shown to be subject to errors if the plot location is not accurate, especially in LiDAR patches of heterogeneous forest, it is likely that stand properties predicted from LiDAR metrics will be affected by plot position error. However, since regression models linking stand properties and LiDAR-derived metrics usually contain several LiDAR metrics and models often involve variable transformation, it is hard to quantify the effects of position errors on predicted stand properties. Nevertheless, biomass estimates from FIA subplot 1 and from large plots 68 which contain four FIA subplots were used to assess the effect of plot location error and plot size. The results indicate that for plot size of 0.04 acre, the ratio of the averaged residual versus the mean of field estimated biomass is small in the homogenous LiDAR patches. This means that the averaged predicted biomass in the simulated plots doesn’t differ much from estimated biomass based on field measurements in the original plots. In the LiDAR patches of heterogeneous forest, the ratio increases with increasing distance. This means that the differences between predicted biomass in the simulated plots and estimated biomass based on field measurements in the original plots increases. This is consistent with results from Gobakken and Naesset (2008), who report that the mean and standard deviation of the differences for mean tree height, stand basal area and stand volume increased with increasing plot position error especially on poor sites where there were normally few stems. For large plots (1.5 acre), only small ratios were obtained over the whole sequence of simulated distances indicating that the average predicted biomass doesn’t differ much from the actual field plot biomass even as positional error increases. It should be noted here that biomass estimates for the large plot (1.5acre) actually are the average of the four subplots, not the mean of the large plot (1.5acre) since FIA only measures trees within four subplots, not the whole large plot. The findings from this study imply that as plot size increases, the effect of plot location error on LiDAR metrics is decreasing. Small position errors are acceptable in homogeneous forest stands, but it is important to have accurate plot position in heterogeneous forest stands with high spatial variation. Whenever possible, using larger plot sizes will reduce the effects of plot position error on LiDAR-derived metrics. In the context of FIA, if plot location is obtained with recreational-grade GPS, which is true in most area, matching LiDAR data with field measurements at the subplot level is risky because of the inaccurate plot locations and the small subplot size, especially in forest stands with high spatial variation. In this case, linking LiDAR data with field measurements using larger plots, which encompass four subplots, may 69 provide a way to characterize forest condition at the similar scale as the combination of four subplots. Because FIA only measures trees within four subplots, not within the whole large plot (1.5acre), if plot location is obtained with survey-grade GPS, the use of smaller subplots is probably better since there is high chance to accurately georeference LiDAR data with field subplot while large LiDAR plot covers more area than four subplots combined. 70 Chapter 5 Forest Height Prediction from Field Measurement and LiDAR Data via Spatial Models 5.1 Introduction Forest height is a crucial inventory attribute for calculating timber volume, forest biomass, site potential, and scheduling silvicultural treatment. Measuring height by current photogrammetric or field survey techniques is time consuming and expensive. As a new emerging remote sensing tool, airborne LiDAR data have been studied to derive height information. Two different approaches have been used to obtain height measurements from LiDAR data. The first approach is to identify individual trees using a canopy height model and extract their height, and the second approach is to regress plot-level or stand-level height on LiDAR-derived metrics which describe vertical and horizontal distribution of forest canopy (Hyyppä et al. 2000, Næsset 2002, Persson et al. 2002, Maltamo et al. 2004, Andersen et al. 2006). Many studies have reported that the accuracy of height estimates from LiDAR data is comparable to field height measurement, while others found LiDAR tends to underestimate individual tree height because of the low probability that the small-footprint laser pulses will intercept the apex of tree top (Hyyppä et al. 2000, Gaveau and Hill 2003, Yu et al. 2004, Andersen et al. 2006). Though these results are promising, most of reported studies were conducted over small areas and field heights were measured carefully or using more expensive and accurate instruments than the hand-held rangefinder commonly used in forest inventory practices such as the US Forest Service Forest Inventory and Analysis (FIA) program. The accuracy of LiDAR-derived height when compared to field height measurement is not clearly understood in an operational forest inventory setting. 71 Another issue with large-area operational forest inventory is the accuracy of plot positions. As stated in Chapter 4, less accurate, easy-to-carry GPS receivers are often used to get the position of field plots. This may introduce inaccurate geographical coregistration of field plots with LiDAR data. If field plots are poorly georeferenced, it is likely that the empirical regression relationship between field height and LiDAR metrics will be affected. Models describing spatial correlations have been used to determine forest biophysical parameters and characterize forest ecosystem structure (Biging and Dobbertin 1995, Stoyan and Stoyan 1998, Stoyan and Penttinen 2000, Lappi, J. 2001, Zawadzki et al. 2005). In this study, instead of linking field plots with LiDAR data directly, a stationary spatial process was assumed for plot-level height, and then spatial models were applied to predict plot-level height at unobserved locations both from field inventory and LiDAR data respectively. The particular objective is to produce maps of predicted plot-level height over a large region, and then compare the distributions of heights predicted from operational field inventory and from LiDAR measurements. 5.2 Study area and data description As in chapters 3 and 4, a set of 95 FIA field plots located in the west of the Kenai Mountains, Kenai Peninsula, Alaska are used for this study. Each field plot consists of a cluster of four circular subplots approximately 1/24 acre in size with a radius of 24.0 ft, and each subplot contains a 6.8-foot fixed-radius microplot (Bechtold and Patterson 2005). Within each subplot, the height of trees with diameter at breast height of 5.0 inches or greater were measured; within each microplot, the height of saplings (1.0-4.9 inches DBH) and seedlings were measured. At each subplot center, a polygon type, which is a unique combination of land cover type, forest density, forest stand size and forest stand origin, was determined and the size of the polygons was collected (field procedures for coastal Alaska inventory 2003). Two aggregated plot-level heights, plot 72 tree height and stand height, were defined and calculated for the purpose of this study. Plot tree height is defined as the average height of individual trees on the plot with DBH equal or greater than 5 inches weighted by polygon area. Stand height is defined as the average height of trees with DBH equal or greater than 5 inches, seedlings, and sapling on the plot weighted by polygon area. As described in chapter 3 and 4, LiDAR data were collected over each field plot and the surrounding area. For each 300m by 300m LiDAR patch, a digital terrain model (DTM) was generated using returns classified by the data provider as bare-earth points. Then all LiDAR returns were spatially registered to the DTM using their coordinates. The relative height of each return was calculated as the difference between its vertical Z coordinate and the terrain surface height. Returns with a relative height value less than 2 meters were excluded to eliminate ground returns, rocks, stumps and low vegetation. The remaining points were considered to be laser canopy hits. Finally, the laser canopy hits within the boundary of a 144-foot fixed-radius plot containing the four subplots were extracted, and LiDAR plot mean height and 90th percentile height were calculated. The reason that the large plot was used instead of four individual subplots is to decrease the effect of inaccurate field plot positions that results from poor GPS positions or azimuth and distance errors when locating the individual subplots. Figure 5.1 Map of study area. Picture in the middle is LANDSAT ETM+ image for the study area and red circles indicate field plot locations. Picture in the right is the LiDAR coverage over one example field plot and colored by height. 73 5.3 Methods Four aggregated plot-level heights (plot tree height and stand height from field measurements, LiDAR plot mean and LiDAR 90th percentile height) from 95 plots were assumed to be a partial realization of a stationary Gaussian process. That is {Z ( s) : s ∈ D ⊂ ℜ 2 }, Z = (Z(s1 ), , … Z(s n )) T has a multivariate normal distribution, where Z(s) represents aggregated plot-level height at location s, D is a fixed subset of 2-dimensional Euclidean space; D ⊂ ℜ 2 contains spatial coordinates s={s1,…,sn} and si is the longitude and latitude coordinates at location i. n is the number of locations, 95 in this case. Stationary means that for any set of n sites {s1,…,sn} and any h ∈ ℜ 2 , the distribution of (Z(s1),…, Z(sn)) is the same as that of (Z(s1+h),…, Z(sn+h)), which implies that the joint distribution doesn’t change when shifted in space. Further, an isotropic process was assumed, which means that the semivariogram function depends upon the separation vector h only through its length ||h||. For the sake of simplicity, the Gaussian process was assumed to have a constant mean, that is Z(s) =µ + ω(s) + ε(s), where µ is the mean component of the model, and ω(s) is a zero-centered stationary Gaussian spatial process, which captures the residual spatial association, and the ε(s) is an uncorrelated pure error term. The ω(s) introduces the partial sill and range parameter and ε(s) adds the nugget effect (Banerjee et al. 2004). Empirical semivariograms of plot-level heights were first fitted by four theoretical parametric models: Gaussian, exponential, Matern and Spherical class. Model parameters were estimated by restricted maximum likelihood methods. For detailed differences between theoretical semivariogram models, refer to Banerjee et al. (2004). The theoretical models allow us to calculate semivariance values for any h that are necessary for other geostatistical calculations and analyses such as kriging. Finally ordinary kriging was applied and maps of predicted height were produced over the 74 entire region along with its standard error. All computations were conducted in the geoR package in R (Ribeiro Jr. and Diggle 2001). 5.4 Results 5.4.1 Empirical semivariogram model fitting Figure 5.2 shows empirical semivariograms as fit using four theoretical models for both field-measurement-based and LiDAR-based plot-level heights. The semivariogram is the function describing the degree of spatial dependence of aggregated plot-level heights and the empirical semivariogram is a nonparametric estimate of the semivariogram. The empirical semivariance for a vector of separation h is derived by calculating one-half the average squared difference in plot-level height for every pair of plots locations separated by h. These values are then plotted against the distances between data pairs. Field plots in our sample were spread over the western Kenai region with the maximum distance of about 163,500 m. It is common not to compute the empirical semivariogram up to the largest possible distance due to the fact that shrinking number of available pairs for larger distances increases the variability of the empirical semivarogram. A general recommendation is to compute the empirical semivariogram up to about one half of the maximum separation distance in the data (Schabenberger and Gotway 2005). In addition, since field plots don’t fall on a regular grid, the distances between pairs are all different. The distance considered needs to be divided into regular bins. The distance values represent the bin midpoints. At least 30 pairs per bin were used to calculate empirical semivariogram (Banerjee et al. 2004). 75 Figure 5.2 Empirical semivariogram fitting of four aggregated plot-level height Figure 5.2 shows that semivariance of aggregated plot-level heights has a similar pattern over distance. All semivariograms rise to a distance around 40,000 m then level off or decrease, which implies that aggregated plot-level heights from two plots may not be correlated when their distance is beyond 40,000 m. No semivarigrams pass 76 through the origin, which suggests that the nugget effect is not zero for all cases. The estimated sill is not the same for four different heights. Estimated sill values are 8, 30, 8, 17 for plot tree height, stand height, LiDAR mean height and LiDAR 90th percentile height respectively. The estimated sill is the sum of total variation explained by the spatial structure and nugget effect. Four different semivariogram models - Gaussian, exponential, Matern and spherical model were fit to empirical semivariograms. The main differences among these theoretical models are the curve smoothness and whether sill can be reached or not. The smooth parameter is infinity for Gaussian model, 1 for Matern model and 0.5 for exponential model. These models were fit interactively "by eye" and curves based on the best fitting model parameters were drawn in Figure 5.2. Within small distances, the spherical curve rises quickly and reaches the plateau in a short distance. The curvature of the Gaussian curve changes sign within a short distance. There is not much difference between exponential (red dash line) and Matern (green dot line) models. From visual examination, none of the models fit well. The better fitting Matern model was finally chosen to be the covariance function. 5.4.2 Spatial prediction Using the Matern covariance model, ordinary kriging was applied and height prediction and standard error over the region were computed at 300m by 300m pixel resolution. Contour maps of predicted height and standard error are displayed in Figure 5.3 and summary statistics are shown in Table 5.1. Empirical cumulative distribution functions and probability density functions of predicted plot-level height are plotted in Figure 5.4. As expected, predicted plot tree height is higher than predicted stand height and predicted LiDAR 90th percentile height is higher than predicted LiDAR mean height. The mean of predicted plot tree height is very similar to the mean of predicted LiDAR 90th percentile height, but predicted plot height has much less range than predicted LiDAR 90th percentile height. This is confirmed by 77 distribution curves in Figure 5.4 in which the predicted LiDAR 90th percentile height represented by blue line spreads more widely than the predicted plot tree height represented by black line. Predicted stand height has similar mean and range as predicted LiDAR mean height. In fact their empirical distributions (green and red lines in Figure 5.4) seem very close. However, predicted stand height has much larger kriging standard error (5.05-5.37 m) than predicted LiDAR mean height (1.94 to 2.78 m). Table 5.1 Summary of predicted plot-level height Mean Median (m) (m) Plot tree height 12.34 12.41 Stand height 7.66 7.72 LiDAR mean height 7.37 7.49 12.00 12.22 LiDAR 90th percentile height Minimum (m) 10.12 4.62 4.12 6.05 Maximum (m) 14.62 10.96 11.25 17.18 Contour maps shown in Figure 5.3 reveal similar spatial patterns for height predicted from field measurements and LiDAR data. A circular area of low height is shown in the north-east of the Kenai Peninsula. Maps of kriging standard error also show the same pattern among different types of plot-level heights except that standard error of predicted stand height is a slightly larger. As expected, all standard error maps indicate that standard error near the location of the observed points is small. 78 Figure 5.3 Maps of predicted plot-level heights from field measurements along with their standard error estimates 79 Figure 5.4 Maps of predicted plot-level heights from LiDAR data along with their standard error estimates 80 Figure 5.5 Empirical cumulative distribution function and kernel density function of predicted plot-level heights 81 5.4.3 Difference in predicted plot-level heights between field-based measurements and LiDAR-based measurements Three groups of comparisons were made: predicted plot tree height vs predicted LiDAR mean height, predicted plot tree height vs predicted LiDAR 90th percentile height, and predicted stand height vs predicted LiDAR mean height. Maps of the differences are shown in Figure 5.5. On average, predicted plot tree height is much higher than predicted LiDAR mean height with mean difference of 4.97m. The differences between predicted plot tree height and predicted LiDAR 90th percentile height, and between predicted stand height and predicted LiDAR mean height, are very small. For the majority of grids, these differences are within 1m as shown in Figure 5.6. On average, predicted plot tree height is higher than predicted LiDAR 90th percentile height by 0.34m and predicted stand height is higher than predicted LiDAR mean height by 0.28m. 82 Figure 5.6 Differences of predicted plot-level heights between field-based measurements and LiDAR-based measurements 83 Figure 5.7 Empirical probability density function of the differences of predicted plot-level heights 5.5 Discussion Semivariogram results indicate that aggregated plot-level heights in this dataset seem to spatially correlate until the distance between locations exceeds about 40,000m. However, since few pairs are located within short distances due to the fact that FIA plots are established based on an array of approximately 6,000-acre hexagons with each hexagon containing only one plot (Bechtold and Patterson 2005), the results may have been different if field plots had a different distribution pattern. Spatial prediction results show that at 300m by 300m pixel resolution, the distribution of predicted stand height is comparable to the distribution of predicted LiDAR mean height with the mean difference of only 0.28m, but predicted plot tree height is much higher than predicted LiDAR mean height with the mean difference of 4.97m. As described earlier, stand height is calculated from trees, saplings and seedlings, while plot tree height is calculated from trees only. In the literature, mean tree height from field measurements is often reported to be higher than corresponding averaged laser canopy height due to the fact that the majority of laser returns would miss tree tops 84 and would be reflected from the side of the crowns of dominant and co-dominant trees. The magnitude of difference depends on forest conditions and the LiDAR acquisition specifications used and varies from study to study, but the majority of the difference is usually within 3 meters (Næsset et al. 2004). The big difference between predicted plot tree height and predicted LiDAR mean height in our results is probably because forests in the western Kenai region have very low stand density (mean stand density is 66 trees per acre), low height and relatively open canopies, the laser can easily pass through the upper canopy and some laser returns are indeed reflected from saplings and seedlings. This also explains why average height from trees, saplings and seedlings is very similar to the predicted LiDAR mean height (Li 2008). In addition, field plot height is the weighted average of tree height from four surveyed subplots while LiDAR mean height is the average of the canopy return heights within the big plot containing all four subplots. This might explain why the minimum and maximum of field plot height and LiDAR mean height are different. The mean of predicted plot tree height is comparable to the mean of predicted LiDAR 90th percentile height, but predicted plot tree height tends to have smaller standard error and range than predicted LiDAR 90th percentile height. Both field-based plotlevel height and LiDAR-based height display similar spatial patterns across the whole region. The choice of the covariance function impacts the kriging prediction. Since our primary interest is spatial prediction, the correctness of the covariance model is important. Unfortunately the selected parametric Matern model doesn’t fit the empirical semivariogram well even though cross validation results indicate it is acceptable. Consequently spatial prediction results may not be highly accurate. In addition, the distance between field plots is large and spatial correlation indicated in the semivariogram is not strong, which may also contribute to inaccurate spatial predictions. Nevertheless, kriging surface maps produced in this study provide a visual 85 display describing the spatial distribution of height, which is very useful information for forest inventory and monitoring. For the sake of simplicity, a constant mean model of Gaussian process was assumed. Considered the large area coverage, adding some covariant variables, such as weather parameters and site conditions, may improve prediction precision. Reliable tree height mapping is useful to support forest inventory and monitoring. Most vegetation mapping today is conducted by manual photo-interpretation or satellite imagery combined with field surveys. The manual photo interpretation technique is costly and the results are dependent on the interpreter. Mapping based on optical satellite imagery requires that the area of interest is cloud-free. In Alaska, nearly persistent cloud cover precludes acquisition of useful optical satellite images for a particular time period. A remote measurement of forest structure that is rapid, reproducible and that provides reasonable spatial resolution is needed. As a rapidlygrowing remote sensing technology, LiDAR offers great potential to capture canopy structure. However, due to high costs to apply LiDAR data in operational forest inventory, LiDAR data are primarily acquired over specific project areas that are typically much smaller than the spatial extent at which most satellite image datasets are routinely acquired. In addition, it is unusual to have accurately georeferenced field plots available over large regions. These factors may limit the operational use of LiDAR. In this study, instead of developing regression models assuming accurate field plot location, a new approach was developed that uses discontinuous LiDAR coverage and spatial models. This new approach produced estimates of plot-level height over a large region using discontinuous LiDAR data that are comparable to those obtained using field inventory. The results are particularly useful for remote areas like Alaska where field work is expensive and optical satellite imagery is not easy to obtain. This approach could save time when greater accuracy is not needed, but quick assessment is necessary. 86 Chapter 6 Conclusions The results presented in this dissertation provide valuable information regarding the utility of LiDAR data for operational forest inventory given field plot design and inaccurate plot location. The processing and analysis techniques described contribute to solve methodology challenges on how to use small-footprint airborne LiDAR to facilitate large-scale operational forest inventory, especially when accurate plot locations are not available. Overall, this research proved that small-footprint airborne LiDAR has a promising future in forest inventory and analysis. Results of this study offer solutions to three important questions regarding the use of LiDAR in the context of operational forest inventory: 1) Is it possible to select a small set of LiDAR metrics which have strong prediction power and also have clear biological interpretations? In chapter 3 of this dissertation, three variable selection methods - stepwise regression, principal component analysis (PCA), and Bayesian Modeling Averaging (BMA) were compared using LiDAR data from three very different forest types: Douglas-fir and western hemlock forest in moist western Washington State, Douglas-fir and Ponderosa pine forest in dry central Washington State, and Spruce and birch forest in Kenai Peninsula in south central Alaska. Separate aboveground biomass regression models were developed for each study site as well as common models using the three study sites combined. Results from principal component analysis indicate that three LiDAR metrics - mean height, coefficient variation of height and canopy LiDAR point density- explain the majority of variation contained within a larger set of LiDARderived metrics, and this is true for three different study sites and the combined 87 dataset. Thus these three metrics were selected as predictive variables for biomass PCA regression models. Final biomass models based on three variable selection methods have R2 values ranging from 0.67 to 0.88 and models contain different sets of LiDAR-derived metrics. Within each study site, the stepwise models had slightly higher R-square values than the BMA and PCA models, but the stepwise models tended to contain more LiDAR metrics than the BMA and PCA models. The BMA models had similar R-square values to the PCA models. However, the BMA models contain different LiDAR metrics across three study sites whereas PCA models contain the same set of LiDAR metrics: mean height, coefficient of variation of height and canopy point density, for the three study sites. It is encouraging to find that the same set of LiDAR-derived metrics was found to be the most predictive across the different forest types. In the literature, many sitespecific empirical relationships have been developed across a variety of forest types, but published models are very different in terms of model form and the LiDAR metrics included. To apply LiDAR data in an operational forest inventory, there is a great need for simple, accurate, consistent, and physically meaningful prediction models that can be used or easily adapted to different regions and sensor systems. Results from this study indicate that it is possible to develop straightforward regression models for different forest types using three primary LiDAR metrics - mean height, coefficient of variation of height and canopy point density. These kinds of LiDAR-based forest structure models would be analogous to the use of aerial stand volume tables that have been widely used in forest inventory for a long time. If this is true for a wide range of forest types and LiDAR systems, it is expected that the operational use of LiDAR in forest inventory will become common. 88 Another appealing aspect of these three LiDAR metrics is their biological interpretation. By definition, LiDAR mean height represents canopy height in the field, coefficient of variation of height represents canopy depth, and canopy point density represents canopy cover. The three LiDAR metrics succinctly describe the 3D canopy structure, which explains why these metrics capture the majority of variation contained in LiDAR data. Forest canopy structure closely correlates with stand structure which is defined as the size and number of woody stems per unit area. Thus models using these three LiDAR metrics likely capture the fundamental allometric relationships between foliage volumes and stem biomass. It is possible to select a small set of LiDAR metrics which have strong prediction power and also have clear biological interpretations. However, due to the different coefficients at different study sites, individual site models using these three variables are recommended. 2) What are the effects of plot location error and plot size on LiDAR-derived metrics and predicted biomass? Traditionally, field plot positions recorded in a large-scale forest inventory program, such as FIA, are intended to assist field crews in relocating the plots, as well as to document their general location. Plot locations with small errors (several meters) are acceptable. However, when using LiDAR data in forest inventory, plot position errors may result in mismatch between field plots and LiDAR data, and thus may affect the empirical relationship between field measurements and LiDAR-derived metrics, which may then influence the prediction of forest inventory variables. The most challenging aspect of integrating LiDAR data into the US Forest Service FIA program is the inaccuracy of field plot locations. In Chapter 4 of this dissertation, an original automated procedure to assess plot location error and plot size on LiDAR-derived metrics and predicted biomass is presented. First grid-level LiDAR metrics were 89 extracted from 3D LiDAR points, and a multi-band LiDAR image was created with each band representing each metric. Second, unsupervised classification was used to stratify the LiDAR patches. Finally, simulation was conducted and LiDAR metrics from simulated plots and from original plots were compared. The results show that for small plot size of 0.04 acre and 0.08 acre, the averaged differences of three LiDAR-derived metrics - mean height, canopy cover and coefficient of variation of height - are small in the LiDAR patches of homogenous forest, and these differences don’t change much as the distance between simulated plot position and original plot position increases. In the LiDAR patches of the medium and heterogeneous forests, these differences increase with increasing distance between simulated plot position and original plot position. For a plot size of 1.5 acre, the average differences of LiDAR-derived metrics between simulated plots and original plots are very small in all three types of LiDAR patches and these changes don’t change much as the position error increases. The effects of plot position error and plot size on above ground biomass estimates were assessed using the residual between field estimated biomass and biomass predicted using regression models developed from original plots and simulated LiDAR metrics. The results indicate that for small plots (0.04 acre), the averaged predicted biomass in the simulated plots for LiDAR patches of homogeneous forest doesn’t differ much from estimated biomass based on field measurement in the original plots, but in the LiDAR patches of heterogeneous forest the differences between predicted biomass in the simulated plots and estimated biomass based on field measurement in the original plots increases with increasing position error. In summary, the results show that the accuracy of field plot position and the size of field plot are important factors affecting the accuracy and precision of LiDAR-derived metrics and predicted biomass in heterogeneous forest stands. The logical conclusion 90 is that it is important to have accurate field plot position in these types of stands. On the other hand, in homogenous stands, small position errors are acceptable. Whenever possible, using larger plots will reduce the effects of plot position error on LiDAR metrics and predicted stand biophysical variables. In the context of US Forest Service FIA, georeferenceing LiDAR data with field measurements at subplot level is risky due to inaccurate plot position records and the small subplot size, especially in the forest stands with high spatial variation. In this case, linking LiDAR data with field measurements using larger plots, which encompass four subplots, may provide a way to characterize forest condition at the similar scale as the combination of four subplots. 3) What are the differences between predicted plot-level height based on operational field inventory and heights based on LiDAR measurements when compared over a large region using spatial models? In chapter 5, four aggregated plot-level heights (plot tree height and stand height from field measurements, LiDAR plot mean and LiDAR 90th percentile height) were defined and compared. Plot tree height is defined as the average height of individual trees with DBH equal or greater than 5 inches weighted by polygon area. Stand height is defined as the average height of trees with DBH equal or greater than 5 inches, seedlings, and sapling weighted by polygon area. LiDAR plot mean height and 90th percentile heights are based on the laser canopy hits within the boundary of a 144-foot fixed radius plot containing the four subplot plots. A stationary Gaussian process with constant mean was assumed and empirical semivariograms of plot-level heights were fit by theoretical parametric models. Then ordinary kriging was implemented and contour maps of predicted plot-level height from field height measurements and from LiDAR data were produced over the entire region along with maps of estimated standard error. Results indicate that, at a 300m by 300m pixel resolution, the spatial trends of predicted plot-level height are similar between field measurements and LiDAR measurements. The distribution of predicted stand height is very similar to the 91 distribution of predicted LiDAR mean height with mean difference of only 0.28m. The mean of predicted plot tree height is comparable to the mean of predicted LiDAR 90th percentile height, but the distribution of predicted LiDAR 90th percentile height has much heavier tails. In this study, instead of developing regression models assuming accurate field plot location, a new approach that uses discontinuous LiDAR coverage and spatial models was developed and maps of predicted plot-level height over a large region were produced. Forest height is a crucial forest inventory variable and forest height mapping is important in forest inventory and monitoring. The method and results are particularly useful for remote areas like Alaska where field work is expensive and optical satellite imagery is difficult or impossible to obtain. In conclusion, the methodology and results presents in this dissertation demonstrate that it is feasible to integrate LiDAR data into large scale forest inventory. Three LiDAR-derived variables - mean height, coefficient of variation of height and canopy point density, which quantify canopy height, canopy depth and canopy cover, respectively, have strong prediction power for forest biophysical structure parameters. When matching field measurements with potentially large location errors with LiDAR data, large plots are recommended due to plot position error and plot size concerns. Using LiDAR data alone, it is possible to stratify forest stands and map forest height. Methodologies developed in this dissertation can be automated with little manpower involved. After these methods are streamlined, it should be possible to provide results, such as regression models and maps over large area, within a few weeks of the LiDAR acquisition. There are some limitations with this study. First of all, the LiDAR data used is not continuous wall-to-wall coverage, but limited to small patches centered on field plots. Classification over these small patches may not represent forest stand condition over 92 the larger spatial extent. Secondly, the above ground biomass is the total weight of oven-dried biological material present above the soil surface in a specified area and the biomass estimates from field measurements in this study were obtained using previously developed allometric relationships with field measured DBH and height. The individual tree biomass was calculated using allometric equations and plot-level biomass is the sum of all measured trees converting to mass per unit area. Majority of the existing allometric equations don’t account for wood density, which is a known factor affecting precise estimate of individual tree biomass, due to the complicity that wood density varies among individuals of a given species, among geographic locations, and within the vertical and radial dimensions of individual trees (Fearnside 1997). Using these allometric equations smoothes out tree to tree variations and results may be different from the true biomass obtained through destructive harvest method. Thus the coefficients of the developed regression models between biomass and LiDAR metrics in this study may change if using true biomass. However, true biomass is difficult to obtain at present, if not possible. In addition, depending on the objectives, true biomass estimates over large area may be not necessary since variation among individual trees may average out. Thirdly, no other image data were used in this study. Fusion of LiDAR data with other optical data, such as aerial photos and satellite images may help characterize the forest canopy, since LiDAR data only provide structure information while optical data can record reflectance properties indication of forest composition which can be used for species recognition. More research is clearly needed to test different sensor and flight parameters, combining LiDAR data with other hyperspectral images. Given the anticipated decline in the cost of LiDAR data collection, it is expected that LiDAR data will be an increasingly useful tool in forest inventory. This study may lead to further advancements and efficiencies in large-scale forest inventories in the following areas: 1) LiDAR data may be used to measure a broad range of structural attributes to quickly update existing forest structure maps. 2) Well-calibrated LiDAR 93 data could be used to capture inventory attributes in remote or inaccessible regions. All of these may lead to more meaningful and ecologically relevant measures of forest composition, structure and function. 94 List of References Ackermann, F. 1999. Airborne laser scanning – present status and future expections. Journal of Photogrammetry and remote sensing 54 (1999) 64-67. Alemdag, I.S. 1984. Total tree and merchantable stem biomass equations for Ontario hardwoods. Report PI-X-46. Canadian Forestry Service, Petawawa National Forestry Institute, Chalk River, ON. 54 p. American Forest Council. 1992. Report of the Blue Ribbon Panel on forest inventory and analysis. Arlington, VA. American Forest Council. 14p. Andersen, H.-E., R.J. McGaughey, and S.E. Reutebuch. 2005. Estimating forest canopy fuel parameters using LiDAR data. Remote Sensing of Environment 94: 441449. Andersen, H.-E. 2003. Estimation of critical forest structure metrics through the spatial analysis of airborne laser scanner data. Ph.D. dissertation, University of Washington, Seattle, WA. Anderson, H-E., Reutebuch, S. E., and McGaughey, R. J. 2006. A rigorous assessment of tree height measurements obtained using airborne lidar and conventional field methods. Canadian Journal of Remote Sensing, Vol. 32 (5): p355-366. Axelsson, P. 1999. Processing of laser scanner data—algorithms and applications. ISPRS Journal of Photogrammetry & Remote Sensing. Vol: 54: 138–147. Axelsson, P., 2000. DEM generation from laser scanner data using adaptive TIN models. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 34 (B4/1), 110-117. Baltsavias, E.P. 1999a. Airborne laser scanning: basic relations and formulas. ISPRS Journal of Photogrammetry and Remote Sensing 54:199-214. Baltsavias, E.P. 1999b. A comparison between photogrammetry and laser scanning ISPRS Journal of Photogrammetry & Remote Sensing 54. 83–94. Baltsavias, E.P. 1999c Airborne laser scanning: existing systems and firms and other resources ISPRS Journal of Photogrammetry & Remote Sensing 54. 164–198. Bang, K. I., Habib, A.F., Kusevic, K., Mrstik, P., 2008. Integration of terrestrial and airborne lidar data for system calibration. The International Archives of the 95 Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B1. Beijing 2008 Commission I, WG I/2. Banerjee, S., Carlin, B. P. and Gelfand, A. E. 2004. Hierarchical modeling and analysis of spatial data. Chapman & Hall/CRC. Bechtold, W. and Patterson, P. 2005. The enhanced forest inventory and analysis program- national sampling design and estimation procedures. USDA Forest Service, Southern Research Station, General Technical Report SRS-80. 85p. Biging, G. S. and Dobbertin, M. 1995. Evaluation of competition indices in individual tree growth models. Forest Science. Vol 41: p360-377. Birdsey, R. 1995. A Brief History of the “Straddler Plot” Debates. Forest Science Monograph, 31, 7-11. Blair, J. B., Hofton, M., Luthcke, S.B. 2001. Wide-swath imaging LiDAR development for airborne and spaceborne applications. International Archives of Photogrammetry and Remote Sensing, Volume XXXIV-3/W4 Annapolis, MD, 22-24 Oct. 2001. Bollandsas, O. M., Naesset, E. 2007. Estimating percentile-based diameter distributions in uneven-sized Norway spruce stands using airborne laser scanner data. Scandinavian Journal of Forest Research. Vol. 22: 33-47. Boudreau, J., Nelson, R., Margolis, H.A., Beaudoin, A., Guindon, L., Kimes, D., 2008. Regional aboveground forest biomass using airborne and spaceborne LiDAR in Québec Remote Sensing of Environment. Volume 112, Issue 10, Pages 3876-3890. Brandtberg, T., 1999. Automatic individual tree-based analysis of high spatial resolution remotely sensed data. Silvestria 118, PhD Thesis, Centre for Image Analysis, Swedish University of Agricultural Sciences, Uppsala, Sweden. Brandtberg, T., Warner, T., Landenberger, R. E., and McGraw, J. B. 2003. Detection and analysis of individual leaf-off tree crowns in small footprint, high sampling density lidar data from the eastern deciduous forest in North America. Remote Sensing of Environment 85(3): 290-303. Brantberg, T. 2007. Classifying individual tree species under leaf-off and leaf-on conditions using airborne lidar. ISPRS Journal of Photogrammetry and Remote Sensing, 61(5): 325-340. 96 Breidenbach, J. McGaughey, R., Andersen, H-E., Reutebuch, S. E. 2007. Influence of plot location errors on the estimation of forest parameters with LIDAR data. Proceedings of ForestStat07. November 5-7, 2007. Montpellier, France. Curtis, R., Marshall, D., and DeBell, D. (eds.). 2004. Silvicultural Options for YoungGrowth Douglas-Fir Forests: The Capitol Forest Study –Establishment and First Results. US For. Serv. Gen. Tech. Rep. PNW-GTR-598. USDA Forest Service, Pacific Northwest Research Station, Portland, OR. 110 p. Cushman, S. A., and McGarigal, K. 2003. Landscape-level patterns of avian diversity in the Oregon Coast Range. Ecological Monographs 73:259-281. Danson, F.M, Armitage, R.P., Bandugula, V., Ramirez, F.A., Tate, N.J., Tansey, K.J., Tegzes, T., 2008. Terrestrial laser scanners to measure forest canopy gap fraction. SilviLaser 2008 CD, Sept. 17-19, 2008. Drake, J.B., R.G. Knox, R.O. Dubayah, D.B, Clark, R. Condit, J.B. Blair, and M. Hofton. 2003. Above-ground biomass estimation in closed canopy neotropical forests using lidar remote sensing: factors affecting the generality of relationship. Global Ecology and Biogeography 12:147-159. Everitt, B.S, and G. Dunn. 2001. Applied Multivariate Data Analysis. Arnold, London. 342 p. Fearnside, P.M. 1997. Wood density for estimating forest biomass in Brazilian Amazonia. Forest Ecology and Management. Vol. 90 (1): 59-87. Flood, M. & Gutelius, B. 1997. Commercial implications of topographic terrain mapping using scanning airborne laser radar. Photogrammetric Engineering and Remote Sensing, Vol. 63(4): 327-329. USDA Forest Service. 2003. Field procedures for the coastal Alaska inventory. 2003. US forest service. 182p. USDA Forest Service. 2005. Forest Inventory and Analysis National Core Field Guide, Volume 1: Field data collection procedures for phase 2 plots, 2005, Version 3.0. Available online at http://fia.fs.fed.us/library/field-guides-methodsproc/docs/2006/core_ver_3-0_10_2005.pdf; Last accessed December 17, 2007. Gaveau, D., and Hill, R. 2003. Quantifying canopy height underestimation by laser pulse penetration in small-footprint airborne laser scanning data. Canadian Journal of Remote Sensing. Vol 29 (5): p650-657. 97 Gobakken, T. & Naesset, E. 2008. Assessing effects of sample plot positioning errors on biophysical stand properties derived from airborne laser scanner data. Proceedings of SilviLaser 2008. Sept 17-19, 2008. Edinburgh, UK. Gray, A. 2003. Monitoring stand structure in mature coastal Douglas-fir forest: effect of plot size. Forest Ecology and Management. 175: 1-16. Hall, S.A., I.C. Burke, D.O. Box, M.R. Kaufmann, and J.M. Stoker. 2005. Estimating stand structure using discrete-return lidar: an example from low density, fire prone ponderosa pine forests. Forest Ecology and Management 208:189-209. Hollaus, M., Wagner, W., Maier, B., Schadauer, K. 2007. Airborne laser scanning of forest stem volume in a mountainous environment. Sensors. Vol 7: p1559-1577. Holmgren, J., Nilsson, M., and Olsson, H. 2003. Estimation of tree height and stem volume on plots using airborne laser scanning. Forest Science 49(3): 419-428. Holmgren, J. 2004. Prediction of tree height, basal area and stem volume in forest stands using airborne laser scanning. Scandinavian Journal of Forest Research 19: 543-553. Holmgren, J. and A. Persson. 2004. Identifying Species of Individual Trees Using Airborne Laser Scanner. Remote Sensing of Environment, 90:415-423. Hopkinson, C., Chasmer, L., Young-Pow, C. & Treitz, P., 2004. Assessing forest metrics with a ground-based scanning lidar. Canadian Journal of Forest Research 34 (3), 573–583. Hoppus, M. L. and Lister, A. 2006. The Status of Accurately Locating FIA Plots Using GPS. USFS Northeastern Research Station, Forest Inventory and Analysis White Paper. Hudak, A.T., N.L. Crookston, J.S. Evans, M.J. Falkowski, A.S. Smith, P.E. Gessler, and P. Morgan. 2006. Regression modeling and mapping of coniferous forest basal area and tree density from discrete-return lidar and multispectral satellite data. Canadian Journal of Remote Sensing 32: 1-13. Hyyppä, J. and Inkinen, M. 1999. Detecting and Estimating Attributes for Single Trees Using Laser Scanner. Photogrammetric Journal of Finland, 16:27-42. Hyyppä, J., Pyysalo, U., Hyyppä, H., and Samberg, A. 2000. Elevation accuracy of laser scanning-derived digital terrain and target models in forest environment. In proceedings of EARSel-SIG-workshop on LIDAR. June 16-17, 2000, Dresden, Germany. FRG, Dresden. 98 Hyyppä, J., Kelley, O., Lehikoinen, M. and M. Inkinen. 2001. A Segmentation-based Method to Retrieve Stem Volume Estimate from 3-d Tree Height Models Produced by Laser Scanners. IEEE Transactions on Geo-science and Remote Sensing, 39:969-975. Hyyppä, J., Hyyppä, H., Litkey1, P., Yu, X., Haggrén, H., Rönnholm, P., Pyysalo, U., Pitkänen J., and Maltamo, M. 2004. Algorithm and methods of airborne laser scanning for forest measurements. In: International archives of photogrammetry, remote sensing and information sciences. Volume XXXVI, Part 8/W2. Edited by M. Thies, B. Koch, H. Spiecker, H. Weinacker. Jolliffe, I.T. 1972. Discarding variables in a principal component analysis. I: Artificial data. Applied Statistics 21: 160-173. Kilian, J., Haala N., Englich M., 1996. Capture and evaluation of airborne laser scanner data. International Archives of Photogrammetry and Remote Sensing, Vol. XXXI, B3, Vienna, Austria. Kraus, K., Pfeifer N., 1998, Determination of terrain models in wooded area with airborne laser scanner data. ISPRS Journal of Photogrammetry & Remote Sensing 53, pp. 193-203. Lappi, J. 2001. Forest inventory of small areas combining the calibration estimator and a spatial model. Canadian Journal of Forest Research Vol 31: p1551–1560. Larson, P.R. 1963. Stem form development of forest trees. Forest Science Monograph No. 5. 42 p. Lefsky, M.A., W.B. Cohen, S.A. Acker, G.G. Parker, T.A. Spies, and D. Harding. 1999. Lidar remote sensing of the canopy structure and biophysical properties of Douglas-fir Western Hemlock Forests. Remote Sensing of Environment 70: 339-361. Lefsky, M.A., W.B. Cohen, G.G Parker, and D.J. Harding. 2002. Lidar remote sensing for ecosystem studies. BioScience 52(1): 19-30. Lefsky, M.A., A.T. Hudak, W.B. Cohen, and S.A. Acker. 2005a. Patterns of covariance between forest stand and canopy structure in the Pacific Northwest. Remote Sensing of Environment 95: 517-531. Lefsky, M.A., A.T. Hudak, W.B. Cohen, and S. A. Acker. 2005b. Geographic variability in lidar predictions of forest stand structure in the Pacific Northwest. Remote Sensing of Environment 95: 532-548. 99 Li, Y., Andersen, H., McGaughey, R. 2008. A Comparison of statistical methods for estimating forest biomass from Light Detection and Ranging data. Western Journal of Applied Forestry. Vol. 23 (4): 223-231. Li, Y. 2008. A comparison of forest height prediction from field measurement and LiDAR data via spatial model. Proceedings of FIA symposium 2008. Oct 21-23, 2008, Park City, Utah. In press. Lim, K.S. and P.M. Treitz. 2004. Estimation of above ground forest biomass from airborne discrete return laser scanner data using canopy-based quantile estimators. Scandinavian Journal of Forest Research 19:558-570. Litkey, P., Liang, X., Kaartinen, H., Hyyppä, J., Kukko, A, Holopainen, M. 2008. Single-scan TLS methods for forest parameter retrieval, SilviLaser 2008 CD, Sept. 1719, 2008. Lolley, M.R. 2005. Wildland Fuel Conditions and Effects of Modeled Fuel Treatments on Wildland Fire Behavior and Severity in Dry Forests of the Wenatchee Mountains. MS thesis, University of Washington, Seattle, WA. Makela, A., and H.T. Valentine. 2006. Crown ratio influences allometric scaling in trees. Ecology 87: 2967-2972. Maltamo, M., Mustonen, K., Hyyppä, J., Pitkänen, J., and Yu, X. 2004. The accuracy of estimating individual tree variables with airborne laser scanning in a boreal nature reserve. Canadian Journal of Forest Research, Vol 34(9): p1791–1801. Maltamo, M., Eerikainen, K., Pitkanen, J., Hyyppa, J., Vehmas, M. 2004. Estimation of timber volume and stem density based on scanning laser altimetry and expected tree size distribution functions. Remote sensing of Environment. Vol 90: p319-330. Maltamo, M., Packalen, P., Yu, X., Eerikainen, K., Hyyppa, J., Pitkanen, J. 2005. Indentifying and quantifying structural characteristics of heterogeneous boreal forests using laser scanner data. Forest Ecology and Management. Vol 216: p41-50. Maltamo, M., Hyyppa, J., Malinen, J. 2006. A comparative study of the use of laser scanner data and field measurements in the prediction of crown height in boreal forests. Scandinavian Journal of Forest Research. Vol 21: p231-238. Manning, G.H., M.R.C. Massie, and J. Rudd. 1984. Metric single-tree weight tables for the Yukon Territory. Inf. Report BC-X-250, Canadian Forestry Service, Pacific Forest Research Centre, Victoria, BC. 60 p. 100 Magnussen, S., Eggermont, P. and V.N. LaRccia. 1999. Recovering Tree Heights from Airborne Laser Scanner Data. Forest Science, Vol. 45: 407-422. Magnussen, S., Gougeon, F. Leckie, D. and Wulder, M. 2001. Predicting Tree Heights form a Combination of LiDAR Canopy Heights and Digital Stem Counts. Workshop on Land Surface Mapping and Characterization Using Laser Altimetry, Annapolis, MD, USA, October 22-24, 2001. McGarigal, K., and Marks, B. J. 1995. FRAGSTATS: spatial pattern analysis program for quantifying landscape structure. USDA For. Serv. Gen. Tech. Rep. PNW351. Means, J.E., A.H. Heather, J.K. Greg, B.A. Paul, and W.K. Mark. 1994. Software for computing plant biomass--BIOPAK users guide. US For. Serv. Gen. Tech. Rep.PNWGTR-340. Pacific Northwest Research Station, Portland, OR. 180 p. Means, J. E., Acker, S.A., Harding, D.J., Blair, J.B., Lefsky, M. A., Cohen, W. B., Harmon, M.E. and W. A. McKee. 1999. Use of Large-footprint Scanning Airborne LiDAR to Estimate Forest Stand Characteristics in the Western Cascades of Oregon. Remote Sensing of Environment, 67(3): 298-308. Means, J., S. Acker, B. Fitt, M. Renslow, L. Emerson, C. Hendrix. 2000. Predicting forest stand characteristics with airborne scanning lidar. Photogrammetric Engineering and Remote Sensing 66(1):1367-1371. Moffiet, T., K. Mengersen, C. Witte, R. King, and R. Denham. 2005. Airborne laser scanning: exploratory data analysis indicates potential variables for classification of individual trees or stands according to species. ISPRS Journal of Photogrammetry and Remote Sensing 59: 289-309. Næsset, E. 2002. Predicting forest stand characteristics with airborne laser using a practical two-stage procedure and field data. Remote Sensing of Environment 80: 8899. Næsset, E. 2004. Practical large-scale forest stand inventory using a small-footprint airborne scanning laser. Scandinavian Journal of Forest Research 19: 164-179. Naesset, E., Gpbakken, T., Holmgren, J., Hyyppa, J. 2004. Laser scanning of forest resources: The Nordic experience. Scandinavian Journal of forest research. 2004. 19. 6. 482-499. Næsset, E., O. M. Bollandsas, and T. Gobakken. 2005. Comparing regression methods in estimation of biophysical properties of forest stands from two different inventories using laser scanner data. Remote Sensing of Environment. 94: 541-553. 101 Naesset, E. 2007. Airborne laser scanning as a method in operational forest inventory: status of accuracy assessments accomplished in Scandinavia. Scandinavian Journal of Forest Research. Vol 22: p433-442. Nelson, R., Short, A., & Valenti, M. 2004. Measuring biomass and carbon in Delaware using an airborne profiling LIDAR. Scandinavian Journal of Forest Research. Vol 19: 500-511. Nilson, T., & Peterson, U., 1994. Age dependence of forest reflectance: Analysis of main driving factors, Remote Sensing of Environment. Vol. 48: 319–331. Nilsson, M. 1996. Estimation of tree heights and stand volume using an airborne LiDAR system. Remote sensing of environment 56:1-7. Ni-Meister, W., D.B. Jupp, and R. Dubayah. 2001. Modeling Lidar waveform in heterogeneous and discrete canopies. IEEE Transactions on Geoscience and Remote Sensing 39: 1943-1958. Oliver, C.D. and B.C. Larson. 1996. Forest stand dynamics. McGraw-Hill, New York. 467 p. Olsson, H. 2003. Summary of the ScandLaser 2003 workshops and recent development in Sweden. In: IN Proceedings of the ISPRS working group VIII/2 'Laser-Scanners for Forest and Landscape Assessment', Freiburg, Germany, 03-06 October 2004. Edited by M. Thies, B. Koch, H. Spiecker, H. Weinacker. Packalen, P., Maltamo, M. 2007. The k-MSN method for the prediction of speciesspecific stand attributes using airborne laser scanning and aerial photographs. Remote sensing of Environment. Vol 109: 328-341. Packalen, P., Pitkanen, J., Maltamo, M. 2008. Comparison of individual tree detection and canopy height distribution approachs: a case study in Finland. Proceedings of SilviLaser 2008. Sept 17-19, 2008. Edinburgh, UK. Paine, D.P. and J.D. Kiser. 2003. Aerial photography and image interpretation. Wiley, Hoboken, New Jersey. 632 p. Pang, Y., Lefsky, M., Andersen, H-E., Miller, M.E. and Sherrill, K. 2008. Validation of the ICESat vegetation product using crown-area-weighted mean height derived using crown lineation with discrete return lidar data. Canadian Journal of Remote Sensing, Vol. 34, Supplement 2: S471-S484. 102 Parker, R.C. & Glass P.A., 2004. High- versus low-density LiDAR in a double-sample forest inventory. Southern Journal of Applied Forestry. Vol 28 (4): 205-210. Parker, R.C. & Evans. 2004. An application of LiDAR in a double-sample forest inventory. Western Journal of Applied Forestry. Vol 19 (2): 95-101. Parker, R. C., & Mitchel A.L. 2005. Smoothed versus unsmoothed LiDAR in a double-sample forest inventory. Southern Journal of Applied Forestry. Vol 29 (1): 4047. Pereira, L.& Janssen, L., 1999. Suitability of laser data for DTM generation: a case study in the context of road planning and design. Photogrammetry Engineering and remote sensing. Vol. 54: 244-253. Persson, Å., Holmgren, J. and Söderman, U. 2002. Detecting and measuring individual trees using an airborne laser scanner. Photogrametric engineering and Remote Sensing, Vol 68: p925-932. Petzold, B., P. Reiss, and W. Stössel, 1999. Laser scanning - surveying and mapping agencies are using a new technique for the derivation of digital terrain models. ISPRS Journal of Photogrammetry & Remote Sensing 54:95-104. Pflugmacher, D., Cohen, W., Kennedy, R., and Lefsky, M. 2008. Regional applicability of forest height and aboveground biomass models for the geoscience laser altimeter system. Forest Science. Vol. 54 (6): 647-657. Piedallu, C. and J.-C. Gegout. 2005. Effects of Forest Environment and Survey Protocol on GPS Accuracy. Photogrammetric Engineering & Remote Sensing 71(9): 1071-1078. Pospecu, S. C., Wynne, R. H. and R. F. Nelson. 2002. Estimating Plot-level Tree Heights with LiDAR: Local Filtering with a Canopy Height Based Variable Window Size. Computer and Electronics in Agriculture, 37(1-3):71-95. Popescu, S.C., Wynne, R.H. and Nelson, R.F. 2003. Measuring individual tree crown diameter with lidar and assessing its influence on estimating forest volume and biomass. Can. J. Remote sensing, vol 29 (5) 564-577. Pospecu, S.C., Wynne, R.H. and J.A. Scrivani. 2004. Fusion of Small-footprint LiDAR and Multispectral Data to Estimate Plot Level Volume and Biomass in Deciduous and Pine Forests in Virginia, USA. Forest Science, 50(4):551-565. 103 Popescu, S. C., Zhao, K. G. 2008. A voxel-based lidar mothod for estimating crown base height for deciduous and pine trees. Remote sensing of environment. 112: 767781. Pretasch, H. 1997. Analysis and modeling of spatial stand structures. Methodological considerations based on mixed beech-larch stand in Lower Saxony. Forest Ecology and Management. Vol 97: p237-253. Raftery, A.E. and S. Richardson. 1996. Model selection for generalized linear models via GLIB, with application to epidemiology. Bayesian Biostatistics. Edit by D.A. Berry and D.K. Stangl. New York: Marcel Dekker, p. 321-354. Raftery, A.E., D. Madigan, and J.A. Hoeting. 1997. Bayesian model averaging for regression models. Journal of the American Statistical Association 92: 179-191. Raftery, A., I. Painter, and C. Volinsky. 2005. BMA: An R Package for Bayesian Model Averaging. R News 5: 2-8. Reutebuch, S., McGaughey, R., Andersen, H., Carson, W., 2003. Accuracy of a highresolution lidar terrain model under a conifer forest canopy. Canadian Journal of Remote Sensing, 29, pp. 527-535. Reutebuch, S.E., Andersen, H-E, and McGaughey, R.J. 2005. Light Detection and Ranging (LIDAR): An Emerging Tool for Multiple Resource Inventory. Journal of Forestry 103(6): 286-292. Riano, D., Meier, E., Allgöwer, B., Chuvieco, E. and Ustin, S.L., 2003. Modeling airborne laser scanning data for the spatial generation of critical forest parameters in fire behavior modeling. Remote Sensing of Environment. Vol. 86(2): 177–186. Ribeiro Jr., P.J. and Diggle, P.J. 2001, geoR: A package for geostatistical analysis. Rnews, Vol 1(2): p15-18. ISSN 1609-3631. Schabenberger, O. and Gotway, C.A. 2005. Statistical methods for spatial data analysis. Chapman & Hall/CRC. 488p. Shaw, D.L. 1979. Biomass equations for Douglas-fir, western hemlock, and red cedar in Washington and Oregon. In Proceedings of the Forest Resource Inventories Workshop, Colorado State University, Fort Collins, CO, July 23-26, 1979. p. 763-781. Singh, T. 1984. Biomass equations for six major tree species of the Northwest Territories. Inf. Report NOR-X-257. Canadian Forestry Service, Northern Forest Research Centre, Edmonton, AB. 21 p. 104 St-Onge, B. A. 2000. Estimating Individual Tree Heights of the Boreal Forest Using Laser Altimetry and Videography. Workshop of ISPRS WG III/2 and III/5: Mapping Surface Structure and Topography by Airborne and Spaceborne Lasers, 7-9.11.1999. St-Onge, B., and Véga, C. 2003. Combining stereophotogrammetry and lidar to map forest canopy height. In Proceedings of the ISPRS working group III/3 workshop “3-D reconstruction from airborne laserscanner and InSAR data”, Dresden, Germany, 8–10 October 2003, volume XXXIV, part 3/W13. pp. 205–210. La Jolla, CA. International Archives of Photogrammetry and Remote Sensing, 32:179184. Stoyan, D. and Stoyan, H. 1998. Non-homogeneous Gibbs process models for forestry-a case study. Biometrical Journal. Vol 40: p521-531. Stoyan, D. and Penttinen. 2000. Recent application of point process methods in forestry statistics. Statistical Science. Vol 15(1): p61-78. Tickle, P.K., Lee, A., Lucas, R.M., Austin, J., and Witte, C. 2006. Quantifying Australian forest floristics and structure using small footprint LIDAR and large scale aerial photography. Forest Ecology and Management. 223: 379-394. Tou, J.L. & Gonzalez, R.C. 1974. Pattern recognition principles. Addison-Wesley Publishing Company. 377p. Vosselman, G. 2000. Slope based filtering of laser altimetry data. IAPRS, Vol. XXXIII, Part B3, Amsterdam, The Netherlands, pp. 935-942. Wang, C.; Glenn, N.; Streutker, D. 2007. Ground-return Identification of Airborne LiDAR Data in a Forested Area Using Gaussian-fitting Models American Geophysical Union, Fall Meeting 2007, abstract #B41E-04. Wang, Y., Weinacker, H, Koch, B. 2007. Development of a procedure for vertical structure analysis and 3D-single tree extraction within forests based on LiDAR point cloud. IAPRS volume XXXVI, part 3/w52, 2007. Wehr, A., Lohr, U. 1999. Airborne laser scanning—an introduction and overview. ISPRS Journal of Photogrammetry & Remote Sensing 54. 68–82. Wulder, M.A. 2003. The current status of laser scanning of forests in Canada and Austria, In: ScandLaser Proceedings, 21-33. 105 Yu, X., Hyyppä, J., Hyyppä, H, and Maltamo, M. 2004a. Effects of flight altitude on tree height estimation using airborne laser scanning. International Archives of Photogrammetry, Remote sensing and spatial information science. Vol XXXVI 8/W2. Yu, X., J. Hyyppa, H. Kaartinen, M. Maltamo. 2004b. Automatic detection of harvested trees and determination of forest growth using airborne laser scanning. Remote Sensing of Environment 90: 451-462. Zawadzki, J., Cieszewski, C.J., Zasada, M. and Lowe, R.C. 2005. Applying geostatistics for investigations of forest ecosystems using remote sensing imagery. Silva Fennica. Vol 39(4): p599–617. Zimble, D. A, Evans, D. L., Carlson, G. C., Parker, R. C., Grado, S. C., Gerard, P. D. 2003. Characterizing vertical forest structure using small-footprint airborne LiDAR. Remote sensing of environment. 87: 171-182. 106 VITA Name: Education Ph D (2009) Yuzhen Li Quantitative Resource Management (Remote Sensing), University of Washington, Seattle, WA, USA MS (2008) Statistics, University of Washington, Seattle, WA, USA MS (2005) Quantitative Resource Management (Forest Biometrics), University of Washington, Seattle, WA, USA MS (1998) Silviculture, Chinese Academy of Forestry, Beijing, China BS (1995) Forest Science, ShanDong Agriculture University, TaiAn, China Professional Experience Graduate Intern, June 2008 – Sep. 2008, Biometrics & Statistics group, Western Timberlands Research Department, Weyerhaeuser Company, Federal way, WA Graduate Teaching Assistant, Jan. 2008 - June 2008 Department of Statistics, University of Washington, Seattle, WA Graduate Research Assistant, Sep. 2001- Dec. 2007, Sep. 2008- Mar. 2009 College of Forest Resources, University of Washington, Seattle, WA Assistant Researcher, July 1998- Mar. 2001 Chinese Academy of Forestry, Beijing, China