Quantifying uncertainty in forest carbon modelling with Bayesian statistics G. Patenaude1, M. Van Oijen2, R. Milne2 and T.P. Dawson1 1 Environmental Change Institute, University of Oxford, 1a Mansfield road, Oxford, UK, OX1 3SZ . e-mail: bu01gp@ceh.ac.uk 2 Centre for Ecology and Hydrology, Bush Estate, Penicuik, Midlothian, Edinburgh, UK. _______________________________________________________________________ Abstract Process based models have been widely used for assessing forest above ground carbon content and fluxes, as they enable deeper insights into the drivers of forest production and fluxes while providing higher flexibility than conventional production tables. However, in spite of the numerous models that exist, few have reached an operational status beyond that of the research realm. The lack of data and knowledge about their reliability has hampered their practical use. In this paper, we present a Bayesian calibration as a solution to this setback, where the parameters and the data used in the calibration process are presented in the form of probability distributions, reflecting our degree of certainty about them. As further information is gained, the distributions are updated. In using this approach, the presentation of uncertainties, over the derivation of an optimised set of parameter based on a goodness-of-fit approach is advocated. The approach is tested on the 3-PG model for the estimation of above ground carbon stocks of UK Corsicants pine forests. The results show the ability of the approach to produce model outputs and parameter uncertainty distributions. _______________________________________________________________________ Introduction Process-based models have been widely used in the fields of forest physiology and forest ecology, as they enable deeper insights into the drivers of forest production and offer higher flexibility than conventional production tables (Landsberg and Waring, 1997). However, in spite of their flexibility, few process based models have reached an operational status as little is known about their reliability or about the uncertainty in parameters and model outputs. Bayesian calibration now make it possible to parameterise models while quantifying the uncertainty in parameters and model outputs (Gertner et al., 1999; Green et al., 1999). The parameters and the data used in the calibration process are presented in the form of probability distributions, reflecting our degree of certainty about them (Jansen, 1999). In this paper, a Bayesian approach was used to calibrate the 3-PG model (Physiological Processes Predicting Growth, Landsberg and Waring 1997) for the quantification of the carbon held in a UK Corsican pine plantation. Study site, model initialisation and climate data The Thetford plantation is located in East Anglia, U.K (52°30´ N, 0°30´ E) and has been extensively studied since its creation (e.g. Baker, 1992, Ovington, 1957). Of all stands, Corsican pines of yield class 14 were modelled here as this is the prevalent yield class occurring in Thetford for this species. The 3-PG model was initialised at tree age of 15 years using biomass data obtained from Baker (1992) and Baker et al. (1994). Root, stem and foliage biomass were 7.1 t/ha, 22 t/ha (including stem and branches) and 9.8 t/ha respectively. Conversion to carbon content was achieved using species specific conversion factors (Hamilton, 1975). Initial stocking of 3955 trees per hectare was obtained from production tables (Edwards and Christie 1981). Long term average climatic conditions were derived from the Cambridge botanical garden meteorological station (http://badc.nerc.ac.uk/home/index.html), while mean monthly precipitations and mean daily incident solar radiation were derived from the Climate Research Unit datasets (New et al. 2000). Bayesian calibration A total of 14 parameters in 3-PG considered to be non sensitive by Esprey et al. (2004) were fixed at constant values in this analysis. Bayesian calibration was applied to the remaining 22 parameters. Bayes’ theorem If we define θ as a parameter vector for 3-PG, then P(θ) represents its probability distribution and P(f(θ)) the uncertainty in model outputs (f(θ)) generated by the uncertainty in the parameters. In this context, Bayesian calibration is a method enabling P(θ) to be updated as new data come in. Given a dataset D, we can derive P(θ|D) from P(θ) by applying Bayes’ Theorem: P(θ|D) = P(θ) P(D|θ) / P(D) Where P(θ|D) is the posterior parameter distribution; P(θ) is the prior parameter distribution; P(D|θ) is the conditional probability of the data for a given parameterisation (the likelihood); and P(D) is a normalization constant. The prior The prior distribution is built from marginal distributions, which reflect our current knowledge of the parameters. Uniform distributions, bounded by a biophysically or biologically reasonable maximum and minimum value for each of the 22 parameters in 3-PG were used, as no studies on the parameterisation of 3-PG for Corsican pines have previously been conducted. The likelihood A total of 30 data points were used in the calibration exercise. These included total aboveground carbon content (11), stem carbon content including branches (3), foliage carbon content (3), root carbon content (5) and LAI (8). The datasets were derived from Baker (1992), Baker et al. (1994) and from the forestry yield tables for Corsican stands of yield class 14, under an intermediate thinning strategy and planted at 2 meters spacing (Edwards and Christie, 1981). For root carbon content, a ratio above to below-ground carbon content was derived from data provided by Ovington (1957). LAI data points were from Ovington (1957) combined with the growth pattern in Mencuccini and Grace, (1996). To calculate the probability of the data given a specific model parameterisation, P(D|θ), information about measurement error must be available. No such detailed information was available as the data were derived from various sources. We therefore assumed that the errors were independent, Gaussian and that standard deviations (SD) of errors were equal to 10 t ha-1 for above-ground, stem and root carbon content, 2 t ha-1 for foliage biomass, and 1 for LAI. P(D|θ) then follows from the comparison of each data point Di with the corresponding model output fi(θ) as: n P(D|θ) = ∏ϕ (D i − f i (θ ) ;0, SDi ) i where, ϕ symbolises a Gaussian function with 0 and SDi as mean and standard deviation, and n=30, the number of points in the data sample. The posterior As 3-PG cannot be solved analytically, a Monte Carlo Markov Chain (Metropolis Hastings Random Walk) was used, which has the following steps (e.g. Jansen, 1999): 1. Propose a new candidate from the parameter space as θ'=θt + ε, where θ' is the proposed candidate, θt is the current parameter vector and ε is a random vector enabling the exploration of the parameter space. 2. Calculate the ratio of probabilities β: β = p (θ ' | D ) p ( D | θ ' ) p (θ ' ) = p (θ t | D ) p ( D | θ t ) p (θ t ) 3. Generate a uniform random variable u (0≤u≤1). The new candidate θ' is accepted and becomes θt+1 if u ≤ β. If β ≥1, the proposal is always accepted. Results After the Bayesian calibration exercise, the uncertainty for 14 out of 22 calibrated parameters has been reduced while eight posterior distributions for the parameters remained near uniform. Figure 1 exemplifies the distribution used, prior to the calibration, and the resulting posterior distribution for one 3-PG parameter: maximum temperature for growth. 30 8000 28 7000 26 T opt 22 20 18 16 Prior marginal distribution Posterior marginal distribution 6000 24 Frequency 5000 4000 3000 2000 14 1000 12 MCMC steps 30 26 10000 15000 20000 25000 30000 22 5000 18 0 14 0 10 10 Temperature (degrees Celcius) Optimum temperature for growth (°C) a. b. Temperature (degrees Celcius) Figure 1 Examples of a marginal posterior distribution for a selected parameter. (a.) shows approximately 30 000 MCMC steps resulting in the posterior distribution shown in (b.). In Figure 2 three principal results are presented: (i) the mean from the marginal posterior distribution of 3-PG model outputs (ii) 3-PG model outputs resulting from the maximum a posteriori estimate of θ, considered as the single “best” parameter value estimated from the MCMC sample (see for instance Van Oijen et al. in press), and (iii) the data points used in the likelihood (the measured data). 10 Best fit Measured -1 Foliage carbon content (t ha ) 8 LAI 4 6 4 Best fit Measured Posterior mean 2 2 0 10 20 30 40 50 60 Time (years) 70 0 80 Best fit 50 -1 Stem carbon content (t ha ) Posterior mean 40 10 30 20 10 0 20 30 40 50 Time (years) 60 70 80 Best fit 140 Measured -1 Root carbon content (t ha ) Posterior mean 8 6 Measured 120 Posterior mean 100 80 60 40 20 0 10 20 30 40 50 60 Time (years) 70 80 0 10 20 30 40 50 60 70 80 Time (years) Figure 2 Mean from the marginal posterior distribution for 3-PG model outputs. Outputs from the maximum a posteriori estimate of θ, considered as the single “best” parameter value from the MCMC sample (continuous line) and the measured data are also shown. Error bars are standard deviation to the measured data points. The saw toothed pattern results from the thinning regime applied. Conclusion Commonly used error minimisation approaches ignore uncertainty, a principal cause for mismanagement of natural resources. Conversely, Bayesian calibration advocates the quantification of uncertainties to parameters, thereby yielding uncertainties in model outputs, over the derivation of an optimised set of parameter based on a goodness-of-fit approach (e.g. the maximum-likelihood approach). By doing so, Bayesian calibration targets the muchneeded platform for expressing parameter and output uncertainty in forest-growth modelling. Reference Baker, J.R., (1992). The UK element of the Maestro-1 SAR campaign. International Journal of Remote Sensing, 13(9): 1593-1608. Baker, J.R., Mitchell, P.L., Cordey, R.A., Groom, G.B., Settle, J.J. and Stileman, M.R., (1994). Relationships between Physical Characteristics and Polarimetric Radar Backscatter for Corsican Pine Stands in Thetford Forest, Uk. International Journal of Remote Sensing, 15(14): 2827-2849. Edwards, P.N. and Christie, J.M., (1981). Yield models for forest management. Forestry Commission booklet ; no. 48. HMSO, London, 274 pp. Esprey, L.J., Sands, P.J. and Smith, C.W., (2004). Understanding 3-PG using a sensitivity analysis. Forest Ecology and Management, 193(1-2): 235-250. Gertner, G.Z., Fang, S.F. and Skovsgaard, J.P., (1999). A Bayesian approach for estimating the parameters of a forest process model based on long-term growth data. Ecological Modelling, 119(23): 249-265. Green, E.J., MacFarlane, D.W., Valentine, H.T. and Strawderman, W.E., (1999). Assessing uncertainty in a stand growth model by Bayesian synthesis. Forest Science, 45(4): 528-538. Jansen, M.J.W., (1999). Data use and Bayesian statistics for model calibration. In: A. Stein and F.W.T. Penning de Vries (Editors), Data and models in action. Kluwer, Dordrecht, pp. 69-80. Landsberg, J.J. and Waring, R.H., (1997). A generalised model of forest productivity using simplified concepts of radiation-use efficiency, carbon balance and partitioning. Forest Ecology and Management, 95: 209-228. Mencuccini, M. and Grace, J., 1996. Hydraulic conductance, light interception and needle nutrient concentration in Scots pine stands and their relations with net primary productivity. Tree Physiology, 16(5): 459-468. New, M., Hulme, M. and Jones, P. (2000). Representing twentieth-centurey space-time climate variability. Part II: Development of 1901-96 monthly grids of terrestrial surface climate. Journal of Climate, 13(13):2217-2238. Ovington, J.D., 1957. Dry-matter production by Pinus sylvestris L. Annals of Botany, XXI(82): 288314. Van Oijen, M., Rougier, J., Smith, R. (In press). Bayesian calibration of process-based models: bridging the gap between models and data. Tree Physiology.