Quantifying uncertainty in forest carbon modelling with Bayesian statistics G. Patenaude

advertisement
Quantifying uncertainty in forest carbon modelling with Bayesian statistics
G. Patenaude1, M. Van Oijen2, R. Milne2 and T.P. Dawson1
1
Environmental Change Institute, University of Oxford, 1a Mansfield road, Oxford, UK,
OX1 3SZ . e-mail: bu01gp@ceh.ac.uk
2
Centre for Ecology and Hydrology, Bush Estate, Penicuik, Midlothian, Edinburgh,
UK.
_______________________________________________________________________
Abstract
Process based models have been widely used for assessing forest above
ground carbon content and fluxes, as they enable deeper insights into the
drivers of forest production and fluxes while providing higher flexibility than
conventional production tables. However, in spite of the numerous models that
exist, few have reached an operational status beyond that of the research realm.
The lack of data and knowledge about their reliability has hampered their
practical use. In this paper, we present a Bayesian calibration as a solution to
this setback, where the parameters and the data used in the calibration process
are presented in the form of probability distributions, reflecting our degree of
certainty about them. As further information is gained, the distributions are
updated. In using this approach, the presentation of uncertainties, over the
derivation of an optimised set of parameter based on a goodness-of-fit approach
is advocated. The approach is tested on the 3-PG model for the estimation of
above ground carbon stocks of UK Corsicants pine forests. The results show the
ability of the approach to produce model outputs and parameter uncertainty
distributions.
_______________________________________________________________________
Introduction
Process-based models have been widely used in the fields of forest physiology and forest
ecology, as they enable deeper insights into the drivers of forest production and offer higher
flexibility than conventional production tables (Landsberg and Waring, 1997). However, in
spite of their flexibility, few process based models have reached an operational status as
little is known about their reliability or about the uncertainty in parameters and model outputs.
Bayesian calibration now make it possible to parameterise models while quantifying the
uncertainty in parameters and model outputs (Gertner et al., 1999; Green et al., 1999). The
parameters and the data used in the calibration process are presented in the form of
probability distributions, reflecting our degree of certainty about them (Jansen, 1999). In this
paper, a Bayesian approach was used to calibrate the 3-PG model (Physiological Processes
Predicting Growth, Landsberg and Waring 1997) for the quantification of the carbon held in a
UK Corsican pine plantation.
Study site, model initialisation and climate data
The Thetford plantation is located in East Anglia, U.K (52°30´ N, 0°30´ E) and has been
extensively studied since its creation (e.g. Baker, 1992, Ovington, 1957). Of all stands,
Corsican pines of yield class 14 were modelled here as this is the prevalent yield class
occurring in Thetford for this species.
The 3-PG model was initialised at tree age of 15 years using biomass data obtained from
Baker (1992) and Baker et al. (1994). Root, stem and foliage biomass were 7.1 t/ha, 22 t/ha
(including stem and branches) and 9.8 t/ha respectively. Conversion to carbon content was
achieved using species specific conversion factors (Hamilton, 1975). Initial stocking of 3955
trees per hectare was obtained from production tables (Edwards and Christie 1981).
Long term average climatic conditions were derived from the Cambridge botanical garden
meteorological station (http://badc.nerc.ac.uk/home/index.html), while mean monthly
precipitations and mean daily incident solar radiation were derived from the Climate
Research Unit datasets (New et al. 2000).
Bayesian calibration
A total of 14 parameters in 3-PG considered to be non sensitive by Esprey et al. (2004) were
fixed at constant values in this analysis. Bayesian calibration was applied to the remaining
22 parameters.
Bayes’ theorem
If we define θ as a parameter vector for 3-PG, then P(θ) represents its probability distribution
and P(f(θ)) the uncertainty in model outputs (f(θ)) generated by the uncertainty in the
parameters. In this context, Bayesian calibration is a method enabling P(θ) to be updated as
new data come in. Given a dataset D, we can derive P(θ|D) from P(θ) by applying Bayes’
Theorem:
P(θ|D) = P(θ) P(D|θ) / P(D)
Where P(θ|D) is the posterior parameter distribution; P(θ) is the prior parameter distribution;
P(D|θ) is the conditional probability of the data for a given parameterisation (the likelihood);
and P(D) is a normalization constant.
The prior
The prior distribution is built from marginal distributions, which reflect our current knowledge
of the parameters. Uniform distributions, bounded by a biophysically or biologically
reasonable maximum and minimum value for each of the 22 parameters in 3-PG were used,
as no studies on the parameterisation of 3-PG for Corsican pines have previously been
conducted.
The likelihood
A total of 30 data points were used in the calibration exercise. These included total aboveground carbon content (11), stem carbon content including branches (3), foliage carbon
content (3), root carbon content (5) and LAI (8). The datasets were derived from Baker
(1992), Baker et al. (1994) and from the forestry yield tables for Corsican stands of yield
class 14, under an intermediate thinning strategy and planted at 2 meters spacing (Edwards
and Christie, 1981). For root carbon content, a ratio above to below-ground carbon content
was derived from data provided by Ovington (1957). LAI data points were from Ovington
(1957) combined with the growth pattern in Mencuccini and Grace, (1996).
To calculate the probability of the data given a specific model parameterisation, P(D|θ),
information about measurement error must be available. No such detailed information was
available as the data were derived from various sources. We therefore assumed that the
errors were independent, Gaussian and that standard deviations (SD) of errors were equal to
10 t ha-1 for above-ground, stem and root carbon content, 2 t ha-1 for foliage biomass, and 1
for LAI.
P(D|θ) then follows from the comparison of each data point Di with the corresponding model
output fi(θ) as:
n
P(D|θ) =
∏ϕ (D
i
− f i (θ ) ;0, SDi )
i
where, ϕ symbolises a Gaussian function with 0 and SDi as mean and standard deviation,
and n=30, the number of points in the data sample.
The posterior
As 3-PG cannot be solved analytically, a Monte Carlo Markov Chain (Metropolis Hastings
Random Walk) was used, which has the following steps (e.g. Jansen, 1999):
1. Propose a new candidate from the parameter space as θ'=θt + ε, where θ' is the proposed
candidate, θt is the current parameter vector and ε is a random vector enabling the
exploration of the parameter space.
2. Calculate the ratio of probabilities β: β =
p (θ ' | D ) p ( D | θ ' ) p (θ ' )
=
p (θ t | D ) p ( D | θ t ) p (θ t )
3. Generate a uniform random variable u (0≤u≤1). The new candidate θ' is accepted and
becomes θt+1 if u ≤ β. If β ≥1, the proposal is always accepted.
Results
After the Bayesian calibration exercise, the uncertainty for 14 out of 22 calibrated parameters
has been reduced while eight posterior distributions for the parameters remained near
uniform. Figure 1 exemplifies the distribution used, prior to the calibration, and the resulting
posterior distribution for one 3-PG parameter: maximum temperature for growth.
30
8000
28
7000
26
T opt
22
20
18
16
Prior marginal
distribution
Posterior marginal
distribution
6000
24
Frequency
5000
4000
3000
2000
14
1000
12
MCMC steps
30
26
10000 15000 20000 25000 30000
22
5000
18
0
14
0
10
10
Temperature (degrees Celcius)
Optimum temperature for growth (°C)
a.
b.
Temperature (degrees Celcius)
Figure 1 Examples of a marginal posterior distribution for a selected parameter. (a.)
shows approximately 30 000 MCMC steps resulting in the posterior distribution shown in
(b.).
In Figure 2 three principal results are presented: (i) the mean from the marginal posterior
distribution of 3-PG model outputs (ii) 3-PG model outputs resulting from the maximum a
posteriori estimate of θ, considered as the single “best” parameter value estimated from the
MCMC sample (see for instance Van Oijen et al. in press), and (iii) the data points used in
the likelihood (the measured data).
10
Best fit
Measured
-1
Foliage carbon content (t ha )
8
LAI
4
6
4
Best fit
Measured
Posterior mean
2
2
0
10
20
30
40
50
60
Time (years)
70
0
80
Best fit
50
-1
Stem carbon content (t ha )
Posterior mean
40
10
30
20
10
0
20
30 40 50
Time (years)
60
70
80
Best fit
140
Measured
-1
Root carbon content (t ha )
Posterior mean
8
6
Measured
120
Posterior mean
100
80
60
40
20
0
10
20
30
40
50
60
Time (years)
70
80
0
10 20 30 40 50 60 70 80
Time (years)
Figure 2 Mean from the marginal posterior distribution for 3-PG model outputs. Outputs
from the maximum a posteriori estimate of θ, considered as the single “best” parameter
value from the MCMC sample (continuous line) and the measured data are also shown.
Error bars are standard deviation to the measured data points. The saw toothed pattern
results from the thinning regime applied.
Conclusion
Commonly used error minimisation approaches ignore uncertainty, a principal cause for
mismanagement of natural resources. Conversely, Bayesian calibration advocates the
quantification of uncertainties to parameters, thereby yielding uncertainties in model outputs,
over the derivation of an optimised set of parameter based on a goodness-of-fit approach
(e.g. the maximum-likelihood approach). By doing so, Bayesian calibration targets the muchneeded platform for expressing parameter and output uncertainty in forest-growth modelling.
Reference
Baker, J.R., (1992). The UK element of the Maestro-1 SAR campaign. International Journal of Remote
Sensing, 13(9): 1593-1608.
Baker, J.R., Mitchell, P.L., Cordey, R.A., Groom, G.B., Settle, J.J. and Stileman, M.R., (1994).
Relationships between Physical Characteristics and Polarimetric Radar Backscatter for Corsican Pine
Stands in Thetford Forest, Uk. International Journal of Remote Sensing, 15(14): 2827-2849.
Edwards, P.N. and Christie, J.M., (1981). Yield models for forest management. Forestry Commission
booklet ; no. 48. HMSO, London, 274 pp.
Esprey, L.J., Sands, P.J. and Smith, C.W., (2004). Understanding 3-PG using a sensitivity analysis.
Forest Ecology and Management, 193(1-2): 235-250.
Gertner, G.Z., Fang, S.F. and Skovsgaard, J.P., (1999). A Bayesian approach for estimating the
parameters of a forest process model based on long-term growth data. Ecological Modelling, 119(23): 249-265.
Green, E.J., MacFarlane, D.W., Valentine, H.T. and Strawderman, W.E., (1999). Assessing
uncertainty in a stand growth model by Bayesian synthesis. Forest Science, 45(4): 528-538.
Jansen, M.J.W., (1999). Data use and Bayesian statistics for model calibration. In: A. Stein and
F.W.T. Penning de Vries (Editors), Data and models in action. Kluwer, Dordrecht, pp. 69-80.
Landsberg, J.J. and Waring, R.H., (1997). A generalised model of forest productivity using simplified
concepts of radiation-use efficiency, carbon balance and partitioning. Forest Ecology and
Management, 95: 209-228.
Mencuccini, M. and Grace, J., 1996. Hydraulic conductance, light interception and needle nutrient
concentration in Scots pine stands and their relations with net primary productivity. Tree Physiology,
16(5): 459-468.
New, M., Hulme, M. and Jones, P. (2000). Representing twentieth-centurey space-time climate
variability. Part II: Development of 1901-96 monthly grids of terrestrial surface climate. Journal of
Climate, 13(13):2217-2238.
Ovington, J.D., 1957. Dry-matter production by Pinus sylvestris L. Annals of Botany, XXI(82): 288314.
Van Oijen, M., Rougier, J., Smith, R. (In press). Bayesian calibration of process-based models:
bridging the gap between models and data. Tree Physiology.
Download