Information System for Forest Management Models and Datasets

advertisement
Information System for Forest Management Models and Datasets
Allan Sims, Andres Kiviste, Artur Nilson, Maris Hordo
Institute of Forestry and Rural Engineering, Estonian Agricultural University
Kreutzwaldi 5, 51014 Tartu, Estonia
Abstract
The aim of the database of forest management models is to standardize the form of
forest management models and to simplify working with them in application software.
The database contains 12 tables. There are tables for formulas, constants, authors,
papers, variables, etc. The application allows the testing of models. There are two
different ways to test, one where a user can give some values to arguments and the
results are drawn as a chart, and another where the results are calculated based on
test data. The first one can be used with all models. It is the initial test to see if model
constants and arguments are entered correctly. The second type of test calculates
results and presents them in a table using the test data. The database currently
includes 171 forest measurement models from 10 countries. The database is webbased and freely available at http://www.eau.ee/~mbaas/.
Introduction
Scientific activity as a whole involves creating, applying or checking models. In a forestry
information system it is now possible to use information on forestry in the form of models.
Thus, much forestry research produces models (Nilson 1992).
A huge variety of models were created to solve some narrow problem but were not important
to publish. Thus, many models have no published documentation. Just as often, in far too
many cases model documentation leaves a lot to be desired. And worse still, there are a
number of models for which no documentation exists at all, with the effect that published
research results based on these models are not verifiable (Benz & Knorrenschild 1997).
Considering the overwhelming flood of publications on forest models, their haphazard
availability and complicated presentation, the authors found it necessary to create a handy
collection of existing models. The best environment for such a collection is a standard
database. The system should provide the possibility to administer the data and retain it in the
course of work.
To avoid reinventing the wheel when someone cannot find a model he or she needs it is
necessary to collect the models we already have (Nilson 1992).
The creation of a database forces one to systematize and to critically check forestry models.
As a result, it is possible to use current information more efficiently and to clarify important
subjects for research (Kiviste 1994).
A wide range of forest models has been developed over the last four or more decades,
during which concerted efforts have been made at forest modelling. The need for the
conservation and sharing of these models would seem to be clear (Rennolls et al. 2002).
Nowadays, most forest management decisions are based on models. Models help to
understand forest behaviour. At the Estonian State Forest Management Centre, a computerbased decision support system is being developed. Both adequate forest stand descriptions
and stand growth and structure models are needed for the effective use of the system
(Kiviste & Hordo 2003). According to forest literature from the last decades (Hägglund 1981,
Pretzsch et al. 2002), traditional growth and yield tables and equations approximating them
are being replaced with more sophisticated stand growth simulators. It is therefore possible
to simulate different scenarios and to choose the best sustainable forest management
regimes.
Similar ideas are also widely developed, but unfortunately not as much as we would expect.
They were started some years ago at the University of Kassel, Germany, with “The Register
of Ecological Models” (Benz et al. 2001), and at the University of Greenwich, England, with
“The Forest Model Archive” (Rennolls et al. 2002).
The objective of the present study is to create a system for collecting and testing forest
management models. The information system consists of two parts: One is the database of
forest management models and the other consists of empirical datasets. For testing
purposes the models are verified and validated.
The aim of the system is to systematize and structure information we have about models.
Models are constantly improved; hence, adding models to the system is a continual activity.
The goal is to create an online database of forest management models
(http://www.eau.ee/~mbaas/) where everyone can enter new models and comment on
models already entered.
After creating to entering into the database, each can be repeatedly rewritten, and each time
it is possible to make mistakes. To avoid them the system should be able to apply models
directly from the database. If models are applicable directly it is simple to test them and
improve the quality of the database.
Material and methods
Model
A model is an abstraction, or a simplified representation, of some aspect of reality. Models
may be expressed in verbal (e.g. a description) or material forms (e.g. a scale model). A
mathematical model is like a verbal model but uses mathematical language, which is more
concise and less ambiguous than natural language (Vanclay 1994). A mathematical forestry
model is intended to predict tree or forest development by the use of empirical mathematical
equations.
Basically, models serve one or both of the following purposes:
• Testing the overall understanding one has of a system, on the basis of a mathematical
representation of its subsystems and the proposed coupling of its subsystems;
• Predicting the future, to explain how and why things have worked in the past (Wallman et al.
2002).
A model may be considered a whole-stand model, a size class model, a single-tree model or
a landscape-level model. In whole-stand models no details of the individual trees in the stand
are determined. Single-tree models use the individual tree as the basic unit of modelling.
Size class models provide some information regarding the structure of the stand. Landscape
and ecosystem models take into account not only trees but also other elements of vegetation
(Vanclay 1994, Wallman et al. 2002).
A model may also be used for understanding or for prediction. The basic form of purely
predictive models is probably multiple regression; models for understanding are different and
seek to improve our ability to explain how nature works.
Principles of the database
The first part of the information system is a database of forest management models. The
database is based on the following principles (Sims 2001):
• The names of tables and fields are in English to simplify international distribution.
• The names of tables and fields are up to 8 characters long to avoid complications with
applications that do not allow longer names.
• IDs are used in fields containing language-specific terms, and every such code is explained
in a separate table. Thus it is simple to translate model information into many languages.
• Every model has to provide enough information about the author(s) and the publications in
which it has been presented. In this way it is simple to find more information about a model.
• The database should be applicable to simplify the verification and subsequent use of a
model.
• Models and metadata have to be entered so it is simple to translate them into many
languages for ease of international distribution.
• Models have to contain enough information to understand their purpose and simplify
querying from the database.
The structure of the database of forest management models
The database contains 12 tables (Sims 2001, Sims 2003). Four main tables contain
information on models:
• Formulas (a table: formmain) – contains formulas plus general information, e.g. a country,
comment, type of model (static, growth, etc.)
• Argument information (a table: argsinfo) – contains descriptions of arguments, e.g. a name
of the argument, a unit, etc.
• Constants (a table: constans) – contains constants for every tree species separately.
• List of tree species (a table: formulas) – contains a list of tree species for which a model is
created.
• The adequacy of species and constants (a table: casetbl2) - a model has one formula but
different constants for different species. This table connects tables for formulas with
constants by species.
Additional tables contain metadata:
• List of publications in which models are published (tables: bibllist, mpaper) - one publication
can have many models; therefore, publications have their own table and each model is
referred to by its ID only.
• Action and time (tables:actdescr, action) - it is necessary to also document every action and
the time it is taken. The different actions are model creation, first entry of model, subsequent
entries, etc.
• List of authors (a table: author) - information about authors.
• List of codes (a table: codifinf) – the list of different codes used to describe arguments.
• List of tree species (a table: species) – the names of tree species in Estonian, English and
Latin.
Tools
At the web-server the data are stored in a MySQL database server. The programming
language is PHP. For each model it is possible to produce an output, in html-format for
information and in plain text format for user-defined functions (Sims 2004).
Model verification and validation is performed using the program R for statistical computing
(Dalgaard 2002). To simplify model analysis a package is created for the environment of R.
The package is downloadable from the database homepage (http://www.eau.ee/~mbaas/).
Restrictions on collected models
Thousands of models exist already and their structures vary. Putting all models into the
database will make it overly complicated. Thus, the database has some restrictions on which
models are allowed to be entered.
As the name of the database refers, only forest management models are collected. These
are models that are commonly used by foresters in everyday management. In most cases
such models are:
• Height and taper curves
• Volume and form factor models
• Diameter, height, basal area and volume growth models
• Crown models (e.g. crown height, crown base, etc.)
• Different base models (e.g. diameter or height in age of base, etc.)
• Rule models (e.g. lowest basal area after thinning. etc.)
Because the database should be applicable only those models are entered that are in the
form of equations. Each model can have a total of four equations, of which three are
subequations and one is the main equation. For example, equations 1–4 belong to one
model and each formula gives a value that can be used in the subsequent formulas.
ipf 1 = c1 − 493 ⋅ ln( p6 + 1) + 1355 ⋅ p5
ipf 2 =
(2)
ipf 1
50
(1)
c2
ipf 3 = ( p 4 − ipf 2 ) 2 +
4 ⋅ ipf 1 ⋅ p4
(3)
c2
p2
p 4 + ipf 2 + ipf 3
op =
( 2 + 4 ⋅ ipf 1 ⋅ p3− c 2 ) /( p 4 − ipf 2 + ipf 3 )
(4)
Initially, the database has been created with a focus on Estonia; therefore, models have
been collected from Estonia, the Baltic region and neighbouring countries. A key criterion is
that the models should be applicable to tree species growing in Estonia.
Only those models are entered into the database that have at least:
• a formula;
• constants;
• a list of the tree species to which the model is applicable;
• a description of arguments.
If any of these items is missing the model is useless for the system.
Empirical datasets
The second part of the information system is empirical datasets.
For testing purposes sample plot data is used (Kiviste & Hordo 2002). This project was
initiated in 1995. Since then, 715 sample plots have been measured, of which 380 have been
re-measured.
The plots are circles holding a minimum of 100 trees each. On the plots, the polar
coordinates and the breast height diameters of all trees are measured. Additionally, the total
height and the crown length of selected sample trees are measured.
It is difficult to use such huge amounts of data for testing purposes. Instead, it is enough to
have only type series (e.g. a smoothed height curve in Fig. 1). A nonparametric approach is
used to create such type series. Nonparametric regression avoids problematic assumptions
on the nature of the true regression function other than that it is smooth (Brown & Heathcote
2002).
Figure 1. An example of a smoothed height curve
Results
Collected models
The database contains models designed for forest management, because initially we did not
want to cast too wide a net. We have chosen models of forest management because they
are most needed by foresters.
The database contains 171 models from 10 countries.
Estonia
Finland
Latvia 23
Germany
Lithuania
Sweden
Byelorussia
Russia
Austria
Norway
56
models
30
models
models
22
models
19
models
15
models
2
models
2
models
1
model
1
model
The database mostly contains models for calculation of volume, height, increment, site index,
different growth, etc. For volume calculation, the database comprises models for a single tree
and for a stand.
The database encompasses different types of growth models. Some of them are increment
models and some are predictive models. For prediction, the database includes models for
tree height, tree diameter, stand volume and stand basal area.
Output of the database
The database is created to simplify working with models. If someone needs a model or
further information on it, the homepage offers all the relevant information entered into the
database. There are ready-to-use models for the environments of FoxPro and Visual Basic.
For software developers a program has been created that generates user-defined functions
(UDF) directly from the database. It is simple to do because the information in the database
has been recorded in a systematic manner. A UDF contains comments, references, a
description of the arguments and the function. Currently, only UDFs suited to the FoxPro and
Visual Basic programming languages can be created dynamically through the web. The main
differences lie in the built-in function names. In FoxPro square root and natural logarithm are
designated as “SQRT” and “LOG”, respectively, whereas in Visual Basic the corresponding
commands are labelled “SQR’ and “LN‘”. It is simple to convert and create functions for these
two programming languages, because in FoxPro and Visual Basic a user does not have to
specify variables and their types. Most programming languages require the specification of
all variables prior to use.
R is a statistical software application that is good for model testing, because it incorporates
the package “fmdis” to simplify model analysis. R enables downloading files from the Internet.
The package contains a function dnld.model(), which downloads the requested model from
the database of forest management models. Using other functions, it is subsequently simple
to analyze and draw graphs for visualizing models (Fig. 2, a height curve model with tree
variable: mean stand height, mean stand diameter and subject tree diameter).
Figure 2. An example of a graphical output based on height curves
Model testing
As a result of the study a package is created for model analysis and visualisation. The
package helps to understand model behaviour. The verification of each model is based on
empirical data and helps to understand whether the model was entered correctly into the
database or if there were errors already in publication.
To compare models (Sims 2003):
residual standard error se is calculated using the formula:
1
n
se =
n
∑ ( ŷ
i
− y~i )2
i =1
(5)
where ŷ i is predicted value;
y~i
is empirical value;
ni is number of observations;
for every model, mean error s is calculated using the formula:
s=
1
n
n
∑ ( ŷ
i
− y~i )
i =1
(6)
a linear regression is performed using the formula:
( ŷ i − y~i ) = b0 + b1 ⋅ y i
(7)
where b0 and b1 are estimations;
and for intercept b0 and slope b1 probabilities are calculated to see whether
these parameter estimates are significantly different from zero.
The following terms are used in the equations:
ŷi
is predicted value;
~
y
i is empirical value;
b0 and b1 are estimations;
n is number of observations;
y~
is mean of empirical values;
ŷ is mean of predicted values;
se2 is residual standard error calculated using the formula:
( y~i − y i )2
se2 =
n−2
∑
(8)
where
y i = y~ + b1( ŷ i − ŷ )
(9)
To estimate intercept and slope together, it is not correct to use the regular test for
probabilities calculation. The present study uses the simultaneous F-test of J. B. Denti and M.
J. Blackie (1979). The method is based on the linear regression
( ŷ i − y~i ) = b0 + b1 ⋅ y i
(10)
A perfect fit for linear regression renders slope b1=1 and intercept b0=0. The null hypothesis
is b0=0 and b1=1. Simultaneous F-test for intercept and slope is used to check the hypothesis
for the F-statistic calculated using the formula
F=
( n − 2 )( nb02 + 2ny~b0 ( b1 − 1) +
∑ y~ ( b − 1)
2
i
1
2
)
2nse2
(11)
where
y i = y~ + b1( ŷ i − ŷ ) .
(12)
Probability for F-statistic has been calculated with function pf, where the parameters are
F
n
n-2
– F-statistic value;
– the number of degrees of freedom in the numerator;
– the number of degrees of freedom in the denominator.
Figure 3 presents how well a model fits type series for height curve.
Figure 3. An example of a test output based on height curve for pine
Conclusion
The study describes the structure and principles of an information system consisting of two
parts: a database of forest management models and empirical datasets. The information
system allows model testing and visualisation.
The database contains 171 models from 10 countries - Estonia, Finland, Latvia, Germany,
Lithuania, Sweden, Byelorussia, Russia, Austria and Norway.
To improve quality, procedures for model testing and visualisation have been created. The
initial test is for verification and checking whether the model has been entered correctly into
the database. To analyse functionality, the system has procedures for validating models
based on empirical data.
For testing purposes, sample plot data are used.
The information system is expected to be a useful tool for modellers as well as end-users of
models. Since the number of models in the system is not limited new models can continually
be added into the system.
Acknowledgments
This study was supported by the Estonian Science Foundation, Grant No. 7568.
Reference
Benz, J., Hoch, R. & Legovic, T. (2001), ‘Ecobas – modelling and documentation’,
Ecological Modelling 138, 3–15.
Benz, J. & Knorrenschild, M. (1997), ‘Call for a common model documentation etiquette’,
Ecological Modelling 97, 141–143.
Brown, S. & Heathcote, A. (2002), ‘On the use of nonparametric regression in assessing
parametric regression models’, Journal of Mathematical Psychology 46, 716–730. 9
Dalgaard, P. (2002), Introductory Statistics with R, Springer. ISBN 0-387-95475-9.
*http://www.biostat.ku.dk/ pd/ISwR.html
Dent, J. B. & Blackie, M. J. (1979), Systems Simulation in Agriculture, Applied Science,
London.
Hägglund, B. (1981), Forecasting growth and yield in established forests. An outline and
analysis of the outcome of subprogram within the Hugin project., Technical Report 31,
Swedish University of Agricultural Science.
Kiviste, A. (1994), ‘Kasvufunktsioonide andmebaas’, Teadustööde kogumik 173, 67–70.
Kiviste, A. & Hordo, M. (2002), ‘Eesti metsa kasvukäigu püsiproovitükkide võrgustik’,
Metsanduslikud Uurimused 37, 43–56.
Kiviste, A. & Hordo, M. (2003), The network of permanent sample plots for forest growth
modelling in Estonia, in ‘Research for Rural Development 2003’, Latvian Agricultural
University.
Nilson, A. (1992), ‘Mudelite baasi rollist eesti metsanduses’, Eesti Mets 2, 26:27.
Pretzsch, H., Biber, P. & Durský, J. (2002), ‘The single tree-based stand simulator Silva:
construction, application and evaluation’, Forest Ecology and Management 162, 3–21.
Rennolls, K., Ibrachim, M. & Smith, P. (2002), A forest models archive?, in ‘Forest
Biometry, Modelling and Information Systems’.
*http://cms1.gre.ac.uk/conferences/iufro/proceedings/RennIbrSmithFMA.pdf
Sims, A. (2001), Metsanduslike mudelite andmebaas, Bachelor’s thesis, Estonian
Agricultural University.
Sims, A. (2003), Metsanduslike mudelite andmebaas, Master’s thesis, Estonian Agricultural
University.
Sims, A. (2004), Online database of forestry models, in ‘Research for Rural Development
2004’, Latvian Agricultural University.
Vanclay, J. K. (1994), Modelling Forest Growth and Yield: Application to Mixed Tropical
Forests, CAB International.
Wallman, P., Sverdrup, H., Svensson, M. G. E. & Alveteg, M. (2002), Developing
Principles and Models for Sustainable Forestry in Sweden, Kluwer Academic Publisher,
chapter 5: Integrated modelling, pp. 57–83.
Download