Information System for Forest Management Models and Datasets Allan Sims, Andres Kiviste, Artur Nilson, Maris Hordo Institute of Forestry and Rural Engineering, Estonian Agricultural University Kreutzwaldi 5, 51014 Tartu, Estonia Abstract The aim of the database of forest management models is to standardize the form of forest management models and to simplify working with them in application software. The database contains 12 tables. There are tables for formulas, constants, authors, papers, variables, etc. The application allows the testing of models. There are two different ways to test, one where a user can give some values to arguments and the results are drawn as a chart, and another where the results are calculated based on test data. The first one can be used with all models. It is the initial test to see if model constants and arguments are entered correctly. The second type of test calculates results and presents them in a table using the test data. The database currently includes 171 forest measurement models from 10 countries. The database is webbased and freely available at http://www.eau.ee/~mbaas/. Introduction Scientific activity as a whole involves creating, applying or checking models. In a forestry information system it is now possible to use information on forestry in the form of models. Thus, much forestry research produces models (Nilson 1992). A huge variety of models were created to solve some narrow problem but were not important to publish. Thus, many models have no published documentation. Just as often, in far too many cases model documentation leaves a lot to be desired. And worse still, there are a number of models for which no documentation exists at all, with the effect that published research results based on these models are not verifiable (Benz & Knorrenschild 1997). Considering the overwhelming flood of publications on forest models, their haphazard availability and complicated presentation, the authors found it necessary to create a handy collection of existing models. The best environment for such a collection is a standard database. The system should provide the possibility to administer the data and retain it in the course of work. To avoid reinventing the wheel when someone cannot find a model he or she needs it is necessary to collect the models we already have (Nilson 1992). The creation of a database forces one to systematize and to critically check forestry models. As a result, it is possible to use current information more efficiently and to clarify important subjects for research (Kiviste 1994). A wide range of forest models has been developed over the last four or more decades, during which concerted efforts have been made at forest modelling. The need for the conservation and sharing of these models would seem to be clear (Rennolls et al. 2002). Nowadays, most forest management decisions are based on models. Models help to understand forest behaviour. At the Estonian State Forest Management Centre, a computerbased decision support system is being developed. Both adequate forest stand descriptions and stand growth and structure models are needed for the effective use of the system (Kiviste & Hordo 2003). According to forest literature from the last decades (Hägglund 1981, Pretzsch et al. 2002), traditional growth and yield tables and equations approximating them are being replaced with more sophisticated stand growth simulators. It is therefore possible to simulate different scenarios and to choose the best sustainable forest management regimes. Similar ideas are also widely developed, but unfortunately not as much as we would expect. They were started some years ago at the University of Kassel, Germany, with “The Register of Ecological Models” (Benz et al. 2001), and at the University of Greenwich, England, with “The Forest Model Archive” (Rennolls et al. 2002). The objective of the present study is to create a system for collecting and testing forest management models. The information system consists of two parts: One is the database of forest management models and the other consists of empirical datasets. For testing purposes the models are verified and validated. The aim of the system is to systematize and structure information we have about models. Models are constantly improved; hence, adding models to the system is a continual activity. The goal is to create an online database of forest management models (http://www.eau.ee/~mbaas/) where everyone can enter new models and comment on models already entered. After creating to entering into the database, each can be repeatedly rewritten, and each time it is possible to make mistakes. To avoid them the system should be able to apply models directly from the database. If models are applicable directly it is simple to test them and improve the quality of the database. Material and methods Model A model is an abstraction, or a simplified representation, of some aspect of reality. Models may be expressed in verbal (e.g. a description) or material forms (e.g. a scale model). A mathematical model is like a verbal model but uses mathematical language, which is more concise and less ambiguous than natural language (Vanclay 1994). A mathematical forestry model is intended to predict tree or forest development by the use of empirical mathematical equations. Basically, models serve one or both of the following purposes: • Testing the overall understanding one has of a system, on the basis of a mathematical representation of its subsystems and the proposed coupling of its subsystems; • Predicting the future, to explain how and why things have worked in the past (Wallman et al. 2002). A model may be considered a whole-stand model, a size class model, a single-tree model or a landscape-level model. In whole-stand models no details of the individual trees in the stand are determined. Single-tree models use the individual tree as the basic unit of modelling. Size class models provide some information regarding the structure of the stand. Landscape and ecosystem models take into account not only trees but also other elements of vegetation (Vanclay 1994, Wallman et al. 2002). A model may also be used for understanding or for prediction. The basic form of purely predictive models is probably multiple regression; models for understanding are different and seek to improve our ability to explain how nature works. Principles of the database The first part of the information system is a database of forest management models. The database is based on the following principles (Sims 2001): • The names of tables and fields are in English to simplify international distribution. • The names of tables and fields are up to 8 characters long to avoid complications with applications that do not allow longer names. • IDs are used in fields containing language-specific terms, and every such code is explained in a separate table. Thus it is simple to translate model information into many languages. • Every model has to provide enough information about the author(s) and the publications in which it has been presented. In this way it is simple to find more information about a model. • The database should be applicable to simplify the verification and subsequent use of a model. • Models and metadata have to be entered so it is simple to translate them into many languages for ease of international distribution. • Models have to contain enough information to understand their purpose and simplify querying from the database. The structure of the database of forest management models The database contains 12 tables (Sims 2001, Sims 2003). Four main tables contain information on models: • Formulas (a table: formmain) – contains formulas plus general information, e.g. a country, comment, type of model (static, growth, etc.) • Argument information (a table: argsinfo) – contains descriptions of arguments, e.g. a name of the argument, a unit, etc. • Constants (a table: constans) – contains constants for every tree species separately. • List of tree species (a table: formulas) – contains a list of tree species for which a model is created. • The adequacy of species and constants (a table: casetbl2) - a model has one formula but different constants for different species. This table connects tables for formulas with constants by species. Additional tables contain metadata: • List of publications in which models are published (tables: bibllist, mpaper) - one publication can have many models; therefore, publications have their own table and each model is referred to by its ID only. • Action and time (tables:actdescr, action) - it is necessary to also document every action and the time it is taken. The different actions are model creation, first entry of model, subsequent entries, etc. • List of authors (a table: author) - information about authors. • List of codes (a table: codifinf) – the list of different codes used to describe arguments. • List of tree species (a table: species) – the names of tree species in Estonian, English and Latin. Tools At the web-server the data are stored in a MySQL database server. The programming language is PHP. For each model it is possible to produce an output, in html-format for information and in plain text format for user-defined functions (Sims 2004). Model verification and validation is performed using the program R for statistical computing (Dalgaard 2002). To simplify model analysis a package is created for the environment of R. The package is downloadable from the database homepage (http://www.eau.ee/~mbaas/). Restrictions on collected models Thousands of models exist already and their structures vary. Putting all models into the database will make it overly complicated. Thus, the database has some restrictions on which models are allowed to be entered. As the name of the database refers, only forest management models are collected. These are models that are commonly used by foresters in everyday management. In most cases such models are: • Height and taper curves • Volume and form factor models • Diameter, height, basal area and volume growth models • Crown models (e.g. crown height, crown base, etc.) • Different base models (e.g. diameter or height in age of base, etc.) • Rule models (e.g. lowest basal area after thinning. etc.) Because the database should be applicable only those models are entered that are in the form of equations. Each model can have a total of four equations, of which three are subequations and one is the main equation. For example, equations 1–4 belong to one model and each formula gives a value that can be used in the subsequent formulas. ipf 1 = c1 − 493 ⋅ ln( p6 + 1) + 1355 ⋅ p5 ipf 2 = (2) ipf 1 50 (1) c2 ipf 3 = ( p 4 − ipf 2 ) 2 + 4 ⋅ ipf 1 ⋅ p4 (3) c2 p2 p 4 + ipf 2 + ipf 3 op = ( 2 + 4 ⋅ ipf 1 ⋅ p3− c 2 ) /( p 4 − ipf 2 + ipf 3 ) (4) Initially, the database has been created with a focus on Estonia; therefore, models have been collected from Estonia, the Baltic region and neighbouring countries. A key criterion is that the models should be applicable to tree species growing in Estonia. Only those models are entered into the database that have at least: • a formula; • constants; • a list of the tree species to which the model is applicable; • a description of arguments. If any of these items is missing the model is useless for the system. Empirical datasets The second part of the information system is empirical datasets. For testing purposes sample plot data is used (Kiviste & Hordo 2002). This project was initiated in 1995. Since then, 715 sample plots have been measured, of which 380 have been re-measured. The plots are circles holding a minimum of 100 trees each. On the plots, the polar coordinates and the breast height diameters of all trees are measured. Additionally, the total height and the crown length of selected sample trees are measured. It is difficult to use such huge amounts of data for testing purposes. Instead, it is enough to have only type series (e.g. a smoothed height curve in Fig. 1). A nonparametric approach is used to create such type series. Nonparametric regression avoids problematic assumptions on the nature of the true regression function other than that it is smooth (Brown & Heathcote 2002). Figure 1. An example of a smoothed height curve Results Collected models The database contains models designed for forest management, because initially we did not want to cast too wide a net. We have chosen models of forest management because they are most needed by foresters. The database contains 171 models from 10 countries. Estonia Finland Latvia 23 Germany Lithuania Sweden Byelorussia Russia Austria Norway 56 models 30 models models 22 models 19 models 15 models 2 models 2 models 1 model 1 model The database mostly contains models for calculation of volume, height, increment, site index, different growth, etc. For volume calculation, the database comprises models for a single tree and for a stand. The database encompasses different types of growth models. Some of them are increment models and some are predictive models. For prediction, the database includes models for tree height, tree diameter, stand volume and stand basal area. Output of the database The database is created to simplify working with models. If someone needs a model or further information on it, the homepage offers all the relevant information entered into the database. There are ready-to-use models for the environments of FoxPro and Visual Basic. For software developers a program has been created that generates user-defined functions (UDF) directly from the database. It is simple to do because the information in the database has been recorded in a systematic manner. A UDF contains comments, references, a description of the arguments and the function. Currently, only UDFs suited to the FoxPro and Visual Basic programming languages can be created dynamically through the web. The main differences lie in the built-in function names. In FoxPro square root and natural logarithm are designated as “SQRT” and “LOG”, respectively, whereas in Visual Basic the corresponding commands are labelled “SQR’ and “LN‘”. It is simple to convert and create functions for these two programming languages, because in FoxPro and Visual Basic a user does not have to specify variables and their types. Most programming languages require the specification of all variables prior to use. R is a statistical software application that is good for model testing, because it incorporates the package “fmdis” to simplify model analysis. R enables downloading files from the Internet. The package contains a function dnld.model(), which downloads the requested model from the database of forest management models. Using other functions, it is subsequently simple to analyze and draw graphs for visualizing models (Fig. 2, a height curve model with tree variable: mean stand height, mean stand diameter and subject tree diameter). Figure 2. An example of a graphical output based on height curves Model testing As a result of the study a package is created for model analysis and visualisation. The package helps to understand model behaviour. The verification of each model is based on empirical data and helps to understand whether the model was entered correctly into the database or if there were errors already in publication. To compare models (Sims 2003): residual standard error se is calculated using the formula: 1 n se = n ∑ ( ŷ i − y~i )2 i =1 (5) where ŷ i is predicted value; y~i is empirical value; ni is number of observations; for every model, mean error s is calculated using the formula: s= 1 n n ∑ ( ŷ i − y~i ) i =1 (6) a linear regression is performed using the formula: ( ŷ i − y~i ) = b0 + b1 ⋅ y i (7) where b0 and b1 are estimations; and for intercept b0 and slope b1 probabilities are calculated to see whether these parameter estimates are significantly different from zero. The following terms are used in the equations: ŷi is predicted value; ~ y i is empirical value; b0 and b1 are estimations; n is number of observations; y~ is mean of empirical values; ŷ is mean of predicted values; se2 is residual standard error calculated using the formula: ( y~i − y i )2 se2 = n−2 ∑ (8) where y i = y~ + b1( ŷ i − ŷ ) (9) To estimate intercept and slope together, it is not correct to use the regular test for probabilities calculation. The present study uses the simultaneous F-test of J. B. Denti and M. J. Blackie (1979). The method is based on the linear regression ( ŷ i − y~i ) = b0 + b1 ⋅ y i (10) A perfect fit for linear regression renders slope b1=1 and intercept b0=0. The null hypothesis is b0=0 and b1=1. Simultaneous F-test for intercept and slope is used to check the hypothesis for the F-statistic calculated using the formula F= ( n − 2 )( nb02 + 2ny~b0 ( b1 − 1) + ∑ y~ ( b − 1) 2 i 1 2 ) 2nse2 (11) where y i = y~ + b1( ŷ i − ŷ ) . (12) Probability for F-statistic has been calculated with function pf, where the parameters are F n n-2 – F-statistic value; – the number of degrees of freedom in the numerator; – the number of degrees of freedom in the denominator. Figure 3 presents how well a model fits type series for height curve. Figure 3. An example of a test output based on height curve for pine Conclusion The study describes the structure and principles of an information system consisting of two parts: a database of forest management models and empirical datasets. The information system allows model testing and visualisation. The database contains 171 models from 10 countries - Estonia, Finland, Latvia, Germany, Lithuania, Sweden, Byelorussia, Russia, Austria and Norway. To improve quality, procedures for model testing and visualisation have been created. The initial test is for verification and checking whether the model has been entered correctly into the database. To analyse functionality, the system has procedures for validating models based on empirical data. For testing purposes, sample plot data are used. The information system is expected to be a useful tool for modellers as well as end-users of models. Since the number of models in the system is not limited new models can continually be added into the system. Acknowledgments This study was supported by the Estonian Science Foundation, Grant No. 7568. Reference Benz, J., Hoch, R. & Legovic, T. (2001), ‘Ecobas – modelling and documentation’, Ecological Modelling 138, 3–15. Benz, J. & Knorrenschild, M. (1997), ‘Call for a common model documentation etiquette’, Ecological Modelling 97, 141–143. Brown, S. & Heathcote, A. (2002), ‘On the use of nonparametric regression in assessing parametric regression models’, Journal of Mathematical Psychology 46, 716–730. 9 Dalgaard, P. (2002), Introductory Statistics with R, Springer. ISBN 0-387-95475-9. *http://www.biostat.ku.dk/ pd/ISwR.html Dent, J. B. & Blackie, M. J. (1979), Systems Simulation in Agriculture, Applied Science, London. Hägglund, B. (1981), Forecasting growth and yield in established forests. An outline and analysis of the outcome of subprogram within the Hugin project., Technical Report 31, Swedish University of Agricultural Science. Kiviste, A. (1994), ‘Kasvufunktsioonide andmebaas’, Teadustööde kogumik 173, 67–70. Kiviste, A. & Hordo, M. (2002), ‘Eesti metsa kasvukäigu püsiproovitükkide võrgustik’, Metsanduslikud Uurimused 37, 43–56. Kiviste, A. & Hordo, M. (2003), The network of permanent sample plots for forest growth modelling in Estonia, in ‘Research for Rural Development 2003’, Latvian Agricultural University. Nilson, A. (1992), ‘Mudelite baasi rollist eesti metsanduses’, Eesti Mets 2, 26:27. Pretzsch, H., Biber, P. & Durský, J. (2002), ‘The single tree-based stand simulator Silva: construction, application and evaluation’, Forest Ecology and Management 162, 3–21. Rennolls, K., Ibrachim, M. & Smith, P. (2002), A forest models archive?, in ‘Forest Biometry, Modelling and Information Systems’. *http://cms1.gre.ac.uk/conferences/iufro/proceedings/RennIbrSmithFMA.pdf Sims, A. (2001), Metsanduslike mudelite andmebaas, Bachelor’s thesis, Estonian Agricultural University. Sims, A. (2003), Metsanduslike mudelite andmebaas, Master’s thesis, Estonian Agricultural University. Sims, A. (2004), Online database of forestry models, in ‘Research for Rural Development 2004’, Latvian Agricultural University. Vanclay, J. K. (1994), Modelling Forest Growth and Yield: Application to Mixed Tropical Forests, CAB International. Wallman, P., Sverdrup, H., Svensson, M. G. E. & Alveteg, M. (2002), Developing Principles and Models for Sustainable Forestry in Sweden, Kluwer Academic Publisher, chapter 5: Integrated modelling, pp. 57–83.