This file was created by scanning the printed publication. Errors identified by the software have been corrected; however, some errors may remain. Standard Forest Sampling Designs and Their Analysis Using TabGen 1 Charles T. Scott2 Scott D. Klopfer 3 Abstract-Three standard sampling designs have been used for forest monitoring through the years. Simple random sampling typically is applied for small areas. Stratified random sampling often is used in mid-scale assessments when the forest areas are delineated on maps. Double sampling for stratification generally is used only for extensive surveys when strata sizes must be estimated on aerial photographs. Remote sensing (satellite imagery) can be used for stratified random sampling at larger scales, potentially reducing sampling error. The program TabGen was written in Visual Basic 2.0 to analyze surveys using each of these designs. TabGen reads files of survey data that have been expressed on a perhectare basis. The user then selects the row and column categorical variables and the attribute of interest (continuous variable). Tables of means, totals, and areas are produced, including 95% sampling errors for each cell. Users can define multiple filters to control what data are included/excluded from each table. Most forms of forest monitoring are based on sampling designs so that results are unbiased and of known precision. The three most commonly used designs are simple random sampling, stratified random sampling, and double sampling for stratification. The designs are listed in order of increasing efficiency but also increasing complexity. However, each is designed to provide estimates of forest-resource attributes and their precision. These monitoring results generally are provided in the form of one- and two-way tables. For example, a key table might be change estimates for abundance by species and size class. The statistical reports of the Forest Inventory and Analysis (FIA) units of the USDA Forest Service are compilations of such tables. Although we regularly use such tables, the forest survey and sampling literature describes sampling designs and alternative estimators for a single attribute of interest rather than for tables of them. Software for generating these tables has not been widely available. A FORTRAN program called FINSYS (Forest Inventory System) was developed in the early 1960's and was updated in the early 1980's (Born and Barnard 1983). However, the batch processing mode was retained in the revised version, so it has not been used widely. Survey sampling software packages are available but have not been well accepted by natural resource analysts largely because Ipaper presented at the North American Science Symposium: Toward a Unified Framework for Inventorying and Monitoring Forest Ecosystem Resources, Guadalajara, Mexico, November 1-6,1998. 2Charles T. Scott is Project Leader, USDA Forest Service, Northeastern Research Station, 359 Main Rd., Delaware, OH 43015-8640, U.S.A., Phone: 740-368-0101; Fax: 740-368-0152; e-mail: cscottlne_de@fs.fed.us 3Scott D. Klopfer is RS/GIS Coordinator, Virginia Polytechnic Institute and State University, Fish and Wildlife Information Exchange located in Blacksburg, VA. e-mail: sklopfe~.edu. USDA Forest Service Proceedings RMRS-P-12. 1999 they focus on human population surveys. General statistical packages also are widely available but using them to develop resource tables and their variances requires considerable statistical and programming knowledge. Spreadsheets can be used to generate tables for simple sampling designs, but variance estimators are much more difficult; thus, resource managers likely would use the data without knowing the reliability of the estimates. During a study with Mead Paper Corp. that combined monitoring with geographic information systems (GIS), the need for an interactive forest-resource analysis program became clear. The program TabGen was written to meet that need. In this paper, we describe TabGen and discuss the estimation procedures it uses. Estimation Methods Typically, estimators are derived for a single attribute of interest, such as total biomass of a forest. When estimating tables, one may be interested in estimating biomass by species group and diameter class. The estimation process and the attribute of interest are the same in each cell but different conditions are placed on the attribute of interest in each table cell. For example, if a tree is not of the first species and not in the first diameter class, its biomass is not summed into the first table cell (Cochran 1977, p. 142-144). With simple random sampling, plots (sampling units) are located randomly across the population (sampling frame). The simple average of the plot values is computed for each cell along with the simple variance. The sample size is the same for each cell because each plot contributes to the estimate even if the only information is that the plot does not belong to the cell. This is a common source ofmisunderstanding (Scott 1999). For stratified random sampling, the population is divided into homogeneous areas of known size. Each stratum in treated as a simple random sample, so the strata means and variances are computed in the same manner. To estimate the overall mean, each strata mean is multiplied by its stratum weight (proportionate area in each stratum) and summed. The variance of the overall mean is computed as the sum of each strata variance multiplied by the square of its stratum weight (Cochran 1977). Double sampling for stratification is similar except that stratum weights are estimated rather than known. In forestry applications, the weights typically are estimated using a large sample of points on aerial photographs (Bickford 1952). Each point is classified into a stratum and weights are estimated as the proportion of photo points falling in each stratum. The estimation process is the same as for stratified random sampling except that a variance term is 87 added to account for the errors in the stratum weights (Cochran 1977). The formulas for all three designs for the estimation of two-way tables are given in Scott (1999). When stratum weights are known, the double sampling for stratification estimators simplify to the stratified random sampling estimators, which simplify to the simple random sampling estimators when there is only one stratum. Program TabGen _ _ _ _ _ __ TabGen produces two-way tables for any of the three sampling designs described. Written in Visual Basic 2.0, the program allows the user to create tables in a point-and-click environment. The current version of TabGen is described here, though work has begun on a more general version that will handle FIA data. Program Inputs Currently, TabGen reads a flat file for each of the following: plot (sampling unit) data, site-index trees, regeneration, overstory trees, and fields common to both regeneration and overs tory trees. Other files can be substituted, though a hierarchy of one plot file and one or more files with multiple observations per plot is assumed. Each file has a header record containing a label for each data field. Each data record is comma delimited and has the plot number, observed data, and any calculated data, e.g., trees per hectare, biomass per tree, and diversity index. TabGen reads a variable library (dictionary) that describes the variables that are read from each file, whether they are continuous or categorical, and, if the latter, the labels for each category. The names of any per-unit-area fields must start with a "#" symbol, so that TabGen knows which field to multiply all the other fields on that record by to put them on a per-hectare basis. TabGen also reads a control file that contains the stratum weights, the total area in the population, and a list of plots that will be included in the analysis. This list allows the analyst to include only the subset of plots of interest. GIS software can be used independently to select a subarea of interest, compute the areas by stratum and identify the plots within, and then create the control file with zeros following the plot numbers for those plots outside the area of interest. All files are assumed to be in the same directory as the control file. Program Execution Once the input files are created, the first step in running TabGen is to select the control file. The next step is to select Figure 1.-Form to select table variables, to create and edit new ones, and to add filters. the variables to be presented in a table. TabGen reads the variable library and presents the form in Figure 1. When a file type is selected, the variable list is presented. The user selects two categorical attributes for row and columns variables, and then a continuous attribute for the body of the table. The design of the table is shown at the bottom right. Once the table is setup, the Generate Tables button is hit, and one (plot) or two files (plot and one other) are read to generate the tables (Fig. 2). The results can be viewed as: 1. The percent or total area in each table cell. 2. The mean or total of the attribute of interest in each cell. 3. The ratio of the attribute total to the area estimate for each cell. 4. The mean of individual observations (generally on a per-tree basis). 5. The number of plots "falling" in each cell. The 95% sampling errors (confidence limits) are computed for each cell using the estimators for the design indicated in Figure 1 (simple random sampling, stratified random sampling, or double sampling for stratification). The user hits a button to display the sampling errors for the table. Hitting Figure 2.-Mean basal area (ft2/acre) by species group and size class. 88 USDA Forest Service Proceedings RMRS-P-12. 1999 the same button switches back to the estimates, so it is easy for resource managers to determine the reliability of the estimates. Program Options TabGen has several additional features that allow the user to customize how the data are estimated and presented. The Modify Values function allows the user to rescale table values by dividing each by 1,000 to put the values on a perthousand basis. The user also can input a value to multiply or divide by to convert values to dollars, for example. The user also can create categorical variables from continuous ones by assigning ranges to categories, such as converting diameter to 5-cm classes. Existing categorical variables can be transformed into new variables by collapsing the full set of categories into a smaller set by combining classes, for example, when categories have too few values to stand alone in a table. Any created variable can be edited later, e.g., change class labels. In small populations, some strata may have insufficient numbers of plots to stand alone. TabGen allows the user to collapse the strata in any fashion. To aid in choosing which strata to collapse, stratum weights and sample sizes are displayed for original and collapsed strata. TabGen's powerful filtering feature allows users to create tables that meet their needs. Filters can be defined to exclude observations with specified values of one or more attributes to be excluded from a table. The user selects the variable, defines ranges of its values, then checks which ranges will be used in estimates. For example, the user can select Species and then define two ranges-one for softwoods and one for hardwoods. A filter labeled "Softwood Only" would have only the first range checked. A second filter labeled "Hardwood Only" would have the second range checked. Multiple filters can be created for each variable (Fig. 3). Filters can be used to create a series of two-way tables as a way offorming three-way tables, such as one table for poletimber, another for sawtimber, and one with both. Filters also can be used in combination, for example, applying the "Softwood Only" and "Sawtimber Only" filters simultaneously to yield a table containing only softwood sawtimber trees. Filters can also. be used to ignore plots with missing or suspect values, such as.computing average site index only on those plots where site index was observed. Filters give users tremendous power to create tables that meet their needs. The primary output ofTabGen is the on-screen displays of the tables and their sampling errors, but any table can be printed to the default printer. Tables also can be sent to a comma-delimited text file suitable for importing into spreadsheets or word processors for additional formatting or for generating graphics. TabGen can generate a plot summary file for the current attribute. The summary file can be used in other analyses apart from TabGen or as an additional input field by attaching it to the input plot file. Summary TabGen generates two-way tables using the correct estimators and produces 95% sampling errors for each cell in the tables. Written in Visual Basic 2.0, the current version allows users to create tables using all available data in a point-and-click environment. TabGen has been tested on Windows 9+ and NT. Tables can be refined through the use of variable editing and filtering features. The program also works in conjunction with a GIS to produce tables for subareas ofthe population. Thus, it should prove a powerful tool for quickly and easily exploring monitoring results. Although written for a specific study with Mead Paper Corp., TabGen is public domain software and can be modified for other applications. Copies of TabGen and sample data sets are available from the author. Acknowledgments The development of TabGen was supported by Mead Paper Corp. through a cooperative agreement with the USDA Forest Service, Northeastern Research Station. We thank our reviewers: Tom Frieswyk, Pat Miles, and Larry Royer, all with the Forest Service. Literature Cited Figure 3.-Form to create, edit, delete, and apply filters for a variable. The Sawtimber Only filters for Tree Size is displayed. USDA Forest Service Proceedings RMRS-P-12. 1999 Bickford, C. Allen. 1952. The sampling design used in the forest survey of the Northeast. Journal of Forestry 50(4):290-293. Born, David J.; Barnard, Joseph E. 1983. FINSYS-2: Subsystem TABLE-2 and OUTPUT-2. General Technical Report NE-84. Broomall, PA: U.S. Department of Agriculture, Forest Service, Northeastern Forest Experiment Station. 133 p. Cochran, William G. 1977. Sampling Techniques. John Wiley & Sons, New York. 428 p. Scott, Charles T. 1999. Estimating two-way tables based on forest surveys. In Hansen, Mark H.; Burk, Thomas E., eds. Integrated tools for natural resources inventories in the 21st century - an international conference on the inventory and monitoring of forested ecosystems. Gen. Tech. Rep. NC-. St. Paul, MN: U.S. Department of Agriculture, Forest Service, North Central Research Station. In press. 89