Standard Forest Sampling Designs and Their Analysis Using TabGen

advertisement
This file was created by scanning the printed publication.
Errors identified by the software have been corrected;
however, some errors may remain.
Standard Forest Sampling Designs and
Their Analysis Using TabGen 1
Charles T. Scott2
Scott D. Klopfer 3
Abstract-Three standard sampling designs have been used for
forest monitoring through the years. Simple random sampling
typically is applied for small areas. Stratified random sampling
often is used in mid-scale assessments when the forest areas are
delineated on maps. Double sampling for stratification generally is
used only for extensive surveys when strata sizes must be estimated
on aerial photographs. Remote sensing (satellite imagery) can be
used for stratified random sampling at larger scales, potentially
reducing sampling error. The program TabGen was written in
Visual Basic 2.0 to analyze surveys using each of these designs.
TabGen reads files of survey data that have been expressed on a perhectare basis. The user then selects the row and column categorical
variables and the attribute of interest (continuous variable). Tables
of means, totals, and areas are produced, including 95% sampling
errors for each cell. Users can define multiple filters to control
what data are included/excluded from each table.
Most forms of forest monitoring are based on sampling
designs so that results are unbiased and of known precision.
The three most commonly used designs are simple random
sampling, stratified random sampling, and double sampling for stratification. The designs are listed in order of
increasing efficiency but also increasing complexity. However, each is designed to provide estimates of forest-resource
attributes and their precision.
These monitoring results generally are provided in the
form of one- and two-way tables. For example, a key table
might be change estimates for abundance by species and size
class. The statistical reports of the Forest Inventory and
Analysis (FIA) units of the USDA Forest Service are compilations of such tables. Although we regularly use such
tables, the forest survey and sampling literature describes
sampling designs and alternative estimators for a single
attribute of interest rather than for tables of them.
Software for generating these tables has not been widely
available. A FORTRAN program called FINSYS (Forest
Inventory System) was developed in the early 1960's and
was updated in the early 1980's (Born and Barnard 1983).
However, the batch processing mode was retained in the
revised version, so it has not been used widely. Survey
sampling software packages are available but have not been
well accepted by natural resource analysts largely because
Ipaper presented at the North American Science Symposium: Toward a
Unified Framework for Inventorying and Monitoring Forest Ecosystem
Resources, Guadalajara, Mexico, November 1-6,1998.
2Charles T. Scott is Project Leader, USDA Forest Service, Northeastern
Research Station, 359 Main Rd., Delaware, OH 43015-8640, U.S.A., Phone:
740-368-0101; Fax: 740-368-0152; e-mail: cscottlne_de@fs.fed.us
3Scott D. Klopfer is RS/GIS Coordinator, Virginia Polytechnic Institute
and State University, Fish and Wildlife Information Exchange located in
Blacksburg, VA. e-mail: sklopfe~.edu.
USDA Forest Service Proceedings RMRS-P-12. 1999
they focus on human population surveys. General statistical packages also are widely available but using them to
develop resource tables and their variances requires considerable statistical and programming knowledge. Spreadsheets can be used to generate tables for simple sampling
designs, but variance estimators are much more difficult;
thus, resource managers likely would use the data without
knowing the reliability of the estimates.
During a study with Mead Paper Corp. that combined
monitoring with geographic information systems (GIS), the
need for an interactive forest-resource analysis program
became clear. The program TabGen was written to meet
that need. In this paper, we describe TabGen and discuss
the estimation procedures it uses.
Estimation Methods
Typically, estimators are derived for a single attribute of
interest, such as total biomass of a forest. When estimating
tables, one may be interested in estimating biomass by
species group and diameter class. The estimation process
and the attribute of interest are the same in each cell but
different conditions are placed on the attribute of interest in
each table cell. For example, if a tree is not of the first species
and not in the first diameter class, its biomass is not summed
into the first table cell (Cochran 1977, p. 142-144).
With simple random sampling, plots (sampling units) are
located randomly across the population (sampling frame).
The simple average of the plot values is computed for each
cell along with the simple variance. The sample size is the
same for each cell because each plot contributes to the
estimate even if the only information is that the plot does
not belong to the cell. This is a common source ofmisunderstanding (Scott 1999).
For stratified random sampling, the population is divided
into homogeneous areas of known size. Each stratum in
treated as a simple random sample, so the strata means and
variances are computed in the same manner. To estimate
the overall mean, each strata mean is multiplied by its
stratum weight (proportionate area in each stratum) and
summed. The variance of the overall mean is computed as
the sum of each strata variance multiplied by the square of
its stratum weight (Cochran 1977).
Double sampling for stratification is similar except that
stratum weights are estimated rather than known. In forestry applications, the weights typically are estimated using
a large sample of points on aerial photographs (Bickford
1952). Each point is classified into a stratum and weights
are estimated as the proportion of photo points falling in
each stratum. The estimation process is the same as for
stratified random sampling except that a variance term is
87
added to account for the errors in the stratum weights
(Cochran 1977). The formulas for all three designs for the
estimation of two-way tables are given in Scott (1999). When
stratum weights are known, the double sampling for stratification estimators simplify to the stratified random sampling estimators, which simplify to the simple random sampling estimators when there is only one stratum.
Program TabGen _ _ _ _ _ __
TabGen produces two-way tables for any of the three
sampling designs described. Written in Visual Basic 2.0, the
program allows the user to create tables in a point-and-click
environment. The current version of TabGen is described
here, though work has begun on a more general version
that will handle FIA data.
Program Inputs
Currently, TabGen reads a flat file for each of the following: plot (sampling unit) data, site-index trees, regeneration,
overstory trees, and fields common to both regeneration
and overs tory trees. Other files can be substituted, though a
hierarchy of one plot file and one or more files with multiple
observations per plot is assumed. Each file has a header
record containing a label for each data field. Each data
record is comma delimited and has the plot number, observed data, and any calculated data, e.g., trees per hectare,
biomass per tree, and diversity index. TabGen reads a
variable library (dictionary) that describes the variables
that are read from each file, whether they are continuous or
categorical, and, if the latter, the labels for each category.
The names of any per-unit-area fields must start with a "#"
symbol, so that TabGen knows which field to multiply all the
other fields on that record by to put them on a per-hectare
basis.
TabGen also reads a control file that contains the stratum
weights, the total area in the population, and a list of plots
that will be included in the analysis. This list allows the
analyst to include only the subset of plots of interest. GIS
software can be used independently to select a subarea of
interest, compute the areas by stratum and identify the plots
within, and then create the control file with zeros following
the plot numbers for those plots outside the area of interest.
All files are assumed to be in the same directory as the
control file.
Program Execution
Once the input files are created, the first step in running
TabGen is to select the control file. The next step is to select
Figure 1.-Form to select table variables, to create and edit new ones,
and to add filters.
the variables to be presented in a table. TabGen reads the
variable library and presents the form in Figure 1. When a
file type is selected, the variable list is presented. The user
selects two categorical attributes for row and columns variables, and then a continuous attribute for the body of the
table. The design of the table is shown at the bottom right.
Once the table is setup, the Generate Tables button is hit,
and one (plot) or two files (plot and one other) are read to
generate the tables (Fig. 2). The results can be viewed as:
1. The percent or total area in each table cell.
2. The mean or total of the attribute of interest in each
cell.
3. The ratio of the attribute total to the area estimate for
each cell.
4. The mean of individual observations (generally on a
per-tree basis).
5. The number of plots "falling" in each cell.
The 95% sampling errors (confidence limits) are computed
for each cell using the estimators for the design indicated in
Figure 1 (simple random sampling, stratified random sampling, or double sampling for stratification). The user hits a
button to display the sampling errors for the table. Hitting
Figure 2.-Mean basal area (ft2/acre) by species group and size class.
88
USDA Forest Service Proceedings RMRS-P-12. 1999
the same button switches back to the estimates, so it is easy
for resource managers to determine the reliability of the
estimates.
Program Options
TabGen has several additional features that allow the
user to customize how the data are estimated and presented.
The Modify Values function allows the user to rescale table
values by dividing each by 1,000 to put the values on a perthousand basis. The user also can input a value to multiply
or divide by to convert values to dollars, for example.
The user also can create categorical variables from continuous ones by assigning ranges to categories, such as
converting diameter to 5-cm classes. Existing categorical
variables can be transformed into new variables by collapsing the full set of categories into a smaller set by combining
classes, for example, when categories have too few values to
stand alone in a table. Any created variable can be edited
later, e.g., change class labels.
In small populations, some strata may have insufficient
numbers of plots to stand alone. TabGen allows the user to
collapse the strata in any fashion. To aid in choosing which
strata to collapse, stratum weights and sample sizes are
displayed for original and collapsed strata.
TabGen's powerful filtering feature allows users to create
tables that meet their needs. Filters can be defined to
exclude observations with specified values of one or more
attributes to be excluded from a table. The user selects the
variable, defines ranges of its values, then checks which
ranges will be used in estimates. For example, the user can
select Species and then define two ranges-one for softwoods
and one for hardwoods. A filter labeled "Softwood Only"
would have only the first range checked. A second filter
labeled "Hardwood Only" would have the second range
checked. Multiple filters can be created for each variable
(Fig. 3). Filters can be used to create a series of two-way
tables as a way offorming three-way tables, such as one table
for poletimber, another for sawtimber, and one with both.
Filters also can be used in combination, for example, applying the "Softwood Only" and "Sawtimber Only" filters simultaneously to yield a table containing only softwood sawtimber trees. Filters can also. be used to ignore plots with
missing or suspect values, such as.computing average site
index only on those plots where site index was observed.
Filters give users tremendous power to create tables that
meet their needs.
The primary output ofTabGen is the on-screen displays of
the tables and their sampling errors, but any table can be
printed to the default printer. Tables also can be sent to a
comma-delimited text file suitable for importing into spreadsheets or word processors for additional formatting or for
generating graphics. TabGen can generate a plot summary
file for the current attribute. The summary file can be used
in other analyses apart from TabGen or as an additional
input field by attaching it to the input plot file.
Summary
TabGen generates two-way tables using the correct estimators and produces 95% sampling errors for each cell in the
tables. Written in Visual Basic 2.0, the current version
allows users to create tables using all available data in a
point-and-click environment. TabGen has been tested on
Windows 9+ and NT. Tables can be refined through the use
of variable editing and filtering features. The program also
works in conjunction with a GIS to produce tables for
subareas ofthe population. Thus, it should prove a powerful
tool for quickly and easily exploring monitoring results.
Although written for a specific study with Mead Paper
Corp., TabGen is public domain software and can be modified for other applications. Copies of TabGen and sample
data sets are available from the author.
Acknowledgments
The development of TabGen was supported by Mead
Paper Corp. through a cooperative agreement with the
USDA Forest Service, Northeastern Research Station. We
thank our reviewers: Tom Frieswyk, Pat Miles, and Larry
Royer, all with the Forest Service.
Literature Cited
Figure 3.-Form to create, edit, delete, and apply filters for a variable.
The Sawtimber Only filters for Tree Size is displayed.
USDA Forest Service Proceedings RMRS-P-12. 1999
Bickford, C. Allen. 1952. The sampling design used in the forest
survey of the Northeast. Journal of Forestry 50(4):290-293.
Born, David J.; Barnard, Joseph E. 1983. FINSYS-2: Subsystem
TABLE-2 and OUTPUT-2. General Technical Report NE-84.
Broomall, PA: U.S. Department of Agriculture, Forest Service,
Northeastern Forest Experiment Station. 133 p.
Cochran, William G. 1977. Sampling Techniques. John Wiley &
Sons, New York. 428 p.
Scott, Charles T. 1999. Estimating two-way tables based on forest
surveys. In Hansen, Mark H.; Burk, Thomas E., eds. Integrated
tools for natural resources inventories in the 21st century - an
international conference on the inventory and monitoring of
forested ecosystems. Gen. Tech. Rep. NC-. St. Paul, MN: U.S.
Department of Agriculture, Forest Service, North Central Research Station. In press.
89
Download