Statistics

advertisement
STATISTICS
Stratification
STRATIFICATION
the process of dividing members of the
population into homogeneous subgroups
before sampling
In general, stratification is used to gain
efficiency. If variability is primarily between
strata rather than within strata, it could mean
a smaller number of samples need to be
taken.
BIMODAL DISTRIBUTIONS
Great case for stratification!
STRATIFICATION STRATEGIES
 Proportionate allocation uses a sampling fraction in
each of the strata that is proportional to that of the
total population. For instance, if the population X
consists of m in the male stratum and f in the
female stratum (where m + f = X), then the relative
size of the two samples (x1 = m/K males, x2 = f/K
females) should reflect this proportion.
 Optimum allocation (or Disproportionate allocation) Each stratum is proportionate to the standard
deviation of the distribution of the variable. Larger
samples are taken in the strata with the greatest
variability to generate the least possible sampling
variance.
THE NATURE OF RISK
THE NATURE OF RISK
 Statements (or inferences) about things are based on the best
information at hand.
 In a forestry context, statements are made about stand
volumes based upon a sample, rather than from all of the trees
in the stand. There is a risk of making any statement,
particularly in today's litigious society.
 Decisions are made as best as possible with consideration
given to the probability the statement is right or wrong and the
cost of being wrong. Being proactive in finding potential
problems in timber sales is essential to efficient cruising.
 A forestry example would be where part of the stand is in low
value pulp with the rest of the stand in high value sawtimber.
Good information about volumes by product is necessary rather
than just a total volume. Accurate representation of what is
being sold is important (within reasonable cruising cost
guidelines), in fairness to both the purchaser and the seller.
Unit refer s to a cutting unit ( a physical piece of
ground).
There are two levels of stratification, the strata and
sub-strata or sample group . Stratification groups
similar things together into a population, in forestr y,
the typical unit of obser vation is a plot or a tree.
Typically, volume is the variable of interes t which
af fects the CV.
UNITS
AND
STRATA
STRATA VERSUS ATTRIBUTE
 Stratification is used to group similar individuals together into
populations. These populations are the basis for statistical
calculations and the error standards in the handbook are
written for these strata.
 These attributes or categorical variables can be used to
aggregate total volumes in dif ferent ways. While averages and
totals are available for these dif ferent groupings, it would be
a violation of statistics to post stratify and calculate
confidence intervals about those numbers. The user always
has the option of creating strata using these attributes to
place individuals into those populations.
RULES FOR USING
 1) Only use one sampling method for a stratum (if point/plot
cruising only one BAF or plot size per stratum).
 2) Change the frequency of the sampling method within a
stratum by defining sub -stratification of sample group.
USER DEFINED POPULATIONS
 Defining populations is the crux of cruise design. Before
ef fectively designing a cruise, the prescriptions must be
finalized. It is necessary to know what kind of information is
needed in the prospectus in order to design the cruise. For
example, if there are big dif ferences in the value of a tree
because of size or species then probably need to stratify
based on those characteristics. Once the populations are
defined then the next task is to decide how best to sample
that population.
SAMPLING AND ATTRIBUTES
 each individual (could be a tree or a plot) needs to be identified
by what population (stratum) the individual is in and where it is
located. An individual can belong to only one population and can
be located in only one unit. That unit may be used as a
stratification variable to place an individual into a population.
Membership in a population determines if and when this
individual is a measured sample, and the rules for selecting
samples will vary with the cruise method. A unit may be a
stratification variable which is recorded and this attribute is
used to summarize volumes by unit. Other attributes, such as
species and logging methods, can also be recorded and used to
summarize volumes.
 The key point is although averages or totals can be calculated by
these other attributes, if it is not a stratification variable,
sampling errors and confidence intervals cannot legitimately be
calculated.
EXPANDING SAMPLES
 Each tree sampled represents other trees which were not
sampled. Since the sample selection takes place at the
population level, the expanded volumes, sampling errors and
statistics are also at the population level.
 For sample tree cruises, the apportioning of the volume to the
unit is in proportion to the percent of trees (tally by species)
for a unit.
 For area based sampling, the population volume per acre is
multiplied by the unit acreage. This, of course, results in all
units within a strata having the same species and volume per
acre.
SIMPLE EXAMPLE
Tw o u n i t s , s i n g l e s t r a t u m . C a l c u l a t e t h e
e x p a n s i o n f a c to r a s t h e n u m b e r o f c o u n te d
t r e e s d i v i d e d by t h e n u m b e r s a m p l e d o r 10
d i v i d e d by 2 e q u a l s 5 . E a c h m e a s u r e d t r e e
r e p r e s e n t s 5 ot h e r s i n t h e p o p u l a t i o n t h a t
we r e n ot m e a s u r e d
Since stratification was not by unit, the volume
needs to be prorated back to the unit. Six out of
the 10 trees observed were in unit 1, so 60% of
the PP, WF, and total volume would be assumed
to be in unit 1. Similarly, 4 out of 10 trees
observed were in unit 2, so 40% of the PP, WF,
and total volume would be assumed to be in unit
2.
VOLUME EXPANSION SIDE EFFECTS
PRECAUTION
FOR
POINT/FIX
PLOT
SAMPLING
Ta ke a s t a n d w i t h t w o c o m p o n e n t s w h e r e t h e S ' s c o u l d
r e p r e s e n t s aw t i m b e r o r a s p e c i e s s u c h a s s p r uc e , a n d t h e P ' s
c o ul d r e p r e s en t p u l p o r p i n e . Tw o s a m p l e g r o up s c o ul d b e
c r e a te d , a n ' S ' a n d a ' P ' , a n d s a m p l e s e p a r a te l y f o r t h e S ' s a n d
P ' s o n t h e p o i n t s / p l ot s . L o o k i n g a t t h e S s a m p l e g r o up , t h e
e r r o r i s b a s e d o n p l o t v o l um e o f S ' s s o t h e v a r i a b i li t y w o u l d b e
v e r y h i g h s i n c e s o m e p l o t s h av e a l l t h e v o l ume i n S ' s w h i l e
o t h e r s h av e n o n e . N o t o n l y t h a t b u t t h e p r e s e n c e o f S ' s o n t h e
plot means there isn't as much room left for P's and vice versa.
T h e v o l um e o f S ' s a n d P ' s i s i nv e r s el y c o r r e l a te d . A n d , o f
c o u r s e , t h i s h i g h v a r i a b i li t y a n d r e s u l t a n t h i g h C . V. ' s r e s u l t s i n
t h e c a l l f o r m o r e p l o t s to m e et s a m p l i n g e r r o r. H o w eve r, a d d i n g
m o r e p l o t s c o u l d r e s u l t i n d r i v i ng t h e C . V. ' s h i g h e r i f t h e
v a r i a b i li t y i n c r e a s e s ev e n m o r e , w h i c h , o f c o u r s e , w o u l d
i n d i c a te ev e n m o r e p l o t s a r e n e e d e d a n d s o o n . T h i s t y p e o f
s t a n d n e e d s to b e s a m p l e d i n a d i f fe r e n t m a n n e r. E i t h e r s a m p l e
m i n o r o r h i g h l y v a r i a b l e s p e c i e s o r p r o d uc t s w i t h a s e p a r a te
m et h o d i n a s e p a r a te s t r a t um, o r s p l i t o u t p o r t i o n s o f u n i t s
w h i c h a r e h i g h l y v a r i a b l e a n d s a m p l e t h e m s e p a r a te l y.
POINT/FIX PLOT EXPANSION AND
PRORATION
Remember, if the unit is not used as a stratification variable then there needs to be some assumptions
to allocate volumes back to the unit level. In this example, a single stratum composed of two units is
established. Then plots are placed in the population and the volume per acre calculated for each plot.
Suppose this results in an average volume of 1000 CF per acre. The unit volumes are calculated by
multiplying the volume per acre by the unit acres.
In prorating point/plot sample volumes, the number of points/plots is used at the stratum level to
calculate volume per acre for the stratum. So the number of points/plots are not considered at the unit
level. One precaution in using sample groups with point/plot sampling: don't use sample groups to try to
get unit volumes. Since the point/plot count and expansion is at the stratum level (looking back at the
previous example it would appear), if units were sample groups then sample group 1 would have eight
plots with volume and four without, and vice versa for sample group 2. This increases the variability and
also results in strange looking expanded volumes for units.
INFERENTIAL STATISTICS
2 WAYS TO USE INFERENTIAL STATISTICS
 1 . Estimate a parameter about a population
 2. Test a hypothesis
 Example: 10 forest stands were thinned. The average
increase in CAI was 25% following thinning.
 Hypothesis: Forest thinning has no ef fect on tree growth rate
 You want to prove that the thinning did have an ef fect so you
test the opposite viewpoint or null hypothesis
 We can test the hypothesis by comparing these stands to 10
similar stands that were not thinned.
PREDICTING THE POPULATION
 Can use sample mean for continuous variables to estimate
population mean.
 Using a confidence interval helps define the certainty that the
true population mean falls with the range of values.
 Usually a 95% or 99% confidence interval is expressed.
ONE OR TWO TAILED TESTS
 If you wanted to test the tree mortality associated with an
insect infestation or other damaging event, you set a
prediction level – say greater than 60 percent dead.
 After the event you measure the number of dead trees and
determine the real percent mortality.
 You can do a 1 tailed test to see if your prediction was
accurate by testing the data compared to mortality less than
or equal to 60 percent.
 You can do a 2 tailed test by setting the level to 60% and
testing if the mortality was not equal to 60%. The test would
fail if the measured mortality was much larger or much
smaller than 60%
COMPARING MEANS OF 2 POPULATIONS
 T-test – categorical data OK
 In Excel – =TTEST(array1 ,array2,tails,type)




Array1
Array2
Tails –
Type –
equals
– first set of numbers to compare
– second set of numbers to compare
1 or 2 sided test
1 equals paired, 2 equals Two-sample equal variance, 3
Two-sample unequal variance
Download