Appropriate Sampling

advertisement
Appropriate Sampling
Ann Abbott
Rocky Mountain Research Station
Moscow Forestry Sciences Laboratory
Outline






What is Appropriate Sampling
How do we do it
Questions to Ask
Sampling Designs
Sample Size
Northern Region Protocol
What is appropriate sampling?




Meets the objectives of the research
question
Representative of the population
Feasible
Cost effective
Appropriate Sampling is the
RESULT


Of answering a series of questions
The answers to the appropriate questions
lead naturally to the appropriate Sampling
Design and Data Analysis/Interpretation
Questions to Ask






Objectives of the Research
Population for inferences
Sampling Units
Translation of the objectives
Preliminary Information
Choice of Sampling Design
Questions to Ask





Determination of Sample Size
Auxiliary Variables
Randomization
Recording Results
Analysis
Stating the Objectives


Have the objectives of the investigation
been clearly and explicitly stated, along
with the reasons for undertaking it?
Have the objectives been translated into
precise questions that sample
determinations can be expected to
answer?
Defining the Population




Has the population about which inferences
to be made been carefully defined?
What constraints are to be placed on the
population?
Are the units to be measured or counted
representative of the population?
If not, what changes must be made to
ensure representativeness?
Defining the Population


Is there a logical framework for the choice
of sample units from the defined
population?
If not, what steps can be taken to impose
a logical sampling frame?
Sampling Units

A successful sampling scheme involves
the selection of an appropriate sampling
unit
Quadrat
 Leaves of a plant
 Individual organism
 Belt transect
 Point

Sampling Units





Are the sampling units naturally defined?
If not, how will they be defined?
Is the number of sampling units finite?
If it is finite, is the total number of units in
the population large enough to ignore finite
sampling considerations?
Is the definition of the sampling units
appropriate to the objectives
Choice of Sampling Unit

Must be the unit upon which you wish to
make inferences and estimates
Defined to be “nonoverlapping collections

Sampled without replacement

of elements from the population that cover
the entire population”
Choice of Sampling Unit

Point versus Area
Point samples allow inferences based on the
number of observations in the sample
 Inferences are made on means or
percentages from the sample observations
 Area samples are generally measured with
densities or percent of the area covered
 Inferences are made by extrapolating the
sample density to the entire area

Choice of Sampling Unit: Point vs
Area



Point samples are quicker, can potentially
give a more cost effective coverage of the
area
Area samples can yield more detailed
information but may be more time
consuming
Area sampling assumes that counts are
made without error
Translating the Objectives





What exactly is to be estimated or tested?
Are the required estimates proportions,
totals, means, totals or means over subpopulations, or something else?
Have blank data sheets been constructed?
What is the smallest subset of data from
which estimates are to be made?
What precision is required of the estimates
for the various subsets?
Preliminary Information




Is information about the population
available that may be helpful in designing
the sampling scheme?
Are estimates of the likely variability
available?
Is a pilot study feasible or desirable?
Are there any known factors that help
stratify the population?
Variability

The variation that is inherent in soils data
must be accounted for during the design
phase of a soil sampling plan, including
Sampling design
 Data collection procedures
 Analytical procedures
 Data Analysis



“One of the key characteristics of the soil
system is its extreme variability.” (Mason
1992)
Researchers have long been cautioned
about failing to consider the variability in
soil sampling when dealing with any study
of the soils system (e.g. Cline 1944).
Accounting for Variability



Ensuring that the sample adequately
covers the entire population
Reporting variability estimates along with
central tendency estimates
Reporting interval estimates

Use an interactive approach to balance
the data quality needs and resources with
designs that will either control variation,
stratify to reduce variation, or reduce the
influence of variation on the decision
process
Precision, Bias and Accuracy

Precision is a measure of the
reproducibility of measurements of a
particular soil condition or constituent


The statistical techniques seen in soil
sampling are designed to measure precision
and not accuracy
Bias is a systematic error that contributes
to the difference between the mean of a
large number of test results and an
accepted reference value.
Precision, Bias and Accuracy

Accuracy is the correctness of the
measurement and cannot be directly
measured: it is the sum of precision and
bias




Red dots are precise but biased
Blue dots are unbiased but imprecise
Yellow dots are biased and imprecise
Green dots are unbiased, precise
and therefore accurate
Sampling Designs






Simple Random Sampling
Stratified Random Sampling
Systematic Random Sampling
Cluster Sampling
Other
Combinations
Sampling Designs

Can the population as defined be broken
into naturally occurring groups, where the
grouping variable affects the measured
variable(s)?
If it cannot, Simple Random Sampling or
Systematic Sampling can be effective
 If it can, Stratified Random Sampling or
Cluster Sampling

Simple Random vs Systematic

Simple Random Sampling: If there a “list”
(sampling frame) of all sampling units in
the population


Randomly selects from units on the list
Systematic Random Sampling: If there is
no sampling frame available but there is
an estimate of the total number of
sampling units

Randomly selects starting point
Simple Random Sampling



Used when there is inadequate
information for developing a conceptual
model for a site or for stratifying a site
Any sample in which the probabilities of
selection are known
Sampling units are chosen by using some
method using chance to determine
selection


Simple random sampling is the basis for
all probability sampling techniques and is
the point of reference from which
modifications to increase sampling
efficiency may be made
Alone, simple random sampling may not
give the desired precision
Simple Random Sampling

Advantages
Prior information about population is not
necessary
 Easy to perform, easy to analyze


Disadvantages
May not give desired precision
 Need a sampling frame

Computation

Simple Random Sample-continuous variable
n

Mean
y
y
i 1
i
n
n
 (y
 y )2

Variance

s
Confidence Interval y  t 2
n

2
s
Vˆ ( y ) 
, s2 
n
Sample Size n 
t2 s 2
2
w2
i 1
i
n 1
Computation

Simple Random Sample-Binomial variable
n


Proportion
Variance
pˆ 
y
i 1
i
n
pˆ (1  pˆ )
ˆ
ˆ
V ( p) 
n
ˆ  z
 Confidence Interval p

Sample Size
n
z2 pˆ (1  pˆ )
2
w2
2
pˆ (1  pˆ )
n
Systematic Random Sampling




Attempt to provide better coverage of the study
area or population than that provided by a
simple random sample or a stratified random
sample
Is a simple random sample based on spatial
distribution over the site
Does not require a complete list of sampling
units
Can give better coverage than a simple random
sample
Systematic Random Sampling





Requires some estimate of the total
number of sampling units in the population
Required sample size must be calculated
Determine sampling interval between units
Randomly select starting point
Transect sampling is a version of
Systematic Random Sampling
Systematic Random Sampling

Collects samples in a regular pattern over
the area in the investigation
Grid
 Line Transect


Orientation of grid or transect starting point
should be randomly selected
Systematic Random Sampling

Considerations


Sample size and population size estimates
Some knowledge of the population to avoid
sampling along periodicities
Stratified vs Cluster Sampling



Used when the population can be broken
into naturally occurring groups or
segments
Stratified Random Sampling: when there is
more variability among groups than within
groups
Cluster Sampling: when there is more
variability within groups than among
Stratified Random Sampling


Prior knowledge of the sampling area and
information obtained from background
data may be used to reduce the number of
observations necessary to attain specified
precision
Goal is to increase precision and control
sources of variability in the data
Stratified Random Sampling


Variability between strata must be larger
than variability with strata for any benefit to
be seen
Sampling within each stratum is done with
a Simple Random Sample
Stratified Random Sampling

Advantages
Gives estimates for subgroups
 Can be more precise than Simple Random
Sampling
 Can be more convenient to implement


Disadvantages
Requires prior information about the
population
 More complicated computation

Computation

Stratified Random Sample-continuous variable
1 L
y st   N i y i
N i 1

Mean

1
ˆ
V
(
y
)

Variance
st
N2

s
Confidence Interval y st  t st
2
s
Ni2 i

ni
i 1
L
2
nst
Stratified Random Sample

Sample Size Calculation
Requires information about the relationship
between the individuals among strata
 Can be calculated by weighting strata
 Can allocate sampling based on minimizing
the variance for a fixed cost
 Other ways to allocate sampling among strata
(optimal, Neyman)

Post Stratification


Can be used when stratification is
appropriate for some key variable, but
cannot be done until after the sample is
selected
Often appropriate when a simple random
sample is not properly balanced according
to major groupings
Post Stratification


Mean
1 L
y st   Ni y i
N i 1
Variance
L
L
N

n
1
2
2
Vˆp ( y st ) 
W
s

(
1

W
)
s
 i i n2 
i
i
Nn i 1
i 1
Cluster Sampling



Used when there is more variability within
groups than among
Groups are randomly sampled
Units within groups are sampled
Can sample every element within the group
 Can take a second random sample within the
group

Questions to Ask in Choosing a
Sampling Design



If there is no information on population
groupings, will simple random or
systematic random sampling better meet
the objectives?
Is Simple Random Sampling likely to be
effective?
If not, have the reasons for not using
simple random sampling been clearly
stated?
Questions to Ask in Choosing a
Sampling Design



If Systematic Random Sampling is
chosen, what interval will separate units?
Is there a likelihood that the interval will
coincide with periodicity in the data?
If so, what steps will be taken to avoid the
resulting bias in the estimates?
Questions to Ask in Choosing a
Sampling Design




If there is a grouping in the population, will
stratification improve the precision of the
estimates?
Has the efficiency of the stratification been
calculated?
What is the basis of the stratification?
How will the sampling units be allocated?
Questions to Ask in Choosing a
Sampling Design


If there is a grouping in the population, is
there an advantage to cluster sampling?
Has the efficiency been calculated?
Sample Size

Calculated based on variability (standard
deviation) within the population and desired
precision of the estimate (confidence level)
t2 s 2
Simple Random Sample and
2
n

Systematic Random Sample
w2


Stratified Random Sample (complicated) but still
needs variance
Sample Size

Specific sampling design considerations
Systematic: is the sample size required to
uniformly cover the population consistent with
the expected precision?
 Stratified: has the efficiency of the
stratification been tested in reducing the
sample size or in obtaining the largest number
of observations from the part of the population
of greatest interest?

Sample Size

Sample design considerations, continued
Multistage: has the efficiency of various
combinations of sample units at different
stages been tested?
 Cluster: has the efficiency of various size
clusters been tested?

Sample Size

Cost considerations
Must the number of observations be modified
to account for variation in cost in different
parts of the sampling procedure?
 If so, can the design be improved for better
cost efficiency?

Randomization



Have the sampling units been selected by
an explicit randomization procedure?
Has the randomization procedure been
documented?
Were any constraints correctly applied?
Sample Design Example


Northern Region Soil Monitoring Protocol
Goal: Develop an easy, cost effective and
statistically defensible monitoring protocol
for disturbance

Stating the objectives: Characterize the
activity area in terms of management related
disturbance
Northern Region Protocol



Defining the population: All possible
‘points’ within the Activity Area
Sampling units defined as ‘points’
Infinite number of possible ‘points’ in the
population so finite sample correction
factors do not need to be used
Northern Region Protocol

Sample Design
Stratification may be desirable but variability
information is unavailable
 Simple Random Sampling may not give the
appropriate coverage
 Systematic Random Sampling (Transect) was
chosen to give the best coverage of the area

Northern Region Protocol
Translating the objectives
What exactly will be measured or tested:






Forest floor depth
Forest floor missing
Topsoil displacement
Mixed topsoil/subsoil
Erosion
Rutting (3 depths)




Burning (light, moderate,
severe)
Compaction (3 depths)
Platy/massive structure (3
depths)
5 forest floor variables
Northern Region Protocol

Translating the objectives: Blank data
tables
Northern Region Protocol

Translating the objectives: what exactly is
to be estimated or tested?


What proportion of points in the sample have
the characteristic of the indicator variable?
What is the variability associated with the
proportion?
Northern Region Protocol

Translating the objectives: What is the
required precision of the estimates?


Confidence intervals within ± 5% of the
estimate
Confidence levels are determined by the line
officer, allow choice from 70% to 95%
Northern Region Protocol

What preliminary information is available
about the activity area?
Approximate size and shape
 Harvest history
 Variability estimates generally unknown
 A pilot would be best
 Stratification potential exists

Northern Region Protocol

Problem:



Variability estimates are unavailable
Pilot studies are not feasible due to time and
cost constraint
Statistically valid sample sizes are required
Sequential Sampling



An alternative approach to sampling in
which the sample size is not fixed in
advance
Observations are collected individually or
in small batches
After each observation or batch, the data
are examined to determine whether or not
a decision may be made from the
accumulated data
Sequential Sampling


Combines data collection and data
analysis into a single process or sampling
plan
Can considerably reduce the sample size
requirements and data processing
overheads
Sequential Sampling


Best used in situations where classification
of a population is useful and where the
emphasis is on decision making
In the simplest and most frequently used
form, it is used to make binary
classifications but can be extended into
other applications
Northern Region Protocol


Use a combination of sequential and
systematic random sampling to obtain
variability information for sample size
calculation at the same sampling visit as
the full data collection trip
First 30 observations are used to calculate
initial sample size, then sample size is
continually updated as sampling continues
Northern Region Protocol



Indicator variables are binomial (0,1)
Binomial variables converge to a normal
distribution when n ≥ 30
Attractive for sampling since the maximum
variability can be computed
Northern Region Protocol


When sampling is complete for the activity
area, the estimates and confidence
intervals are computed
Protocol allows field crews to sample an
activity area with a statistically valid
sample size in one visit
Download