The Generalized Random Tessellation Stratified Sampling Design for Selecting Spatially-Balanced Samples

advertisement
The Generalized Random Tessellation
Stratified Sampling Design for Selecting
Spatially-Balanced Samples
Don L. Stevens, Jr.
Department of Statistics
Oregon State University
Monitoring Science & Technology Symposium
September 20 - 24, 2004
Denver, Colorado
R82-9096-01
This presentation was developed under STAR Research Assistance Agreement No. CR829096-01 Program on Designs and Models for Aquatic Resource Surveys awarded by the
U.S. Environmental Protection Agency to Oregon State University. It has not been
subjected to the Agency's review and therefore does not necessarily reflect the views of
the Agency, and no official endorsement should be inferred
Historical Context
• GRTS design evolved from EMAP
work on global tessellations in the
early 1990’s
– Scott Overton, Denis White, Jon
Kimmerling developed EMAP’s
triangular grid + hexagonal tessellation
Historical Context
• EMAP began with a triangular grid & hexagonal
tessellation
– Expected to intensify grid as needed
– Triangular grid has several advantages
• More compact than square grid
• More subdivision factors
– Became clear that basic concept did not have enough
flexibility to accommodate the characteristics of
environmental resource sampling
Environmental Resource
Populations
• Point-like
– Finite population of discrete units, e.g., small- to
medium-sized lakes
• Linear
– Width is very small relative to length, e.g., streams or
riparian vegetation belts
• Extensive
– Covers large area in a more or less continuous and
connected fashion, e.g., a large estuary
Environmental Resource
Populations
• Tobler's First Law of Geography: Things that are close
together in space tend to have more similar properties than
things that are far apart.
OR
• Spatial correlation functions tend to decrease with distance
Sampling Environmental
Resource Populations
• Environmental Resource Populations exist in a
spatial matrix
• Population elements close to one another tend to
be more similar than widely separated elements
• Good sampling designs tend to spread out the
sample points more or less regularly
• Simple random sampling tends to exhibit uneven
spatial patterns
Simple random sample of a domain
with 3 subdomains
A
28
B
28
C
15
Sampling Environmental
Resource Populations
• Patterned response (gradients, patches,
periodic responses)
• Variable inclusion probability
• 0, 1, and 2 dimensional populations (points,
lines, & areas)
• Pattern in population occurrence (density )
• Unreliable frame material
• Temporal panels often needed
Environmental Resource
Populations
Ecological importance, environmental stressor
levels, scientific interest, and political
importance are not uniform over the extent
of the resource
Desirable Properties of
Environmental Resource Samples
• (1) Accommodate varying spatial sample
intensity
• (2) Spread the sample points evenly and
regularly over the domain, subject to (1)
• (3) Allow augmentation of the sample
after-the-fact, while maintaining (2)
Desirable Properties of
Environmental Resource Samples
• (4) Accommodate varying population
spatial density for finite & linear
populations, subject to (1) & (2).
• (2) + (4)  Sample spatial pattern should
reflect the (finite or linear) population
spatial pattern
Sampling Environmental
Resource Populations
• Systematic sample has substantial
disadvantages
– Well known problems with periodic response
– Less well recognized problem: patch-like
response
A
26
B
24
C
15
A
32
B
20
C
16
Sampling Environmental
Resource Populations
• Systematic sample has substantial disadvantages
–
–
–
–
–
Well known problems with periodic response
Less well recognized problem: patch-like response
Difficult to apply to finite populations , e.g., Lakes
Limited flexibility to change sample point density
Difficult to accommodate variable inclusion
probability or sample adjustment for frame errors
Sample point intensity can be
changed using nested grids
A
26
B
88
C
15
RANDOM-TESSELLATION
STRATIFIED (RTS) DESIGN
• Compromise between systematic & SRS that
resolves periodic/patchy response
• Cover the population domain with a grid
–
–
–
–
Randomly located
Regular (square or triangular)
Spacing chosen to give required spatial resolution
Tile the domain with equal-sized regular polygons
containing the grid points
– Select one sample point at random from each
tessellation polygon
RANDOM-TESSELLATION
STRATIFIED (RTS) DESIGN
• Solves some of systematic sample problems
– Non-zero pairwise inclusion probability
– Alignment with geographic features of
population
– Lets points get close together with low
probability
RTS DESIGN
• Does not resolve systematic sample
difficulties with
–
–
–
–
–
variable probability
finite & linear populations
pattern in population occurrence (density)
unreliable frame material
Limited ability to change density
Generalized Random-Tessellation
Stratified (GRTS) Design
• Conceptual structure:
– Population indexed by points contained within a region
R
– Have inclusion probability p(s) defined on R
– Select a sample by picking points
• Finite: points represent units
p(s) is usual inclusion probability
• Linear: points on the lines
p(s) is a density: #sample points /unit length
• Extensive: points are in region area
p(s) is a density: #sample points/unit area
GRTS Design
Mechanics
• Map R into first quadrant of unit square, & add a
random offset
• Subdivide unit square into “small” grid cells
– At least small enough so that total inclusion probability
for a cell (expected number of samples in the cell) is
less than 1
– Total inclusion probability for cell is sum or integral of
p(s) over the extent of the cell
0.0
0.2
0.4
0.6
0.8
1.0
Population region image
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Population region image + random offset
0.0
0.2
0.4
0.6
0.8
1.0
GRTS Design
Mechanics
Order the cells so that some 2-dimensional
proximity relationships are preserved
– Can’t preserve everything, because a 1-1, onto,
continuous map from unit square to unit
interval is impossible
– Can get 1-1,onto, & measureable, which is
good enough
– GRTS uses a quadrant-recursive function,
similar to the space filling curve developed by
Guiseppe Peano in 1890.
Assign each cell an
address corresponding
to the order of
subdivision
The address of the
shaded quadrant is
0.213
Order the cells
following the address
order
GRTS Design
Mechanics
• If we carry the process to the limit, letting
the grid cell size  0, the result is a
quadrant recursive function, that is, a
function that maps the unit square onto the
unit interval such that the image of every
quadrant is an interval.
• Apply a restricted randomization that
preserves “quadrant recursiveness”
HIERARCHICAL
RANDOMIZATION
Each cell address is a base 4 fraction, that is, t = 0.t1t2t3...,
where each digit ti is either a 0, 1, 2, or 3. A function hp is a
hierarchical permutation if
h p(t) = 0. p1( t 1 ) pt 12( t 2 ) pt 1t 2 3( t 3 )...
where pt 1t 2 ...t n1n( _ ) is a possibly distinct permutation of
{0,1,2,3} for each unique combination of digits
t1, t2, ..., tn - 1.
HIERARCHICAL
RANDOMIZATION
• If the permutations that define hp() are chosen at
random and independently from the set of all
possible permutations, we call hp() a hierarchical
randomization function, and the process of
applying hp() hierarchical randomization.
• Compose the basic q-r map with a hierarchical
randomization function
1
1
14
16
5
7
13
15
2
4
10
12
1
3
9
11
0
0
y
8
y
6
0
1
0
1
x
1
x
1
7
6
10
11
8
5
14
13
1
4
16
15
2
3
0
0
y
9
y
12
0
1
x
0
1
x
GRTS Design
Mechanics
• The result is a random order of the “small”
grid cells such that
– All grid cells in the same quadrant have
consecutive order positions
• But will be randomly ordered within those positions
– This holds for all quadrant levels
• This induces a random ordering of
population elements
GRTS Design
Mechanics
• Assign each grid cell a length equal to its total
inclusion probability
• String the lengths in the random order
– Result is a line with length equal to target sample size
• Take systematic sample along line (random start +
unit interval)
• Map back to population using inverse random qr
function
GRTS Design
Mechanics
• Points will be in ‘hierarchical random
order’
• Re-order into ‘reverse hierarchical order’
gives some very useful features to the
sample
Reverse Hierarchical Order
• Illustrate for 2-levels of addressing:
First 16 addresses as base 4-fractions
00 01 02 03
10 11 12 13
20 21 22 23
30 31 32 33
Reverse Hierarchical Order
• Illustrate for 2-levels of addressing:
First 16 addresses as base 4-fractions
00 01 02 03
10 11 12 13
20 21 22 23
30 31 32 33
Reversed digits
00 10 20 30
01 11 21 31
02 12 22 32
03 13 23 33
Reverse Hierarchical Order
• Illustrate for 2-levels of addressing:
First 16 addresses as base 4-numbers
00 01 02 03
10 11 12 13
20 21 22 23
30 31 32 33
Reversed digits
00 10 20 30
01 11 21 31
02 12 22 32
03 13 23 33
Reversed digits as base 10 numbers
0 4
8 12
1 5
9
13
2
6 10 14
3
7
11 15
SPATIAL PROPERTIES OF REVERSE
HIERARCHICAL ORDERED GRTS SAMPLE
• The complete sample is nearly regular, capturing
much of the potential efficiency of a systematic
sample without the potential flaws
• Any subsample consisting of a consecutive
subsequence is almost as regular as the full
sample; in particular, the subsequence
S k = { s 1, s 2 , ..., s k }, for k  M , is a spatially
well-balanced sample.
• Any consecutive sequence subsample, restricted to
the accessible domain, is a spatially well-balanced
sample of the accessible domain.
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
0
0.5
Z
1
1.5
Inclusion probability density surface
1
0.8
1
0.6
Y
0.4
0.4
0.2
0
0.2
0.6
X
0
Region is (0,1)x(0,0.8)
0.8
1.0
0.8
0.6
0.0
0.2
0.4
c(0, 1)
0.0
0.2
0.4
0.6
c(0, 1)
0.8
1.0
SPATIAL PROPERTIES OF REVERSE
HIERARCHICAL ORDERED GRTS SAMPLE
• Assess spatial balance by variance of size of
Voronoi polygons, compared to SRS sample of the
same size.
• Voronoi polygons for a set of points: { s1, s 2 , ..., s k }
The ith polygon is the collection of points in the
domain that are closer to si than to any other sj in
the set.
• Estimate variance by 1000 replications of a
sample of size 256 in unit square
Voronoi Polygons
Uniform Sample
GRTS Sample
SPATIAL PROPERTIES OF REVERSE
HIERARCHICAL ORDERED GRTS SAMPLE
• Compare regularity as points are added one at a
time, following reverse hierarchical order under 4
scenarios:
– Complete, continuous domain
– Domains with “holes” excluding 20 %, modeling nonresponse/access refusal
• 20 randomly-located square holes, constant size
• 20 randomly-located square holes, increasing linearly in size
• 10 randomly-located square holes, increasing exponentially in
size
Inaccessible Domain Patterns (20% Inaccessible)
C onstant Size
Linear Increase
Exponential Increase
Voronoi Polygons
Uniform Sample
GRTS Sample
1.0
Linearly increasing polygon size, total perimeter = 7.6
0.6
Exponentially increasing polygon size, total perimeter = 4.2
0.2
0.4
Continuous domain with no voids
0.0
polygon area variance ratio
0.8
Constant polygon size, total perimeter = 8
0
50
100
150
point density
200
250
20 –point GRTS Sample
Four 20-point GRTS Panels
Five 20-point GRTS Panels
Five 20-point GRTS Panels
+ Special Study Area
Finite Population Example
Equi-probable GRTS Sample
GRTS Sample: Probability inversely
proportional to population density
Inverse density
Equi-probable
Download