The Generalized Random Tessellation Stratified Sampling Design for Selecting Spatially-Balanced Samples Don L. Stevens, Jr. Department of Statistics Oregon State University Monitoring Science & Technology Symposium September 20 - 24, 2004 Denver, Colorado R82-9096-01 This presentation was developed under STAR Research Assistance Agreement No. CR829096-01 Program on Designs and Models for Aquatic Resource Surveys awarded by the U.S. Environmental Protection Agency to Oregon State University. It has not been subjected to the Agency's review and therefore does not necessarily reflect the views of the Agency, and no official endorsement should be inferred Historical Context • GRTS design evolved from EMAP work on global tessellations in the early 1990’s – Scott Overton, Denis White, Jon Kimmerling developed EMAP’s triangular grid + hexagonal tessellation Historical Context • EMAP began with a triangular grid & hexagonal tessellation – Expected to intensify grid as needed – Triangular grid has several advantages • More compact than square grid • More subdivision factors – Became clear that basic concept did not have enough flexibility to accommodate the characteristics of environmental resource sampling Environmental Resource Populations • Point-like – Finite population of discrete units, e.g., small- to medium-sized lakes • Linear – Width is very small relative to length, e.g., streams or riparian vegetation belts • Extensive – Covers large area in a more or less continuous and connected fashion, e.g., a large estuary Environmental Resource Populations • Tobler's First Law of Geography: Things that are close together in space tend to have more similar properties than things that are far apart. OR • Spatial correlation functions tend to decrease with distance Sampling Environmental Resource Populations • Environmental Resource Populations exist in a spatial matrix • Population elements close to one another tend to be more similar than widely separated elements • Good sampling designs tend to spread out the sample points more or less regularly • Simple random sampling tends to exhibit uneven spatial patterns Simple random sample of a domain with 3 subdomains A 28 B 28 C 15 Sampling Environmental Resource Populations • Patterned response (gradients, patches, periodic responses) • Variable inclusion probability • 0, 1, and 2 dimensional populations (points, lines, & areas) • Pattern in population occurrence (density ) • Unreliable frame material • Temporal panels often needed Environmental Resource Populations Ecological importance, environmental stressor levels, scientific interest, and political importance are not uniform over the extent of the resource Desirable Properties of Environmental Resource Samples • (1) Accommodate varying spatial sample intensity • (2) Spread the sample points evenly and regularly over the domain, subject to (1) • (3) Allow augmentation of the sample after-the-fact, while maintaining (2) Desirable Properties of Environmental Resource Samples • (4) Accommodate varying population spatial density for finite & linear populations, subject to (1) & (2). • (2) + (4) Sample spatial pattern should reflect the (finite or linear) population spatial pattern Sampling Environmental Resource Populations • Systematic sample has substantial disadvantages – Well known problems with periodic response – Less well recognized problem: patch-like response A 26 B 24 C 15 A 32 B 20 C 16 Sampling Environmental Resource Populations • Systematic sample has substantial disadvantages – – – – – Well known problems with periodic response Less well recognized problem: patch-like response Difficult to apply to finite populations , e.g., Lakes Limited flexibility to change sample point density Difficult to accommodate variable inclusion probability or sample adjustment for frame errors Sample point intensity can be changed using nested grids A 26 B 88 C 15 RANDOM-TESSELLATION STRATIFIED (RTS) DESIGN • Compromise between systematic & SRS that resolves periodic/patchy response • Cover the population domain with a grid – – – – Randomly located Regular (square or triangular) Spacing chosen to give required spatial resolution Tile the domain with equal-sized regular polygons containing the grid points – Select one sample point at random from each tessellation polygon RANDOM-TESSELLATION STRATIFIED (RTS) DESIGN • Solves some of systematic sample problems – Non-zero pairwise inclusion probability – Alignment with geographic features of population – Lets points get close together with low probability RTS DESIGN • Does not resolve systematic sample difficulties with – – – – – variable probability finite & linear populations pattern in population occurrence (density) unreliable frame material Limited ability to change density Generalized Random-Tessellation Stratified (GRTS) Design • Conceptual structure: – Population indexed by points contained within a region R – Have inclusion probability p(s) defined on R – Select a sample by picking points • Finite: points represent units p(s) is usual inclusion probability • Linear: points on the lines p(s) is a density: #sample points /unit length • Extensive: points are in region area p(s) is a density: #sample points/unit area GRTS Design Mechanics • Map R into first quadrant of unit square, & add a random offset • Subdivide unit square into “small” grid cells – At least small enough so that total inclusion probability for a cell (expected number of samples in the cell) is less than 1 – Total inclusion probability for cell is sum or integral of p(s) over the extent of the cell 0.0 0.2 0.4 0.6 0.8 1.0 Population region image 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Population region image + random offset 0.0 0.2 0.4 0.6 0.8 1.0 GRTS Design Mechanics Order the cells so that some 2-dimensional proximity relationships are preserved – Can’t preserve everything, because a 1-1, onto, continuous map from unit square to unit interval is impossible – Can get 1-1,onto, & measureable, which is good enough – GRTS uses a quadrant-recursive function, similar to the space filling curve developed by Guiseppe Peano in 1890. Assign each cell an address corresponding to the order of subdivision The address of the shaded quadrant is 0.213 Order the cells following the address order GRTS Design Mechanics • If we carry the process to the limit, letting the grid cell size 0, the result is a quadrant recursive function, that is, a function that maps the unit square onto the unit interval such that the image of every quadrant is an interval. • Apply a restricted randomization that preserves “quadrant recursiveness” HIERARCHICAL RANDOMIZATION Each cell address is a base 4 fraction, that is, t = 0.t1t2t3..., where each digit ti is either a 0, 1, 2, or 3. A function hp is a hierarchical permutation if h p(t) = 0. p1( t 1 ) pt 12( t 2 ) pt 1t 2 3( t 3 )... where pt 1t 2 ...t n1n( _ ) is a possibly distinct permutation of {0,1,2,3} for each unique combination of digits t1, t2, ..., tn - 1. HIERARCHICAL RANDOMIZATION • If the permutations that define hp() are chosen at random and independently from the set of all possible permutations, we call hp() a hierarchical randomization function, and the process of applying hp() hierarchical randomization. • Compose the basic q-r map with a hierarchical randomization function 1 1 14 16 5 7 13 15 2 4 10 12 1 3 9 11 0 0 y 8 y 6 0 1 0 1 x 1 x 1 7 6 10 11 8 5 14 13 1 4 16 15 2 3 0 0 y 9 y 12 0 1 x 0 1 x GRTS Design Mechanics • The result is a random order of the “small” grid cells such that – All grid cells in the same quadrant have consecutive order positions • But will be randomly ordered within those positions – This holds for all quadrant levels • This induces a random ordering of population elements GRTS Design Mechanics • Assign each grid cell a length equal to its total inclusion probability • String the lengths in the random order – Result is a line with length equal to target sample size • Take systematic sample along line (random start + unit interval) • Map back to population using inverse random qr function GRTS Design Mechanics • Points will be in ‘hierarchical random order’ • Re-order into ‘reverse hierarchical order’ gives some very useful features to the sample Reverse Hierarchical Order • Illustrate for 2-levels of addressing: First 16 addresses as base 4-fractions 00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 Reverse Hierarchical Order • Illustrate for 2-levels of addressing: First 16 addresses as base 4-fractions 00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 Reversed digits 00 10 20 30 01 11 21 31 02 12 22 32 03 13 23 33 Reverse Hierarchical Order • Illustrate for 2-levels of addressing: First 16 addresses as base 4-numbers 00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 Reversed digits 00 10 20 30 01 11 21 31 02 12 22 32 03 13 23 33 Reversed digits as base 10 numbers 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 SPATIAL PROPERTIES OF REVERSE HIERARCHICAL ORDERED GRTS SAMPLE • The complete sample is nearly regular, capturing much of the potential efficiency of a systematic sample without the potential flaws • Any subsample consisting of a consecutive subsequence is almost as regular as the full sample; in particular, the subsequence S k = { s 1, s 2 , ..., s k }, for k M , is a spatially well-balanced sample. • Any consecutive sequence subsample, restricted to the accessible domain, is a spatially well-balanced sample of the accessible domain. 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.5 Z 1 1.5 Inclusion probability density surface 1 0.8 1 0.6 Y 0.4 0.4 0.2 0 0.2 0.6 X 0 Region is (0,1)x(0,0.8) 0.8 1.0 0.8 0.6 0.0 0.2 0.4 c(0, 1) 0.0 0.2 0.4 0.6 c(0, 1) 0.8 1.0 SPATIAL PROPERTIES OF REVERSE HIERARCHICAL ORDERED GRTS SAMPLE • Assess spatial balance by variance of size of Voronoi polygons, compared to SRS sample of the same size. • Voronoi polygons for a set of points: { s1, s 2 , ..., s k } The ith polygon is the collection of points in the domain that are closer to si than to any other sj in the set. • Estimate variance by 1000 replications of a sample of size 256 in unit square Voronoi Polygons Uniform Sample GRTS Sample SPATIAL PROPERTIES OF REVERSE HIERARCHICAL ORDERED GRTS SAMPLE • Compare regularity as points are added one at a time, following reverse hierarchical order under 4 scenarios: – Complete, continuous domain – Domains with “holes” excluding 20 %, modeling nonresponse/access refusal • 20 randomly-located square holes, constant size • 20 randomly-located square holes, increasing linearly in size • 10 randomly-located square holes, increasing exponentially in size Inaccessible Domain Patterns (20% Inaccessible) C onstant Size Linear Increase Exponential Increase Voronoi Polygons Uniform Sample GRTS Sample 1.0 Linearly increasing polygon size, total perimeter = 7.6 0.6 Exponentially increasing polygon size, total perimeter = 4.2 0.2 0.4 Continuous domain with no voids 0.0 polygon area variance ratio 0.8 Constant polygon size, total perimeter = 8 0 50 100 150 point density 200 250 20 –point GRTS Sample Four 20-point GRTS Panels Five 20-point GRTS Panels Five 20-point GRTS Panels + Special Study Area Finite Population Example Equi-probable GRTS Sample GRTS Sample: Probability inversely proportional to population density Inverse density Equi-probable