Areal data Reminder about types of data Geostatistical data:

advertisement
Areal data
Reminder about types of data
Geostatistical data:
Z (s ) exists everyhere, varies continuously
Can accommodate sudden changes by a model for the mean
E.g., soil pH, two soil types with different pH
model as mean depends on soil type
error = small scale variation, assumed continuous
Areal data
Spatial data measured and reported by regions
Can be abrupt change at region boundaries
Unlike geostat data, “location” is arbitrary within region
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
1 / 52
Areal data
Areal data often arises by aggregation
number of disease cases by county
but doesn’t have to: US states, coded by party of state Governor
Distinction between Geostat and Areal can be blurred
SST data: given to you as geostatistical data
related information: only available as single value per spatial grid cell
(e.g., 1◦ x 1◦ area)
To me: relative scale, size of measurement to span of study area
Distinction between Areal and Point pattern can be blurred
Disease cases: will treat as areal data
but point pattern if have individual locations (household address)
Again, relative scale matters. What is the location of an individual?
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
2 / 52
Spring 2016
3 / 52
Spring 2016
4 / 52
Areal Data
Goals:
Show data
Describe spatial dependence
Predict values
Less common to predict at new locations
Assume observations are “nosiy”,
i.e., measurement error at each location,
Want to smooth (reduce amount of noise)
i.e., predict “true” values at observed locations
Some examples in pictures
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Infant mortality, Auckland NZ districts
40
35
30
25
20
15
10
5
0
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Number of plant species in 20cm x 20 cm patches of alpine
tundra
12
10
8
6
4
2
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
5 / 52
Wheat uniformity trial
5.0
4.5
4.0
3.5
3.0
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
6 / 52
Spring 2016
7 / 52
Incidence of flu in Iowa
0.10
0.08
0.06
0.04
0.02
0.00
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spatial smoothing
Made up map of # cases of flu in central Iowa in January 2013
expressed as # cases / person
Rate in February expected to be similar to that observed in January
Q: Where do you expect February rate to be the highest?
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
8 / 52
Spatial smoothing
Made up map of # cases of flu in central Iowa in January 13
expressed as # cases / person
Rate in February expected to be similar to that observed in January
Q: Where do you expect February rate to be the highest?
A: One of the light areas in the middle!
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
9 / 52
Spatial Smoothing
Why an unusually large value on edge of map may be noise, not
signal
could be biological: negative correlation in subsequent months
often a consequence of population
each square in the middle is ca 10,000 people. Those on the edges are
ca 100
e.g. big city in the middle of the mapped area
sd of estimated rate 10x higher in edge squares.
if assume rates are spatially correlated, nearby areas inform about
poorly est. areas, especially if nearby areas have large N
Another difference between areal and geostatistical data
Geostat: Often reasonable to assume constant variance
Areal: Often not.
Var / sd depends on something else about each region (e.g. N)
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
10 / 52
Areal data: what we’ll do
describe spatial dependence and smoothing for normally distributed
observations
Relatively straight-forward computations
Talk a bit about analyzing counts
Full treatment requires hierarchical Bayes methods
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
11 / 52
Describing association
Spatial correlation between what?
Reminder: in geostats, we assumed that spatial dependence
1) was constant over the map (2nd order stationarity)
2) was same in all directions (isotropy), but could relax
3) depended on distance between two observations
2
described by semivariance: E [Z (s) − Z (s + h)]
or cross-variance: E [Zi (s) − Zi (s + h)] [Zj (s) − Zj (s + h)]
Consider data on a grid (species diversity picture on next slide)
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
12 / 52
12
10
8
6
4
2
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
13 / 52
Describing association
Could describe in terms of distance
1√= up/down or across
2 = diagonal, and so on
2= up/down or across 2
But approximate, since distances between areas, not points
What about irregular regions (e.g. Auckland, on next slide)?
Distance hard to define consistently
(centroids of regions are very irregular)
Areal data analysis has traditionally focused on pairwise
“connectivity”:
how well connected are two regions?
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
14 / 52
Spring 2016
15 / 52
40
35
30
25
20
15
10
5
0
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Connectivity
On a grid: two common definitions of connectivity
rook’s move: a square in the middle connects to the two on either side
and the two above/below
queen’s move: a square in the middle connects to its 8 neighbors
(rooks + 4 diagonals)
squares on edge have 3/5 neighbors; those in the corners have only
2/3
Connections define the “spatial proximity” matrix, also called “spatial
weight” or spatial connectivity” matrix
# rows = # columns = # regions
1 if a pair of regions connect
0 otherwise
always 0 down the diagonals
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
16 / 52
Connectivity
Irregular regions: two approaches
1 if two regions share a boundary
very irregular neighbor count.
Auckland: from 1 - 10 neighbors, median 4
Could also use % boundary shared
very useful for modeling transport among regions
economic flows, disease propagation, invasive spp
Can also use distance
all regions with centroids within d of target
find the k nearest neighbors (k smallest distance between centroids)
make weight a function of distance, d −α
Can use values as is, or row-standardize
so row sum = 1
if have 2 N’s, each has connectivity 1/2
if have 4 N’s, each has connectivity 1/4
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
17 / 52
Connectivity
No standard way. Most common is:
shared boundary = 0/1, perhaps with row standardization
best if connectedness measure informed by subject-matter knowledge
choice does have statistical consequences, especially when
predicting/smoothing
Some weight matrices are symmetrical
0/1 shared boundary
d −α
others are not
% shared boundary
row-standardized matrices
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
18 / 52
Spatial dependence
One very common measure, one less common
Moran’s I , dates to 1950
I
=
s2 =
1 Σi,j wij (Yi − Ȳ )(Yj − Ȳ )
s2
Σi,j wij
1
Σi (Yi − Ȳ )2 , i.e., mle, not usual unbiased est.
N
wij is the ij’th element of the spatial weight matrix
I ranges from -1 to +1
0: no spatial correlation
1: perfect positive correlation
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
19 / 52
Moran’s I
Tests of correlation = 0:
1) If sufficient # regions, I ∼ N with mean and variance that can be
computed
−1
E Iˆ = N−1
, variance formula not insightful
not clear what is “sufficient”, depends on W , but would like 20 or
more regions
2) permutation: randomly shuffle observed values over the regions,
compute I each time
#more extreme
#permutations
more extreme+1
=#
#permutations+1
enumerate all permutations: p =
sample (randomization test): p
+1 accounts for the observed data (already included in all
permutations)
Usually one-sided test, only interested in positive spatial dependence
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
20 / 52
Moran’s I example
Example: species diversity data using Queen’s move neighbors
Iˆ = 0.751
randomization test: 0.751 exceeds all 9,999 randomizations, p =
1/10,000 = 0.0001
normal approximation: E Iˆ = -0.0157, Var Iˆ = 0.00451
Z = √I −E I = 11.4, p < 0.0001
Var I
Next slide shows distribution of Moran’s I values for randomized values
Spatial Data Analysis - Part 8
Spring 2016
21 / 52
Spring 2016
22 / 52
3
0
1
2
Density
4
5
c Philip M. Dixon (Iowa State Univ.)
−0.2
c Philip M. Dixon (Iowa State Univ.)
0.0
0.2
0.4
0.6
0.8
Spatial Data Analysis - Part 8
12
10
8
6
4
2
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
23 / 52
Geary’s c
Based on squared differences, not covariance
c=
N − 1 Σi,j wij (Yi − Yj )2
2 Σi,j wij (Yi − Ȳ )2
similar in spirit to semivariance
denominator scales to ±1
usually similar to but not same as Moran’s I
when there is a difference
I is a more global indicator, because calculate Ȳ
c is more sensitive to differences in small neighborhoods
Test Ho: no spatial dependence using permutations or normal
approximation
Notice that both I and c sum over all pairs of points.
One number for entire region
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
24 / 52
Local indicators of spatial association
rewrite I as:
I =
1
Σi (Yi − Ȳ ) Σj wij (Yj − Ȳ )
s 2 Σi,j wij
Calculate second sum separately for each region
Ii =
1
(Yi − Ȳ ) Σj wij (Yj − Ȳ )
s2
global statistic is then I = Σi Ii /Σi,j wij
illustrate using species diversity data
First local Moran’s Ii , then Moran’s Ii marking Zi > 2
Blue areas are where that region is similar to neighboring regions
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
div
Spring 2016
Ii
25 / 52
12
10
8
6
4
2
0
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
26 / 52
1.0
0.8
0.6
0.4
0.2
0.0
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
27 / 52
Moran correlogram
Define different weight matrices using different distance classes
(0, 1.5) using queen’s move neighbors
(2, 3) → regions slightly further apart
(3, 5), and so on
Or, use 1st nearest neighbor, 2nd NN, 3rd NN, . . .
Calculate Moran’s I for each weight matrix
Plot I vs distance
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
28 / 52
0.8
0.6
0.4
0.2
−0.2
0.0
Moran's I
1
2
3
4
5
6
Distance class
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
29 / 52
Join count statistics
Moran’s I and Geary’s c are for continuous observations
c: “similar” because (Yi − Yj )2 is small
What about categorical data
E.g., US states, record whether governor is Republican or Democrat
Is there spatial correlation?
i.e., If your state is Republican, are neighboring states more likely to be
Republican?
Usual approach is the Black-black (BB) join count statistic
1
BB = Σi,j wij Ii Ij
2
Ii is 1 if the “event” happens in region i
If wij is 1 if neighbors, 0 otherwise, BB is the number of pairs where
both region and neighbor are events
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
30 / 52
Join count statistics
Compare BB to expected value if no spatial dependence
Either a Z-test (assume normal-distribution)
BB − E BB
Z= √
Var BB
or permutation:
keep same number of “event” and non-“event” regions,
keep neighbor structure,
assign “event” / non-“event” randomly to regions.
compute BB for each randomization
Example: Species diversity plot, define “rich” as 6 or more species
Rich-rich, rooks neighbors: BB = 53, E BB = 35, Var BB = 8.495
√
Normal approximation: Z = (53 − 35)/ 8.495 = 6.176, p < 0.0001
Permutation: 53 is larger than any of 9999 randomizations, p =
0.0001
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
31 / 52
TRUE
FALSE
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
32 / 52
2500
2000
Frequency
1000
1500
500
0
●
25
c Philip M. Dixon (Iowa State Univ.)
30
35
40
45
BB for random assignments
Spatial Data Analysis - Part 8
50
55
Spring 2016
33 / 52
Combining multiple sites
Imagine a study of spatial dependence in Iowa that is repeated in
Indiana.
Methods above will describe that spatial dependence in each state
What if you believed the spatial dependence was similar in the two,
and wanted one result
How can you combine information from both states?
No problem, but should think about a few issues.
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
34 / 52
Combining multiple sites
1) One mean for both states, or separate means?
the issue is the scale of spatial dependence
(within state only or including between state)
not an issue if the means are similar, but they are often are not
Separate means → evaluates pooled within state spatial dependence
Separate means is most common
2) Include only pairs of regions within states, or pairs crossing states?
Same spatial scale issues.
Only pairs within states is most common.
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
35 / 52
Combining multiple sites
3) Do the two states contribute equal amounts of information?
Here, same within state sampling, same # regions, same weighting
scheme (In my story)
So, equal amounts of information
use a simple average of both states (A and B)
Iˆ for both states: Iˆ = (IˆA + IˆB )/2
E I = (E IA + E IB )/2
Var Iˆ = (Var IˆA + Var IˆB )/4
If unequal amounts of information, use a weighted average
Many possible ways to weight, depends on study specifics
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
36 / 52
Combining multiple sites
Hard part is getting estimate and its variance
Test Ho: no within state spatial dependence
Z score for overall study, or
permuting observations within state.
BTW, same ideas can be used for semivariograms for multiple sites
Computing trick:
artificially separate sites,
make sure min distance between IA and IN larger than max within
state distance
specify max semivariogram distance so that all pairs are within a site.
weights each region by number of pairs
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
37 / 52
Multiple sites: example
Iowa: Iˆ = 0.35, E Iˆ = −0.0159, Var Iˆ = 0.085
Indiana: Iˆ = 0.42, E Iˆ = −0.0159, Var Iˆ = 0.085
Individually:
Iowa: Z =
Indiana: Z
0.35−(−0.0159)
√
=
0.085
0.42−(−0.0159)
√
=
0.085
1.25, p = 0.10
= 1.49, p = 0.067
Together: Iˆ = (0.35 + 0.42)/2 = 0.385, E Iˆ = −0.0159,
Var Iˆ = (0.085 + 0.085)/4 = 0.0425
Z=
0.385−(−0.0159)
√
0.0425
= 1.94, p = 0.026
Similar patterns in both areas, aggregate the two ⇒ stronger evidence
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
38 / 52
Smoothing areal data
observations are measured with error
if obs. are spatially correlated, nearby obs. contain information
relevant to the true value / prediction for this obs.
⇒ predict Yi incorporating nearby obs.
Nearby often means “all”, i.e. global mean
Will talk about concepts with a few details
Naturally leads to Bayesian inference, won’t go that far in 406
Bivand’s example is disease mapping (Chapter 8)
applies to count data for many outcomes
We focus on normally distributed outcomes
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
39 / 52
Smoothing areal data
Will keep things simple to emphasize concepts
Observe Yi in a region, have multiple regions
Believe that each Yi observed with some random error
Want to predict “true” µi for each region
Normally distributed observations
Yi ∼ N(µi , σ 2 )
assume (to keep things simple) that σ 2 known
σ 2 may be different for each region
Statistical problem:
Given Yi predict µi
Two common ways to solve:
mixed model: Yi = µ + αi + εi , αi ∼ N(0, σa2 ), εi ∼ N(0, σe2 )
Find BLUP of αi
Bayes: Yi ∼ N(µi , σ 2 ), find posterior distribution of µi | Yi
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
40 / 52
Bayes in one slide
Distribution: probability density for a response given parameters
Normal distribution, known variance.
2
1
Y ∼ N(µ), f (Y |µ) = √2πσ
exp −(Y2σ−µ)
2
2
Likelihood: what can we say about a parameter given data
2
1
exp −(Y2σ−µ)
l(µ | Y ) = √2πσ
2
2
NOT a probability density
Prior: distribution of parameter before collecting data: f (µ)
Posterior: distribution of parameter after collecting data: f (µ|Y )
Bayes Theorem: f (µ|Y ) ∝ f (Y | µ)f (µ)
posterior proportional to likelihood times the prior
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
41 / 52
Bayes for normal data
Equations simpler if use precision = 1/variance, not variance
τ = 1/σ 2
Prior: What do I believe about µ before observing Y ?
convenient: µ ∼ N(µ0 , τ0 )
Likelihood: Y |µ ∼ N(µ, τ )
Posterior: µ|Y ∼ N(µ1 , τ1 )
µ0 τ0 + Y τ
, τ1 = τ0 + τ
τ0 + τ
0
Can rewrite µ1 as µ0 τ0τ+τ
+ Y τ0τ+τ
2 2 σ0
Or µ1 = µ0 σ2σ+σ2 + Y σ2 +σ
2
µ1 =
0
0
i.e., posterior mean is a weighted average of data value and prior mean
weights are the relative precision of each
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
42 / 52
Spring 2016
43 / 52
Bayes for normal data
Comparison:
Frequentist: µ̂ = Y
maximum likelihood estimator: max l(µ|Y )
Bayesian:
Have the posterior distribution,
use mode, median or mean as the estimate of µ
Here, all the same
τ0
τ
µ̂ = µ0
+Y
τ0 + τ
τ0 + τ
Conceptually, two sources of information about µ
the prior distribution
the data
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Bayes for normal data
Some numbers:
µ0 Y σ0 σ
τ0
5
9 10 1
0.01
5
9 1
1
1
5
9 1
10 1
5
1 1
1
1
Notice that Bayes estimate has smoothed
τ
µ1
1
8.96
1
7
0.01 5.04
1
3
the data
Towards the prior mean, µ0
Amount of smoothing depends on σ for data and σ0 for prior
Often, not clear how to choose prior
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
44 / 52
Empirical Bayes for spatial data
Want to smooth observations towards global average (or average of
neighbors)
Bayes smooths towards the prior mean
So use global average as the prior mean
Variations allow for local averages
Approach called Empirical Bayes
Still have to choose σ0 : controls amount of smoothing
Fit mixed model: Yi = µ + αi + εi , αi ∼ N(0, σa2 ), εi ∼ N(0, σe2 )
Use random effect variance, σa2 , as σ0
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
45 / 52
Spring 2016
46 / 52
Spring 2016
47 / 52
Spring 2016
48 / 52
Empirical Bayes: example
0.10
0.08
0.06
0.04
0.02
0.00
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Empirical Bayes: Var Y
0.010
0.008
0.006
0.004
0.002
0.000
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Empirical Bayes: smoothed estimates
0.210
0.205
0.200
0.195
0.190
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Empirical Bayes: smoothed estimates
phat
0.7
psmooth
0.6
0.5
0.4
0.3
0.2
0.1
0.0
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
49 / 52
Empirical Bayes: smoothed estimates
Fit mixed model: estimated overall mean = 0.201
estimated sd αi = 0.00674
Range of sd Yi : 0.00516 to 0.103
Examples of smoothing:
Var Yi
0.0107
0.00034
0.0000266
Example calculation: Var
µ̂i
Weight for Yi Yi
µ̂i
0.004
0.667 0.203
0.118
0.223 0.204
0.631
0.213 0.209
Yi = 0.00034, Yi = 0.2229
0.006742
0.00034
+ 0.2015
0.00034 + 0.006742
0.00034 + 0.006742
= 0.2229 × 0.1178 + 0.2015 × 0.8821
= 0.2229
= 0.2040
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
50 / 52
Spring 2016
51 / 52
Empirical Bayes: Weights on global average
1.0
0.9
0.8
0.7
0.6
0.5
0.4
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Empirical Bayes for spatial count data
Many aspects easier when data are counts
number of disease cases, number of deaths per region
models for counts do not have σ or σ0
huge area of research and application
Bivand, chapter 8, describes various models
Take home:
different models give very similar smoothed estimates
Big difference from raw (unsmoothed) estimates
Empirical Bayes does use data twice
General sense that resulting maps are “too smooth”
does not fully account for all sources of variability
Requires Bayes with more complicated models
Applications to count data usually require numeric approximation of
the posterior
MCMC: Markov Chain Monte Carlo
c Philip M. Dixon (Iowa State Univ.)
Spatial Data Analysis - Part 8
Spring 2016
52 / 52
Download