Multi-type patterns Marked point processes: Z (si ) observed at event locations e.g. height or diameter of a tree plant died / alive Z only exists at event locations Multivariate point processes: subset of marked processes marks are a small number of discrete values live/dead, species of plant, fatality / not at an accident location often 2 types concepts often extend to > 2 types c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 June 2013 1 / 27 Questions that might be asked 1) Single type questions asked separately for each type we know how to do this, will not discuss 2) do two types tend to occur together (i.e. in same areas) Two species of tupelo in the swamp data: NX is Nyssa aquatica, NS is Nyssa sylvatica. Do they tend to occur in same areas? If they do, then two processes: Nyssa locations, then flip a coin to determine species λNS (s) = PNS λ(s) λNX (s) = (1 − PNS )λ(s) Two-stage mechanism often makes a lot of sense, e.g. tree mortality A spatial process determines where trees are located A second, possibly spatial process, determines whether live or dead If they don’t occur together, have spatial segregation Maybe even clusters of NX and clusters of NS c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 June 2013 2 / 27 Spatial epidemiology Understand spatial aspects of disease In particular, are disease cases clustered? Answer provides clues to disease mechanism Data are locations of events (disease cases) Can not just look at clustering of cases Often population is clustered So if P[disease] is constant, not spatially varying still expect to find clustering of cases Get random sample of controls (not diseased) from the general population Two types of events: case or control, ask questions like: Are cases more clustered than the general population? Is relative risk of disease = λcase /λcontrol constant c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 June 2013 3 / 27 June 2013 4 / 27 Pictures: spatial segregation ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● species A ● species B ● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ● ● ●●● ● ●●● ●●● ●● ● ● ● ● ● ● ●●●●●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● c Philip M. Dixon (Iowa State Univ.) ● ● ● ●● ● ● ●●● ●● Spatial Data Analysis - Part 12 ● ● ●●● ● ●●●●● ●●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●●●●●● ● ● ●● ● ●● ● ● ● ●●● ●● ●● ● ● ●● ●● ●● ●● ● ● ● ● Pictures: spatial association? ●● ● ● ●● ●● ● ●● ● ● ● ● ●● ●●● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ●●● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ●●●●● case control ● ● ● ● ● ●●●●●● ● ●● ●● ●●● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●●●●●●● ● ●● ●● ●●● ● ● ● ●● ●● ● ●● ● ●●● ● ● ● ●● ●● ●● ●● ●● ●● ● ●● ●● ● ● ● ●● ● ● ●● ● ●●●●● ● ●● ● ● ●● ● ● ● ●● ●● ● ● c Philip M. Dixon (Iowa State Univ.) ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● Spatial Data Analysis - Part 12 June 2013 5 / 27 June 2013 6 / 27 Pictures: equally clustered? ●● ● ● ●●●● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ●● ●● ● ●● ●● ●● ●● ● ● ●● ● ●● ● ●● ● ● case control ● ●●●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● c Philip M. Dixon (Iowa State Univ.) ● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ●●● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●●● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● Spatial Data Analysis - Part 12 Multi-type K function extend definition of K(x) to multiple types “cross-K”: KAB (x) = λ1B E #B events w/i x of an A note: KAB (x) = KBA (x), although estimates not identical because of edge effects “self-K”: KAA (x) = λ1A E #A events w/i x of an A look at KAA (x) − KAB (x). If > 0, spatial segregation. A’s more common around A’s If < 0, spatial association. B’s more common around A’s minor detail if more than 2 types: compare KAA to what? usual answer is KA. , i.e. A to any type KAA − KA. gives same answers as KAA − KA,−A Test significance of departure from 0 by random labeling c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 June 2013 7 / 27 June 2013 8 / 27 Random labeling Fix the locations of events Randomly reassign labels (type of event) Compute statistic of interest pointwise difference in K functions some summary statistic Corresponds to the “two processes” view one process generating locations a second process generating labels (types) is that second process a random labelling of locations? Makes a lot of sense for live/dead case/control labels less clear for multiple species Alternatives exist, talk about later c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 Why does this work? Why does comparing KAA (x) to KAB (x) work? Two processes view locations all have a Kall (x) function. imagine a random sample of locations: A, e.g. τ = 0.4 what is the relationship between Kall (x) and KA (x)? intensity: λA = τ λall = 0.4λall expected count of A within r of an A: E nA = τ expected count of all within r of an A so KA (x) = EλAnA = τ τEλallnall = Eλallnall = Kall (x) Randomly “thinning” a process does not change the K function So random sample of controls gives K for the population If cases are also a random sample of the population, Kcase (x) = Kcontrol (x) c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 June 2013 9 / 27 Nyssa species in the Savannah River Swamp NS ● NX ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ●●●●● ●● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ●● ● ●● ●●●●● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ●● ●●●● ●● ●●●●●●● ● ● ●● ● ●● ● ●● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ●●● ● ●● ●●● ● ● ● ●●● ● ●●● ● ● ●● ● ●● ●● ●●● ● ● ● ● ● ●●● ●●● ●●● ● ● ● ●● ●● ● ● ● ● ●● ● ●●● ●● ● ●●● ● ●●●●●● ● ●● ●● ● ● ●● ● ●●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●●● ●●●●● ●● ●●●●● ●● ● ●● ● ● ●●● ●● ●● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ●● ●●● ●●●● ● ●● ●●●●●●●●●● ●●●●●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ●● ● ● ● ●●● ●● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 June 2013 10 / 27 1.0 ^ K ss K i so (r ) π − r 0.5 ^ i so K sa K N S, N S(r ) π − r ^ i so K as K N S, N S(r ) π − r ^ K aa K i so (r ) π − r −0.5 0.0 K NS, NS (r ) π − r 1.5 Nyssa L(x) functions 0 2 c Philip M. Dixon (Iowa State Univ.) 4 6 r 8 10 Spatial Data Analysis - Part 12 12 June 2013 11 / 27 150 More NS than NA around NS 0 . 50 100 Kss − Ksa 0 c Philip M. Dixon (Iowa State Univ.) 2 4 6 r Spatial Data Analysis - Part 12 8 10 12 June 2013 12 / 27 More NA than NS around NA −50 0 . 50 100 Kaa − Kas 0 2 c Philip M. Dixon (Iowa State Univ.) 4 6 r 8 10 Spatial Data Analysis - Part 12 12 June 2013 13 / 27 More questions 4) Is one type of event more clustered than the other i.e. for short distances, is KAA (x) < KBB (x)? approach in same way as 3) compare observed difference K̂AA (x) − K̂BB (x) to a simulation envelope c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 June 2013 14 / 27 1.0 ^ K ss K i so (r ) π − r 0.5 ^ i so K sa K N S, N S(r ) π − r ^ i so K as K N S, N S(r ) π − r ^ K aa K i so (r ) π − r −0.5 0.0 K NS, NS (r ) π − r 1.5 Nyssa L(x) functions 0 2 c Philip M. Dixon (Iowa State Univ.) 4 6 r 8 10 Spatial Data Analysis - Part 12 12 June 2013 15 / 27 80 Are NS more clustered than NA? −60 −20 0 . 20 40 60 Kss − Kaa 0 c Philip M. Dixon (Iowa State Univ.) 2 4 6 r Spatial Data Analysis - Part 12 8 10 12 June 2013 16 / 27 More questions 5) Are two processes independent? Not exactly same question as 3), random labelling. Not viewed as two stages: locations, then labels Instead, is P[event B in area dA] independent of presence of A’s? P[event B in area dA] may depend on presence of other B’s Turns out to be difficult to evaluate want to maintain all characteristics of the process for A and all the characteristics of the process for B, but break any dependence. c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 June 2013 17 / 27 Testing independence Much harder than testing random labeling Have two processes: A and B. Each has whatever structure it has Question is whether A pattern is placed down independently of the B pattern Need to maintain the A pattern, maintain the B pattern, but break any “connection” between them Historical approach: toroidal rotations randomly displace A horizontally and vertically Maintain all features of process A and process B, but allows A to shift vis-a-vis locations of B = independence points that shift “out” of the study area reappear on other side (next slide) Again, compare observed difference to simulated differences c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 June 2013 18 / 27 June 2013 19 / 27 Testing independence B ● A ● ● ● ● c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 Testing independence Issues / concerns: only feasible for rectangular study areas Now understood to be very sensitive to assumption of first order stationarity (constant intensity across study area) If intensity varies, gives very misleading results New approach: simulate realizations of A that match specified properties of observed A Ditto B, both independent e.g. K function, intensity over the study area, perhaps others too Tscheschel and Stoyan, 2006, Comp. Stats. and Data Analysis 51:859-871. c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 June 2013 20 / 27 Intensity and relative risk Intensity: different insights Can estimate intensity for each type separately (next slide) choose bandwidth separately for each type Or intensity for events, ignoring type Sum of the type-specific intensity c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 June 2013 21 / 27 Intensity and relative risk 0.02 0.06 N. sylvatica 0.02 0.06 N. aquatica c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 June 2013 22 / 27 Intensity and relative risk In many problems, it makes sense to think about the risk of one type of event λA λA r= = λ λA + λB “risk” is the probability that a randomly chosen event is a type A event Allow everything to vary over the study area each intensity is λA (s) = r (s) λ(s) λB (s) = (1 − r (s)) λ(s) Can adjust one or both if a random sample, not an enumeration Note: λA (s) r (s) = λB (s) 1 − r (s) the odds ratio, super important in epidemiology c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 June 2013 23 / 27 June 2013 24 / 27 Intensity and relative risk Estimating r (s) model λA (s) and λB (s) as functions of covariates model r (s) directly as function of covariates or non-parametric (smoothing) estimate of r (s) NP smoothing: based on λ̂A (s) and λ̂B (s) use same bandwidth for both components Usual estimator: based on point-process likelihood c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 Intensity and relative risk cv (σ) 0.60 0.65 0.70 0.75 0.80 BW for relative risk 2 4 6 8 10 12 14 σ c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 June 2013 25 / 27 Intensity and relative risk ● NX ● 0.2 0.6 NS ● ● ● ● ●● ●●● ● ● ● ●● ● ● ● ● ● ● ●●●●● ●● ● ●● ● ●● ● ●● ●●●●● ● ●● ● ●●● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ●●●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ● ● ●●● ● ● ● ●● ● ● ●● ●● ●●● ● ●● ● ●● ● ●● ●● ●● ● ●● ●● ●● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●●● ● ● ●●● ● ●● ● ● ● ●● ● ●● ● ●● ●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●●● ●● ● ●●● ● ●●●●●● ● ●● ●● ● ● ●● ●●●●● ● ●●● ● ●● ● ●● ● ● ● ● ●●●● ●●●●●● ●● ●●●●● ●●● ● ● ●● ●● ●●●● ●● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ●● ● ●● ●● ●●● ● ● ● ● ● ●● ●● ●●● ●●●● ●● ●●●● ●●●●● ● ●●●● ● ● ● ● ●● ●●● ●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ● ● ●● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12 June 2013 26 / 27 June 2013 27 / 27 Summary of multi-type point pattern analysis Lots of things you could do What question(s) is/are most important? Focus the analysis on answering those questions What if the marks are continuous (not just a few types) mark correlation function describe correlation between marks on two events and how those change with distance between events few examples, not well understood c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 12