The Problem with Parameter Redundancy

advertisement
Parameter Redundancy and
Identifiability in Ecological Models
Diana Cole, University of Kent
Introduction
Occupancy Model example
Species present
and detected
Species present
but not detected
Species absent
Prob = 𝜓𝑝
Prob = 𝜓 1 − 𝑝
Prob = 1 − 𝜓
Prob not detected = 𝜓 1 − 𝑝 + 1 − 𝜓
= 1 − 𝜓𝑝
• Parameters: 𝜓 − site is occupied, 𝑝 – species is detected.
• Can only estimate 𝜓𝑝 rather than 𝜓 and 𝑝.
• Model is parameter redundant or parameters are non-identifiable.
Prob detected = 𝜓𝑝
2/27
Parameter Redundancy
• Suppose we have a model 𝑀(𝜃) with parameters 𝜃. A model
is globally (locally) identifiable if 𝑀 𝜃1 = 𝑀(𝜃2 ) implies that
𝜃1 = 𝜃2 (for a neighbourhood of 𝜃).
• A model is parameter redundant model if it can be written in
terms of a smaller set of parameters. A parameter redundant
model is non-identifiable.
• There are several different methods for detecting parameter
redundancy, including:
– numerical methods (e.g. Viallefont et al, 1998),
– symbolic methods (e.g. Cole et al, 2010),
– hybrid symbolic-numeric method (Choquet and Cole,
2012).
• Generally involves calculating the rank of a matrix, which gives
the number of parameters that can be estimated.
3/27
Problems with Parameter Redundancy
• There will be a flat ridge in the likelihood of a parameter
redundant model (Catchpole and Morgan, 1997), resulting in
more than one set of maximum likelihood estimates.
• Numerical methods to find the MLE will not pick up the flat
ridge, although it could be picked up by trying multiple
starting values and looking at profile log-likelihoods.
• The Fisher information matrix will be singular (Rothenberg,
1971) and therefore the standard errors will be undefined.
• However the exact Fisher information matrix is rarely known.
Standard errors are typically approximated using a Hessian
matrix obtained numerically. Can parameter redundancy be
detected from the standard errors?
4/27
Is example 1 parameter redundant?
Parameter
𝜃1
𝜃2
𝜃3
𝜃4
Estimate
0.39
0.64
0.09
0.18
Standard Error
imaginary
0.061
imaginary
imaginary
• Hessian (𝑯) computed numerically has rank 4 (exact Hessian
would have rank < 4 if parameter redundant)
• Single Value Decomposition
• Write 𝑯 = 𝑼𝑺𝑽, Matrix 𝑺 is diagonal matrix (Eigen values), the
number of non-zero values is the rank of the matrix.
• 𝑺𝑖𝑖 = 68.65 48.3996 12.7670 0.0019
• Standardised 1 0.71 0.19 0.000028
• Hybrid Symbolic-Numeric method: rank 3, only 𝜃2 is estimable.
• Symbolic Method: rank 3, estimable parameter combinations
𝜃2 , 1 − 𝜃1 𝜃3 , 𝜃1 𝜃4 .
5/27
Is example 2 parameter redundant?
Parameter
𝜃1
𝜃2
𝜃3
𝜃4
Estimate
0.41
0.83
0.10
0.19
Standard Error
0.70
0.07
0.11
0.33
• Hessian (H) computed numerically has rank 4 (exact would have
rank < 4 if parameter redundant).
• Standardised Single Value Decomposition
1 0.70 0.045 0.0010
• Hybrid-Symbolic Numeric method: rank 3, only 𝜃2 is estimable.
• Symbolic Method: rank 3, estimable parameter combinations
𝜃2 , 1 − 𝜃1 𝜃3 , 𝜃1 𝜃4 .
6/27
Is example 3 parameter redundant?
Parameter
𝜃1
𝜃2
𝜃3
𝜃4
𝜃5
𝜃6
𝜃7
𝜃8
Estimate
0.37
0.48
0.39
0.34
0.40
0.65
0.10
0.18
Standard Error
0.19
0.19
0.20
0.17
0.20
0.06
0.03
0.09
• Standardised Single Value Decomposition
[1.00 0.65 0.11 0.096 0.074 0.039 0.034 0.0011]
• Hybrid-Symbolic Numeric method: rank 8 so is not parameter
redundant.
• Symbolic method: rank 8 so is not parameter redundant, but a
further test reveals that model could be near redundant, as when
𝜃1 = 𝜃2 = 𝜃3 = 𝜃4 = 𝜃5 model is the same as example 1.
7/27
Simulation Study for Examples 1 and 2
Parameter True Value Average MLE St. Dev. MLE
𝜃1
0.4
0.49
0.32
𝜃2
0.7
0.70
0.06
𝜃3
𝜃4
0.1
0.2
0.28
0.33
0.32
0.32
57% have defined standard errors
SVD threshold %age SVD test correct
0.01
100%
0.001
75%
0.0001
15%
0.00001
7%
8/27
Mark-Recovery Models
Animals are marked and then when the animal dies its mark is
recovered. E.g. Lapwings
Recapture yr
63 64 64
1147 63
𝑭 = 1285 64
1106 65
Ringing yr
63
14
64 𝑵 =
65
4
16
1
4
11
9/27
Mark-Recovery Models
•
•
•
•
1147
14 4
1
𝑭 = 1285
𝑵=
16 4
1106
11
𝜙1 1st year survival probability, 𝜙𝑎 adult year survival
𝜆1 1st year recovery probability, 𝜆𝑎 adult year recovery
1 − 𝜙1 𝜆1 𝜙1 1 − 𝜙𝑎 𝜆𝑎 𝜙1 𝜙𝑎 1 − 𝜙𝑎 𝜆𝑎
𝛽2 1 − 𝜙1 𝜆1
𝜙1 1 − 𝜙𝑎 𝜆𝑎
𝑃=
𝛽1
1 − 𝜙1 𝜆1
•
• 𝑃=
𝛽1
𝛽2 1 − 𝜙𝑎
𝛽1
𝛽2 𝜙𝑎 1 − 𝜙𝑎
𝛽2 1 − 𝜙𝑎
𝛽1
10/27
Symbolic Method
(Cole et al, 2010 and Cole et al, 2012)
• Exhaustive summary – unique representation of the model
• Parameters
11/27
Symbolic Method
𝜕𝜿𝑖
• Form a derivative matrix 𝐃𝑖 =
𝜕𝜽
• Calculate rank. Number estimable parameters = rank(D).
Deficiency = p – rank(D). Deficiency > 0 model is parameter
redundant.
• Rank 𝐃1 = Rank 𝐃2 = Rank 𝐃3 = 3 but there are 4
parameters, so model is parameter redundant.
12/27
Estimable Parameter Combinations
• For a parameter redundant model with deficiency d, solve
𝜶′ 𝐃 = 0. There will be d solutions, 𝜶𝑗 . If 𝛼𝑖𝑗 = 0 for all j, then
𝜃𝑖 is estimable.
• Estimable parameter combinations can be found by solving a
set of PDEs:
• Estimable parameter combinations: 𝜙𝑎 , (1 − 𝜙1 )𝜆1 , 𝜙1 𝜆𝑎 .
13/27
Other uses of symbolic method
• Uses of symbolic method:
– Catchpole and Morgan (1997) exponential family models,
mostly used in ecological statistics,
– Rothenberg (1971) original general use, econometric
examples,
– Goodman (1974) latent class models,
– Sharpio (1986) non-linear regression models,
– Pohjanpalo (1982) first use for compartment models,
– Cole et al (2010) General exhaustive summary framework,
– Cole et al (2012) Mark-recovery models.
• Finding estimable parameters:
– Catchpole et al (1998) exponential family models,
– Chappell and Gunn (1998) and Evans and Chappell (2000)
compartment models,
– Cole et al (2010) General exhaustive summary framework.
Problem with Symbolic Method
• The key to the symbolic method for detecting parameter
redundancy is to find a derivative matrix and its rank.
• Models are getting more complex.
• The derivative matrix is therefore structurally more complex.
• Maple runs out of memory calculating the rank.
Wandering Albatross
Multi-state models for sea birds
Hunter and Caswell (2009)
Cole (2012)
Striped Sea Bass
Tag-return models for fish
Jiang et al (2007)
Cole and Morgan (2010)
• How do you proceed?
– Numerically – but only valid for specific value of
parameters. But can’t find combinations of parameters you
can estimate. Not possible to generalise results.
– Symbolically – involves extending the theory, again it
involves a derivative matrix and its rank, but the derivative
matrix is structurally simpler.
– Hybrid-Symbolic Numeric Method.
15/27
Multi-state capture-recapture
example
Wandering Albatross
• Hunter and Caswell (2009) examine parameter redundancy of multistate mark-recapture models, but cannot evaluate the symbolic rank
of the derivative matrix (developed numerical method).
• 4 state breeding success model:
1 success
1
3 post-success
3
N 1
log L  
2 = failure
4
4
4
  m
r 1 c  r 1 i 1 j 1
 ( r ,c )
2
N
( r ,c )
i, j
log ij( r ,c )
 r 1 r T

 c  c 1 (I   c 1 ) c  2 ...(I   r 1 ) r T
c  r 1
c  r 1
4 = post-failure
breeding given survival
successful breeding
survival
 2  2 2
 3  3 3
 4  4 4 
  11 1
  (1   )   (1   )   (1   )   (1   )
1
2 2
2
3 3
3
4 4
4 
 1 1
  1 (1  1 )

0
 3 (1   3 )
0


0

(
1


)
0

(
1


)
2
2
4
4


 p1
0

0

0
recapture
0
p2
0
0
0 0
0 0
0 0

0 0 16/27
Extended Symbolic Method
Cole et al (2010)
1.
Choose a reparameterisation, s, that simplifies the model
 s1    1 1 1 
structure.
 s     
 2   2 2 2
 s3   3  3 3 
s 



  

 s13   p1 
  

 s14   p 2 
2.
Rewrite the exhaustive summary, (), in terms of the
reparameterisation - (s).
p1 11 1




p


(
1


)
2 1 1
1




p1 2  2 2
 (θ)  

p


(
1


)
2 2 2
2


2
2
2
 p1 1 1  1 (1  p1 )  





s1s13




s
s
5 14




s2 s13
 (s)  

s
s
6 14


2
 s1 s13 (1  s13 )  





17/27
Extended Symbolic Method
3.
Calculate the derivative matrix Ds.
 s13
  j (s)   0
Ds  

0
 si 


4.
0 0
0 s13
0
0
0 (2s1  2s1 s13 ) s13 

0 ( s5  s5 s14 ) s13


0
s9 s13




  s j  

The no. of estimable parameters =rank(Ds)  if Rank      Dim (s) .
   i  


rank(Ds) = 12, no. est. pars = 12, deficiency = 14 – 12 = 2
5.
If Ds is full rank s = sre is a reduced-form exhaustive
summary. If Ds is not full rank solve set of PDE to find a
reduced-form exhaustive summary, sre.
s re  s1
s2
s5
s6
s11
s12
s13
s14
s 7 / s3
s8 / s4
s3 s 9
s4 s10 
T
Extended Symbolic Method
6.
Use sre as an exhaustive summary.

s re   11 1  2  2 2  11 1  2  2 2  3  3  4  4

p1
p2
3
3
4
4
 3  3 3 11
Survival Constraint
Breeding
Constraint
1= 2=
3= 4
1= 3,
2= 4
1= 2,
3= 4
1, 2,
3,4
1= 2= 3= 4
0 (8)
0 (9)
1 (9)
1 (11)
1= 3 ,2= 4
0 (9)
0 (10)
0 (10)
2 (12)
1= 2, 3= 4
0 (9)
0 (10)
1 (10)
1 (12)
1,2,3,4
0 (11)
0 (12)
0 (12)
2 (14)

 4  4 4 2  2 

T
Multi-state mark–recapture
models
State 1: Breeding site 1
State 2: Breeding site 2
State 3: Non-breeding,
Unobservable in state 3
 - survival
 - breeding
 - breeding site 1
1 –  - breeding site 2
20/27
Multi-state mark–recapture
models – General Model
• General Multistate-model has S states, with the last U states
unobservable with N years of data.
• Survival probabilities released in year r captured in year c:
• t is an SS matrix of transition probabilities at time t with
transition probabilities i,j(t) = ai,j(t).
• Pt is an SS diagonal matrix of probabilities of capture pt.
• pt = 0 for an unobservable state,
21/27
General simpler exhaustive summary
Cole (2012)
r = 10N – 17
d=N+3
22/27
Hybrid Symbolic-Numeric Method
•
•
•
•
•
Choquet and Cole (2012)
Calculate the derivative matrix,
𝜕𝜿
𝑫=
,
𝜕𝜽
symbolically.
Evaluate 𝑫 at a random point 𝜽𝑘 to give 𝑫𝑘 .
Calculate 𝑟𝑘 the rank of 𝑫𝑘 .
Repeat for 5 random points model, then 𝑟 = max 𝑟𝑘 .
If the model is parameter redundant for any 𝑫𝑘 with 𝑟𝑘 = 𝑟
solve 𝜶′𝑘 𝑫𝑘 = 0. The zeros in 𝜶𝑘 indicate positions of
parameters that can be estimated.
23/27
Example – multi-site capture-recapture model
• The capture-recapture models can be extended to studies
with multiple sites (Brownie et al, 1993).
• Example Canada Geese in 3 different geographical regions T=6 years.
• Geese tend to return to the same site – memory model.
(𝑡)
• Initial state probabilities:𝜋𝑗
𝑡
𝑡
𝑡
𝑡
for 𝑗 = 1,2 & 𝑡 = 1, … 6 (𝜋3 = 1 − 𝜋1 − 𝜋2 )
𝑡
• Transition probabilities: 𝜙∗𝑖𝑗 for 𝑖, 𝑗 = 1,2,3 & 𝑡 = 1, … , 5 and 𝜙𝑖𝑗𝑘 for 𝑖, 𝑗, 𝑘 =
1,2,3 & 𝑡 = 2, … , 5.
𝑡
• Capture probabilities: 𝑝𝑗 for 𝑖 = 1,2,3 , 𝑡 = 2, … , 6. (p = 180 Parameters)
• (General simpler exhaustive summary, Cole et al, 2014)
24/27
Example – Occupancy models
(Hubbard et al, in prep)
• Robust design used to remove PR in occupancy models
• Monitoring of amphibians in the Yellowstone and Grand Teton
National Parks, USA (Gould et al, 2012).
• Two species: Columbian Spotted Frogs and Boreal Chorus Frogs.
• 𝜓 occupancy probabilities, 𝑝 detection probabilities.
• (s) dependence on site, (t) dependence of time, ∙ dependent
on neither site nor time.
Model
𝜓 ∙ 𝑝 ∙
𝜓 𝑠 𝑝 ∙
𝜓 ∙ 𝑝 𝑠
𝜓 𝑡 𝑝 𝑡
𝜓 𝑡, 𝑠 𝑝 ∙
𝜓 𝑡, 𝑠 𝑝 𝑡
𝜓 𝑡, 𝑠 𝑝 𝑡, 𝑠
Rank
20
65
35
59
161
176
236
Deficiency No. pars
0
20
0
65
0
35
0
59
17
178
17
193
67
303
25/27
Conclusion
Numeric
Symbolic
Hybrid-Symbolic
Accurate / correct
answer
Not always
Yes
Yes
General Results (e.g.
any no. of years)
No
Yes
Work in progress
Easy to use (e.g. for
an ecologist)
Yes
No, but can develop
simpler ex. sum
Yes
Possible to add to
existing computer
packages
Yes
No (needs symbolic
algebra)
Yes (E-surge and Msurge)
No
Yes
Yes
No
Yes
In the future?
Best for intrinsic PR
and general results
Best for extrinsic PR
and a quick result
26/27
Individually
Identifiable
Parameters
Estimable parameter
combinations
References
http://www.kent.ac.uk/smsas/personal/djc24/parameterredundancy.htm
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
Brownie, C. Hines, J., Nichols, J. et al (1993) Biometrics, 49, p1173.
Catchpole, E. A. and Morgan, B. J. T. (1997) Biometrika, 84, 187-196
Catchpole, E. A., Morgan, B. J. T. and Freeman, S. N. (1998) Biometrika, 85, 462-468
Chappell, M. J. and Gunn, R. N. (1998) Mathematical Biosciences, 148, 21-41.
Choquet, R. and Cole, D.J. (2012) Mathematical Biosciences, 236, p117.
Cole, D. J. and Morgan, B. J. T. (2010) JABES, 15, 431-434.
Cole, D. J., Morgan, B. J. T and Titterington, D. M. (2010) Mathematical Biosciences, 228,
16–30.
Cole, D. J. (2012) Journal of Ornithology, 152, S305-S315.
Cole, D. J., Morgan, B. J. T., Catchpole, E. A. and Hubbard, B.A. (2012) Biometrical Journal,
54, 507-523.
Cole, D. J., Morgan, B.J.T., McCrea, R.S, Pradel, R., Gimenez, O. and Choquet, R. (2014)
Ecology and Evolution, 4, 2124-2133,
Evans, N. D. and Chappell, M. J. (2000) Mathematical Biosciences, 168, 137-159.
Gould, W. R., Patla, D. A., Daley, R., et al (2012). Wetlands, 32, p379.
Goodman, L. A. (1974) Biometrika, 61, 215-231.
Hunter, C.M. and Caswell, H. (2009). Ecological and Environmental Statistics, 3, 797-825
Jiang, H. Pollock, K. H., Brownie, C., et al (2007) JABES, 12, 177-194
Pohjanpalo, H. (1982) Technical Research Centre of Finland Research Report No. 56.
Rothenberg, T. J. (1971) Econometrica, 39, 577-591.
Shapiro, A. (1986) Journal of the American Statistical Association, 81, 142-149.
Viallefont, A., Lebreton, J.D., Reboulet, A.M. and Gory, G. (1998) Biometrical Journal, 40,
313-325.
27/27
Download