Statistical modelling and latent variables (2) Mixing latent variables and parameters in statistical inference Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) State spaces We typically have a parametric model for the latent variables, representing the true state of a system. Also, the distribution of the observations may depend on parameters as well as latent variables. Observations may often be seen as noisy versions of the actual state of a system. Examples of states could be: Use green arrows for one-way parametric dependency (for which you don’t provide a probability distribution in frequentist statistics). L D 1. The physical state of a rocket (position, orientation, velocity, fuel-state). 2. Real water temperature (as opposed to measured temperature). 3. Occupancy in an area. 4. Carrying capacity in an area. Observations, latent variables and parameters inference Sometimes we are interested in the parameters, sometimes in the state of the latent variables, sometimes both. Impossible to do inference on the latent variables without also dealing with the parameters and vice versa. L D Often, other parameters affect the latent variables than the observations. D L L D Observations, latent variables and parameters ML estimation A latent variable model will specify the distribution of the latent variables given the parameters and the distribution of the observations given both the parameters and the latent variables. This will give the distribution of data *and* latent variables: L f(D,L| )=f(L| )f(D|L, ) D But in an ML analysis, we want the likelihood, f(D| )! Theory (law of total probability again): Pr( D | ) Pr( D, L | ) Pr( D | L, ) Pr( L | ) or f (D | ) f ( D, L | )d f ( D | L, ) f ( L | )d Observations, latent variables and parameters ML estimation Likelihood: Pr( D | ) Pr( D, L, ) L Pr( D |L, ) Pr( L | ) L or f (D | ) f ( D, L | )dL f ( D | L, ) f ( L | )dL The integral can often not be obtained analytically. In occupancy, the sum is easy (only two possible states) Kalman filter: For latent variables as linear normal Markov chains with normal observations depending linearly on them, this can be done analytically. Alternative when analytical methods fail: numerical integration, particle filters, Bayesian statistics using MCMC. Occupancy as a state-space model – the model in words Assume a set areas, i (1,…,A). Each area has a set of ni transects. Each transect has an independent detection probability, p, given the occupancy. Occupancy is a latent variable for each area, i. Assume independency between the occupancy state in different areas. The probability of occupancy is labelled . So, the parameters are =(p, ). Pr( i=1| )= . Start with distribution of observations given the latent variable: Pr(xi,j=1 | i=1, )=p. Pr(xi,j=0 | i=1, )=1-p, Pr(xi,j=1 | i=0, )=0. Pr(xi,j=0 | i=0, )=1. So, for 5 transects with outcome 00101, we will get Pr(00101 | i=1, )=(1-p)(1-p)p(1-p)p=p2(1-p)3. Pr(00101 | i=0, )=1 1 0 1 0=0 Occupancy as a state-space model – graphic model One latent variable per area 1 2 3 ……… Pr( i=1| )= Pr( i=0| )=1- A The area occupancies are independent. (area occupancy) Data: x1,1 x1,2 Detections in single transects. =occupancy rate p=detection rate given occupancy. p Parameters ( ): x1,3 ……… x1,n1 Pr(xi,j=1 | Pr(xi,j=1 | i=1, )=p. Pr(xi,j=0 | i=0, )=0. Pr(xi,j=0 | i=1, )=1-p, i=0, )=1 The detections are independent *conditioned* on the occupancy. Important to keep such things in mind when modelling! PS: What we’ve done so far is enough to start analyzing using WinBUGS. Occupancy as a state-space model – probability distribution for a set of transects Probability for a set of transects to give ki>0 detections in a given order is Pr(ki | i 1, ) p k i (1 p) ni ki , Pr(ki | i 0, ) 0 while with no detections Pr(ki 0| i 1, ) (1 p) ni , Pr(ki 0| i 0, ) 1 We can represent this more compactly if we introduce the identification function. I(A)=1 if A is true. I(A)=0 if A is false. Then Pr(ki | i 0, ) I (ki 0) With no given order on the ki detection, we pick up the binomial coefficient: n ki Pr(ki | i 1, ) p (1 p) ni ki , Pr(ki | i 0, ) I (ki 0) k (Not relevant at all for inference. The for a given dataset, the constant is just “sitting” there.) Occupancy as a state-space model – area-specific marginal detection probability (likelihood) For a given area with an unknown occupancy state, the detection probability will then be (law of tot. prob.): Pr(ki | ) Pr(ki | i 1, ) Pr( i 1 | ) Pr(ki | i 0, ) Pr( i 0 | ) n ki p (1 p) ni k ki I ( ki Binomial (p=0.6) 0)(1 ) Occupancy (p=0.6, =0.6) Occupancy is a zero-inflated binomial model Occupancy as a state-space model – full likelihood Each area is independent, so the full likelihood is: A Pr( ki | ) i 1 ni ki p ki (1 p ) ni ki I (ki 0)(1 ) We can now do inference on the parameters, =(p, ), using ML estimation (or using Bayesian statistics). Occupancy as a state-space model – occupancy inference Inference on Pr( Pr( i i i, 1 | ki , ) 1 | ki given the parameters, Pr(ki | Pr(ki | 0, ) i i 1) Pr( 1) Pr( i i (1 p) ni (1 p) ni (1 (Bayes theorem): 1) 100 % for ki 1) 0 ) PS: We pretend that is known here. However, is estimated from the data and is not certain at all. We are using data twice! Once to estimate and once to do inference on the latent variables. Avoided in a Bayesian setting.