Model 1: Repeatable independent trials

advertisement
Model 1: Repeatable independent trials
with a free (unknown) success rate, p.
With a given success rate, p, and the data
having k successes out of n trials, the
probability of getting them in a specific
order is:
k
Pr(data | p, model 1) = p (1-p)
n-k
General outcomes:
Xi = outcome on day i (either 0 or 1 in our case)
General outcomes:
Xi = outcome on day i (either 0 or 1 in our case)
Dataset: D={X1,X2,X3,...,Xn}
Pr(D)=Pr(X1,X2,X3,...,Xn)=
Pr(X2,X3,...,Xn|X1) * Pr(X1)=
Pr(X3,...,Xn|X2,X1)* Pr(X2|X1) * Pr(X1)=
...
Pr(X1) * Pr(X2|X1)Pr(X3|X1,X2) * ... *
Pr(Xn | Xn-1, ..., X2, X1)
Low prediction strength for the data
compared to other models
=>
Model probability goes down.
Markov chains:
Pr(Xn | Xn-1, Xn-2, .... , X2, X1) =
Pr(Xn | Xn-1)
Markov chains:
Pr(Xn | Xn-1, Xn-2, .... , X2, X1) =
Pr(Xn | Xn-1)
Pr(X10, X12 | X11) =
Pr(X12 | X11, X10) Pr(X10 | X11)=
Pr(X12 | X11) Pr(X10 | X11)
Using general
conditional
probability
Using Markov
property
X1
X2
X3
...
Xn-1
Xn
X1
X2
X3
Pr(X1,X2,X3,...,Xn)=
...
Xn-1
Xn
In general
Pr(X1) * Pr(X2|X1)Pr(X3|X1,X2) * ... * Pr(Xn | Xn-1, ..., X2, X1) =
Pr(X1) * Pr(X2 | X1)Pr(X3 | X2) * ... * Pr(Xn | Xn-1)
Because of
Markov property
Stationarity:
Pr(outcome X on day i | outcome Y on day i-1) =
Pr(outcome X on day j | outcome Y on day j-1)
for any two days i and j.
Stationarity:
Pr(outcome X on day i | outcome Y on day i-1) =
Pr(outcome X on day j | outcome Y on day j-1)
for any two days i and j.
Can now define:
p1=Pr(rain | rain the previous day)
p2=Pr(rain | no rain the previous day)
Problem: What happens on the first day?
Start with rule 6 to get the
unconditional probability of rain a given day:
Pr(rain on day i) =
Pr(rain on day i | rain on day i-1) Pr(rain on day i-1) +
Pr(rain on day i | no rain on day i-1) Pr(no rain on day i-1)
Problem: What happens on the first day?
Start with rule 6 to get the
unconditional probability of rain a given day:
Pr(rain on day i) =
Pr(rain on day i | rain on day i-1) Pr(rain on day i-1) +
Pr(rain on day i | no rain on day i-1) Pr(no rain on day i-1)
Assume stationary unconditional probability,
p'=Pr(rain any given day)
p' = p1 * p' + p2 * (1-p')
=>
p' = p2/ (1-p1+p2)
Data likelihood under Markov model (model 2)
given the parameters p1 and p2:
Pr(X2, X3, ... , Xn | X1) =
Pr(X2 | X1) Pr(X3 | X2) ... Pr(Xn | Xn-1)=
k1
n1-k1
k2
n2-k2
p1 (1-p1)
p2 (1-p2)
when we have k1 rainy days out of n1 days with
rain the previous day
and k2 rainy days of of n2 days with no rain the
previous day.
Data likelihood under Markov model (model 2)
given the parameters p1 and p2:
Pr(X2, X3, ... , Xn | X1) =
Pr(X2 | X1) Pr(X3 | X2) ... Pr(Xn | Xn-1)=
k1
n1-k1
k2
n2-k2
p1 (1-p1)
p2 (1-p2)
Pr(X1, X2, X3, ... , Xn ) =
k1
n1-k1
k2
n2-k2
p' p1 (1-p1)
p2 (1-p2)
if it rained on day 1
(1-p') p1k1 (1-p1)n1-k1 p2k2 (1-p2)n2-k2 if no rain on day 1
Data likelihood under Markov model (model 2)
given the parameters p1 and p2:
Pr(X2, X3, ... , Xn | X1) =
Pr(X2 | X1) Pr(X3 | X2) ... Pr(Xn | Xn-1)=
k1
n1-k1
k2
n2-k2
p1 (1-p1)
p2 (1-p2)
Pr(X1, X2, X3, ... , Xn ) =
k1
n1-k1
k2
n2-k2
p' p1 (1-p1)
p2 (1-p2)
if it rained on day 1
(1-p') p1k1 (1-p1)n1-k1 p2k2 (1-p2)n2-k2 if no rain on day 1
X1
= p' (1-p')
1-X1
k1
p1 (1-p1)
n1-k1
k2
p2 (1-p2)
n2-k2
Data:
k=596 rainy days of n=3652 days in total
Data:
k=596 rainy days of n=3652 days in total
Data for the Markov chain (model 2):
X1=0
k1=202 of n1=596 days of rain when it
rained the previous day
k2=393 of n2=3055 days of rain with no
rain the previous day
Data likelihoods:
Pr(D | Model 1) =
∑p Pr(D | p,M1) Pr(p | M1)
Pr(D | Model 2) =
∑p1,p2 Pr(D | p1,p2,M2) Pr(p1|M2) Pr(p2|M2)
Data likelihoods:
Pr(D | Model 1) =
∑p Pr(D | p,M1) Pr(p | M1)
Pr(D | Model 2) =
∑p1,p2 Pr(D | p1,p2,M2) Pr(p1|M2) Pr(p2|M2)
Bayes factor:
Pr(D|M1)
-29
B = ------------ = 2.32*10
Pr(D|M2)
Rain the previous day
Pr(p1 | D)
P1
No rain the previous day
Pr(p2 | D)
P2
Pr(p2-p1 | D)
0.10
0.15
0.20
0.25
0.30
p1-p2
Covariance between outcomes two
consecutive days (under the Markov chain):
Cov(Xi, Xi-1) = E((Xi-E(Xi))(Xi-1-E(Xi-1)))
Covariance between outcomes two
consecutive days (under the Markov chain):
Cov(Xi, Xi-1) = E((Xi-E(Xi))(Xi-1-E(Xi-1))) =
E((Xi-p')(Xi-1-p')) = E(Xi* Xi-1)-E(Xi)p'-E(Xi-1)p'+p'2=
2
E(Xi* Xi-1) - p' =
2
Pr(Xi* Xi-1=1) – p' =
2
Pr(Xi=1 | Xi-1=1) Pr(Xi-1=1) – p' =
2
p1*p' – p' =
p' (p1-p')
Stop here if you want to go
through the algebra
Definition of correlation:
Cov(X,Y)
Corr(X,Y) = ---------------sd(X) sd(Y)
where sd() is the standard deviation, the
square root of the variance.
In our case:
Var(Xi)=p'(1-p') (see clip 18)
Cov(Xi, Xi-1) = p' (p1-p')
=>
Corr(Xi, Xi-1) = (p1-p')/(1-p')
Note: p1=p2=p'
=> zero correlation
Pr(correlation | D)
Correlation
Pr(correlation | D)
Expectancy
95%
credibility
interval
Correlation
Download