Data analysis Ben Graham October 19, 2015 MA930, University of Warwick

advertisement
Data analysis
Ben Graham
MA930, University of Warwick
October 19, 2015
Intro
I
I
I
I
I
MLE
Condence intervals
Bayesian credible intervals
p-values
Hypothesis testing
6.2 Sucient statistics
I
I
I
I
I
I
I
I
Def 6.2.1 A statistic T (X ) is a sucient statistic for θ if the
conditional distribution of the sample X given the value of
T (X ) does not depend on θ.
Thm 6.2.2 If f (x | θ)/f (T (x ) | θ) is constant, then T (X ) is
sucient.
Thm 6.2.6 T (X ) is sucient i f (x | θ) = g (T (x ) | θ)h(x ) for
some g , h
Example: Independent Xi ∼Bernoulli(θ), θ ∈ (0, 1).
Example: Independent X , . . . , XN ∼Uniform(0, θ), θ > 0.
Example: Independent X , . . . , Xn ∼ N (θ, σ ), θ ∈ R
Example: Independent X , . . . , Xn ∼ N (θ , θ ), θ ∈ R, θ > 0
Minimal sucient statistics
1
2
1
1
1
2
2
1
2
6.3 Likelihood principle
I
I
I
I
Random sample X = (X , . . . , Xn )
Xi ∼ f (xi | θ) pmf or pdf
1
X
∼
Q
i f (xi | θ) = f (x | θ)
Likelihood function
L(θ | x ) = f (x | θ)
I
Likelihood principle: if
L(θ | x )/L(θ | y )
is independent of θ, then the conclusions drawn from x and y
should be identical.
Chapter 7 Point estimation
7.2.2 Maximum Likelihood Estimator
Q
k
I L(θ | x ) =
i f (xi | θ), θ ∈ R
I MLE: Statistic θ̂(x ) = arg maxθ L(θ | x )
I Dierentiable? Solve ∂ L(θ | x ) = 0, i = 1, . . . , k
∂θ
I log-likelihood `(θ) = log L(θ)
I Ex: N (µ, σ )
I Ex: Uniform(0, θ), θ > 0.
I Theorem 7.2.10: Invariance: The MLE of τ (θ) is τ (θ̂).
i
2
Lagrange multipliers
I
I
For maximizing/minimizing f : Rn → R subject to
g : Rn → R, g (x ) = 0.
Example: multinomial distribution
Newton Raphson method
I
For nding roots of an equation
xn+
1
I
I
= xn −
f (xn )
f 0 (xn )
√
2 is a root of the equation x − 2 = 0
f (x ) = x − 2
Ex:
2
2
I
xn+
1
= xn −
x
2
−2
2x
7.2.3 Bayes Estimators
I
I
I
I
Parameter θ is random with prior distribution π(θ)
Joint distribution π(θ)f (x | θ)
Posterior distribution θ | X
condition joint distribution on observed data X .
Example:
θ ∼Beta(α, β), π(θ) ∝ θα− (1 − θ)β− for θ ∈ (0, 1)
X , . . . , Xn ∼Bernoulli(p).
Conjugate family of priors/posteriors.
Example 2: Normal prior, normal data.
1
1
I
I
1
Bayes Risk
I
I
Loss function L(θ, δ)
Choose δ = δ(X ) to minimize the posterior expected loss
Eθ|X [L(θ, δ)] =
I
ˆ
θ
L(θ, δ)f (θ | x )d θ
This will minimize the Bayes risk
Eθ,X [L(θ, δ)].
I
I
Quadratic loss L(θ, δ) = (θ − δ) →Posterior mean
Absolute value loss L(θ, δ) = |θ − δ|→Posterior median
2
7.2.4. EM algorithm
Missing data problem: x = (xo , xm ) .
I xo observed
I xm missing
I Joint distribution f (xo , xm | θ)
I Want arg maxθ log L(θ | xo ).
I EM algorithm: start at some initial guess θ ( ) , then
0
h
θ(r +1) = arg max E[x
θ
m
|θ(r ) ,
i
[
log
L
(θ
|
x
,
x
)]
o m
x]
o



= arg max 
E[x
θ
m
|θ(r ) ,xo ] [log
L(θ | xo )] + E[x
|
m
|θ(r ) ,xo ] [log
f (xm | θ, xo )]

{z
maximised by

θ=θ(r )
}
The hard EM algorithm
The Hard EM algorithm is as its name suggest, much easier than
the general EM-algorithm.
I Split the data x = (xo , xm ). You have only observed xo . Start
at some θ( ) .
I Iterate θ (t ) → θ (t + ) by:
0
1
I
I
sampling xm = xm (t ) conditional on θ(t ) , and then
setting θ(t +1) to be the MLE for x = (x0 , xm ).
7.2.17 Multiple Poisson Rates
I
I
I
I
Parameters β; τ , . . . , τn
Observe Xi ∼Poisson(τi ) and Yi ∼Poisson(βτi )
i .e . τi =population density at place i , β =disease eect size
Missing data: suppose X is missing.
1
1
7.3.1 Mean squared error
I
How to measure the quality of estimator W of θ?
MSE (W ) = Eθ [(W − θ)
Biasθ W
I
I
I
I
I
2
] = Varθ [W ] + (Biasθ W )2
= Eθ W − θ
An estimator is unbiased if Biasθ W = 0.
MLE for σ for N (µ, σ ) is biased.
Small MSE more important than unbiasedness
P
A sequence of estimator is consistant if Wn →
µ as n → ∞
P
Wn → µ if MSE(Wn ) → 0
2
I
I
2
Markov's inequality: X ≥ 0 →P(|X | ≥ a) ≤ E [X ]/a
Chebyshev's inequality. r.v. X with mean µ variance σ 2 ,
P (|X − µ| ≥ k σ) ≤ k −2
Fisher's information
I
I
Sample distribution f (x | θ)
Fisher's information:
I (θ) = Eθ
I
"
2
2 #
∂
∂
regularity
=
−E
log f (X | θ)
log f (X | θ)
∂θ
∂θ2
Theorem 7.3.9 Cramer-Rao Inequality
Sample X , . . . , Xn with pdf f (X | θ). Estimator W (X ) with
1
I
nite variance
I
Then
I
ˆ
∂
d
E W (X ) =
[W (x )f (x | θ)] dx
dθ θ
∂θ
X
d E W (X )
d
Var (W (X )) ≥ θ θ
Special case: X , . . . , Xn iidrv
1
I (θ)
2
Proof
I
Cauchy-Schwarz inequality: Cov(X , Y ) ≤Var(X )Var(Y ).
2
I
I
I
Assume wlog E[X ] = E[Y ] = 0
For all t ∈ R,E[(tX + Y )2 ] = E[t 2 X + 2tXY + Y 2 ] ≥ 0
Cramer-Rao
Cov
∂
W , f (X | θ)
∂θ
2
≤ Var (W )Var
∂
f (X | θ)
∂θ
Ch 8 Hypothesis testing
I
I
I
I
I
I
I
A hypothesis is a statement about a population parameter
Null hypothesis H
Alternative hypothesis H
Form a statistical test
Can you reject H ?
Rejecting H does not mean accepting H .
You do not accept H .
0
1
0
0
1
0
Download