Data analysis Ben Graham October 19, 2015 MA930, University of Warwick

Data analysis Ben Graham MA930, University of Warwick October 19, 2015 Intro I I I I I MLE Condence intervals Bayesian credible intervals p-values Hypothesis testing 6.2 Sucient statistics I I I I I I I I Def 6.2.1 A statistic T (X ) is a sucient statistic for θ if the conditional distribution of the sample X given the value of T (X ) does not depend on θ. Thm 6.2.2 If f (x | θ)/f (T (x ) | θ) is constant, then T (X ) is sucient. Thm 6.2.6 T (X ) is sucient i f (x | θ) = g (T (x ) | θ)h(x ) for some g , h Example: Independent Xi ∼Bernoulli(θ), θ ∈ (0, 1). Example: Independent X , . . . , XN ∼Uniform(0, θ), θ > 0. Example: Independent X , . . . , Xn ∼ N (θ, σ ), θ ∈ R Example: Independent X , . . . , Xn ∼ N (θ , θ ), θ ∈ R, θ > 0 Minimal sucient statistics 1 2 1 1 1 2 2 1 2 6.3 Likelihood principle I I I I Random sample X = (X , . . . , Xn ) Xi ∼ f (xi | θ) pmf or pdf 1 X ∼ Q i f (xi | θ) = f (x | θ) Likelihood function L(θ | x ) = f (x | θ) I Likelihood principle: if L(θ | x )/L(θ | y ) is independent of θ, then the conclusions drawn from x and y should be identical. Chapter 7 Point estimation 7.2.2 Maximum Likelihood Estimator Q k I L(θ | x ) = i f (xi | θ), θ ∈ R I MLE: Statistic θ̂(x ) = arg maxθ L(θ | x ) I Dierentiable? Solve ∂ L(θ | x ) = 0, i = 1, . . . , k ∂θ I log-likelihood `(θ) = log L(θ) I Ex: N (µ, σ ) I Ex: Uniform(0, θ), θ > 0. I Theorem 7.2.10: Invariance: The MLE of τ (θ) is τ (θ̂). i 2 Lagrange multipliers I I For maximizing/minimizing f : Rn → R subject to g : Rn → R, g (x ) = 0. Example: multinomial distribution Newton Raphson method I For nding roots of an equation xn+ 1 I I = xn − f (xn ) f 0 (xn ) √ 2 is a root of the equation x − 2 = 0 f (x ) = x − 2 Ex: 2 2 I xn+ 1 = xn − x 2 −2 2x 7.2.3 Bayes Estimators I I I I Parameter θ is random with prior distribution π(θ) Joint distribution π(θ)f (x | θ) Posterior distribution θ | X condition joint distribution on observed data X . Example: θ ∼Beta(α, β), π(θ) ∝ θα− (1 − θ)β− for θ ∈ (0, 1) X , . . . , Xn ∼Bernoulli(p). Conjugate family of priors/posteriors. Example 2: Normal prior, normal data. 1 1 I I 1 Bayes Risk I I Loss function L(θ, δ) Choose δ = δ(X ) to minimize the posterior expected loss Eθ|X [L(θ, δ)] = I ˆ θ L(θ, δ)f (θ | x )d θ This will minimize the Bayes risk Eθ,X [L(θ, δ)]. I I Quadratic loss L(θ, δ) = (θ − δ) →Posterior mean Absolute value loss L(θ, δ) = |θ − δ|→Posterior median 2 7.2.4. EM algorithm Missing data problem: x = (xo , xm ) . I xo observed I xm missing I Joint distribution f (xo , xm | θ) I Want arg maxθ log L(θ | xo ). I EM algorithm: start at some initial guess θ ( ) , then 0 h θ(r +1) = arg max E[x θ m |θ(r ) , i [ log L (θ | x , x )] o m x] o    = arg max  E[x θ m |θ(r ) ,xo ] [log L(θ | xo )] + E[x | m |θ(r ) ,xo ] [log f (xm | θ, xo )]  {z maximised by  θ=θ(r ) } The hard EM algorithm The Hard EM algorithm is as its name suggest, much easier than the general EM-algorithm. I Split the data x = (xo , xm ). You have only observed xo . Start at some θ( ) . I Iterate θ (t ) → θ (t + ) by: 0 1 I I sampling xm = xm (t ) conditional on θ(t ) , and then setting θ(t +1) to be the MLE for x = (x0 , xm ). 7.2.17 Multiple Poisson Rates I I I I Parameters β; τ , . . . , τn Observe Xi ∼Poisson(τi ) and Yi ∼Poisson(βτi ) i .e . τi =population density at place i , β =disease eect size Missing data: suppose X is missing. 1 1 7.3.1 Mean squared error I How to measure the quality of estimator W of θ? MSE (W ) = Eθ [(W − θ) Biasθ W I I I I I 2 ] = Varθ [W ] + (Biasθ W )2 = Eθ W − θ An estimator is unbiased if Biasθ W = 0. MLE for σ for N (µ, σ ) is biased. Small MSE more important than unbiasedness P A sequence of estimator is consistant if Wn → µ as n → ∞ P Wn → µ if MSE(Wn ) → 0 2 I I 2 Markov's inequality: X ≥ 0 →P(|X | ≥ a) ≤ E [X ]/a Chebyshev's inequality. r.v. X with mean µ variance σ 2 , P (|X − µ| ≥ k σ) ≤ k −2 Fisher's information I I Sample distribution f (x | θ) Fisher's information: I (θ) = Eθ I " 2 2 # ∂ ∂ regularity = −E log f (X | θ) log f (X | θ) ∂θ ∂θ2 Theorem 7.3.9 Cramer-Rao Inequality Sample X , . . . , Xn with pdf f (X | θ). Estimator W (X ) with 1 I nite variance I Then I ˆ ∂ d E W (X ) = [W (x )f (x | θ)] dx dθ θ ∂θ X d E W (X ) d Var (W (X )) ≥ θ θ Special case: X , . . . , Xn iidrv 1 I (θ) 2 Proof I Cauchy-Schwarz inequality: Cov(X , Y ) ≤Var(X )Var(Y ). 2 I I I Assume wlog E[X ] = E[Y ] = 0 For all t ∈ R,E[(tX + Y )2 ] = E[t 2 X + 2tXY + Y 2 ] ≥ 0 Cramer-Rao Cov ∂ W , f (X | θ) ∂θ 2 ≤ Var (W )Var ∂ f (X | θ) ∂θ Ch 8 Hypothesis testing I I I I I I I A hypothesis is a statement about a population parameter Null hypothesis H Alternative hypothesis H Form a statistical test Can you reject H ? Rejecting H does not mean accepting H . You do not accept H . 0 1 0 0 1 0

Data analysis Ben Graham October 19, 2015 MA930, University of Warwick

Related documents

Products

Support

Data analysis Ben Graham October 19, 2015 MA930, University of Warwick

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib