Data analysis Ben Graham MA930, University of Warwick October 26, 2015 p-value examples I I p-value: P such that P(P ≤ t ) ≤ t for t ∈ [0, 1]. Random sample X , . . . , Xn iidrv Exponential(θ), mean statistic 1 µ := 1/θ I Under H 0 unknown : µ = 1, sample mean has the Gamma distribution with mean 1, shape n Q = FH0 (X̄ ) ∈ [0, 1] uniform under H0 H1 : µ < 1 I Reject smallvalues of Q I P = Q I Let I I H 1 : µ 6= 1 I Reject large values of I I H 1 P =1−Q Q : µ 6= 1 I Reject small and large values of I P = 1 − |2Q − 1| Q Hypothesis testing I Example: Lady tasting tea I Experiments by Ronald Fisher and Muriel Bristol I 8 cups of tea: 4 tea into cup rst, 4 milk rst #Successes Count 0 1 1 16= 2 36= 3 16= 4 I P 4 4 34 2 2 4 4 1 4 3 1 1 count =70= 8 4 I Null hypothesis: no ability to taste dierence construct p-value I Under H 0, P(4 successes | H0 ) = 1/70 I Muriel got 4/4: Reject H 0 Power calculations I Before doing an experiment, check it can do what you want I Example: Testing a coin for bias: I 0 :θ= 2 vs 1 1 2 I Ask to keep I I I X ∼Binomial(n, θ) H : θ 6= P(reject H | H ) ≤ 5% Ask for P(reject H | |θ − | ≥ 0.1) ≥ 92% √ Assume X̄ ∼ N (θ, θ/4 n ) in the range θ ∈ [0.4, 0.6] Need n such that [FN−( , ) (1 − 5%/2) + FN−( , ) (1 − 8%)] × s .d . ≥ 0.1 H 1 0 0 1 0 1 0 1 2 1 0 1 Condence Intervals I Parameter θ∈R I 95% CI: Statistics L, R such that ∀θ, Pθ [L ≤ θ ≤ R ] ≥ 95% I Not uniqueone sided or two sided, etc I Complement of the critical regions for testing H 0 : θ = θ̂, that the MLE is the right parameter. I N.B. Here θ is xed and the statistics L, R are random. i.e. Normal condence intervals I X 1 , . . . , Xn ∼ N (θ, 1) I MLE X̄ iidrv = θ̂ ∼ N (θ, 1/n) I 1.96 1.96 1.96 1.96 P[θ− √ ≤ θ̂ ≤ θ+ √ ) = P[θ ∈ (θ̂− √ , θ̂+ √ )] = 95% n I n n 1.64 1.64 P[θ̂ ≥ θ − √ ) = P[θ ∈ (−∞, θ̂ + √ )] = 95% n I n n 1.64 1.64 P[θ̂ ≤ θ + √ ) = P[θ ∈ (θ̂ − √ , ∞)] = 95% n n t condence intervals I X 1 , . . . , Xn ∼ N (µ, σ 2 ) I Sample mean I X̄ iidrv = µ̂ ∼ N (θ, 1/n), sample variance X̄ −√µ ∼t S / n n− q such that Ft −1 (q ) − Ft −1 (−q ) = P(−q ≤ A ≤ q | A ∼ tn− ) = 95% Hypothesis test: Under H : µ = µ , X̄ −µ 0 √ Ft −1 S / n ∼ Uniform(0, 1) S S Condence interval: ∀µ, P(µ ∈ (X̄ − q √ , X̄ + q √ )) = 95% n n n 1 n 0 n I 2 1 I Choose I S 0 Hypothesis test for contingency table I I I I I I m × n contingency table H :properties P P are independent pi ,j = ai × bj ; i ai = 1 , j bj = 1 so m + n − 2 degrees of freedom H :properties are not independent. m × n − 1 degrees of freedom P Number of observations N = i ,j Oi ,j Expected number of observation under H is P P Ei ,j := N × ( Oi ,k /N ) × ( Ok ,j /N ) Asymptotically under H by Wilk's theorem: 0 1 0 0 Q (Ei ,j /N )O , ≈ χ2(m−1)(n−1) −2 log Q (Oi ,j /N )O , P 2 Pearson's χ test statistic (Oi ,j − Ei ,j )2 /Ei approximates above if mini Ei ≥ 5. i j i j I I Large values: reject independence. Small values: faked data? Variance stabilizing transforms I X ∼ f (· | θ) EX = θ and X ) =: V (θ) p Y := X / V (θ) has variance 1 I Varθ ( I I Taylor's theorem: g (X )) ≈Var(θ + g 0 (θ)Y V (θ)) ≈ g 0 (θ) V (θ) 0 Want to nd g such that Var(g (X )) ≈ g (θ) V (θ) ≈constant g 0 (θ) ≈ constant × V (θ)− / p Var( I 2 2 1 2 g (θ) = ˆ θ V (u )− / du 1 2 √ V (θ) = θ → g (X ) = X Exponential mean θ : V (θ) = θ → g (X ) = log X I Poisson: I 2 Bayesian statistics I Parameter I Data θ with prior belief X ∼ f (X | θ) I Joint distribution ´ ´ θ x f (θ) f (X , θ) = f (θ)f (X f (x , θ)dxd θ = 1 | θ), I Bayes theorem, Bayes' theorem, Bayes's theorem: f (x , θ) t f (x , t )dt f (θ | x ) = ´ = f (θ)f (x | θ) Z (x ) i.e. Posterior is proportional to prior times likelihood Can generally ignore the normalizing constant Z (x ) Bayesian statistics I Instead of MLE θ̂, we can look at properties of the posterior distribution I I δ =posterior δ =posterior mean minimizes the expected square error median minimes the expected absolute error I The prior distribution does not need to be a real probability distribution. I ´ f (θ)d θ = ∞ call it an improper prior. For a random sample of size n , as n → ∞, the prior becomes If less important. Asymptotically f (θ | X 1 , . . . , Xn ) ∼ N (θ, I (θ)−1 /n) (just like MLE). I The exception to this rule is if the prior is way o, i.e. taking θ ∼ N (0, 1) or θ ∼ Uniform(0, 1) when θ is really 100 Credible intervals I For Bayesian, credible intervals replace condence intervals. I A 95% credible interval is an interval covering 95% of the posterior ˆ R (x ) L(x ) f (θ | x )d θ = 95% ↔ Pposterior (θ ∈ (L(x ), R (x ))) = 95% I Unlike the frequentist case, given the data, ”θ ∈ (L(x ), R (x ))? is still ocially random. , Where do priors come from? I Non-informative priors: make up something so broad that it is guarenteed to cover all but the most unrealistic values of θ. I OR: ask an expert I Conjugate priors: some pairs I normal prior and normal likelihood I beta prior and binomial likelihood I beta prior and geometric likelihood I gamma prior and poisson likelihood I gamma prior and normal likelihood I gamma prior and gamma likelihood I etc work out nicely analytically, so are often used. I Jerey's prior f (θ) ∝ I (θ) invariant under reparametrization p I If the prior looks a lot like the posterior, your experiment is rather questionable. Jerey's prior example I Likelihood Bernoulli(θ) I Could call the Uniform(0, 1) distribution an uninformative prior I Jerey's prior: f (x | θ) = θx (1 − θ) −x 1 I (θ) = Var ∂ ∂θ = Var log X −θ θ(1 − θ) f (x | θ) ∝ I (θ)− I Observe f (x | θ) 1 = Var = −X − θ 1−θ X 1 1 θ(1 − θ) 1 1 , → f (x | θ) = Beta 2 2 n samples: k 1s and n − k 0s = Beta( + k , + n − k ) Posterior 1 1 2 2 I Credible interval: 2.5% and 97.5% quantiles of the posterior distribution [qbeta]