Data analysis Ben Graham October 26, 2015 MA930, University of Warwick

advertisement
Data analysis
Ben Graham
MA930, University of Warwick
October 26, 2015
p-value examples
I
I
p-value:
P such that P(P ≤ t ) ≤ t for t ∈ [0, 1].
Random sample X , . . . , Xn iidrv Exponential(θ), mean
statistic
1
µ := 1/θ
I Under
H
0
unknown
: µ = 1,
sample mean has the Gamma distribution
with mean 1, shape
n
Q = FH0 (X̄ ) ∈ [0, 1] uniform under H0
H1 : µ < 1
I Reject smallvalues of Q
I P = Q
I Let
I
I
H
1
: µ 6= 1
I Reject large values of
I
I
H
1
P =1−Q
Q
: µ 6= 1
I Reject small and large values of
I
P = 1 − |2Q − 1|
Q
Hypothesis testing
I Example: Lady tasting tea
I Experiments by Ronald Fisher and Muriel Bristol
I 8 cups of tea: 4 tea into cup rst, 4 milk rst
#Successes
Count
0
1
1
16=
2
36=
3
16=
4
I
P
4
4
34
2 2
4
4
1
4
3
1
1
count
=70=
8
4
I Null hypothesis: no ability to taste dierence construct
p-value
I Under
H
0,
P(4 successes | H0 ) = 1/70
I Muriel got 4/4: Reject
H
0
Power calculations
I Before doing an experiment, check it can do what you want
I Example: Testing a coin for bias:
I
0
:θ=
2
vs
1
1
2
I Ask to keep
I
I
I
X
∼Binomial(n, θ)
H : θ 6=
P(reject H | H ) ≤ 5%
Ask for P(reject H | |θ − | ≥ 0.1) ≥ 92%
√
Assume X̄ ∼ N (θ, θ/4 n ) in the range θ ∈ [0.4, 0.6]
Need n such that
[FN−( , ) (1 − 5%/2) + FN−( , ) (1 − 8%)] × s .d . ≥ 0.1
H
1
0
0
1
0
1
0 1
2
1
0 1
Condence Intervals
I Parameter
θ∈R
I 95% CI: Statistics
L, R
such that
∀θ, Pθ [L ≤ θ ≤ R ] ≥ 95%
I Not uniqueone sided or two sided, etc
I Complement of the critical regions for testing
H
0
: θ = θ̂,
that the MLE is the right parameter.
I N.B. Here
θ
is xed and the statistics
L, R
are random.
i.e.
Normal condence intervals
I
X
1
, . . . , Xn ∼ N (θ, 1)
I MLE
X̄
iidrv
= θ̂ ∼ N (θ, 1/n)
I
1.96
1.96
1.96
1.96
P[θ− √ ≤ θ̂ ≤ θ+ √ ) = P[θ ∈ (θ̂− √ , θ̂+ √ )] = 95%
n
I
n
n
1.64
1.64
P[θ̂ ≥ θ − √ ) = P[θ ∈ (−∞, θ̂ + √ )] = 95%
n
I
n
n
1.64
1.64
P[θ̂ ≤ θ + √ ) = P[θ ∈ (θ̂ − √ , ∞)] = 95%
n
n
t condence intervals
I
X
1
, . . . , Xn ∼ N (µ, σ 2 )
I Sample mean
I
X̄
iidrv
= µ̂ ∼ N (θ, 1/n),
sample variance
X̄ −√µ
∼t
S / n n−
q such that
Ft −1 (q ) − Ft −1 (−q ) = P(−q ≤ A ≤ q | A ∼ tn− ) = 95%
Hypothesis
test:
Under H : µ = µ ,
X̄ −µ
0
√
Ft −1 S / n ∼ Uniform(0, 1)
S
S
Condence interval: ∀µ, P(µ ∈ (X̄ − q √ , X̄ + q √ )) = 95%
n
n
n
1
n
0
n
I
2
1
I Choose
I
S
0
Hypothesis test for contingency table
I
I
I
I
I
I
m × n contingency table
H :properties
P
P are independent pi ,j = ai × bj ;
i ai = 1 ,
j bj = 1 so m + n − 2 degrees of freedom
H :properties are not independent.
m × n − 1 degrees of freedom
P
Number of observations N =
i ,j Oi ,j
Expected number of observation under H is
P
P
Ei ,j := N × ( Oi ,k /N ) × ( Ok ,j /N )
Asymptotically under H by Wilk's theorem:
0
1
0
0
Q
(Ei ,j /N )O ,
≈ χ2(m−1)(n−1)
−2 log Q
(Oi ,j /N )O ,
P
2
Pearson's χ test statistic
(Oi ,j − Ei ,j )2 /Ei approximates
above if mini Ei ≥ 5.
i j
i j
I
I Large values: reject independence. Small values: faked data?
Variance stabilizing transforms
I
X
∼ f (· | θ)
EX = θ
and
X ) =: V (θ)
p
Y := X / V (θ) has variance 1
I Varθ (
I
I Taylor's theorem:
g (X )) ≈Var(θ + g 0 (θ)Y V (θ)) ≈ g 0 (θ) V (θ)
0
Want to nd g such that Var(g (X )) ≈ g (θ) V (θ) ≈constant
g 0 (θ) ≈ constant × V (θ)− /
p
Var(
I
2
2
1 2
g (θ) =
ˆ
θ
V (u )− / du
1 2
√
V (θ) = θ → g (X ) = X
Exponential mean θ : V (θ) = θ → g (X ) = log X
I Poisson:
I
2
Bayesian statistics
I Parameter
I Data
θ
with prior belief
X ∼ f (X
| θ)
I Joint distribution
´ ´
θ x
f (θ)
f (X , θ) = f (θ)f (X
f (x , θ)dxd θ = 1
| θ),
I Bayes theorem, Bayes' theorem, Bayes's theorem:
f (x , θ)
t f (x , t )dt
f (θ | x ) = ´
=
f (θ)f (x | θ)
Z (x )
i.e. Posterior is proportional to prior times likelihood
Can generally ignore the normalizing constant
Z (x )
Bayesian statistics
I Instead of MLE
θ̂,
we can look at properties of the posterior
distribution
I
I
δ =posterior
δ =posterior
mean minimizes the expected square error
median minimes the expected absolute error
I The prior distribution does not need to be a real probability
distribution.
I
´
f (θ)d θ = ∞ call it an improper prior.
For a random sample of size n , as n → ∞, the prior becomes
If
less important.
Asymptotically
f (θ | X
1
, . . . , Xn ) ∼ N (θ, I (θ)−1 /n)
(just like
MLE).
I The exception to this rule is if the prior is way o, i.e. taking
θ ∼ N (0, 1)
or
θ ∼ Uniform(0, 1)
when
θ
is really 100
Credible intervals
I For Bayesian, credible intervals replace condence intervals.
I A 95% credible interval is an interval covering 95% of the
posterior
ˆ R (x )
L(x )
f (θ | x )d θ = 95% ↔ Pposterior (θ ∈ (L(x ), R (x ))) = 95%
I Unlike the frequentist case, given the data,
”θ ∈ (L(x ), R (x ))?
is still ocially random.
,
Where do priors come from?
I Non-informative priors: make up something so broad that it is
guarenteed to cover all but the most unrealistic values of
θ.
I OR: ask an expert
I Conjugate priors: some pairs
I normal prior and normal likelihood
I beta prior and binomial likelihood
I beta prior and geometric likelihood
I gamma prior and poisson likelihood
I gamma prior and normal likelihood
I gamma prior and gamma likelihood
I etc
work out nicely analytically, so are often used.
I Jerey's prior
f (θ) ∝ I (θ) invariant under reparametrization
p
I If the prior looks a lot like the posterior, your experiment is
rather questionable.
Jerey's prior example
I Likelihood Bernoulli(θ)
I Could call the Uniform(0, 1) distribution an uninformative prior
I Jerey's prior:
f (x | θ) = θx (1 − θ) −x
1
I (θ) = Var
∂
∂θ
= Var
log
X −θ
θ(1 − θ)
f (x | θ) ∝ I (θ)−
I Observe
f (x | θ)
1
= Var
=
−X
−
θ
1−θ
X
1
1
θ(1 − θ)
1 1
,
→ f (x | θ) = Beta
2
2
n samples: k 1s and n − k 0s
= Beta( + k , + n − k )
Posterior
1
1
2
2
I Credible interval: 2.5% and 97.5% quantiles of the posterior
distribution [qbeta]
Download