Chapter 5: Some Elementary Statistical Inferences

advertisement
Chapter 5: Some Elementary Statistical Inferences
5.1 Sampling and Statistics
Definition
• (Random Sample) The random variables X1 , · · · , Xn constitude a random sample on a random variable X if X1 , · · · , Xn are iid with the same distribution as that of of X.
• (Statistic) Any function of observations is called a statistic.
If X1 , · · · , Xn are random sample with common PDF (or PMF) f (x) and CDF F (x), the the
joint PDF (or PMF) is
fX1 ,···,Xn (x1 , · · · , xn ) =
n
Y
f (xi )
i=1
and the joint CDF is
FX1 ,···,Xn (x1 , · · · , xn ) =
n
Y
F (xi ).
i=1
In addition, if a parameter is contained in f (x) so that we can write f (x) = fθ (x), then the
likelihood function is defined by their joint PDF (or PMF) as
L(θ) =
n
Y
fθ (xi ).
i=1
Example: Let X1 , · · · , Xn be random sample with PDF fθ (x). Assume fθ (x) is the PDF of
N (µ, σ 2 ), where θ = (µ, σ 2 ).
(a) If both µ and σ 2 are unknown, the quantities are 3, n, X̄, S 2 ,
Pn
4
i=1 Xi . Determined whether they are statistics.
Answer: 3, n, X̄, S 2 and
Pn
i=1
Pn
i=1 (Xi
− µ)2 /n, X̄/σ, and
Xi4 are statistic.
(b) If µ is known but σ 2 is not, which of those in part (a) are statistics.
Answer: 3, n, X̄, S 2 ,
Pn
i=1 (Xi
− µ)2 /n,
Pn
i=1
Xi4 .
(c) If σ is known but µ is not, which of those in part (a) are statistics.
Answer: 3, n, X̄, S 2 , X̄/σ,
Pn
i=1
Xi4 .
5.2 Order Statistics
• Let X1 , · · · , Xn be iid random sample from a distribution with PDF (or PMF) f (x). Let
X(i) be the i-th least value of X1 , · · · , Xn . Then, if X1 , · · · , Xn is continuous, then the joint
PDF of X(1) , · · · , X(n) is
g(y1 , · · · , yn ) = n!
n
Y
i=1
if y1 ≤ y2 ≤ · · · ≤ yn .
1
f (yi )
• Let F (x) be a CDF of X. Then, F −1 (p) for 0 < p < 1 is the quantile function of X. F −1 (0.5)
is the median.
• Sample quantile: X([pn]) is the pth sample quantile or 100pth precentile of the sample.
• X([ n2 ]) is the sample median.
Example:
• Example 5.2.4.
• If f (x) is the density of U nif orm[0, 1], the the joint PDF of the order statistic X(1) , · · · , X(n)
is
g(y1 , · · · , yn ) = n!
if 0 ≤ y1 ≤ y2 ≤ · · · ≤ yn ≤ 1. This is called the Dirichlet distribution.
In addition, the marginal PDF of X(k) is
gX( n) (x) =
n!
xk−1 (1 − x)n−k .
(k − 1)!(n − k)!
This is the density of β(k, n − k + 1).
• Let W1 , · · · , Xn+1 be iid Exp(1).In the above example. It can be further seen that the joint
density of X(k1 ) , X(k2 ) with 1 ≤ k1 < k2 ≤ n is the same as the joint density of
P k1
i=1 Wi
( Pn+1
,
i=1 Wi
Pk2
i=1
Pn+1
i=1
Wi
).
Wi
We can use this to derive the asymptotic distribution of the joint PDF of order statistics.
• In general, if f (x) is a general density, we can use the transformation of Yi = F −1 (xi ) and
Y(i) = F −1 (x(i) ). We can also analytically derive the joint asymptotic distirbutions of order
statistics.
• Find the asymtotic distirbution of the sample median. Derive the formula for confidence
interval.
• Find the limiting distirbution of the pth sample quantile. Derive the forumae for confidence
interval.
5.4 More on Confidence Intervals
One sample case.
Independent normal sample. If X1 , · · · , Xn are iid N (µ, σ 2 ), then the (1 − α)100% confidence
interval for µ is
S
x̄ ± tα/2,n−1 √
n
2
and the (1 − α)100% confidence interval for σ 2 is
[
(n − 1)S 2 (n − 1)S 2
,
].
χ2α/2,n−1 χ21−α/2,n−1
If σ is known, then the (1 − α)100% confidence interval for µ is
σ
x̄ ± zα/2,n−1 √ .
n
iid sample when n is large If X1 , · · · , Xn are iid with common mean µ and variance σ 2 , then if
n is large (e.g. n > 40), the (1 − α)100% confidence interval for µ is
S
x̄ ± zα/2,n−1 √
n
and the confidence interval for σ 2 is still as the same of the normal sample.
Binomial proportions. If we observed X ∼ Bin(n, p), then the (1 − α)% confidence interval for
p is
s
p̂(1 − p̂)
p̂ ± zα/2
.
n
Two sample case.
Large sample case. Suppose we observed X1 , · · · , Xn1 are iid with common mean µ1 and
variance σ12 and Y1 , · · · , Yn2 are iid with common mean µ2 and variance σ22 . Suppose both n1 and
n2 are large, then the (1 − α)100% (large sample) confidence interval for µ1 − µ2 is
s
x̄ − ȳ ± z α2
s21
s2
+ 2.
n1 n2
Pooled t case. Suppose we observed X1 , · · · , Xn1 ∼iid N (µ1 , σ 2 ) and Y1 , · · · , Yn2 ∼iid N (µ2 , σ 2 ).
Let
(n1 − 1)S12 + (n2 − 1)2 S22
.
n1 + n2 − 2
Then, the (1 − α)100% pooled t confidence interval for µ1 − µ2 is
Sp2 =
s
x̄ − ȳ ± tα/2,n1 +n2 −2 sp
1
1
+ .
n1 n2
Q: what is the answer when σ12 6= σ22 ?
Q: what is the confidence interval for σ12 /σ22 if we have two sample normal data?
Two-sample binomial for the difference in proportion. If we observe X ∼ Bin(n1 , p1 ) and
Y ∼ Bin(n2 , p2 ), then the (1 − α)100% confidence interval for p1 − p2 is
s
p̂1 − p̂2 ± z α2
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
+
.
n1
n2
5.5 Introduction to Hypotheses Testing
Assume the PDF (or PMF) is f (x; θ), θ ∈ Ω. Assume Ω0 ∪ Ω1 = Ω and Ω0 ∩ Ω1 = φ. Suppose
we consider the hypothesis
H0 : θ ∈ Ω0 versus H1 : θ ∈ Ω1 .
We will draw conclusion based on observations.
Look at the following 2 × 2 table.
3
Truth
Conclusion
Accept H0
Reject H0
H0
Correct
Type I Error
H1
Type II Error
Correct
We call
P (Reject H0 |H0 )
is the type I error probability and
P (Accept H0 |H1 )
is the type II error probability. We call the maximum of type I error probability is the significance
level, which is usually denoted by α. That is
α = max P (Reject H0 |H0 ).
θ∈Ω0
The power function of a test is defined by
P (Reject H0 |θ),
whic is a function of θ.
Usually, we find the test, we need α is controlled by a given value. Usually, in a test we need
to find the rejection region C based on observed value of a statistic T . That is we reject H0 if
T ∈ C and we accept H0 if T 6∈ C.
Please understand the above concepts based on the following examples:
• Suppose X1 , · · · , Xn are iid N (µ, 1). Test is
(a) : H0 : µ ≤ 0 ↔ H1 : µ > 0
or
(b) : H0 : µ ≥ 0 ↔ H1 < 0.
or
(c) : H0 : µ ∈ [µ1 , µ2 ] ↔ H0 6∈ [µ1 , µ2 ].
• Suppose X ∼ Bin(n, p). Test is
(a) H0 : p ≤ p0 ↔ H1 : p < p0
or
(b) H0 : p ≥ p0 ↔ H1 : p > p0
or
(c) H0 : p ∈ [p1 , p2 ] ↔ H1 : p 6∈ [p1 , p2 ].
4
Connection between confidence interval and test. We can reject H0 if the confidence interval
of θ is not included in the confidence interval.
5.6 Additional Comments About Statistical Tests
We will focus on the following examples:
Example 5.6.1: Let X1 , · · · , Xn be iid sample with mean µ and variance σ 2 . Test
H0 : µ = µ0 ↔ H1 : µ 6= µ0 .
Derive the power function: under (a) n is large; (b) Xi are normal samples.
Example 5.6.2: Assume X1 , · · · , Xn1 are iid N (µ1 , σ 2 ) and Y1 , · · · , Yn2 are iid N (µ2 , σ 2 ). Test
H0 : µ1 = µ2 ↔ H1 : µ1 6= µ2 .
Derive the power function.
Example 5.6.3: Suppose X1 , · · · , Xn are iid Bernoulli(p). Test
H0 : p = p0 ↔ H1 : p 6= p0
by (a) using exact binomial method; and (b) the normal approximation method. Approximately
derive the power function of (b).
Example 5.6.4: Suppose X1 , · · · , X10 are iid sample from P oisson(θ). Suppose we reject
H0 : θ ≤ 0.1 ↔ H1 : θ > 0.1
if
Y =
10
X
Xi ≥ 3.
i=1
Find the type I error probability, type II error probability and significance level.
Example 5.6.5: Let X1 , · · · , X25 be iid sample from N (µ, 4). Consider the test
H0 : µ ≥ 77 ↔ H1 : µ < 77.
Suppose we observe x̄ = 76.1. Then, p-value is
76.1 − 77
Pµ=77 (X̄ ≤ 76.1) = Φ( q
) = Φ(−2.25) = 0.012.
4/25
Remark: Observed significance level is called the p-value, which is the probability of the test
statistic greater than the observed value under the null hypothesis. For example, in Example
5.6.4, the p-value is
P (P oisson(0.1) ≥ y).
5.7 Chi-Square Tests.
5
Consider a test
H0 : θ ∈ Θ0 ↔ H1 : θ ∈ Θ1 .
Suppose under H0 we estimate µi = E(Xi ) by µ̂i and we estimate σi2 = V (Xi ) by σ̂i2 .
Pearson χ2 statistic. The Pearson χ2 statistic for independent random samples is generally
defined by
n
X
(Xi − µ̂i )2
q
Y =
.
σ̂i2
i=1
The idea is from independent normal samples as X1 , · · · , Xn are independent with N (µi , σi2 ) respectively. Then,
n
X
(Xi − µi )2
X2 =
∼ χ2n .
2
σi
i=1
Loglikelihood ratio statistic. Let `(θ) be the likelihood function. Then, the loglikelihood ratio
statistic is defined by
Λ = 2 log
supθ∈Θ `(θ)
= 2[log sup `(θ) − sup `(θ).
supθ∈Θ0 `(θ)
θ∈Θ
θ∈Θ0
We can show both X 2 and Λ are approximately chi-square distributioned. In general, we call
X 2 Pearson goodness of fit and Λ deviance goodness of fit statistics. Particularly, their degrees of
freedom equal to the difference of degrees of freedom between Θ and Θ0 . Let us try to understand
them in the following examples for X 2 . We will look at Λ in detail in Chapter 6.
Example 5.7.1 Suppose we filp a die n times. Let Xi be the number observed at the i-th time.
Find Pearson χ2 statistic X 2 .
Example 5.7.2 Suppose we have X1 , · · · , Xn samples from a distribution taking values over [0, 1].
How to find the Pearson χ2 statistic X 2 to test whether the distribution is uniform. Suppose we
partition [0, 1] into four intervals [0, 1/4], (1/4, 1/2], (1/2, 3/4] and (3/4, 1].
6
Download