Applied Statistics Tutorial: Sample Mean & CLT

advertisement
MATH2411
Applied Statistics
Tutorial Notes 3
Warm-up (Sample Mean and Sample Variance)
Referring to Example 2, eight students are randomly selected (with replacement) from
MATH2411 T1A. Their midterm scores are recorded below:
Suppose Billy will play the game twice (choose one ball with replacement each time).
Use X to denote the average payoff of the two rounds.
(c) Complete the following table.
Outcome
WW
WB
BW
BB
Prob. of outcome
1
9
2
9
2
9
4
9
Round 1 payoff x1
10
10
−2
−2
Round 2 payoff x2
10
−2
10
−2
Sample Mean
x1 + x2
x=
2
10
4
4
−2
0
72
72
0
34, 29, 23, 31, 34, 28, 32, 29
(a) Find the Sample Mean x
P8
xi
34 + 29 + 23 + 31 + 34 + 28 + 32 + 29
x = i=1
=
= 30
8
8
(b) Find the Sample Variance s2
P8
(xi − x)2
42 + (−1)2 + (−7)2 + 12 + 42 + (−2)2 + 22 + (−1)2
92
s2 = i=1
=
=
8−1
7
7
(c) Find the Sample Standard Deviation s
r
√
√
92
2 161
s = s2 =
=
≈ 3.6253
7
7
Sample Variance
2
1 X
(xi − x)2
s2 =
2 − 1 i=1
Example 1 (Sample Mean and Sample Variance)
In a bag there are 3 balls of same size, and they are made by the same material. One
ball is colored white and the other two balls are colored black. Billy randomly chooses
a ball from the bag. If the ball chosen is white, he gains $10, while if the ball chosen
is black, he will lose $2. Use $X to denote the pay-off of the game (single round).
(d) Find E(X)
(a) Find E(X)
(e) Find V ar(X)
E(X) = 10
2
6
1
+ (−2)
= =2
3
3
3
(b) Find V ar(X)
V ar(X) = E(X 2 ) − (E(X))2
1
2
= 102
+ (−2)2
− 22
3
3
108
=
−4
3
= 32
E(X) =
1
2
2
4
18
(10) + (4) + (4) + (−2) =
=2
9
9
9
9
9
V ar(X) = E((X)2 ) − (E(X))2
1
2
2
4
=[
(102 ) + (42 ) + (42 ) + (−2)2
9
9
9
9
100 + 32 + 32 + 16
=
−4
9
= 20 − 4
= 16
(f) Find E(s2 )
E(s2 ) =
1
2
2
4
(0) + (72) + (72) + (0) = 32
9
9
9
9
] − 22
Exercise 1
In Example 1, Suppose Billy will play the game 3 times (choose one ball with replacement each time). Use M to denote the average payoff of the 3 rounds.
(g) Complete the following table
Outcome
www
wwb
wbw
bww
wbb
bwb
bbw
bbb
Prob of outcome
1
27
2
27
2
27
2
27
4
27
4
27
4
27
8
27
Round 1 payoff m1
10
10
10
−2
10
−2
−2
−2
Round 2 payoff m2
10
10
−2
10
−2
10
−2
−2
Round 3 payoff m3
10
−2
10
10
−2
−2
10
10
Sample Variance s2
P3
(mi − m)2
= i=1
3−1
0
E(X) = E(X) = 3.2
r
r
q
1.6
V ar(X)
1.62
1
σX = V ar(X) =
=
=√ =
n
64
5
64
∴ P (3.2 ≤ X ≤ 3.4) = P (
−2
6
6
6
2
2
2
3.2 − 3.2
1
5
≤
X − 3.2
1
5
≤
3.4 − 3.2
1
5
)
≈ P (0 ≤ Z ≤ 1)
≈ 0.8413 − 0.5
−2
= 0.3413
48
48
48
48
48
48
0
(h) Find E(M )
1
2
4
8
10 + 36 + 24 − 16
54
(10) + 3 [
(6) ] + 3 [
(2) ] +
(−2) =
=
=2
27
27
27
27
27
27
(i) Find V ar(M )
E(M ) =
V ar(M ) = E((M )2 ) − (E(M ))2
1
2 2
4 2
8
={
(102 ) + 3 [
(6 ) ] + 3 [
(2 ) ] +
(−2)2
27
27
27
27
100 + 216 + 48 + 32
−4
27
=
44 − 12
3
=
Exercise 2 (Central Limit Theorem)
If a certain machine makes electrical resistors having a mean resistance of 40 ohms
and a standard deviation of 2 ohms, what is the probability that a random sample of
36 of these resistors will have a combined resistance of more than 1458 ohms?
Let X in ohms be the resistance of a resistor. Then,
r we have: r
q
1
1
1
V ar(X) =
(2)2 =
E(X) = E(X) = 40 and σX = V ar(X) =
n
36
3
} − 22
32
3
∴ P (36X > 1458) = P
(1458/36) − 40
X − 40
>
(1/3)
(1/3)
= P ( Z > 3 (40.5 − 40) )
= 1 − P (Z ≤ 1.5)
≈ 1 − 0.9332
= 0.0668
(j) Find E(s2 )
E(s2 ) =
If a random sample of 64 customers are to be observed, find the probability that their
mean time spent at the bank counter is at least 3.2 minutes but no more than 3.4
minutes.
= P (Z ≤ 1) − P (Z ≤ 0)
Sample Mean m
m1 + m2 + m3
=
3
=
Example 2 (Central Limit Theorem)
Suppose that, the amount of time that a Hang Seng Bank HKUST Branch staff serves
a customer is a random variable X with expectation E(X) = 3.2 minutes and a
standard deviation σX = 1.6 minutes.
1
2
4
8
3 (2 + 4) 48
(0) + 3 [
(48) ] + 3 [
(48) ] +
(0) =
= 32
27
27
27
27
27
Example 3 (Distribution of Sample Mean)
The weight of a randomly selected can of soft drink ST A is known to have a normal
distribution with mean 304g and a standard deviation of 2g.
A brief summary of course materials:
Consider a random sample of size n, X1 , X2 , · · · , Xn from a common distribution
X with mean E(X) = µ and variance V ar(X) = σ 2 .
Let X in g be the weight of the soft drinks.
(a) What is the probability that a random can has weight between 300g and 308g?
Required probability = P (300 < X < 308)
X − 304
308 − 304
300 − 304
<
<
=P
2
2
2
(1) If θ is the underlying population parameter, then any function
b 1 , X2 , · · · , Xn ) can be considered as an estimator of θ. θb is called
θ(X
b = θ.
an unbiased estimator of θ if E(θ)
n
(2) The Sample Mean X =
= P (−2 < Z < 2)
1X
Xi is an unbiased estimator of population
n i=1
mean µ. That is, E(X) = µ.
= 0.9772 − 0.0228
= 0.9544
n
(3) The Sample Variance S 2 =
1 X
(Xi − X)2 is an unbiased estimator of
n − 1 i=1
population variance σ 2 . That is, E(S 2 ) = σ 2 .
(b) The soft drink are randomly packed and sold in 6-can packages. What is the
largest weight (integer gram) can be printed on a single can so that the probability of
a customer getting an underweight 6-can pack is no more than 1%?
Let y in g be the printable weight so that the probability of any 6-can pack being
underweight is no more than 1%.
22
2
With X ∼ N 304,
= N 304,
, we need the least integer Y satisfying
6
3
P
∵
P (X < Y ) ≤ 1%
!
X − 304
y − 0.5 − 304
<
≤ 0.01 = P (Z < −2.33)
2
2
X − 304
√2
6
√
√
6
∼ Z, ∴
√
6
(4) For Sample Mean X: E(X) = µ and V ar(X) =
σ2
.
n
(5) If the samples are drawn from a normal population with mean µ and variance
σ2
σ 2 , then the sample mean also follows normal distribution X ∼ N µ,
.
n
(6) Central Limit Theorem (CLT) Consider a random sample X1 , X2 , · · · , Xn
of size n from any common distribution X with mean µ and variance σ 2 ,
√
n(X − µ)
it holds that
→ N (0, 1) as n → ∞ (practically for n ≥ 30).
σ
1.5(y − 304.5) ≤ −2.33
−2.33
y ≤ 304.5 + √
≈ 302.10
1.5
∴ the largest integer required is 302.
* Amendment at noon on 9th August are highlighted in red.
Example 4 (Unbiasedness of Estimator, 2008 Fall Final)
Let X1 , X2 , · · · , Xn be a random sample of size n drawn from a normal distribution
n
1X 2
N (0, σ 2 ). Define W =
X . Prove that W is an unbiased estimator for σ 2 .
n i=1 i
!
!
n
n
n
1X 2
1 X
1 X
2
E(W ) = E
Xi =
E(Xi ) =
V ar(Xi ) + [E(Xi )]2
n i=1
n i=1
n i=1
n
1
1X 2
=
[σ + (0)2 ] = (nσ 2 ) = σ 2
n i=1
n
∴ W is an unbiased estimator for σ 2 .
Exercise 3 (Unbiasedness of Estimator, 2012 Spring Final)
Consider a random variable X ∼
Bin(n,
θ), where n is known but θ is unknown. For
X
a function g(t) = nt(1 − t), is g
an unbiased estimator of g(θ)?
n
2
X
X
X
X
E g
=E n
1−
= E(X) − E
n
n
n
n
1
V ar(X) + (E(X))2
= nθ −
n
nθ(1 − θ) + (nθ)2
n
=
θ(1
−
θ)(n
−
1)
= nθ −
X
while g(θ) = nθ(1 − θ) 6= E g
n
X
is an BIASED estimator of g(θ).
Hence g
n
A brief summary of course materials:
(1) The χ2 , T Distributions are defined as the following:
Distribution
Definition
Example
χ2 Distribution
χ2k
If Z1 , Z2 , · · · , Zk ∼ N (0, 1),
k
X
2
then X =
Zi2 ∼ χ2k .
iid
X2 =
2
(n − 1)SX
∼ χ2n−1
2
σX
i=1
Example 5 (C.I. for Mean with Known Variance)
Many cardiac patients wear implanted pacemakers to control their heartbeat. A
plastic connector module mounts on the top of the pacemaker. Assuming a standard
deviation of 0.0015 and a normal distribution, find a 95% confidence interval for
the mean of all connector modules made by a certain manufacturing company, if a
random sample of 75 modules are collected and has an average of 0.310 inch.
Let X be the length of connector modules made by the company.
X, sample mean of n = 75 modules, is used to estimate the population mean µX .
Population standard deviation is known to be σX = 0.0015.
95% confidence interval is required so take α = 1 − 0.95 = 0.05.
∵ x = 0.31, z α2 = z0.025 = 1.960, σX = 0.0015, n = 75
σX
σX
α
α
, x + (z 2 ) √
So required interval is x − (z 2 ) √
n
n
0.0015
0.0015
= 0.31 − 1.96 √
, 0.31 + 1.96 √
75
75
= [0.3097, 0.3103]
T Distribution
tk
If Z ∼ N (0, 1), V ∼ χ2k ,
Z and V are independent,
Z
then T = q ∼ tk .
√
Tn−1 =
V
k
n(X − µX )
∼ tn−1
SX
(2) Let α be a number with 0 < α < 1.
For Z ∼ N (0, 1), T ∼ tk , X 2 ∼ χ2k , define the quantities zα , tk,α , χ2k,α to be
numbers such that α = P (Z > zα ) = P (T > tk,α ) = P (X 2 > χ2k,α ).
(3) 0 < α < 1 also gives the symmetric 1 − α confidence intervals by:
Condition
σX
is known
σX
is unknown
µX
is unknown
Parameter
θ
Estimator
θb
Lower Bound θbL
Upper Bound θbU
µX
X
x ∓ z α2
µX
X
x ∓ tn−1, α2
2
σX
2
SX
(n − 1)s2X
χ2n−1, α
2
σX
√
n s
√X
n
(n − 1)s2X
χ2n−1,1− α
2
Example 6 (C.I. for Mean with Unknown Variance)
Regular consumption of pre-sweetened cereals contributes to tooth decay, heart disease, and other degenerative disease according to studies conducted by Dr.W.H.Bowen
of the National Institutes of Health and Dr. J. Yudben, Professor of Nutrition and
Dietetics at the University of London. In a random sample of 20 similar single serving
of Alpha-Bits, the average sugar content was 11.3 grams with a standard deviation of
2.45 grams.
Assuming that the sugar contents are normally distributed, construct a 95% confidence interval for the mean sugar content for single servings of Alpha-Bits.
y = 11.3, tn−1, α2 = t49, 0.025 = 2.01
sX
sX
So required interval is x − (tn−1, α2 ) √
, x + (tn−1, α2 ) √
n
n
2.11
2.11
√
√
= 11.3 − 2.01
, 11.3 + 2.01
50
50
= [10.7002, 11.8998]
Let X be the amount of sugar content for single servings.
X, sample mean of n = 20 servings, is used to estimate the population mean µX .
Population standard deviation σX is unknown.
Sample standard deviation is sX = 2.45.
95% confidence interval is required so take α = 1 − 0.95 = 0.05.
x = 11.3, tn−1, α2 = t19, 0.025 = 2.093
sX
sX
So required interval is x − (tn−1, α2 ) √
, x + (tn−1, α2 ) √
n
n
2.45
2.45
= 11.3 − 2.093 √
, 11.3 + 2.093 √
20
20
= [10.153, 12.447]
Example 7 (C.I. for Variance with Unknown Mean)
A sample of 7 boxes of a contain type of cereal with a nominal weight of 750g had the
following weights:
775, 780, 781, 795, 803, 810, 823
Find a 95% confidence interval for σ 2 .
n = 7, x = 795.3, s2X = 315.5714, α = 1 − 95% = 0.05
χ2n−1, α = χ26, 0.025 = 14.45,
2
χ2n−1, 1− α = χ26, 0.975 = 1.237
2
(n − 1)s2X (n − 1)s2X
, 2
χ2n−1, α
χn−1, 1− α
∴ Required interval is
2
Exercise 4
Based on the given information in Example 6, now 30 MORE similar single serving
is randomly drawn. Among these 30 new samples which are assumed to be normally
distributed, the average sugar content was 11.3 grams with a standard deviation of
1.96 grams.
With all information you have, construct a 95% confidence interval for the mean sugar
content for single servings of Alpha-Bits.
Let Y be the amount of sugar content for single servings.
Y , sample mean of m = 50 servings, is used to estimate the population mean µY .
Population standard deviation σY is unknown.
19(2.45) + 29(1.96)
Sample standard deviation is sY =
= 2.11
(20 + 30) − 1
95% confidence interval is required so take α = 1 − 0.95 = 0.05
=
!
2
(7 − 1)(315.5714) (7 − 1)(315.5714)
,
14.45
1.237
= (131.04, 1530.24)
So the 95% confidence interval for σ is (11.4, 339)
Exercise 5
Each year in a university, 200 students are randomly invited to do a feedback questionnaire when they finish their first year of studies. However only 5% of the forms
could finally be collected. Responses are calculated and it is found that the mean
scores for teaching staff is 88 with a standard deviation of 5, while the mean score for
facilities is 70 with a standard deviation of 10.
Let X be the scores of teaching staff while Y be the scores of facilities.
Then, we have n = 200 × 5% = 10, x = 88, sX = 5
y = 70, sY = 10
Assume that both scorings are distributed normally. Find a 90% confidence interval
for:
(α = 1 − 90% = 0.1, tn−1, α2 = t9, 0.05 = 1.833)
(a) the mean scores of teaching staff;
sX
sX
, x + tn−1, α2 √
Required interval = x − tn−1, α2 √
n
n
5
5
= 88 − 1.833 √
, 88 + 1.833 √
10
10
= [85.1018, 90.8982]
(b) the mean scores of facilities;
sY
sY
Required interval = y − tn−1, α2 √
, y + tn−1, α2 √
n
n
10
10
√
√
= 70 − 1.833
, 70 + 1.833
10
10
= [64.2035, 75.7965]
(c) the mean scores of facilities, if it is given that the standard deviations in the
previous years were also 10;
σY
σY
Required interval = y − z α2 √
, y + z α2 √
n
n
10
10
= 70 − z0.05 √
, 70 + z0.05 √
10
10
√
√
= [70 − (1.645)( 10), 70 + (1.645)( 10)]
(d) the mean scores of teaching staff, if it is given that the standard deviations in the
previous years were 6 instead of 5;
σX
σX
Required interval = x − z α2 √
, y + x α2 √
n
n
6
6
√
√
, 88 + z0.05
= 88 − z0.05
10
10
6
6
= 88 − (1.645) √
, 88 + (1.645) √
10
10
= [84.8788, 91.1212]
(e) the variance of the scores of teaching staff; and
#
"
(n − 1)s2X (n − 1)s2X
, 2
Required interval =
χ2n−1, α
χn−1, 1− α
2
2
"
#
(10 − 1)(5)2 (10 − 1)(5)2
=
,
χ29, 0.05
χ29, 0.95
225
225
=
,
16.919 3.325
= [13.2987, 67.6692]
(f) the variance of the scores of facilities.
#
"
(n − 1)s2Y (n − 1)s2Y
Required interval =
, 2
χ2n−1, α
χn−1, 1− α
2
2
"
#
2
(10 − 1)(10)
(10 − 1)(10)2
=
,
χ29, 0.05
χ29, 0.95
900
900
=
,
16.919 3.325
= [53.1946, 270.6767]
= [64.7981, 75.2019]
(Answers will be available at http://ihome.ust.hk/~makittylee)
Download