Math 2411

advertisement
MATH2411
Applied Statistics
Tutorial Notes 4
Null
Hypothesis
Warm-up (Distribution of Sample Mean)
The random variable X, representing the number of cherries in a cherry puff, has the
following probability distribution:
x
P (X = x)
4
0.2
5
0.4
6
0.3
7
0.1
H0 : µX = µ0
Test
Statistics
Condition
σX
is known
z0 =
x − µ0
√
σX / n
(a) Find the expectation E(X) and the variance V ar(X).
E(X) = 4 (0.2) + 5 (0.4) + 6 (0.3) + 7(0.1) = 0.8 + 2 + 1.8 + 0.7 = 5.3
H0 : µX = µ0
σX
is unknown
t0 =
x − µ0
√
sX / n
V ar(X) = E(X 2 ) − (E(X))2
= 42 (0.2) + 52 (0.4) + 62 (0.3) + 72 (0.1) − 5.32
= (3.2 + 10 + 10.8 + 4.9) − 28.09
= 0.81
H0 :
2
σX
=
σ02
µX
is unknown
χ20
(n − 1)s2X
=
σ02
Alternative
Hypothesis
Rejection Criteria
H1 : µX 6= µ0
|z0 | > z α2
H1 : µX > µ0
z0 > zα
H1 : µX < µ0
z0 < −zα
H1 : µX 6= µ0
|t0 | > tn−1, α2
H1 : µX > µ0
t0 > tn−1,α
H1 : µX < µ0
t0 < −tn−1,α
2
H1 : σ X
> σ02
χ20 > χ2n−1,α
H1 :
2
σX
6=
σ02
χ20 > χ2n−1, α
2
or
χ20 < χ2n−1,1− α
H1 :
2
σX
<
σ02
χ20
2
(b) Suppose 36 cherry puffs are to be randomly selected and use X to denote the
sample mean (average number of cherries in 36 puffs).
Find the mean E(X) and the variance V ar(X).
E(X) = E(X) = 5.3
V ar(X) =
1
9
1
V ar(X) =
(0.81) =
36
36
400
(c) Find the probability that the average number of cherries in 36 cherry puffs will
be less than 5.5.
r
9
3
σX =
=
= 0.15
400
20
X − 5.3
5.5 − 5.3
4
P (X ≤ 5.5) = P
=P Z≤
≈ 0.9082
≤
0.15
0.15
3
<
χ2n−1,1−α
Example 1 (Test for population mean)
The breaking strength of a fiber used in manufacturing cloth is required to be at least
160 psi. Past experience has indicated that the standard deviation of the breaking
strength is 3 psi. A random sample of 40 specimens from a certain batch is tested
and the average breaking strength is found to be 159.8 psi. For α = 0.05, should this
batch be judged acceptable or not?
Let X be the random variable of breaking strength.
Then σX = 3, n = 40, x = 159.8, α = 0.05 and zα = z0.05 = 1.645
(
H0 : µX = 160
H1 : µX < 160
159.8 − 160
√
= −0.421637
> −1.645 = −z0.05 , hence we do not reject H0 ,
3/ 40
at 0.05 significance level, based on the given observations.
∴ z0 =
Example 2 (Test for population standard deviation)
A soft-drink dispensing machine is said to be out of control if the standard deviation
of the contents exceeds 15 ml. If a random sample of 25 drinks from this machine has a
sample standard deviation of 20.3 ml, does this indicate at the 0.05 level of significance
that the machine is out of control? Assume that the contents are normally distributed.
Let X be the amount of one drink from the machine.
Then sX = 20.3, n = 25, α = 0.05 and χ2n−1, α = χ224, 0.05 = 36.415
(
2
= 225)
H0 : σX = 15 (σX
2
H1 : σX > 15 (σX > 225)
24 · 20.32
= 43.956
> χ224,0.05 .
225
Hence, we have strong enough evidence, at 0.05 significance level, to reject H0 based
on the given observations.
χ20 =
Exercise 1
Assume that the yield of alfalfa (in tons per acre) has a normal distribution with
mean 1.5 and variance 0.09. It is hoped that a new fertilizer will increase the average
yield. We shall test the one-sided right test with H0 : µX = 1.5, where µX is the
population mean of the yield with the new fertilizer.
Assume that the normal population is still used and the variance continues to equal
0.09 with the new fertilizer. Determine the unknown sample size n and critical value
c so that the Type I error probability is 0.05 and the power of the test statement at
µX = 1.7 is 0.95.
(
(
H0 : µX = 1.5
Type I error ⇒
α = P (X > c | H0 )
&
H1 : µX = 1.7
power
⇒ 1 − β = P (X > c | H1 )
X − 1.5
c − 1.5
√ >
√
Type I error: P
= 0.05 = P (Z > 1.645) ⇒
0.3/ n
0.3/ n
X − 1.7
c − 1.7
√ >
√
Power:
P
= 0.95 = P (Z > −1.645) ⇒
0.3/ n
0.3/ n
0.3
1.5 + 1.7
So, c − 1.5 = 1.645 √
= −(c − 1.7) ⇒ c =
= 1.6
2
n
√
1.645(0.3)
⇒ n = 4.9352 = 24.354225 ≈ 24
1.6 − 1.5
∴ c = 1.6 and n = 24
and hence
Let x1 , x2 , ..., xn be given fixed points. Let Y1 , Y1 , ..., Yn be the response values at
x1 , x2 , ..., xn respectively. Under the linear assumption, we have Yi = β0 + β1 xi + i
where the random errors i ∼ N (0, σ 2 ) are assumed to be independent.
For the observed paired data (x1 , y1 ), (x2 , y2 ), ..., (xn , yn ) with x and y being the
means of xi and yi respective, we define the followings:
Pn
n
n
X
X
( i=1 xi )2
(1) SXX =
(xi − x)2 =
(xi 2 ) −
n
i=1
i=1
P
n
n
n
X
X
( i=1 yi )2
SY Y =
(yi − y)2 =
(yi 2 ) −
n
i=1
i=1
Pn
Pn
n
n
X
X
( i=1 xi ) ( i=1 yi )
(xi yi ) −
SXY =
(xi − x)(yi − y) =
n
i=1
i=1
(2)
Pn
(x − x)(yi − y)
SXY
Pn i
is the Least Square Estimate of β1
b = i=1
=
2
S
(x
−
x)
XX
i=1 i
a = y − bx is the Least Square Estimate of β0
(3)
ŷ = a + bx is the fitted regression line where yˆi is the fitted value of Yi .
(4)
ei = yi − yˆi is the residual of Yi
2
(5)
s =
(6)
βˆ1 =
c − 1.5
√ = 1.645
0.3/ n
Pn
i=1 ei
n−2
Pn
2
=
SY Y − b SXY
is called the mean square error (MSE)
n−2
(x − x)(Yi −
i=1
Pn i
2
i=1 (xi − x)
Y)
is the Least Square Estimator of β1
βˆ0 = Y − βˆ1 x is the Least Square Estimator of β0
c − 1.7
√ = −1.645
0.3/ n
(7)
If σ is unknown, then the 100(1 − α)% confidence intervals for β1 and β0 are
sP
r
n
2
1
i=1 (xi )
b ± tn−2, α2 s
and a ± tn−2, α2 s
respectively.
SXX
SXX
(8)
Given xnew , the value ynew = a + bxnew is called the predictor of ynew , where its
s
1
(xnew − x)2
100(1 − α)% prediction interval is ynew
ˆ ± tn−2, α2 s 1 + +
n
SXX
n=
Null
Hypothesis
H0 :
β 1 = b1
H0 :
β 0 = b0
Test
Statistics
Condition
σ is
unknown
σ is
unknown
t0 =
t0 =
s
b − b1
√
s/ SXX
a−b
q Pn 0
2
i=1 (xi )
n SXX
(b) Find the least square fitted linear regression line equation.
Alternative
Hypothesis
Rejection
Criteria
H1 : β1 6= b1
|t0 | > tn−2, α2
Then,
H1 : β1 > b1
t0 > tn−2,α
SXX =
Let y = a + bx be the line required.
100
45
110
51
120
54
130
61
H1 : β1 < b1
t0 < −tn−2,α
H1 : β0 6= b0
|t0 | > tn−2, α2
H1 : β0 > b0
t0 > tn−2,α
H1 : β0 < b0
t0 < −tn−2,α
SXY
(xi − x)2 = 2(52 + 152 + 252 + 352 + 452 ) = 8250
140
66
150
70
160
74
170
78
180
85
n
X
=
(xi − x)(yi − y) =
i=1
∴b=
190
89
(a) Plot the given data points on the coordinate paper.
Refer to scanned answer at the link
y = 67.3
i=1
Example 3
A chemical enginner is investigating the effect of process operating temperature on
product yield. Then experiment results are listed below.
Temperature (x)
Yield (y)
x = 145,
n
X
http://docs.wixstatic.com/ugd/68f78b_9f525f2ffee046cc98aeebc1b0705471.pdf
n
X
!
− nx y = 101570 − (145)(673) = 3985
xi yi
i=1
3985
797
SXY
=
= 0.483̇0̇
=
SXX
8250
1650
and
673
797
−
(145)
10
1650
−452
=
= −2.73̇9̇
165
a = y − bx =
So the line equation required is: ŷ = −2.739 + 0.4830x .
(c) Use the result in (b) to predict the yield ynew
ˆ at the temperature xnew = 122◦ C.
ynew
ˆ = −2.739 + 0.4830(122) = 56.19
Exercise 2
In 2013-2014 academic year, 9 student samples from MATH 2411 class are drawn and
their percentage scores in midterm (x) and final examination (y) are as follows:
x
y
77
82
50
66
71
78
72
34
81
47
94
85
96
99
99
99
67
68
(a) Find the fitted least square regression line.
Here, sample size n is 9 so we have :
P9
9
X
( i=1 xi )2
7072
18164
2
SXX =
xi −
= 57577 −
=
9
9
9
i=1
P
P
9
9
9
X
( i=1 xi )( i=1 yi )
707(658)
14116
SXY =
xi yi −
= 53258 −
=
9
9
9
i=1

SXY
14116
3529

b=
=
=
= 0.7771



SXX
18164
4541
So we have :


658 3529 707
54775

 a = y − bx =
−
=
= 12.062
9
4541
9
4541
∴ the fitted least square regression line is: ŷ = 12.062 + 0.7771x
(b) Evaluate the Mean Square Error (MSE) s2 .
P9
9
X
( i=1 yi )2
6582
34856
2
SY Y =
yi −
= 51980 −
=
9
9
9
i=1
34856 3529 14116
−
S
−
b
S
12051748
9
4541
9
Y
Y
XY
2
so s =
=
=
= 379.1407808
n−2
9−2
31787
(c) Given that Billy got 85 in the midterm, use the result in (a) to estimate his final
exam score. Billy’s final score estimated, ynew
ˆ = 12.062 + 0.7771 (85) = 78.1155 .
(d) Find 80% prediction interval for Billy’s final exam score.
v
u
s
707 2
r
u
(85 −
)
2
u
1
(xnew − x)
1
20556
9
=
1+ +
=u
1
+
+
≈ 1.063808749
t
18164
n
SXX
9
18164
9
s
(xnew − x)2
1
80% prediction interval = ynew
ˆ ± tn−2, 1−80% s
1+ +
2
n
SXX
√
= 78.1155 ± t9−2, 0.1
379.141 (1.063808749)
Example 4
A researcher wants to investigate the relationship between the driving experience and
the monthly auto insurance premium. A random sample of 100 auto drivers insured
with a company and having similar auto insturance policies was selected. The following
table summarizes their driving experience x (in years) and the monthly auto insurance
premium y (in dollars).
Variable
x
y
Mean
11.25
69
Standard Deviation
7.4
14.8
It is also given that Sxy = −7774.6 .
(a) Find the least square regression line for predicting the monthly auto insurance
premium from the years of driving experience.
Pn
n
X
(xi − x)2
SXX =
(xi − x)2 = (n − 1) i=1
= (100 − 1)(7.42 ) = 5421.24
n
−
1
i=1
SXY
−7774.6
Therefore, b =
= −1.434099948
=
SXX
5421.24
−7774.6
and hence a = y − bx = (69) −
(11.25) = 85.13362441
5421.24
∴ the lease square regression line is ŷ = 85.13 − 1.434x
= 78.1155 ± 1.415 (19.47) (1.06381)
= 78.12 ± 29.31
= [48.81, 107.43]
P. S.
If the score percentage cannot exceed 100 (e.g. no bonus marks), then the prediction
interval would be [48.81, min(100, 107.43)] = [48.81, 100]
And it would be [max(0, 48.81), min(100, 107.43)] = [48.81, 100] if there is no negative
score.
(b) Predict the monhly auto insurance premium for a driver with 10 years of driving
experience. Round your answer to the nearest dollar.
Monthly auto insurance premium predicted = 85.13 − 1.434(10)
= 70.79
(e) Plot the given data points and the fitted regression line on the coordinate paper
given.
For the plot, please refer to past link at https://drive.google.com/file/d/0B- frzO- qxjYjcmp1MU0wb0l4b2M/view
Also, the EQUATION of the fitted least square regression line is required.
= 71, correct to the nearest dollar
Exercise 3 (Counting Principle)
A study was made on the amount of converted sugar in a certain process at various
temperatures. The data were coded and recorded as follows:
Temperature, x
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
Converted sugar, y
8.1
7.8
8.5
9.8
9.5
8.9
8.6
10.2
9.3
9.2
10.5
σX = 0.11
and
y = 91.2̇7̇,
σY = 0.7201̇8̇
n
X
(xi − x)2 = (n − 1)x = 10(0.11) = 1.1
i=1
!
n
n
X
X
=
(xi − x)(yi − y) =
xi yi − nx y = 152.59 − (1.5)(100.4) = 19.9
SXX =
SXY
i=1
∴b=
SXY
19.9
=
= 1.80̇9̇
SXX
1.1
SY Y =
n
X
i=1
s2 =
(yi 2 ) −
Pn
( i=1 yi )2
100.42
= 923.58 −
= 7.201̇8̇
n
11
7.201̇8̇ − 1.80̇9̇ (1.99)
39619
SY Y − b SXY
=
=
= 0.4001̇9̇
n−2
11 − 2
99000
(d) Construct a 95% confidence interval for β0 .
sP
95% confidence interval = a ± tn−2,
1−95%
2
s
= 6.413̇6̇ ± t11−2, 0.025
(a) Find the least square linear regression line.
x = 1.5,
(c) Evaluate s2 .
n
i=1
xi 2
n SXX
s
p
25.85
0.4001̇9̇
11 (1.1)
= 6.413̇6̇ ± 2.262 (0.6326) (1.4616)
= 6.4136 ± 2.0915
= [4.322, 8.505]
i=1
and
a = y − bx =
100.4 199
−
(1.5) = 6.413̇6̇
11
110
So the line equation required is: ŷ = 6.4136 + 1.8091x .
(e) Construct a 95% confidence interval for β1 .
r
1
95% confidence interval = b ± tn−2, 1−95% s
2
SXX
r
p
1
= 1.80̇9̇ ± t11−2, 0.025
0.4001̇9̇
1.1
= 1.80̇9̇ ± 2.262 (0.6326) (0.9535)
(b) Predict the amount of converted sugar produced when the coded temperature is
1.75.
= 6.4136 ± 1.3644
= [0.445, 3.173]
Amount of converted sugar predicted, ynew
ˆ = 6.413̇6̇ + 1.80̇9̇ (1.75) = 9.5795̇4̇
(Answers will be available at http://ihome.ust.hk/~makittylee)
Download