AMS572.01
Practice Final Exam Fall, 2013
Name ___________________________________ID ______________________Signature________________________
Instruction: This is a close book exam. Anyone who cheats in the exam shall receive a grade of F. Please provide complete solutions for full credit. The exam goes from 11:15am - 1:45pm. Good luck!
1.
The following table gives the amount of additive ( x ) and the reduction in nitrogen oxides ( y ) in 7 cars.
Amount of additive ( x ) 1 2 3 4 5 6 7
4.8 Reduction in nitrogen oxide ( y ) 2.5 3.1 3.8 3.2 3.9 4.4
(a) Find the least squares regression line.
(b) Test at α = 0.05
whether there is a significant linear relationship between these two variables.
(c) What percentage of variation in nitrogen oxide is explained by the amount of additive?
(d) Please write up the entire SAS code necessary to answer questions (a), (b), (c) above.
Solution: This is a simple linear regression problem.
(a) 𝑛 = 7, 𝑥̅ = 4, 𝑦̅ = 3.67
𝑆 𝑥𝑦
= ∑ 𝑥𝑦 − 𝑛𝑥̅𝑦̅ = 112.4 − 7 ∗ 4 ∗ 3.67 = 9.6
𝑆 𝑥𝑥
= ∑ 𝑥 2 − 𝑛𝑥̅ 2 = 140 − 7 ∗ 4 2 = 28
𝑆 𝑦𝑦
= ∑ 𝑦 2 − 𝑛𝑦̅ 2 = 98.15 − 7 ∗ 3.67
2 = 3.79
𝛽̂
0
The fitted least square regression line is: 𝛽̂
1
= 𝑦̅ − 𝛽̂
1
=
𝑆 𝑥𝑦
𝑆 𝑥𝑥
=
9.6
28
= 0.343
𝑥̅ = 3.67 − 0.343 ∗ 4 = 2.298
𝑦̂ = 2.298 + 0.343𝑥
(b) The mean square error estimate of σ is:
𝑆𝑆𝐸
̂ = √𝑀𝑆𝐸 = √ 𝑛 − 2
= √
𝑆𝑆𝑇 − 𝑆𝑆𝑅 𝑛 − 2
= √
𝑆 𝑦𝑦
− 𝛽̂ 2
1 𝑛 − 2
𝑆 𝑥𝑥
= √
3.79 − 0.343
7 − 2
2 ∗ 28
= 0.315
The hypotheses are: 𝐻
0
Test statistic:
: 𝛽
1
= 0 versus 𝐻 𝑎
: 𝛽
1
≠ 0 𝑡
0
= 𝛽̂
1
− 0
SE(𝛽̂
1
)
= 𝛽̂
1
=
0.343
0.315
= 5.76 > 𝑡
5,0.025
= 2.571
√𝑆 𝑥𝑥
√28
Therefore we reject the null hypothesis at α = 0.05
and conclude that there is a significant linear relationship between these two variables.
(c)
𝑅 2
𝑆 𝑥𝑦
2
9.6
2
=
𝑆 𝑥𝑥
𝑆 𝑦𝑦
=
28 ∗ 3.79
= 0.8684
Therefore we claim that 86.84% of variation in nitrogen oxide is explained by the amount of additive.
(d)
Data nitro_ox; input x y; datalines ;
1 2.5
2 3.1
3 3.8
1
4 3,2
5 3.9
6 4.4
7 4.8
; run ; proc reg data = nitro_ox; model y = x; run ;
2.
Based on interviews of couples seeking divorces, a social worker compiles the following data related to the period of acquaintanceship before marriage and the duration of marriage:
Acquaintanceship before marriage
Under 0.5 years
≤ 4 years
11 (10)
Duration of marriage:
> 4 years
8 (9)
0.5 – 1.5 years
Over 1.5 years
28 (28)
21 (22)
24 (24)
19 (18)
(a) Perform a test to determine if there is a relationship between the period of acquaintanceship before marriage and the duration of marriage. Use α = 0.05
.
(b) Please write up the entire SAS code necessary to answer question (a) above.
Solution: This is a two-way contingency table problem.
(a) We are performing a test for independence (multinomial sampling).
H
0
: π ij
= π i.
∗ π
.j
, for all i, j
H a
: the above is not true
Let n i.
∗ n
.j
e ij
= , for all i, j n
∙∙
(Note: for simplicity, I rounded the expected values to integers – but in reality, one does not need to do so.)
The test statistic is: 𝑟 𝑐 𝜒 2
0
= ∑ ∑
(𝑛 𝑖𝑗
−𝑒 𝑖𝑗
)
2 𝑒 𝑖𝑗 𝑖=1 𝑗=1
3 2
= ∑ ∑
(𝑛 𝑖𝑗
−𝑒 𝑖𝑗 𝑒 𝑖𝑗
)
2
=
1
10
+
1
22
+
1
9
+
1
18
= 0.312 < 𝜒 2
2,0.05
= 5.991
𝑖=1 𝑗=1
We could not reject the null hypothesis and conclude that we do not have enough evidence to show any relationship between the period of acquaintanceship before marriage and the duration of marriage.
(b)
data marriage; input acquint $ duration $ number; datalines; short le4 11 short gt4 8 med le4 28 med gt4 24 long le4 21 long gt4 19
;
run;
proc freq data=marriage;
weight number;
2
tables acquint*duration / chisq ;
run;
3.
The following table records the observed number of births at a hospital in four consecutive quarterly periods.
Quarters Jan-Mar
Number of births 110
Apr-June
57
July-Sept
53
Oct-Dec
80
(a) It is conjectured that twice as many babies are born during the Jan-Mar quarter than are born in any of the other three quarters. At α = 0.05
, test if these data strongly contradict the stated conjecture.
(b) Please write up the entire SAS code necessary to answer question (a) above.
Solution: This is a one-way contingency table problem.
(a) We are performing a Chi-square goodness of fit test.
H
0
: P
JM
= 0.4, P
AJ
= P
JP
= P
OD
H a
: the above is not true
= 0.2
𝜒 2
0
The test statistic is:
(110 − 300 ∗ 0.4) 2
=
300 ∗ 0.4
= 7.815
+
(57 − 300 ∗ 0.2)
300 ∗ 0.2
2
+
(53 − 300 ∗ 0.2)
300 ∗ 0.2
2
+
(80 − 300 ∗ 0.2)
300 ∗ 0.2
2
= 8.47 > 𝜒 2
3,0.05
We reject the null hypothesis and conclude that these data strongly contradict the stated conjecture.
.(b)
DATA BIRTH;
INPUT QUARTER $ NUMBER;
DATALINES ;
Jan-Mar 110
Apr-Jun 57
Jul-Sep 53
Oct-Dec 80
;
* HYPOTHESIZING A 2:1:1:1 RATIO;
PROC FREQ DATA =BIRTH ORDER =DATA; WEIGHT NUMBER;
TITLE3 'GOODNESS OF FIT ANALYSIS' ;
TABLES QUARTER / CHISQ NOCUM TESTP =( 0.4 0.2 0.2 0.2
);
RUN ;
4.
Suppose the National Transportation Safety Board (NTSB) wants to examine the safety of compact cars, midsize cars, and full-size cars. It collects a sample of three for each of the cars types.
Compact
643
655
702
Midsize cars
469
427
525
Full-size cars
484
456
402
(a) Using the hypothetical data provided below, test at α = 0.05
whether the mean pressure applied to the driver’s head during a crash test is equal for each types of car. What assumptions are necessary for your test?
(b) Please write up the entire SAS code necessary to answer question (a) above.
Solution: This is a one-way ANOVA problem with 3 independent samples.
(a) We need to perform an ANOVA F-test. The first assumption is that all three populations are normal. The second is that all three population variances are unknown but equal.
H
0
: μ
1
= μ
2
= μ
3
3
H a
: the above is not true
Source
Car Type
Error
Total
SS
86049.55
10254
96303.55
Analysis of Variance d.f.
2
6
8
MS
43024.78
1709
F
25.17
.(b)
Since F
0
= 25.17 > F
2,6,0.05
= 5.14
, we reject the null hypothesis, and claim that the mean pressures applied to the driver’s head during a crash test are NOT all equal for these three types of car. data car; input type $ pressure; datalines ;
;
Compact 643
Compact 655
Compact 702
Midsize 469
Midsize 427
Midsize 525
Fulsize 484
Fulsize 456
Fulsize 402 run ; proc anova data = car; class type; model pressure = type; run ;
5.
The length of time to recovery was recorded for patients randomly assigned and subjected to two different surgical procedures. The data (recorded in days) are as follows:
Procedure 1 Procedure 2
Sample mean
Sample variance
7.3
1.23
8.9
1.49
Sample size 11 13
(a) Test at α = 0.01
whether the data present sufficient evidence to indicate a difference between the mean recovery times for the two surgical procedures. What assumptions are necessary? Test the assumptions necessary if you can.
(b) Please derive the corresponding general test using the pivotal quantity method. Please derive the pivotal quantity and its distribution, list the test statistic, and derive the rejection region for a 2-sided test at the significance level of α.
(c) (extra credit) Please derive the general test using the likelihood ratio test method. Prove whether this test is equivalent to the one derived using the pivotal quantity method in part (b).
Solution:
(a) Inference on two population means. Two small and independent samples.
Procedure 1: 𝑋̅ = 7.3, 𝑆
Procedure 2: 𝑌̅ = 8.9, 𝑆
2
1
2
2
= 1.23, 𝑛
1
= 1.49, 𝑛
2
= 11
= 13
4
[1] Under the normality assumption, we first test if the two population variances are equal. That is, H
0
H a
:
2
2
1
2
. The test statistic is
𝐹
0
=
𝑆
2
2
𝑆
2
1
=
1.49
1.23
= 1.21 < 𝐹
12,10,0.05,𝑈
We cannot reject H
0
-- it is reasonable to assume that
1
2
2
2
.
= 2.91
:
1
2 2
2
versus
[2] This is inference on two population means, independent samples. The first assumption is that both populations are normal. The second is the equal variance assumption which we have checked in (a) [1].
Now we perform the pooled-variance t-test with hypotheses t
0
S p
X
1 /
Y n
1
1 /
0 n
2
H
0
:
1
7 .
3
1 .
37
1 /
2
8 .
9
0
0
11
1 /
versus
13
H a
3 .
33
:
2
0
Since |t
0
| = 3.33 > t
22,0.005
= 2.819
, we reject H
0
and conclude that the data present sufficient evidence to indicate a difference between the mean recovery times for the two surgical procedures at the significance level of 0.01.
(b) Derivation of the pooled-variance t-test (2-sided test) using the pivotal quanity approach
Suppose we have two independent random samples from two normal populations: and
H a
:
Y Y
1 2
, , Y n
2
~ N
1
2
0
2
,
2
. Here is a simple outline of the derivation of the test: using the pivotal quantity approach.
,
2
, , X n
1
~ N
H
0
:
1
2
2
,
0 versus
[1]. We start with the point estimator for the parameter of interest
N
1
2
,
2
1 / n
1
1 / n
2
using the mgf for N properties of the random samples. From this we have
Z
,
2
which is
X
Y
1 /
n
1
M
1
1 /
n
2
2
1
exp
~
N
2
t
as the pivotal quantity because σ is unknown.
:
2 t
X
Y
. Its distribution is
2 / 2
, and the independence
. Unfortunately, Z can not serve
[2]. We next look for a way to get rid of the unknown σ following a similar approach in the construction of the pooledvariance t-statistic. We found that W
n
1
1
S
1
2
n
2
1
S
2
2
/
2
~
2 n
1
n
2
2
using the mgf for
k
2
which is k / 2
1
M
, and the independence properties of the random samples.
2 t
[3]. Then we found, from the theorem of sampling from the normal population, and the independence properties of the random samples, that Z and W are independent, and therefore, by the definition of the t-distribution, we have obtained our pivotal quantity: T
X
S p
Y
1
/
n
1
1
1
/ n
2
2
~ t n
1
n
2
2
, where S
2 p
n
1
1
n
1
S
1
2
n
2
n
2
2
1
S
2
2
is the pooled sample variance.
[4]. The rejection region is derived from P
T
0
c | H
0
, where T
0
S p
X
Y
1 / n
1
1 /
0 n
2
H
0
~ t n
1
n
2
2
. Thus c
t n
1
n
2
2 ,
/ 2
. Therefore at the significance level of α, we reject H
0 in favor of H a
iff T
0
t n
1
n
2
2 ,
/ 2
(c) Derivation of the pooled-variance t-test (2-sided test) using the likelihood ratio test approach
Given that we have two independent random samples from two normal populations with equal but unknown variances. Now we derive the likelihood ratio test for:
5
H
0
: μ
1
= μ
2
vs H a
: μ
1
≠ μ
2
Let μ
1
= μ
2
= μ
={ −∞ < μ
1
, then,
= μ
2
= μ < +∞, 0 ≤ σ 2 < +∞ }, Ω = {−∞ < μ
1
, μ
2
< +∞, 0 < σ 2 < +∞}
L(ω) = L(μ, σ lnL(ω) = − n
1
2
2
) = (
1
2πσ 2
+n
2 ln(2πσ
)
2 n1+n2
2
) − partial derivatives with and σ exp [−
1
2σ 2
2
(∑ n
1
1
2σ 2 i=1
(∑
(x
μ̂ = i n
1 i=1
(x
− μ) 2
∑ n
1 i=1 x i n
1 i
− μ)
+ ∑
+ ∑ n
2 j=1 y j
+ n
2
2 n
2 j=1
+ ∑
(y j n
2 j=1
(y
− μ)
= n
1 x̅ + n
2 n
1
2 j
)
− μ)
+ n
2
2 y̅
)] , and there are two parameters .
, for it contains two parameters, we do the respectively and let the partial derivatives equal to 0. Then we have:
L(Ω) = L(μ
1
, μ
2
, σ 2 ) = (
1
2πσ 2
)
2
ω n
1
1
+ n
2 n1+n2
2 exp [−
1
2σ 2 n
1
[∑ (x
(∑ i=1 n
1 i=1
(x i i
− μ̂)
− μ
1
)
2
2 n
2
+ ∑ (y j j=1
+ ∑ n
2 j=1
(y j
− μ̂)
− μ
2
)
2
2
]
)] , and there are three parameters. lnL(Ω) = − n
1
+ n
2
2 ln(2πσ 2
We do the partial derivatives with μ
1
, μ
2
and σ 2
μ
1 2
2
Ω n
1
) −
1
2σ 2 n
1
(∑ (x i=1 i
− μ
1
) 2 n
2
+ ∑ (y j j=1
− μ
2
)
2
)
respectively and let them all equal to 0. Then we have:
1
− x̅) 2 − y̅)
2
]
+ n
2 n
1
[∑ (x i i=1 n
2
+ ∑ (y j j=1
At this time, we have done all the estimation of parameters. Then, after some cancellations/simplifications, we have:
= [
∑ n
1 i=1
(x i
λ =
−
∑ n
1 i=1 n
1 n
1
(x i
=
(
1 n
1
+n
2
2
2
ω
(
1
2πσ
̂ )
Ω n
1
+n
2
2
− x̅) x̅ + n
+ n
2
2 y̅
)
2
2
= [
σ
̂
Ω
2
ω
] n
1
+n
2
2
+ ∑ n
2 j=1
(y j
+ ∑ n
2 j=1
(y j
− y̅)
2
− n
1 x̅ + n
2 n
1
+ n
2 y̅
)
2
] n
1
+n
2
2 where t
0 significance level α
= [1 + n
1 t 2
0
+ n
2
− 2
]
− n
1
+n
2
2
is the test statistic in the pooled variance t-test. Therefore, λ ≤ λ ∗
is equivalent to |
, we reject the null hypothesis in favor of the alternative when |t
0 t
| ≥ c = t
0
| n
1
≥ c
+n
2
. Thus at the
−2, α /2
. This test is identical to the test we have derived in part (b).
6.
People at high risk of sudden cardiac death can be identified using the change in a signal averaged electrocardiogram before and after prescribed activities. The current method is about 80% accurate. The method was modified, hoping to improve its accuracy. The new method is tested on 50 people and gave correct results on 46 patients.
(a)
Is this convincing evidence that the new method is more accurate? Please test at α =.05.
6
(b) If the new method actually has 90% accuracy, what power does a sample of 50 have to demonstrate that the new method is better at α =.05?
(c) How many patients should be tested in order for this power to be at least 0.75?
Solution:
7