AMS572.01 Practice Final Exam Fall, 2013

advertisement

AMS572.01

Practice Final Exam Fall, 2013

Name ___________________________________ID ______________________Signature________________________

Instruction: This is a close book exam. Anyone who cheats in the exam shall receive a grade of F. Please provide complete solutions for full credit. The exam goes from 11:15am - 1:45pm. Good luck!

1.

The following table gives the amount of additive ( x ) and the reduction in nitrogen oxides ( y ) in 7 cars.

Amount of additive ( x ) 1 2 3 4 5 6 7

4.8 Reduction in nitrogen oxide ( y ) 2.5 3.1 3.8 3.2 3.9 4.4

(a) Find the least squares regression line.

(b) Test at α = 0.05

whether there is a significant linear relationship between these two variables.

(c) What percentage of variation in nitrogen oxide is explained by the amount of additive?

(d) Please write up the entire SAS code necessary to answer questions (a), (b), (c) above.

Solution: This is a simple linear regression problem.

(a) 𝑛 = 7, 𝑥̅ = 4, 𝑦̅ = 3.67

𝑆 𝑥𝑦

= ∑ 𝑥𝑦 − 𝑛𝑥̅𝑦̅ = 112.4 − 7 ∗ 4 ∗ 3.67 = 9.6

𝑆 𝑥𝑥

= ∑ 𝑥 2 − 𝑛𝑥̅ 2 = 140 − 7 ∗ 4 2 = 28

𝑆 𝑦𝑦

= ∑ 𝑦 2 − 𝑛𝑦̅ 2 = 98.15 − 7 ∗ 3.67

2 = 3.79

𝛽̂

0

The fitted least square regression line is: 𝛽̂

1

= 𝑦̅ − 𝛽̂

1

=

𝑆 𝑥𝑦

𝑆 𝑥𝑥

=

9.6

28

= 0.343

𝑥̅ = 3.67 − 0.343 ∗ 4 = 2.298

𝑦̂ = 2.298 + 0.343𝑥

(b) The mean square error estimate of σ is:

𝑆𝑆𝐸

̂ = √𝑀𝑆𝐸 = √ 𝑛 − 2

= √

𝑆𝑆𝑇 − 𝑆𝑆𝑅 𝑛 − 2

= √

𝑆 𝑦𝑦

− 𝛽̂ 2

1 𝑛 − 2

𝑆 𝑥𝑥

= √

3.79 − 0.343

7 − 2

2 ∗ 28

= 0.315

The hypotheses are: 𝐻

0

Test statistic:

: 𝛽

1

= 0 versus 𝐻 𝑎

: 𝛽

1

≠ 0 𝑡

0

= 𝛽̂

1

− 0

SE(𝛽̂

1

)

= 𝛽̂

1

=

0.343

0.315

= 5.76 > 𝑡

5,0.025

= 2.571

√𝑆 𝑥𝑥

√28

Therefore we reject the null hypothesis at α = 0.05

and conclude that there is a significant linear relationship between these two variables.

(c)

𝑅 2

𝑆 𝑥𝑦

2

9.6

2

=

𝑆 𝑥𝑥

𝑆 𝑦𝑦

=

28 ∗ 3.79

= 0.8684

Therefore we claim that 86.84% of variation in nitrogen oxide is explained by the amount of additive.

(d)

Data nitro_ox; input x y; datalines ;

1 2.5

2 3.1

3 3.8

1

4 3,2

5 3.9

6 4.4

7 4.8

; run ; proc reg data = nitro_ox; model y = x; run ;

2.

Based on interviews of couples seeking divorces, a social worker compiles the following data related to the period of acquaintanceship before marriage and the duration of marriage:

Acquaintanceship before marriage

Under 0.5 years

≤ 4 years

11 (10)

Duration of marriage:

> 4 years

8 (9)

0.5 – 1.5 years

Over 1.5 years

28 (28)

21 (22)

24 (24)

19 (18)

(a) Perform a test to determine if there is a relationship between the period of acquaintanceship before marriage and the duration of marriage. Use α = 0.05

.

(b) Please write up the entire SAS code necessary to answer question (a) above.

Solution: This is a two-way contingency table problem.

(a) We are performing a test for independence (multinomial sampling).

H

0

: π ij

= π i.

∗ π

.j

, for all i, j

H a

: the above is not true

Let n i.

∗ n

.j

e ij

= , for all i, j n

∙∙

(Note: for simplicity, I rounded the expected values to integers – but in reality, one does not need to do so.)

The test statistic is: 𝑟 𝑐 𝜒 2

0

= ∑ ∑

(𝑛 𝑖𝑗

−𝑒 𝑖𝑗

)

2 𝑒 𝑖𝑗 𝑖=1 𝑗=1

3 2

= ∑ ∑

(𝑛 𝑖𝑗

−𝑒 𝑖𝑗 𝑒 𝑖𝑗

)

2

=

1

10

+

1

22

+

1

9

+

1

18

= 0.312 < 𝜒 2

2,0.05

= 5.991

𝑖=1 𝑗=1

We could not reject the null hypothesis and conclude that we do not have enough evidence to show any relationship between the period of acquaintanceship before marriage and the duration of marriage.

(b)

data marriage; input acquint $ duration $ number; datalines; short le4 11 short gt4 8 med le4 28 med gt4 24 long le4 21 long gt4 19

;

run;

proc freq data=marriage;

weight number;

2

tables acquint*duration / chisq ;

run;

3.

The following table records the observed number of births at a hospital in four consecutive quarterly periods.

Quarters Jan-Mar

Number of births 110

Apr-June

57

July-Sept

53

Oct-Dec

80

(a) It is conjectured that twice as many babies are born during the Jan-Mar quarter than are born in any of the other three quarters. At α = 0.05

, test if these data strongly contradict the stated conjecture.

(b) Please write up the entire SAS code necessary to answer question (a) above.

Solution: This is a one-way contingency table problem.

(a) We are performing a Chi-square goodness of fit test.

H

0

: P

JM

= 0.4, P

AJ

= P

JP

= P

OD

H a

: the above is not true

= 0.2

𝜒 2

0

The test statistic is:

(110 − 300 ∗ 0.4) 2

=

300 ∗ 0.4

= 7.815

+

(57 − 300 ∗ 0.2)

300 ∗ 0.2

2

+

(53 − 300 ∗ 0.2)

300 ∗ 0.2

2

+

(80 − 300 ∗ 0.2)

300 ∗ 0.2

2

= 8.47 > 𝜒 2

3,0.05

We reject the null hypothesis and conclude that these data strongly contradict the stated conjecture.

.(b)

DATA BIRTH;

INPUT QUARTER $ NUMBER;

DATALINES ;

Jan-Mar 110

Apr-Jun 57

Jul-Sep 53

Oct-Dec 80

;

* HYPOTHESIZING A 2:1:1:1 RATIO;

PROC FREQ DATA =BIRTH ORDER =DATA; WEIGHT NUMBER;

TITLE3 'GOODNESS OF FIT ANALYSIS' ;

TABLES QUARTER / CHISQ NOCUM TESTP =( 0.4 0.2 0.2 0.2

);

RUN ;

4.

Suppose the National Transportation Safety Board (NTSB) wants to examine the safety of compact cars, midsize cars, and full-size cars. It collects a sample of three for each of the cars types.

Compact

643

655

702

Midsize cars

469

427

525

Full-size cars

484

456

402

(a) Using the hypothetical data provided below, test at α = 0.05

whether the mean pressure applied to the driver’s head during a crash test is equal for each types of car. What assumptions are necessary for your test?

(b) Please write up the entire SAS code necessary to answer question (a) above.

Solution: This is a one-way ANOVA problem with 3 independent samples.

(a) We need to perform an ANOVA F-test. The first assumption is that all three populations are normal. The second is that all three population variances are unknown but equal.

H

0

: μ

1

= μ

2

= μ

3

3

H a

: the above is not true

Source

Car Type

Error

Total

SS

86049.55

10254

96303.55

Analysis of Variance d.f.

2

6

8

MS

43024.78

1709

F

25.17

.(b)

Since F

0

= 25.17 > F

2,6,0.05

= 5.14

, we reject the null hypothesis, and claim that the mean pressures applied to the driver’s head during a crash test are NOT all equal for these three types of car. data car; input type $ pressure; datalines ;

;

Compact 643

Compact 655

Compact 702

Midsize 469

Midsize 427

Midsize 525

Fulsize 484

Fulsize 456

Fulsize 402 run ; proc anova data = car; class type; model pressure = type; run ;

5.

The length of time to recovery was recorded for patients randomly assigned and subjected to two different surgical procedures. The data (recorded in days) are as follows:

Procedure 1 Procedure 2

Sample mean

Sample variance

7.3

1.23

8.9

1.49

Sample size 11 13

(a) Test at α = 0.01

whether the data present sufficient evidence to indicate a difference between the mean recovery times for the two surgical procedures. What assumptions are necessary? Test the assumptions necessary if you can.

(b) Please derive the corresponding general test using the pivotal quantity method. Please derive the pivotal quantity and its distribution, list the test statistic, and derive the rejection region for a 2-sided test at the significance level of α.

(c) (extra credit) Please derive the general test using the likelihood ratio test method. Prove whether this test is equivalent to the one derived using the pivotal quantity method in part (b).

Solution:

(a) Inference on two population means. Two small and independent samples.

Procedure 1: 𝑋̅ = 7.3, 𝑆

Procedure 2: 𝑌̅ = 8.9, 𝑆

2

1

2

2

= 1.23, 𝑛

1

= 1.49, 𝑛

2

= 11

= 13

4

[1] Under the normality assumption, we first test if the two population variances are equal. That is, H

0

H a

:

2

2  

1

2

. The test statistic is

𝐹

0

=

𝑆

2

2

𝑆

2

1

=

1.49

1.23

= 1.21 < 𝐹

12,10,0.05,𝑈

We cannot reject H

0

-- it is reasonable to assume that

1

2  

2

2

.

= 2.91

:

1

2   2

2

versus

[2] This is inference on two population means, independent samples. The first assumption is that both populations are normal. The second is the equal variance assumption which we have checked in (a) [1].

Now we perform the pooled-variance t-test with hypotheses t

0

S p

X

1 /

Y n

1

1 /

0 n

2

H

0

:

1

7 .

3

1 .

37

1 /

2

8 .

9

0

0

11

1 /

versus

13

H a

3 .

33

:

2

0

Since |t

0

| = 3.33 > t

22,0.005

= 2.819

, we reject H

0

and conclude that the data present sufficient evidence to indicate a difference between the mean recovery times for the two surgical procedures at the significance level of 0.01.

(b) Derivation of the pooled-variance t-test (2-sided test) using the pivotal quanity approach

Suppose we have two independent random samples from two normal populations: and

H a

:

Y Y

1 2

, , Y n

2

~ N

1

 

2

0

 

2

,

2

. Here is a simple outline of the derivation of the test: using the pivotal quantity approach.

,

2

, , X n

1

~ N

H

0

:

1

 

2

 

2

,

0 versus

[1]. We start with the point estimator for the parameter of interest

N

1

 

2

,

2

1 / n

1

1 / n

2

 

using the mgf for N properties of the random samples. From this we have

Z

,

2

which is

X

Y

1 /

 n

1

M

1

1 /

 n

2

2

  

1

 exp

~

N

2

 t

 as the pivotal quantity because σ is unknown.

:

2 t

X

Y

. Its distribution is

2 / 2

, and the independence

. Unfortunately, Z can not serve

[2]. We next look for a way to get rid of the unknown σ following a similar approach in the construction of the pooledvariance t-statistic. We found that W

  n

1

1

S

1

2 

 n

2

1

S

2

2

/

2

~

2 n

1

 n

2

2

using the mgf for

 k

2

which is k / 2

1

M

, and the independence properties of the random samples.

2 t

[3]. Then we found, from the theorem of sampling from the normal population, and the independence properties of the random samples, that Z and W are independent, and therefore, by the definition of the t-distribution, we have obtained our pivotal quantity: T

X

S p

Y

1

/

 n

1

 

1

1

/ n

2

2

~ t n

1

 n

2

2

, where S

2 p

 n

1

1

 n

1

S

1

2

 n

2

 n

2

2

1

S

2

2

is the pooled sample variance.

[4]. The rejection region is derived from P

T

0

 c | H

0

 

, where T

0

S p

X

Y

1 / n

1

1 /

0 n

2

H

0

~ t n

1

 n

2

2

. Thus c

 t n

1

 n

2

2 ,

/ 2

. Therefore at the significance level of α, we reject H

0 in favor of H a

iff T

0

 t n

1

 n

2

2 ,

/ 2

(c) Derivation of the pooled-variance t-test (2-sided test) using the likelihood ratio test approach

Given that we have two independent random samples from two normal populations with equal but unknown variances. Now we derive the likelihood ratio test for:

5

H

0

: μ

1

= μ

2

vs H a

: μ

1

≠ μ

2

Let μ

1

= μ

2

= μ

={ −∞ < μ

1

, then,

= μ

2

= μ < +∞, 0 ≤ σ 2 < +∞ }, Ω = {−∞ < μ

1

, μ

2

< +∞, 0 < σ 2 < +∞}

L(ω) = L(μ, σ lnL(ω) = − n

1

2

2

) = (

1

2πσ 2

+n

2 ln(2πσ

)

2 n1+n2

2

) − partial derivatives with and σ exp [−

1

2σ 2

2

(∑ n

1

1

2σ 2 i=1

(∑

(x

μ̂ = i n

1 i=1

(x

− μ) 2

∑ n

1 i=1 x i n

1 i

− μ)

+ ∑

+ ∑ n

2 j=1 y j

+ n

2

2 n

2 j=1

+ ∑

(y j n

2 j=1

(y

− μ)

= n

1 x̅ + n

2 n

1

2 j

)

− μ)

+ n

2

2 y̅

)] , and there are two parameters .

, for it contains two parameters, we do the respectively and let the partial derivatives equal to 0. Then we have:

L(Ω) = L(μ

1

, μ

2

, σ 2 ) = (

1

2πσ 2

)

2

ω n

1

1

+ n

2 n1+n2

2 exp [−

1

2σ 2 n

1

[∑ (x

(∑ i=1 n

1 i=1

(x i i

− μ̂)

− μ

1

)

2

2 n

2

+ ∑ (y j j=1

+ ∑ n

2 j=1

(y j

− μ̂)

− μ

2

)

2

2

]

)] , and there are three parameters. lnL(Ω) = − n

1

+ n

2

2 ln(2πσ 2

We do the partial derivatives with μ

1

, μ

2

and σ 2

μ

1 2

2

Ω n

1

) −

1

2σ 2 n

1

(∑ (x i=1 i

− μ

1

) 2 n

2

+ ∑ (y j j=1

− μ

2

)

2

)

respectively and let them all equal to 0. Then we have:

1

− x̅) 2 − y̅)

2

]

+ n

2 n

1

[∑ (x i i=1 n

2

+ ∑ (y j j=1

At this time, we have done all the estimation of parameters. Then, after some cancellations/simplifications, we have:

= [

∑ n

1 i=1

(x i

λ =

∑ n

1 i=1 n

1 n

1

(x i

=

(

1 n

1

+n

2

2

2

ω

(

1

2πσ

̂ )

Ω n

1

+n

2

2

− x̅) x̅ + n

+ n

2

2 y̅

)

2

2

= [

σ

̂

Ω

2

ω

] n

1

+n

2

2

+ ∑ n

2 j=1

(y j

+ ∑ n

2 j=1

(y j

− y̅)

2

− n

1 x̅ + n

2 n

1

+ n

2 y̅

)

2

] n

1

+n

2

2 where t

0 significance level α

= [1 + n

1 t 2

0

+ n

2

− 2

]

− n

1

+n

2

2

is the test statistic in the pooled variance t-test. Therefore, λ ≤ λ ∗

is equivalent to |

, we reject the null hypothesis in favor of the alternative when |t

0 t

| ≥ c = t

0

| n

1

≥ c

+n

2

. Thus at the

−2, α /2

. This test is identical to the test we have derived in part (b).

6.

People at high risk of sudden cardiac death can be identified using the change in a signal averaged electrocardiogram before and after prescribed activities. The current method is about 80% accurate. The method was modified, hoping to improve its accuracy. The new method is tested on 50 people and gave correct results on 46 patients.

(a)

Is this convincing evidence that the new method is more accurate? Please test at α =.05.

6

(b) If the new method actually has 90% accuracy, what power does a sample of 50 have to demonstrate that the new method is better at α =.05?

(c) How many patients should be tested in order for this power to be at least 0.75?

Solution:

7

Download