Additional Confidence Intervals and Hypothesis Testing Problems

advertisement

Additional Confidence Intervals and Hypothesis Testing Problems

1. A program was tested on 30 data sets, execution times were measured. Sample mean and deviation for execution times are: ¯ = 65 ms and s = 6 ms. Compute a 90% and a 95% confidence interval for the mean response time µ .

By CLT, the mean response time,

¯

, and has an approximate normal distribution.

The 90% confidence interval is:

¯ ± z ·

σ

√ n

65 ± 1 .

65 · √

6

30

65 ± 1 .

81 or (63 .

19 , 66 .

81)

The 95% confidence interval is:

¯ ± z ·

σ

√ n

65 ± 1 .

96 · √

6

30

65 ± 2 .

15 or (62 .

85 , 67 .

15)

2. Symple Symon Software has a large software development group. Their statistician has verified that X , the number of lines of code per programmer per week, has a normal distribution with mean 300 and standard deviation 40. Symple programmers all write code in their proprietary programming language, Symply Super. Six months ago they adopted a new programming paradigm.

A random sample of 30 programmers showed that they now produce on average 310 lines of code per week. Does the number of lines per week show an increase on a significance level of 5%? Show all five steps of hypothesis testing.

How large would your sample have to be to show a significant increase?

(a) H

0

: µ = 300

(b) H a

: µ > 300

(c) test statistic: Z =

¯ −

σ/

300 n has a standard normal distribution, if H

0 is true. A large positive value for z , the computed value of Z , counts as evidence against the null hypothesis in favor of the alternative

(d) z =

310 − 300

40 / 30

= 1 .

37

(e) The p -value is the probability to observe a value that is as (un)likely as the observation or more unlikely:

P ( Z > 1 .

37) = 1 − Φ(1 .

37) = 0 .

0853 .

The p -value is above 5% - we can not reject the null hypothesis.

1

The evidence points against the null hypothesis, however. If we increase the sample size, and still get the same sample mean of 310, we may be show an increase in productivity.

For that, we work the example backwards: We are looking for a z , so that P ( Z > z ) ≤

0 .

05 , z therefore is at least Φ

− 1

(0 .

95) = 1 .

65 .

Then

310 − 300

40 /

√ n

≥ 1

√ n ≥ 1 .

65 ·

⇒ n min

40

= 44

10

.

65

= 6

⇐⇒

.

6 ⇒ n ≥ 43 .

44

We need at least 44 programmers in the sample.

3. In sports, playing at home is considered a large advantage. For example, the Iowa

State men’s basketball team is often said to benefit from “Hilton Magic” when it plays at home.

(a) A simple random sample of 115 games from the 1995-1996 NBA professional basketball season (this is just under 10% of the total number of games played).

It turns out that the home team won 73 of those games. Test the hypothesis that the home team’s winning proportion is 0.5. Be sure to define the parameter you are interested in, state the null and alternative hypotheses, find the P-value, and state your conclusion.

(a) H

0

: p = 0 .

5

(b) H a

: p > 0 .

5

(c) test statistic: Z =

ˆ − 0 .

5 q

0 .

5 · (1 − 0 .

5) has a standard normal distribution, if H

0 is n true. A large positive value for z , the computed value of Z , counts as evidence

(d) against the null hypothesis in favor of the alternative z =

73

115

− 0 .

5 q

0 .

5 · (1 − 0 .

5)

115

= 2 .

89

(e) The p -value is the probability to observe a value that is as (un)likely as the observation or more unlikely:

P ( Z > 2 .

89) = 1 − Φ(2 .

89) = 0 .

0019 .

It is very unlikely that we would get sample results like this if there were no home court advantage → reject H

0 and say there is one.

(b) What conditions are required for the test in part (a) to be valid?

Need a simple random sample, require that population size is large, and np > 5 and n (1 − p ) > 5 . All conditions are satisfied here.

(c) Suppose we take a simple random sample of 225 games from the 1996 professional baseball season (this is just under 10% of the total number of games played). It turns out that the home team wins 124 of those games. Test the hypothesis that the proportion of games won by the home team is the same for the two sports against the alternative that the proportions are different.

2

(a) H

0

: p bask

= p base

(b) H a

: p bask

= p base

(c) test statistic: Z = √ ˆ bask

− ˆ base

ˆ (1 − ˆ ) q

1 nbask

+ dard normal distribution, if H

0 with ˆ = n bask p bask

+ n base n bask

+ n base

ˆ base has a stan-

1 nbase is true. Any large value for z , the computed value of Z , (positive and negative) counts as evidence against the null hypothesis in favor of the alternative

(d) computations:

ˆ = z =

73 + 124 197

= = 0 .

579

115 + 225 340

73 / 115 − 124 / 225 p

0 .

579 · 0 .

421 · (1 / 115 + 1 / 225)

= 1 .

48

(e) The p -value is the probability to observe a value that is as (un)likely as the observation or more unlikely:

P ( Z > 1 .

48) + P ( Z < − 1 .

48) = 2 · (1 − Φ(1 .

48)) = 0 .

1388 .

This p -value is not significant - it’s above 5% - therefore, we do not have enough evidence to reject the null hypothesis. We can not say, that there is a significant difference for the proportion of wins at home between basketball and baseball.

(d) Give a 95% confidence interval for the difference between the home team’s proportion of wins in basketball and baseball.

95% CI for p bask

p base is: p bask

− p ˆ ± 1 .

96 p

( .

635)( .

365) / 115 + ( .

551)( .

449) / 225 = 0 .

0837 ± 1 .

96 · 0 .

0558 = ( − .

026 , .

193) the critical value of 0 is inside the confidence interval - leaving us with the same conclusion as the hypothesis test

(e) The standard error (estimated standard deviation) that you use in the confidence interval (part (d)) is different than the standard error that you use in the test

(part (c)). Explain why they are different (i.e., what do you assume in carrying out the test that is not part of the confidence interval calculation).

The test assumes that both p’s are the same and uses the combined estimate of the home-field advantage (.579) rather then two separate home-field advantages.

In this case the standard error changes from .0566 to .0558 (not very much).

3

4. Two groups of students take part in a study to evaluate two different self-study courses,

A and B . 30 students take course A , 40 students are enrolled in course B . In a final test the average scores for the two groups are 89.6 for group A and 81.9 for group B .

The standard deviations are 3.6 for group A and 12.7 for group B . Do the results suggest that the courses A and B are significantly different?

Let µ

A andµ

B be the respective means for the two populations of students taking the two courses.

(a) H

0

: µ

A

= µ

B

(b) H a

: µ

A

= µ

B

(c) test statistic: Z = r x

A

σ

2

A nA

¯

+

B

σ

2

B nB has a standard normal distribution, if H

0 is true.

Any large value for z , the computed value of Z ,(positive and negative) counts as evidence against the null hypothesis in favor of the alternative

(d) computations: z =

89 .

6 − 81 .

9 p

3 .

6 2 / 30 + 12 .

7 2 / 40

= 3 .

64

(e) The p -value is the probability to observe a value that is as (un)likely as the observation or more unlikely:

P ( Z > 3 .

64) + P ( Z < − 3 .

64) = 2 · (1 − Φ(3 .

64)) = 0 .

This p -value is highly significant. We can reject the null hypothesis and assume that the courses A and B are different. By looking at the numbers we can conclude that course A is better than course B .

4

Download