Chapter 10 WebNotes - Part 2

advertisement

Section 10.2: Estimating a Population Mean

Previously, when constructing confidence intervals, we made the assumption that we knew the true population standard deviation,

. This, of course, is a silly assumption. When we do not know the population standard deviation,

, we must estimate it from our sample using the sample standard deviation, s. However when we do so, the test statistic, z, that we previously used, changes. The new test statistic is now called the t-statistic, and has a new distribution (density curve) associated with it.

The new distribution is not exactly like the standard normal curve, but is very close:

It is still centered at zero

It is bell shaped

Its spread is slightly greater than the standard normal distribution

It has more probability in the tails

It is dependent on the sample size used to find s. This dependence on sample size is taken care of via “degrees of freedom”.

If you draw an SRS form a population that has a normal distribution with mean,

, and standard deviation,

, the t-statistic: t

 x

  s / n has the t distribution with n – 1 degrees of freedom.

Finding area probabilities under the t density curve can be done using Table-C in the back of the book or the t-cdf function in the calculator.

Using the t-distribution to construct confidence intervals and conduct hypothesis tests:

Conditions/Assumptions:

1.) The sample is an SRS ( SRS )

2.) Can we use the normal curve? ( Normality )

If the sample size is less than 30, make sure the original population is normally distributed (the way to check this is by looking at a normal probability plot, a stemplot, or a boxplot of the sample data. The most important thing to look for is the absence of serious outliers)

If the sample size is greater than 30 then the original population need not be normally distributed (because the CLT tells us that the distribution of sample means will be normally distributed anyway).

3.) individual observations are independent; when sampling without replacement, the population size N is at least 10 times the sample size n . ( Independence)

Once the conditions above are looked at, the procedure used to construct confidence intervals and conduct hypothesis testing using the t-statistic is the same as using the z-statistic.

Example: George is attempting to find out how many text messages the average DHS student sends in a month. He takes an SRS of 15 DHS students and gets the following numbers

107 116 153 56 44

187

109

36

87

145

138

107

121

29

91

So we will construct a 95% confidence interval for the average number of text messages that a DHS sends in a month.

Step 1 : Name the confidence interval you will be constructing

1-Sample t confidence interval

Step 2: Check the Conditions

1) Sample was an SRS (given)

OK

2) Since our sample size is less than 30 we need to make sure our sample comes from a normal population

When we use the TI83 to graph a normal probability plot and a boxplot we get:

The boxplot indicates that our distribution is approximately symmetric with no outliers. Our normal probability plot is approximately linear. Both are good indications that our data comes from an approximately normal distribution.

OK

3) We are sampling without replacement so we need to make sure N > 10 times n . DHS population is around 1300 which is certainly larger than 150.  OK

Robust Procedures : A confidence interval or hypothesis test is called robust if the confidence level or pvalue does not change very much when the assumptions of the procedure are violated. The t procedure is quite robust against non-normality of the population when there are no outliers.

Step 3 : Calculations

The t confidence interval formula is x

 t

 s n

Our x = 101.73, s = 45.62, n = 15 t

we can find by looking at table-B in our yellow stat formula packet: our sample size is 15 so our degrees of freedom are (15-1) = 14. We also want a 95% confidence level.

The t

value we get therefore is 2.145. s

Note: is called the standard error of the estimate (SE estimate

for short) n

So our confidence interval is: x

 t

 

SE estimate

101.73

2.145

45 .

62

15

=

101.73

2.145*11.779 =

101.73

25.266 =

(76.46, 126.99)

Step 4 : Interpret the t-interval

We are 95 percent confident that the true average number of texts a Darien High School student sends per month is in the interval we constructed.

5

6

7

8

1

2

3

4

1

2

3

4

Matched Pair t Procedure:

Matched pair experiments take on one of three forms:

1) Measurements on individual subjects – before and after a treatment is applied

Example: Giving the same students a pretest and a posttest to evaluate the effectiveness of an educational procedure.

2) Measurement on “naturally” occurring pairs – husband and wife, twins, etc.

Example: Asking a husband and wife to rate a particular movie and look at the difference in rating

3) Measurements on pairs blocked in order to eliminate certain effects that might otherwise obscure the differences due to the treatments being applied.

Example: A pharmaceutical company claims that it developed a new medication for lowering cholesterol.

It takes 15 random subjects who have high cholesterol and measures their cholesterol level before the study, and then 12 weeks after receiving the new medication. Construct a 95% confidence interval to estimate the average difference before and after taking the new medication.

Subject # Before After Subject # Before After

287

290

310

305

230

230

258

251

9

10

11

12

300

288

293

281

244

235

230

220

5

6

289

281

230

220

7

8

Solution:

320

299

270

240

Here are our observed differences:

Subject # Difference

13

14

15

311

290

287

252

230

236

57

60

52

54

59

61

50

59

Subject #

9

10

11

12

13

14

15

Difference

56

53

63

61

59

60

51

Step 1: Name the confidence interval

Matched pairs t-confidence interval

Step 2: Conditions

1) Sample is an SRS (given)

OK

2) Since the sample size is less than 30, we need to check that our sample differences come from an approximately normal population.

Our normal probability plot is somewhat linear. Our boxplot, though not perfectly symmetric, has no outliers. We feel somewhat comfortable proceeding based on this evidence.  OK

Step 3: construct the interval x

57 s s = 4.088 x

 t

* n or display it (54.74, 59.26)

57

2 .

145

4 .

088

15

57

2 .

26

Step 4: Conclusion/Interpretation of the Interval

We are 95% confident that the new medication will produce an average drop in cholesterol that is found in the interval we constructed.

Section 10.3: Estimating a Population Proportion

Whereas in the last section we concentrated on a confidence interval for a population mean using the t- confidence interval, we will now learn how to do a confidence interval for proportions (back to the z-test).

The process of constructing a confidence interval is essentially the same for proportions as it is for means, except for some assumptions and calculations of the test statistic.

Example: What proportion of Fairfield County high school seniors apply to college early? An SRS of 25 seniors in FFLD county was asked whether or not they applied to college early. 15 out of the 25 said they applied early. Construct a 95% confidence interval for the proportion of Fairfield County high school seniors who apply to college early.

Step 1 : Name the Confidence interval

1 proportion z confidence interval

Step 2 : Conditions

1) Sample is an SRS of the population (given)

OK

2) The population is more than 10 times as large as the sample size. (There are way more than 250 seniors so we are safe)

OK

3) n p

10 and n ( 1

)

10 where p = proportion found in our sample (5 in some books) n p

25

(

15

25

)

15

10

Step 3: Calculations estimate

Z

SE estimate and or more specifically for our case:

 z

 

( 1

 n

)

15

25

1 .

96

0 .

6 ( 1

0 .

6 )

25

 n ( 1

)

25 ( 1

15

25

)

10

10

OK

0 .

6

1 .

96

0 .

6 ( 0 .

4 )

0 .

6

0 .

192

25

(0.408, 0.792)

Step 4: Interpretation

I am 95% confident that true proportion of high school seniors in Fairfield County who apply to college early is somewhere between 40.8% and 79.2%

Choosing the Sample Size:

When we designed confidence intervals for population means we often times researched ahead of time how large of a sample we would need in order to get a certain margin of error with certain confidence.

This notion also holds true when designing confidence intervals for proportions. The following are the algebraic steps in deriving the formula:

1) The margin of error for a confidence interval for p is m

 z

ˆ

( 1

 ˆ

)

. n

2) Since we don’t know p-hat until we actually conduct the study we have a problem. We can then do one of two things: a) Use an estimated p-hat based on previous studies b) Use a conservative p-hat = 0.5. Using a p-hat of 0.5 gives the largest possible margin of error for any given z or n. We will use this option most of the time.

3) Square both sides: m

2 

( z

)

2

(

0 .

5 ( 1

 n

0 .

5 )

)

4) Divide both sides by (

)

2 z (0.5 * 0.5):

( z

) m 2

2 

0 .

25

1 n

5) n

0 .

25

( z

)

2 m

2

Example: Before conducting nationwide polling prior to a national election, most good companies will decide ahead of time how many people to include in the poll based on a particular margin of error and confidence interval. Lets say that Gallop wants to conduct a poll to see what proportion of the voters would vote for Bush if George Bush were running for president today against John Kerry? Gallop wants to be 95% confident with a margin of error no more than 3 percentage points. How many people must they randomly sample in order to achieve this? n

0 .

25

( z

)

2 m 2

0 .

25

( 1 .

96 )

2

( 0 .

03 ) 2

0 .

9604

.

0009

1067 .

11 n = 1068 people

Download