Data Analysis 1016-319

advertisement

Introduction to Statistics and Data Analysis

Chapter 11 – Hypothesis Test Basics

Criminal Trials in the United States

The jury is always told that the defendant is “innocent until proven guilty”.

1. What must a member of the jury assume about the defendant at the beginning of the trial? This is the null hypothesis.

H

0

: _______________________ (HINT: one word)

2. It is the prosecuting attorney’s job to present evidence to the jury. IF there is enough evidence

(“beyond a reasonable doubt”), then the jury will convict the defendant of the crime. If the defendant is convicted, the jury is rejecting the null hypothesis (above) and saying the defendant is _________.

This is the alternative hypothesis.

H a

: _______________________

3. When the jury convicts someone of a crime, their verdict is GUILTY.

Is this “Reject H

0

” OR “Fail to Reject H

0

”?

4. If the jury fails to convict someone of a crime, their verdict is NOT GUILTY.

Is this “Reject H

0

” OR “Fail to Reject H

0

”?

How does the verdict of “not guilty” differ from “innocent”?

5. Sometimes the jury makes a correct decision and sometime the jury makes a mistake. a. When H

0

is true, but we reject it based on the sample evidence, this is an error. We call it a Type I error. Write a sentence describing a Type I error in the U.S. criminal justice system. b. When H

0

is false, but we fail to reject it based on the sample evidence, this is also an error. We call it a Type II error. Write a sentence describing a Type II error in the U.S. criminal justice system.

Put each of the following in the correct place in the table below…

Type I Error Type II Error Correct Decision Correct Decision

Decision Based on Evidence (Data)

Reject H

0

Fail to Reject H

0

H

0

is true

TRUTH

(Unknown)

H

0

is false

(and Ha is true)

Criminal Trials in the United States

Solution to Hypothesis Test Basics

1. H

0

: Defendant Is Innocent

2. H a

: Defendant Is Guilty

3. Conviction = Guilty Verdict = “Reject H

0

4. Failure to Convict = NOT Guilty Verdict = “Fail to Reject H

0

“Not Guilty” indicates that there was not enough evidence to convict the defendant. This verdict makes no statement about whether or not the defendant committed the crime.

“Innocent” indicates that the defendant did not commit the crime.

5. Type I Error = Guilty Verdict When Defendant Is Innocent

6. Type II Error = NOT Guilty Verdict When Defendant is Guilty

Decision Based on Evidence (Data)

Reject H

0

Fail to Reject H

0

Type I Error Correct Decision

TRUTH

(Unknown)

H

0

is true

H

0

is false

(and Ha is true)

Correct Decision Type II Error

AP Statistics

Section 11.4

Type I & Type II Errors

Name: _______________________

Date: _________________

Block: ________

Medical Testing

Medical tests have been developed to detect many serious diseases (such as cancer and HIV). A medical test is designed to give correct results as often as possible. That is, to minimize the occurrence of “false positives” and “false negatives”.

A doctor starts by assuming that a patient is healthy (no disease), then looks for evidence to contradict that assumption. If the patient has a negative test result, the doctor continues to assume that the patient is healthy. If the patient has a positive test result, the doctor concludes that the patient has a disease.

A. State H

0

and H a

.

B. When will the doctor Reject H

0

?

C. When will the doctor Fail to Reject H

0

?

D. What kind of an error is a “false positive”? EXPLAIN.

E. What kind of an error is a “false negative”? EXPLAIN.

F. What are the consequences of a false positive? Of a false negative?

In the Movies…

A movie critic claims that, among children’s movies that show the use of tobacco, the mean exposure time is less than 2 minutes. i. Identify the population type and describe the population characteristic (in words). ii. State H

0

and H a iii. Describe a Type I error IN CONTEXT. iv. Describe a Type II error IN CONTEXT.

Potential Side Effects…

A researcher claims that over 1% of the people who take the drug Lipitor experience flu-like symptoms. i. Identify the population type and describe the population characteristic (in words). ii. State H

0

and H a iii. Describe a Type I error IN CONTEXT. iv. Describe a Type II error IN CONTEXT.

Solution To Type I and Type II Errors

Medical Testing

A. H

0

: Patient Is Healthy H a

: Patient Has Disease

B. Doctor rejects H

0

when test result is positive.

C. Doctor fails to reject H

0

when test result is negative.

D. A false positive is a Type I error (reject H

0

when H

0

is true).

E. A false negative is a Type II error (fail to reject H

0

when H

0

is false).

F. With a false positive, a person thinks they have a disease (and may start treatment) when they are healthy. With a false negative, a person doesn’t know they have a disease (and don’t start treatment).

1. Tobacco Exposure i. Numerical,

= mean tobacco exposure time in children’s movies that show the use of tobacco (in minutes) ii. H

0

:

= 2, H a

:

< 2 iii. The mean tobacco exposure time is (at least) 2 minutes, but the data leads us to believe that it is less than 2 minutes. iv. The mean tobacco exposure time is less than 2 minutes, but the data leads us to believe that it is

(at least) 2 minutes.

2. Lipitor i. Categorical (S/F = experience flu-like symptoms/don’t experience flu-like symptoms) p = proportion of all people taking Lipitor who experience flu-like symptoms ii. H

0

: p = 0.01, H a

: p > 0.01 iii. (At most) 1% of all people taking Lipitor experience flu-like symptoms, but the data lead us to believe that more than 1% do. iv. More than 1% of all people taking Lipitor experience flu-like symptoms, but the data lead us to believe that (at most) 1% do.

AP Statistics

Hypothesis Testing

Finding P-value

Practice finding probabilities for z and t… SHOW YOUR WORK!

You may use the z and t tables OR

You may use your calculator e.g. P(z < –1.07) = normalcdf(–

,–1.07, 0, 1)

OR P(t with 14 df > 2.52) = tcdf(2.52,

, 14)

Name: _______________________

Date: _________________

Block: ________

1. Finding probability for z.

A. P(z > 1.65)

B. P(z < –0.94)

C. P(z < –2.59 OR z > +2.59)

2. Finding probability for t with n = 10

How many degrees of freedom should you use for t? _______

A. P(t > 2.33)

B. P(t < –1.50)

C. P(t < –2.05 OR t > +2.05)

3. Finding probability for t with n = 20

How many degrees of freedom should you use for t? _______

A. P(t > 1.86)

B. P(t < –2.45)

C. P(t < –1.37 OR t > +1.37)

Television Viewers

The demographics of television viewers are an important factor in selling advertising time. The

RX pharmaceutical company would like to market a new acid-reflux medication to consumers under the age of 50. They are considering buying advertising time on the cable channel

MSNBC, if they find evidence that the average age of MSNBC viewers is under 50 years.

A. Determine the population type AND describe the population characteristic (in words).

B. State H

0

and H a

C. Is this a z or a t test statistic?

D. Suppose that a random sample of 60 MSNBC viewers had a test statistic value of –1.83.

Compute the p-value.

E. Based on your p-value from part D, is data at least as inconsistent with H

0

as our sample likely to occur when H

0

is true? EXPLAIN.

Female Students

At Rochester Institute of Technology, 34% of the students are female. The Department of

Mathematics and Statistics would like to know if the Data Analysis course has a different percentage of female students.

A. Determine the population type AND describe the population characteristic (in words).

B. State H

0

and H a

C. Is this a z or a t test statistic?

D. A random sample of 36 Data Analysis students had a test statistic value of 0.97. What is the p-value?

E. Based on your p-value from part D, is data at least as inconsistent with H

0

as our sample likely to occur when H

0

is true? EXPLAIN.

Solution to Computing the P-Value

Probabilities

1. z ~ N(0, 1)

A. P(z > 1.65) = normalcdf(1.65,  , 0, 1) = 0.0495

B. P(z < –0.94) = normalcdf(–  , –0.94, 0, 1) = 0.1736

C. P(z < –2.59 OR z > +2.59) = 1 – normalcdf(–2.59, 2.59, 0, 1) = 0.0096

2. t with df = 9

A. P(t > 2.33) = tcdf(2.33,  , 9) = 0.0224

B. P(t < –1.50) = tcdf(–  , –1.50, 9) = 0.0839

C. P(t < –2.05 OR t > +2.05) = 1 – tcdf(–2.05, 2.05, 9) = 0.0706

3. t with df = 19

A. P(t > 1.86) = tcdf(1.86,

, 19) = 0.0392

B. P(t < –2.45) = tcdf(–  , –2.45, 19) = 0.0121

C. P(t < –1.37 OR t > +1.37) = 1 – tcdf(–1.37, 1.37, 19) = 0.1867

Television Viewers

A. Numerical,

= mean age of all MSNBC viewers (in years)

B. H

0

:

= 50, H a

:

< 50

C. t

D. P(t with 59 df

–1.83) = tcdf(–

,–1.83, 59) = 0.0361

E. No, data at least as inconsistent with H

0

as the sample in part D is not likely to occur when H

0 is true (only a 3.61% chance)

Female Students

A. Categorical (S/F = female/male), p = proportion of all Data Analysis students who are female

B. H

0

: p = 0.34, H a

: p  0.34

C. z

D. P(z  –0.97 OR z  0.97) = 1 – normalcdf(–0.97, 0.97, 0, 1) = 0.3321

E. Yes, data at least as inconsistent with H

0

as the sample in part D is fairly likely to occur when

H

0

is true (33.21% chance)

AP Statistics

Section 12.2

Testing Hypotheses about p

Name:__________________

Date: _________________

Block: ________

Exercise 1

Left-Handedness Among the Elderly

Research indicates that 10% of all people are left-handed. A study of 1650 people age 65 and older contained only 83 lefties (“British Survey of Left-Handedness”, N. Bradley, The

Graphologist, 1992). Does this data provide evidence that the proportion of elderly people who are left-handed is smaller than the proportion in the general population?

A. POPULATION

 Determine the population type

 Describe (in words) the population characteristic

 State H

0

and H a

(using  or p)

 Set a reasonable level for α

B. SAMPLE

 Describe the sample: o For numerical data, determine n, X , and s. o For categorical data, determine n and 𝑝̂ .

 Check that the sample meets the necessary assumptions.

C. STATISTICAL METHOD

 State the test

 Write the formula of the test statistic (using the hypothesized value from H

0

)

 Compute the value of the test statistic

D. STATISTICAL RESULTS

 Sketch and label the appropriate curve.

 Find the p-value

E. CONCLUSION

 Reject H

0

OR Fail to Reject H

0

 Make a concluding statement

Solution to Testing Hypotheses About p

A. Categorical (S/F = left-handed/right-handed) p = proportion of left-handed people among those age 65 and older

H

0

: p = 0.10

H a

: p < 0.10

B.  = 0.05 z

  n

C. n = 1650 and p = 83/1650 = 0.0503

Assume that this is a random sample of all people 65 and older np

0

= 1650(0.1) = 165 AND n(1 – p

0

) = 1650(1 – 0.1) = 1485 np

0

10 and n(1 – p

0

)

10 so n is large (and p is normal)

The sample size is small compared to the population (all people age 65 and older) size.

D. z

 

 

6.73

1650 p-value = P(z < –6.73) = 8.59E-12

E. Is p-value   ? YES, so we REJECT H

0

The data provides sufficient evidence to conclude that the proportion of elderly people who are left-handed is smaller than the proportion in the general population.

NOTE: Why are there fewer lefties among the elderly? Bradley (1992) says, “One interpretation of this, which has appeared in the popular press, could be that left-handers die earlier than right-handers… An alternative view, however, is that older people were at school during the period when children were often being forced into the right-handed mould, and lefthandedness was suppressed. The wisdom of this unnaturalness was being questioned at the time, and so not all children were subjected to it, but it took time for the more liberated view to prevail completely.”

Exercise 2

A recent article in Chance Magazine (L. Evans, 2006) states that, “For every age, all the way through the mid-90s, male [driving] fatalities are typically 3 to 5 times that of female fatalities.”

In other words, at least 75% of driving fatalities are male. The data for the article

(www.scienceservingsociety.com/Dr.xls) indicates that, in 2003, 414 male and 120 female 20year-old drivers were killed while traveling alone.

Consider this data to be a random sample of all fatal crashes for 20-year-old drivers traveling alone. Does the data provide sufficient evidence to conclude that more than 75% of all fatal crashes for 20-year-old drivers traveling alone involve male drivers?

A. POPULATION

 Determine the population type

 Describe (in words) the population characteristic

 State H

0

and H a

(using  or p)

 Set a reasonable level for α

B. SAMPLE

 Describe the sample: o For numerical data, determine n, X , and s. o For categorical data, determine n and 𝑝̂ .

 Check that the sample meets the necessary assumptions.

C. STATISTICAL METHOD

 State the test

 Write the formula of the test statistic (using the hypothesized value from H

0

)

 Compute the value of the test statistic

D. STATISTICAL RESULTS

 Sketch and label the appropriate curve.

 Find the p-value

E. CONCLUSION

 Reject H

0

OR Fail to Reject H

0

 Make a concluding statement

Solution to 1-PropZTest

A. Categorical (S/F = male driver/female driver) p = proportion of male drivers among all fatal crashes for 20-year-old drivers traveling alone

H

0

: p = 0.75

H a

: p > 0.75

B.  = 0.05 z

  n

C. n = 414 + 120 = 534 and p = 414/534 = 0.7753

Assume that this is a random sample of all such crashes np

0

= 534(0.75) = 400.5 AND n(1 – p

0

) = 534(1 – 0.75) = 133.5 np

0

10 and n(1 – p

0

)

10 so n is large (and p is normal)

The sample size is small compared to the population (all fatal crashes for 20-year-old drivers traveling alone over time) size.

D. 1-PropZTest  z = 1.35, p-value = 0.0886

E. Is p-value

 

? NO, so we FAIL TO REJECT H

0

The data does NOT provide sufficient evidence to conclude that the proportion of male drivers among all fatal crashes for 20-year-old drivers traveling alone is greater than

0.75.

AP Statistics

Section 12.1

Testing Hypotheses about 

Name: ________________

Date: _________________

Block: ________

Exercise 1

Standard bracelet size is 7 inches for women and 8 inches for men (according to Reed’s

Jewelers). Do these sizes accommodate the average wrist size? In other words, is the average wrist size of all adults less than 7 inches (17.8 cm)?

Obtain a sample from the students in class:

 Using a metric tape measure, determine the size of each student’s wrist to the nearest 0.1 cm.

 Write the wrist sizes on the board.

Does this data provide sufficient evidence to conclude that the average wrist size for all adults is less than 17.8 cm?

A. POPULATION

 Determine the population type

 Describe (in words) the population characteristic

 State H

0

and H a

(using

or p)

 Set a reasonable level for α

B. SAMPLE

 Describe the sample: o For numerical data, determine n, X , and s. o For categorical data, determine n and p.

 Check that the sample meets the necessary assumptions.

C. STATISTICAL METHOD

 State the test

 Write the formula of the test statistic (using the hypothesized value from H

0

)

 Compute the value of the test statistic

D. STATISTICAL RESULTS

 Sketch and label the appropriate curve.

 Find the p-value (or critical value).

E. CONCLUSION

 Reject H

0

OR Fail to Reject H

0

 Make a concluding statement

Solution to T-Test

An example….

Students in an introductory statistics class measured their wrists to the nearest 0.1 cm. The 34 students had an average wrist size of 16.56 cm, with a standard deviation of 1.30 cm. Does the data provide sufficient evidence to conclude that the average wrist size for all adults is less than

17.8 cm?

A. Numerical (wrist size)

 = mean wrist size of all adults (in cm)

H

0

:  = 17.8

H a

:  < 17.8

B.

= 0.05 t

 s

 n

C. n = 34, X = 16.56, s = 1.30

Assume that this is a random sample of all adults n  30, so n is large (and X is approximately normal)

D. T-Test  t = -5.56, p-value = 0.000002

E. Is p-value

 

? YES, so we REJECT H

0

The data provides sufficient evidence to conclude that the mean wrist size of all adults is less than 17.8 cm (7 inches).

Exercise 2

A nutritionist claims that ready-to-eat breakfast cereal has about 100 calories per ounce, on average. A random sample of twelve ready-to-eat cereals provided the following nutritional information in the table below:

Cereal Name

Calories

Per Serving

190

Serving Size

(grams)

59

Calories

Per Ounce

91.14 Kellogg's Raisin Bran

Kellogg's Cocoa Krispies

Kellogg's Corn Flakes

120

100

31

28

Post Honey Bunches of Oats

Post Shredded Wheat

Post Honey Comb

Quaker Life

Quaker Puffed Rice

130

170

120

120

50

32

49

32

32

14

General Mills Cheerios

General Mills Lucky Charms

General Mills Wheaties

General Mills Wheat Chex

110

110

100

160

30

27

27

47

Because 1 ounce = 28.3 grams, we can compute the calories per ounce as follows:

Calories Per Ounce

Calories Per Serving

28.3 grams

Serving Size (grams) 1 ounce

For example, Kellogg’s Raisin Bran has 91.14 calories per ounce

Calories Per Ounce

190 calories

28.3 grams

59 grams 1 ounce

91.14

1. Compute the calories-per-ounce values for the remaining cereals in the sample. Write your results in the table above.

2. Determine if the sample provides sufficient evidence to contradict the nutritionist’s claim

(using steps A – E below).

A. POPULATION

 Determine the population type

 Describe (in words) the population characteristic

 State H

0

and H a

(using  or p)

 Set a reasonable level for 

B. SAMPLE

 Describe the sample: o For numerical data, determine n, X , and s. o For categorical data, determine n and p.

 Check that the sample meets the necessary assumptions.

C. STATISTICAL METHOD

 State the test

 Write the formula of the test statistic (using the hypothesized value from H

0

)

 Compute the value of the test statistic

D. STATISTICAL RESULTS

 Sketch and label the appropriate curve.

 Find the p-value (or critical value).

E. CONCLUSION

 Reject H

0

OR Fail to Reject H

0

 Make a concluding statement

Solution to Testing Hypotheses About

1. Computing Calories Per Ounce

Cereal Name

Kellogg's Raisin Bran

Kellogg's Cocoa Krispies

Kellogg's Corn Flakes

Calories Per Ounce

91.14

109.55

101.07

Post Honey Bunches of Oats

Post Shredded Wheat

Post Honey Comb

Quaker Life

114.97

98.18

106.13

106.13

Quaker Puffed Rice

General Mills Cheerios

General Mills Lucky Charms

General Mills Wheaties

General Mills Wheat Chex

101.07

103.76

115.30

104.82

96.34

2. Hypothesis Test

A. Numerical (calories per ounce)

= mean calories per ounce for all ready-to-eat cereals

H

0

:  = 100

H a

:   100

B.  = 0.05 t

 s

 n

C. n = 12, X = 104.04, s = 7.16

This is a random sample of all ready-to-eat cereals. n is small (< 30), so check a normal probability plot of the sample

Probability Plot of Calories Per Ounce

Normal

99

95

90

80

70

60

50

40

30

20

10

5

Mean

StDev

N

AD

P-Value

104.0

7.158

12

0.184

0.886

1

90 95 100 105 110 115 120 125

Calories Per Ounce

Normal probability plot is  straight line, so the distribution of the population is approximately normal (and X is normal).

D. t

7.16

12

1.95

p-value = P(t with 11 df > 1.95 OR < -1.95) = 0.077

E. Is p-value

 

? NO, so we FAIL TO REJECT H

0

The data does NOT provide sufficient evidence to conclude that the mean number of calories per ounce for all ready-to-eat cereals differs from 100. (Not sufficient evidence to contradict the nutritionist’s claim.)

AP Statistics Hypothesis Testing for Proportions FA11

Homework: Be prepared to present for a grade on Monday.

1.

Got milk? In November 2001, the Ag Globe Trotter newsletter reported that 90% of adults drink milk. A regional farmers’ organization planning a new marketing campaign across its multi-county area polls a random sample of 750 adults living there. In this sample, 657 people said that they drink milk. Doe these responses provide strong evidence that the 90% figure is not accurate for this region?

Correct the mistakes you find below in a student’s attempt to test an appropriate hypothesis:

: ˆ  0.9

H

A

: p  0.9

SRS

657

750

.876;

 

(0.88)(0.12)

0.012

750 z

  

2

0.012

 

(

 

There is more than a 97% chance that the stated percentage is correct for this region.

Choose two (3) of the following to complete. Each requires the use of a hypothesis test for proportions. Show all work, using SCAD as your guide.

2.

A magazine is considering the launch of an online edition. The magazine plans to go ahead only if it’s convinced that more than 25% of current readers would subscribe. The magazine contacts a simple random sample of 500 current subscribers, and 137 of those surveyed expressed interest. What should the company do?

3.

Census data for a certain county shows that 19% of the adult residents are Hispanic.

Suppose 72 people are called for jury duty, and only 9 of them are Hispanic. Does this apparent underrepresentation of Hispanics call into question the fairness of the jury selection system?

4.

A start-up company is about to market a new computer printer. It decides to gamble by running commercials during the Super Bowl. The company hopes that name recognition will be worth the high cost of the ads. The goal of the company is that over 40% of the public recognize its brand name and associate it with computer equipment. The day after the game, a pollster contacts 420 randomly chosen adults, and finds that 181 of them know that this company manufacturers printers. Would recommend that the company continue to advertise during Super Bowls?

5.

Some people are concerned that new tougher standards and high-stakes tests adopted in many states may drive up the high school dropout rate. The National Center for Education

Statistics reported that the high school dropout rate for the year 2000 was 10.9%. One school district, whose dropout rate has always been very close to the national average, reports that 210 of their 1782 students dropped out last year. Is their experience evidence that the dropout rate may be increasing?

6.

The National Center for Education Statistics monitors many aspects of elementary and secondary education nationwide. Their 1996 numbers are often used as a baseline to assess changes. In 1996, 34% of students had not been absent from school even once during the previous month. In the 2000 survey, responses from 8302 students showed that this figure had slipped to 33%. Officials would, of course, be concerned if student attendance were declining. Do these figures give evidence of a change in student attendance?

Download