Section 10.1 Goodness of Fit

Section 10.1

Goodness of Fit

Section 10.1 Objectives

•

Use the chi-square distribution to test whether a frequency distribution fits a claimed distribution

“chi” - kie (rhymes with “eye”) [see page 330]

- a distribution

- area under curve = 1

- all values of

χ 2 are greater than or equal to zero

- positively skewed

df = n -1

Multinomial Experiments

Multinomial experiment

•

A probability experiment consisting of a fixed number of independent trials in which there are more than two possible outcomes for each trial.

•

The probability for each outcome is fixed and each outcome is classified into categories .

•

Recall that a binomial experiment had only two possible outcomes.


Books on a shelf

Fiction

Non-fiction

Reference

60%

30%

10%

3 categories – type of books

If distribution of books is “true” and one had a library of 50 books, how many books of fiction would there be?

60% of 50 is 30 (50*.60= 30)

Non-fiction? 50*.30 = 15

Reference? 50*.10 = 5


3 categories

– type of books

If distribution of books is “ true ” and one had a library of 50 books, how many books of fiction would there be?

60% of 50 is 30 (50*.60= 30)

Non-fiction? 50*.30 = 15

Reference? 50*.10 = 5

These are Expected Values (given a sample of size n and assuming a given distribution is truth)

Expected value = number in sample size * assumed probability

E = n*p


•

A tax preparation company wants to determine the proportions of people who used different methods to prepare their taxes.

•

The company can perform a multinomial experiment.

• It wants to test a previous survey’s claim concerning the distribution of proportions of people who use different methods to prepare their taxes.

•

It can compare the distribution of proportions obtained in the multinomial experiment with the previous survey’s specified distribution.

•

It can perform a chi-square goodness-of-fit test.

Chi-Square Goodness-of-Fit Test


•

Used to test whether a frequency distribution fits an expected distribution.

•

The null hypothesis states that the frequency distribution fits the specified distribution.

•

The alternative hypothesis states that the frequency distribution does not fit the specified distribution.


•

Results of a survey of tax preparation methods.

Each outcome is classified into categories .

Distribution of tax preparation methods

Accountant 25%

By hand

Computer software

20%

35%

Friend/family

Tax preparation service

5%

15%

The probability for each possible outcome is fixed.


• To test the previous survey’s claim, a company can perform a chi-square goodness-of-fit test using the following hypotheses.

H

0

: The distribution of tax preparation methods is 25% by accountant, 20% by hand, 35% by computer software, 5% by friend or family, and 15 % by tax preparation service. (claim)

H a

: The distribution of tax preparation methods differs from the claimed or expected distribution.


•

To calculate the test statistic for the chi-square goodness-of-fit test, the observed frequencies and the expected frequencies are used.

•

The observed frequency O of a category is the frequency for the category observed in the sample data.


•

The expected frequency E of a category is the calculated frequency for the category.



Expected frequencies are obtained assuming the specified (or hypothesized) distribution. The expected frequency for the i th category is

E i

= np i where n is the number of trials (the sample size) and p i is the assumed probability of the category.

i th

Example: Finding Observed and

Expected Frequencies

A tax preparation company randomly selects 300 adults and asks them how they prepare their taxes. The results are shown at the right. Find the observed frequency and the expected frequency for each tax preparation method.

Survey results

(n = 300)

Accountant

By hand

Computer software

Friend/family


71

40

101

35

53

Solution: Finding Observed and


Observed frequency: The number of adults in the survey naming a particular tax preparation method

Survey results

(n = 300)

Accountant

By hand

71

40

101 Computer software

Friend/family


35

53 observed frequency

Solution: Finding Observed and


Expected Frequency: E i

= np i

Tax preparation method

% of people

Observed frequency

Expected frequency

Accountant 25% 71 300(0.25) = 75

By hand

Computer Software

Friend/family


20%

35%

5%

15%

40

101

35

53

300(0.20) = 60

300(0.35) = 105

300(0.05) = 15

300(0.15) = 45 n = 300


For the chi-square goodness-of-fit test to be used, the following must be true.

1.

The observed frequencies must be obtained by using a random sample.

2.

Each expected frequency must be greater than or equal to 5.


•

If these conditions are satisfied, then the sampling distribution for the goodness-of-fit test is approximated by a chi-square distribution with k

– 1 degrees of freedom, where k is the number of categories.

•

The test statistic for the chi-square goodness-of-fit test is

 2   ( O



E )

2

E

The test is always a right-tailed test.

where O represents the observed frequency of each category and E represents the expected frequency of each category.

Chi Square Goodness of Fit Test

In Words

1.

Identify the claim. State the null and alternative hypotheses.

2.

Specify the level of significance.

3.

Identify the degrees of freedom.

4.

Determine the critical value.

In Symbols

State H

0 and H a

.

Identify

α

.

d.f. = k

– 1

Use Table 6 in

Appendix B.

Chi Square Goodness of Fit Test

In Symbols In Words

5.

Determine the rejection region.

6.

Calculate the test statistic.

7.

Make a decision to reject or fail to reject the null hypothesis.

 2   ( O



E )

2

E

If

χ 2 is in the rejection region, reject H

0

.

Otherwise, fail to reject H

0

.

8.

Interpret the decision in the context of the original claim.

Example: Performing a Goodness of Fit Test

Use the tax preparation method data to perform a chisquare goodness-of-fit test to test whether the distributions are different. Use α = 0.01.

Distribution of tax preparation methods

Accountant

By hand

Computer software

Friend/family


25%

20%

35%

5%

15%

Survey results

(n = 300)

Accountant

By hand

Computer software

Friend/family


71

40

101

35

53

Solution: Performing a Goodness of Fit Test

•

H

0

:

The distribution is 25% by accountant, 20% by hand, 35% by computer software, 5% by friend/

•

H a

: family, and 15% by tax preparation service. (Claim)

The distribution of tax preparation methods differs from the claimed or expected distribution.

α = 0.01

• d.f. = 5 – 1 = 4

•

Rejection Region

•

Test Statistic:

•

Decision:

•

Conclusion:

Solution: Performing a Goodness of Fit Test method Observ ed

Expe cted

O-E (O-E) 2 (𝑶 − 𝑬) 𝟐

𝑬

(O) (E)

Accountant

By hand

71

40

75

60

-4

-20

16

400

.213

6.667

Computer hardware

Friend/famil y

Tax prep service

101

35

53

105

15

45

-4

20

8

16

400

64

.152

26.667

1.422

So compute “chi-square:



2   (



)

2

E

= .

213 + 6.667 + .152 + 26.667 + 1.422 = 35.121


•

H

0

:

The distribution is 25% by accountant, 20% by hand, 35% by computer software, 5% by friend/ family, and 15% by tax preparation service. (Claim)

•

H a

: The distribution of tax preparation methods

• α = differs from the claimed or expected distribution.

0.01

•

Test Statistic:

• d.f. = 5 – 1 = 4

χ 2 ≈ 35.121

•


•

Decision: Reject H

0

There is enough evidence at the 1% significance level to conclude that the distribution of tax preparation methods differs from the previous survey’s claimed or expected distribution.


A researcher claims that the number of different-colored candies in bags of dark chocolate M&M’s is uniformly distributed. To test this claim, you randomly select a bag that contains 500 dark chocolate M&M’s. The results are shown in the table on the next slide. Using

α

= 0.10, perform a chi-square goodness-of-fit test to test the claimed or expected distribution. What can you conclude?


Color Frequency

Brown 80

Yellow

Red

95

88

Blue

Orange

Green

83

76

78 n = 500

Solution:

•

The claim is that the distribution is uniform, so the expected frequencies of the colors are equal.

•

To find each expected frequency, divide the sample size by the number of colors.

•

E = 500/6 ≈ 83.3


•

H

0

: Distribution of different-colored candies in bags of dark chocolate M&M’s is uniform. (Claim)

•

H a

: Distribution of different-colored candies in bags of dark chocolate M&M’s is not uniform.

• α = 0.10

•

Test Statistic:

• d.f. = 6 – 1 = 5

•

Rejection Region •

Decision:

0 9.236

0.10

χ 2

•

Conclusion:


Color

Brown

Yello w

Red

Blue

Orang e

Observed frequency

80

95

88

83

76

Expected frequency

83.33

83.33

83.33

83.33

83.33

O – E

-3.33

11.67

4.67

-.33

-7.33

(O-E) 2 (𝑶 − 𝑬) 𝟐

11.0889

𝑬

.133

136.189

1.634

21.809

.1089

53.729

.262

.001

.645

78 83.33

-5.33

28.409

.341



2

Green

  (



)

E

2

= .133 + 1.634 + .262 + .001 + .645 + .341

= 3.016


•

H

0

: Distribution of different-colored candies in bags of dark chocolate M&M’s is uniform. (Claim)

•

H a

: Distribution of different-colored candies in bags of dark chocolate M&M’s is not uniform.

• α = 0.01

•

Test Statistic:

• d.f. = 6 – 1 = 5

•


χ 2 ≈ 3.016

0.10

•

Decision: Fail to Reject H

0

There is not enough evidence at the 10% level of significance to

0

3.016

9.236

χ 2 reject the claim that the distribution is uniform.

Section 10.1 Summary

•

Used the chi-square distribution to test whether a frequency distribution fits a claimed distribution

Section 10.1 Goodness of Fit

Related documents

Products

Support

Section 10.1 Goodness of Fit

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib