Ch12-Notes

MGT 2120: Chapter 12

Tests of Goodness of Fit and Independence

§12.1 Goodness of Fit: A Multinomial Population

A population with each observation having only one of just a few possible values (or categories) is called Multinomial population.

Examples of Multinomial populations



Grade students receive in a class (A, B, C, D, or F)

 Opinion about an issue (Favor, Oppose, Don’t know)



Member of a political party (D, R, or Independent)

Using Goodness of fit tests, we test whether or not a given set of probabilities for the possible outcomes is valid. In other words, we test whether a given probability distribution for a multinomial population is valid or not.

H o

: The given probability distribution is valid (state the values for p

1

, p

1

,…, p k

)

H a

: The given probability distribution is not valid

Random sample: n = Sample size k = number of categories f i

= observed frequency for category i in the sample data

 f i

= n e i

= expected frequency for category i in the sample data e i

= n x p i

, where n = sample size

 2

Calc

Population value

Probability as in H o

Value 1 p

1

Value 2 p

2

…. ….

Value k p k

 p i

= 1

Observed frequency (f i

) f

1 f

2

… f k n =

 f i

Expected frequency (e i

) e

1

= n x p

1 e

2

= n x p

2

…. e k

= n x p k

(f

1

– e

(f

2

– e

2

) 2 /e

2

….

1

)

(f k

– e k

)

2

 2

Calc

=

2

/e

/e k k  i



1

1

( f i

 e i

)

2 e i

NOTE: All expected frequencies (e i

) must be > 5 for the test to be valid.

Test statistic =

 2

𝐶𝑎𝑙𝑐

= ∑ 𝑘 𝑖=1

(𝑓 𝑖

−𝑒 𝑖

)

2 𝑒 𝑖

Then, p-value = CHISQ.DIST.RT(

 2

Calc

,df), where df = k-1

§12.2 Test of Independence

 2

test is used to test whether or not two variables are independent.

Examples of test of independence



Is wage rate independent of gender?



Is smoking independent of age group?



Is the choice of major independent of race?

H o

: The two variables are independent

H a

: The two variables are NOT independent

Prepare a two-way table called Contingency Table for the two variables.

Observed frequencies (f ij

= observed frequency for row i and column j in the sample data)

Variable 1 values

Value 1

Value 2

….

Value n f f f

Value 1

11

21 n1

 f i1

= T

.1

f f f

12

22 n2

 f i2

Variable 2 values

Value 2

= T

.2

….

….

….

….

…. f f f

Value m

1m

2m nm

 f im

= T

.m



 f f

1j

2j

= T

= T

1.

2.

….

 f



T nj

= T n.

i.

= T n = Number of categories of the row variable, i.e. number of rows m = Number of categories of the column variable, i.e. number of columns e ij

=

T i.

∗T

.j

where, T i.

= Sum of row i; T

.j

= Sum of column j; and T = Grand sum

T

Expected frequencies (e ij

= expected frequency for category row i and column j)

Variable 1 values

Variable 2 values

Value 1 Value 2

….

Value 1 e

11

= (T

1.

*T

.1

)/T e

12

= (T

1.

*T

.2

)/T ….

Value 2 e

21

= (T

2.

*T

.1

)/T e

22

= (T

2.

*T

.2

)/T ….

…. ….

 e i1

= T

.1

….

 e i2

= T

.2

….

Value n e n1

= (T n.

*T

.1

)/T e n2

= (Tn

.

*T

.2

)/T ….

….

Value m e

1m

= (T

1.

*T

.m

)/T  e

1j

= T

1.

e

2m

= (T

2.

*T

.m

)/T

 e

2j

…. ….

= T

2.

e nm

= (T n.

*T

.m

)/T

 e nj

= T n.

 e im

= T

.m

NOTE: All expected frequencies (e ij

) must be > 5 for the test to be valid.

Test statistic =

 2

Calc

= ∑

𝑛 𝑖=1

∑

𝑚 𝑗=1

(𝑓 𝑖𝑗

−𝑒 𝑖𝑗

)

2

, where, df = (n-1)(m-1) 𝑒 𝑖𝑗 p-value = CHISQ.DIST.RT(

 2

Calc

,df)

§12.3 Goodness of Fit for Poisson and Normal distributions



2 test procedure for Poisson distribution :

H o

: The population follows Poisson distribution

H a

: The population does NOT follow Poisson distribution

H

0

: The number of customers entering the store during 5-minute intervals has a Poisson probability distribution

H a

: The number of customers entering the store during 5-minute intervals does not have a

Poisson distribution

Take a random sample and compute the sample mean 𝑥̅ and use it as the estimate for the mean of the Poisson distribution.

Mean = SUMPRODUCT(Range1, Range2)/SUM(Range 2)

Range 1 = Cell range that contains the X-values

Range 2 = Cell range that contains the Frequencies

The Poisson probability can be determined using the Excel function POISSON.DIST(X,Mean,0)

Table of observed and expected frequencies:

X Poisson probability with 𝑥̅ as the mean

Observed frequency (f i

)


)

 2

Calc

0 p

1

1 p

2 p k

 p

…. …. i

= 1 f

1 f

2

… f k n =

 f i e e

2

= n x p

2

…. e

1 k

= n x p

= n x p

1 k

(f

1

– e

1

)

2

/e

1

(f

2

– e

2

)

2

….

/e

2

(f k

– e k

)

2

/e k

 2

Calc

= k  i



1

( f i

 e i

)

2 e i

Make sure all expected frequencies (e i

) are > 5. If not, combine rows.

 p-value for this test is given by = CHISQ.DIST.RT(

 2

Calc

,df), where df = k-2



 2 test procedure for Normal distribution :

Step 1:

Set up the hypotheses:

Step 2:

H o

: The random variable (population) follows Normal distribution

H a

: The random variable (population) does NOT follow Normal distribution

Take a random sample of size n, and use:

Sample mean 𝑥̅ = Estimate for the population mean



Sample standard deviation S = Estimate for the population standard deviation



Step 3:

Set up the table of observed frequencies and expected frequencies. a.

Find number of classes, k = n/5 (round down) b.

Determine the probability per class interval (1/k) c.

Set up a column of cumulative probabilities for the class intervals (The cumulative probability for the last class must be = 1) d.

Using NORM.INV(Cumulative probability, 𝑥̅ , S) function determine the upper limit for each class.

Step 4: a.

Using the “Array Function” FREQUENCY, determine the observed frequencies. (Check to see the sum = n) b.

Compute expected frequency (e i

) = n/k for all classes. c.

Calculate the test statistic =

 2

𝐶𝑎𝑙𝑐

= ∑ 𝑘 𝑖=1

(𝑓 𝑖

−𝑒 𝑖

) 2 𝑒 𝑖 d.

Find the p-vale = CHISQ.DIST.RT(

 2

Calc

,df), with df = k-3 e.

Make a conclusion

Cumulative probability

Upper class limit Observed frequency

(f i

)


)

 2

Calc

1/k

2/k

…. k/k

NORM.INV(1/k,

NORM.INV(2/k,

NORM.INV(1, 𝑥̅ 𝑥̅ 𝑥̅

, S)

, S)

, S)

Use “Array”

FREQUENCY n = function

 f i f f

1 k e e

1 k

= n/k f

2 e

2

= n/k

… ….

= n/k

(f

1

– e

1

)

2

/e

1

(f

2

– e

2

)

2

….

/e

2

(f k

– e k

)

2

/e k

 2

Calc

= k  i



1

( f i

 e i

)

2 e i

Ch12-Notes

= ∑

∑

Related documents

Products

Support

Ch12-Notes

= ∑

∑

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib