Ch12-Notes

advertisement

MGT 2120: Chapter 12

Tests of Goodness of Fit and Independence

§12.1 Goodness of Fit: A Multinomial Population

A population with each observation having only one of just a few possible values (or categories) is called Multinomial population.

Examples of Multinomial populations

Grade students receive in a class (A, B, C, D, or F)

 Opinion about an issue (Favor, Oppose, Don’t know)

Member of a political party (D, R, or Independent)

Using Goodness of fit tests, we test whether or not a given set of probabilities for the possible outcomes is valid. In other words, we test whether a given probability distribution for a multinomial population is valid or not.

H o

: The given probability distribution is valid (state the values for p

1

, p

1

,…, p k

)

H a

: The given probability distribution is not valid

Random sample: n = Sample size k = number of categories f i

= observed frequency for category i in the sample data

 f i

= n e i

= expected frequency for category i in the sample data e i

= n x p i

, where n = sample size

 2

Calc

Population value

Probability as in H o

Value 1 p

1

Value 2 p

2

…. ….

Value k p k

 p i

= 1

Observed frequency (f i

) f

1 f

2

… f k n =

 f i

Expected frequency (e i

) e

1

= n x p

1 e

2

= n x p

2

…. e k

= n x p k

(f

1

– e

(f

2

– e

2

) 2 /e

2

….

1

)

(f k

– e k

)

2

 2

Calc

=

2

/e

/e k k  i

1

1

( f i

 e i

)

2 e i

NOTE: All expected frequencies (e i

) must be > 5 for the test to be valid.

Test statistic =

 2

𝐶𝑎𝑙𝑐

= ∑ 𝑘 𝑖=1

(𝑓 𝑖

−𝑒 𝑖

)

2 𝑒 𝑖

Then, p-value = CHISQ.DIST.RT(

 2

Calc

,df), where df = k-1

§12.2 Test of Independence

 2

test is used to test whether or not two variables are independent.

Examples of test of independence

Is wage rate independent of gender?

Is smoking independent of age group?

Is the choice of major independent of race?

H o

: The two variables are independent

H a

: The two variables are NOT independent

Prepare a two-way table called Contingency Table for the two variables.

Observed frequencies (f ij

= observed frequency for row i and column j in the sample data)

Variable 1 values

Value 1

Value 2

….

Value n f f f

Value 1

11

21 n1

 f i1

= T

.1

f f f

12

22 n2

 f i2

Variable 2 values

Value 2

= T

.2

….

….

….

….

…. f f f

Value m

1m

2m nm

 f im

= T

.m

 f f

1j

2j

= T

= T

1.

2.

….

 f

T nj

= T n.

i.

= T n = Number of categories of the row variable, i.e. number of rows m = Number of categories of the column variable, i.e. number of columns e ij

=

T i.

∗T

.j

where, T i.

= Sum of row i; T

.j

= Sum of column j; and T = Grand sum

T

Expected frequencies (e ij

= expected frequency for category row i and column j)

Variable 1 values

Variable 2 values

Value 1 Value 2

….

Value 1 e

11

= (T

1.

*T

.1

)/T e

12

= (T

1.

*T

.2

)/T ….

Value 2 e

21

= (T

2.

*T

.1

)/T e

22

= (T

2.

*T

.2

)/T ….

…. ….

 e i1

= T

.1

….

 e i2

= T

.2

….

Value n e n1

= (T n.

*T

.1

)/T e n2

= (Tn

.

*T

.2

)/T ….

….

Value m e

1m

= (T

1.

*T

.m

)/T  e

1j

= T

1.

e

2m

= (T

2.

*T

.m

)/T

 e

2j

…. ….

= T

2.

e nm

= (T n.

*T

.m

)/T

 e nj

= T n.

 e im

= T

.m

NOTE: All expected frequencies (e ij

) must be > 5 for the test to be valid.

Test statistic =

 2

Calc

= ∑

𝑛 𝑖=1

𝑚 𝑗=1

(𝑓 𝑖𝑗

−𝑒 𝑖𝑗

)

2

, where, df = (n-1)(m-1) 𝑒 𝑖𝑗 p-value = CHISQ.DIST.RT(

 2

Calc

,df)

§12.3 Goodness of Fit for Poisson and Normal distributions

2 test procedure for Poisson distribution :

H o

: The population follows Poisson distribution

H a

: The population does NOT follow Poisson distribution

H

0

: The number of customers entering the store during 5-minute intervals has a Poisson probability distribution

H a

: The number of customers entering the store during 5-minute intervals does not have a

Poisson distribution

Take a random sample and compute the sample mean 𝑥̅ and use it as the estimate for the mean of the Poisson distribution.

Mean = SUMPRODUCT(Range1, Range2)/SUM(Range 2)

Range 1 = Cell range that contains the X-values

Range 2 = Cell range that contains the Frequencies

The Poisson probability can be determined using the Excel function POISSON.DIST(X,Mean,0)

Table of observed and expected frequencies:

X Poisson probability with 𝑥̅ as the mean

Observed frequency (f i

)

Expected frequency (e i

)

 2

Calc

0 p

1

1 p

2 p k

 p

…. …. i

= 1 f

1 f

2

… f k n =

 f i e e

2

= n x p

2

…. e

1 k

= n x p

= n x p

1 k

(f

1

– e

1

)

2

/e

1

(f

2

– e

2

)

2

….

/e

2

(f k

– e k

)

2

/e k

 2

Calc

= k  i

1

( f i

 e i

)

2 e i

Make sure all expected frequencies (e i

) are > 5. If not, combine rows.

 p-value for this test is given by = CHISQ.DIST.RT(

 2

Calc

,df), where df = k-2

 2 test procedure for Normal distribution :

Step 1:

Set up the hypotheses:

Step 2:

H o

: The random variable (population) follows Normal distribution

H a

: The random variable (population) does NOT follow Normal distribution

Take a random sample of size n, and use:

Sample mean 𝑥̅ = Estimate for the population mean

Sample standard deviation S = Estimate for the population standard deviation

Step 3:

Set up the table of observed frequencies and expected frequencies. a.

Find number of classes, k = n/5 (round down) b.

Determine the probability per class interval (1/k) c.

Set up a column of cumulative probabilities for the class intervals (The cumulative probability for the last class must be = 1) d.

Using NORM.INV(Cumulative probability, 𝑥̅ , S) function determine the upper limit for each class.

Step 4: a.

Using the “Array Function” FREQUENCY, determine the observed frequencies. (Check to see the sum = n) b.

Compute expected frequency (e i

) = n/k for all classes. c.

Calculate the test statistic =

 2

𝐶𝑎𝑙𝑐

= ∑ 𝑘 𝑖=1

(𝑓 𝑖

−𝑒 𝑖

) 2 𝑒 𝑖 d.

Find the p-vale = CHISQ.DIST.RT(

 2

Calc

,df), with df = k-3 e.

Make a conclusion

Cumulative probability

Upper class limit Observed frequency

(f i

)

Expected frequency (e i

)

 2

Calc

1/k

2/k

…. k/k

NORM.INV(1/k,

NORM.INV(2/k,

NORM.INV(1, 𝑥̅ 𝑥̅ 𝑥̅

, S)

, S)

, S)

Use “Array”

FREQUENCY n = function

 f i f f

1 k e e

1 k

= n/k f

2 e

2

= n/k

… ….

= n/k

(f

1

– e

1

)

2

/e

1

(f

2

– e

2

)

2

….

/e

2

(f k

– e k

)

2

/e k

 2

Calc

= k  i

1

( f i

 e i

)

2 e i

Download