Document

advertisement
Goodness of Fit Test
1. State the null and alternative hypotheses.
2. Select a random sample and record observed
frequency fi for the ith category (k categories).
3. Compute expected frequency ei for the ith category:
ei  n  pi
4. Compute the value of the test statistic.
2
k
(
f

e
)
 2 -stat   i i
ei
i 1
if ei > 5, this has a chi-square distribution
5.
Reject H0 if
 2 -stat  2
df = k – 1
Goodness of Fit Test
Example: Finger Lakes Homes (A)
k=4
Finger Lakes Homes manufactures four models of
prefabricated homes, a two-story colonial, a log cabin, a
split-level, and an A-frame. To help in production
planning, management would like to determine if
previous customer purchases indicate that there is a
preference in the style selected.
The number of homes sold of each model for 100
sales over the past two years is shown below.
SplitAModel Colonial Log Level Frame
# Sold
30
20
35
15
Goodness of Fit Test
ei = (n)(pi)
Hypotheses
1/4
H0: pC = pL = pS = pA = .25
Ha: customers prefer a particular style
i.e., there is at least one proportion much
greater than .25
Expected frequencies
e1 = (0.25)(100) = 25
e2 = (0.25)(100) = 25
e3 = (0.25)(100) = 25
e4 = (0.25)(100) = 25
(30  25)2 (20  25) 2 (35  25) 2 (15  25)2
 -stat  10



25
25
25
25
2
Goodness of Fit Test
df = 4 – 1 = 3 (row)
2
.05
 7.815
 = .05 (column)
Do Not Reject H0
Reject H0
.05
m=3
7.815
10
 2 -stat
At 5% significance, the assumption that there is no
home style preference is rejected.
2
Independence Test
1. State the null and alternative hypotheses.
2. Select a random sample and record observed
frequency fi for each cell of the contingency table.
3. Compute expected frequency eij for each cell
(Row i Total)(Column j Total)
eij 
n
4. Compute the test statistic.
2
(
f

e
)
ij
ij
 2 -stat  
eij
i
j
if ei > 5, this has a chi-square distribution
5. Reject H0 if  -stat  
2
2
df = (m - 1)(k - 1)
Independence Test
Example: Finger Lakes Homes (B)
Each home sold by Finger Lakes Homes can be
classified according to price and to style. Finger Lakes’
manager would like to determine if the price of the home
and the style of the home are independent variables.
The number of homes sold for each model and price
k = 4 For convenience,
for the past two years is shown below.
the price of the home is listed as either $99,000 or less or
more than $99,000.
m=2
Price Colonial
< $99,000
18
> $99,000
12
Log
6
14
Split-Level A-Frame
19
12
16
3
Independence Test
Observed Frequencies (fi)
Price
Colonial
Log
> $99K
18
12
6
14
19
16
12
3
55
45
Total
30
20
35
15
100
< $99K
Split-Level A-Frame
Total
Expected Frequencies (ei)
Price
Colonial
Log
11
19.25
8.25
> $99K
16.5
13.5
9
15.75
6.75
55
45
Total
30
20
35
15
100
< $99K
Split-Level A-Frame
Total
Independence Test
Hypotheses
H0: Price of the home is independent of the
style of the home that is purchased
Ha: Price of the home is not independent of the
style of the home that is purchased
Compute test statistic
2
2
2
2
(
18

16.5
)
(
6

11
)
(
19

19.25
)
(
12

8.25
)
 2 -stat 



16.5
11
19.25
8.25
(12  13.5) 2 (14  9) 2 (16  15.75) 2 (3  6.75) 2




13.5
9
15.75
6.75
 9.145
Independence Test
df = (4 – 1)(2 – 1) = 3 (row)  = .05 (column)
Do Not Reject H0
2
.05
 7.815
Reject H0
.05
m=3
7.815
9.145
2
2
 -stat
At 5% significance, we reject the assumption that the price of the
home is independent of the style of home that is purchased.
Goodness of Fit Test: Poisson Distribution
1. Set up the null and alternative hypotheses.
2. Select a random sample and
a. Record observed frequencies
b. Estimate mean number of occurrences m
3. Compute expected frequency of occurrences ei for each
value of the Poisson random variable.
m x e m
f ( x) 
x!
ei  n  f ( xi )
4. Compute the value of the test statistic.
2
(
f

e
)
 2 -stat   i i
ei
i 1
k
2
2
5. Reject H0 if  -stat  
df = k – p – 1
Goodness of Fit Test: Poisson Distribution
Example: Troy Parking Garage
In studying the need for an additional entrance to a city
parking garage, a consultant has recommended an analysis
approach that is applicable only in situations where the number of
cars entering during a specified time period follows a Poisson
distribution.
A random sample of n = 100 one-minute time intervals
resulted in the customer arrivals listed below. A statistical test
must be conducted to see if the assumption of a Poisson
distribution is reasonable.
# of Arrivals 0 1 2 3 4 5 6 7 8 9 10 11 12
Frequency
0 1 4 10 14 20 12 12 9
8
6 3
1
otal Arrivals = 0(0) + 1(1) + 2(4) + 3(10) + . . . + 12(1) = 600
Total one-minute intervals = n = 0 + 1 + 4 + 10 + . . . + 1 = 100
Estimate of m = 600/100 = 6
Goodness of Fit Test: Poisson Distribution
The hypothesized probability of x cars arriving during the time period is
For x = 01
6160x1eee666
f ((0)
x)  0.0149
0.0025
(1)
0!
x1!
1!
x
f (x )
n∙ f (x )
x
f (x )
0
1
2
3
4
5
0.0025
0.25
1.49
4.46
8.92
13.39
16.06
6
7
8
9
10+
0.1606
0.1377
0.1033
0.0688
0.0839
16.06
13.77
10.33
6.88
8.39
1.0000
100.00
0.0149
0.0446
0.0892
0.1339
0.1606
n∙ f (x )
Goodness of Fit Test: Poisson Distribution
i
fi
ei
f i - ei
0 or 1 or 2
3
4
5
6
7
8
9
10 or more
5
10
14
20
12
12
9
8
10
6.20
8.92
13.39
16.06
16.06
13.77
10.33
6.88
8.39
-1.20
1.08
0.61
3.94
-4.06
-1.77
-1.33
1.12
1.61
2
2
2
(

1.20)
(1.08)
(1.61)
 2 -stat 

 ... 
 3.274
6.20
8.92
8.39
Goodness of Fit Test: Poisson Distribution
2
.05
 14.067
With  = .05 (column) and df = 7 (row)
Do Not Reject H0
Reject H0
.05
3.274
14.067
2
At 10% significance, there is no reason to doubt the assumption
of a Poisson distribution.
Goodness of Fit Test: Normal Distribution
1. State the null and alternative hypotheses.
2. Select a random sample and
a. Compute the mean and standard deviation (p = 2).
b. Define intervals so that ei > 5 is in the ith interval
c. For each interval, record observed frequencies fi
3. Compute ei for each interval.
4. Compute the value of the test statistic.
2
(
f

e
)
 2 -stat   i i
ei
i 1
if ei > 5, this has a chi-square distribution
k
5. Reject H0 if  2 -stat  2
df = k – p – 1
Goodness of Fit Test: Normal Distribution
Example: IQ Computers
IQ Computers (one better than HP?) manufactures and
sells a general purpose microcomputer. As part of a study to
evaluate sales personnel, management wants to determine, at
a 5% significance level, if the annual sales volume (number
of units sold by a salesperson) follows a normal probability
distribution.
A simple random sample of 33 of the salespeople was
taken and their numbers of units sold are below.
33
64
83
43
65
84
44
66
85
45
68
86
52
70
91
52
72
92
56
73
94
58 63 63 64
73 74 74 75
98 101 102 105
n = 33, x = 71.76, s = 18.47
Goodness of Fit Test: Normal Distribution
To ensure the test statistic has a chi-square distribution, the
normal distribution is divided into k intervals.
6 equal intervals.
k = 33/5 = 6.6
Expected frequency:
ei = 33/6 = 5.5
The probability of
being in each
interval is equal to
1/6 = .1667
z.
Goodness of Fit Test: Normal Distribution
Find the z that corresponds to the red tail probability
= (1)(.1667) = .1667
.1667
– .97
z.
Goodness of Fit Test: Normal Distribution
Find the z that corresponds to the red tail probability
= (2)(.1667) = .3333
.3333
–.43
z.
Goodness of Fit Test: Normal Distribution
Find the z that corresponds to the red tail probability
= (3)(.1667) = .5000
.5000
0
z.
Goodness of Fit Test: Normal Distribution
Find the remaining z values using symmetry
–.97 –.43
0
.43 .97
z.
Goodness of Fit Test: Normal Distribution
Find the z that corresponds to the red tail probability
Convert the z’s to x’s
xx
(xz)(sx)(xz )(
sx)
s
x  71.76  (.97
0)(18.47)
.9)(18.47)
7)(18.47)
.43
.4
3
–.97 –.43
0
.43 .97
53.84 63.81 71.76 79.70 89.68
z.
x
Goodness of Fit Test: Normal Distribution
Observed and Expected Frequencies
LL
∞
53.84
63.81
71.76
79.70
89.68
UL
53.84
63.81
71.76
79.70
89.68
∞
Total
fi
6
4
6
6
4
7
33
f i – ei
0.5
-1.5
0.5
0.5
-1.5
1.5
ei
5.5
5.5
5.5
5.5
5.5
5.5
33
(fi – ei)2/ei
0.05
0.41
0.05
0.05
0.41
0.41
 2 -stat  1.36
Data Table
33
64
83
43
65
84
44
66
85
45
68
86
52
70
91
52
72
92
56
73
94
58 63 63 64
73 74 74 75
98 101 102 105
Goodness of Fit Test: Normal Distribution
df = 3 (row)
2
.05
 7.815
 = .05 (column)
At 5% significance, there is
no reason to doubt the
assumption that the
population is normally
distributed.
Do Not Reject H0
Reject H0
.05
1.36
 2 -stat
m=3
7.815
2
Download