Comparing Multiple Proportions, Test of Independence and

advertisement
Comparing Multiple Proportions,
Test of Independence and
Goodness of Fit
Statistics (Spring 2015)
Content
• Testing the Equality of Population Proportions
for Three or More Populations
• Test of Independence
• Goodness of Fit Test
2
Statistics (Spring 2015)
Comparing Multiple Proportions, Test
of Independence and Goodness of Fit
• In this chapter we introduce three additional
hypothesis-testing procedures.
• The test statistic and the distribution used are
based on the chi-square (2) distribution.
• In all cases, the data are categorical.
3
Statistics (Spring 2015)
Testing the Equality of Population
Proportions for Three or More Populations
• Using the notation
p1 = population proportion for population 1
p2 = population proportion for population 2
and pk = population proportion for population k
• The hypotheses for the equality of population
proportions for k ≥ 3 populations are as follows:
H0: p1 = p2 =
. . .
= pk
Ha: Not all population proportions are equal
4
Statistics (Spring 2015)
Testing the Equality of Population
Proportions for Three or More Populations
• If H0 cannot be rejected, we cannot detect a
difference among the k population proportions.
• If H0 can be rejected, we can conclude that not all
k population proportions are equal.
• Further analyses can be done to conclude which
population proportions are significantly different
from others.
5
Statistics (Spring 2015)
Testing the Equality of Population
Proportions for Three or More Populations
• Example: Finger Lakes Homes
Finger Lakes Homes manufactures three models of
prefabricated homes, a two-story colonial, a log cabin, and
an A-frame. To help in product-line planning,
management would like to compare the customer
satisfaction with the three home styles.
p1 = proportion likely to repurchase a Colonial
for the population of Colonial owners
p2 = proportion likely to repurchase a Log Cabin
for the population of Log Cabin owners
p3 = proportion likely to repurchase an A-Frame
for the population of A-Frame owners
6
Statistics (Spring 2015)
Testing the Equality of Population
Proportions for Three or More Populations
• We begin by taking a sample of owners from
each of the three populations.
• Each sample contains categorical data indicating
whether the respondents are likely or not likely to
repurchase the home.
7
Statistics (Spring 2015)
Testing the Equality of Population
Proportions for Three or More Populations
• Observed Frequencies (sample results)
Likely to
Repurchase
Home Owner
Colonial Log A-Frame Total
Yes
100
81
83
264
No
35
20
41
96
Total
135
101
124
360
8
Statistics (Spring 2015)
Testing the Equality of Population
Proportions for Three or More Populations
• Next, we determine the expected frequencies
under the assumption H0 is correct.
Expected Frequencies
Under the Assumption H0 is True
(Row i Total)(Column j Total)
eij 
Total Sample Size
• If a significant difference exists between the
observed and expected frequencies, H0 can be
rejected.
9
Statistics (Spring 2015)
Testing the Equality of Population
Proportions for Three or More Populations
• Expected Frequencies (computed)
Likely to
Yes
Repurchase No
Total
Home Owner
Colonial Log A-Frame
99.00 74.07 90.93
36.00 26.93 33.07
135
101
124
Total
264
96
360
10
Statistics (Spring 2015)
Testing the Equality of Population
Proportions for Three or More Populations
• Next, compute the value of the chi-square test
statistic.
 2  
i
j
( f ij  eij )2
eij
where:
fij = observed frequency for the cell in row i and column j
eij = expected frequency for the cell in row i and column j
under the assumption H0 is true
Note: The test statistic has a chi-square distribution with
k – 1 degrees of freedom, provided the expected
frequency is 5 or more for each cell.
11
Statistics (Spring 2015)
Testing the Equality of Population
Proportions for Three or More Populations
• Computation of the Chi-Square Test Statistic.
Obs.
Exp.
Sqd.
Sqd. Diff. /
Likely to
Home
Freq.
Freq.
Diff.
Diff.
Exp. Freq.
Repurch.
Owner
fij
eij
(fij - eij)
(fij - eij)2
(fij - eij)2/eij
Yes
Colonial
97
97.50
-0.50
0.2500
0.0026
Yes
Log Cab.
83
72.94
10.06
101.1142
1.3862
Yes
A-Frame
80
89.56
-9.56
91.3086
1.0196
No
Colonial
38
37.50
0.50
0.2500
0.0067
No
Log Cab.
18
28.06
-10.06
101.1142
3.6041
No
A-Frame
44
34.44
9.56
91.3086
2.6509
Total
360
360
2 =
8.6700
12
Statistics (Spring 2015)
Testing the Equality of Population
Proportions for Three or More Populations
• Rejection Rule
p-value approach: Reject H0 if p-value < a
2
2



Critical value approach: Reject H0 if

where  is the significance level and
there are k - 1 degrees of freedom
13
Statistics (Spring 2015)
Testing the Equality of Population
Proportions for Three or More Populations
• Rejection Rule (using a = .05)
– Reject H0 if p-value ≤ .05 or 2 ≥ 5.991
With  = .05 and
k-1=3-1=2
degrees of freedom
Do Not Reject H0
Reject H0
5.991
2
14
Statistics (Spring 2015)
Testing the Equality of Population
Proportions for Three or More Populations
• Conclusion Using the p-Value Approach
Area in Upper Tail .10 .05 .025
2 Value (df = 2)
4.605 5.991 7.378
.01
.005
9.210 10.597
• Because 2 = 8.670 is between 9.210 and 7.378,
the area in the upper tail of the distribution is
between .01 and .025.
• The p-value ≤  . We can reject the null
hypothesis.
Actual p-value is .0131
15
Statistics (Spring 2015)
Testing the Equality of Population
Proportions for Three or More Populations
• We have concluded that the population
proportions for the three populations of home
owners are not equal.
• To identify where the differences between
population proportions exist, we will rely on a
multiple comparisons procedure.
16
Statistics (Spring 2015)
Multiple Comparisons Procedure
• We begin by computing the three sample
proportions.
Colonial: p1  100
135
Log Cabin: p2  81
A-Frame: p3  83
 .741
101
124
 .802
 .669
• We will use a multiple comparison procedure
known as the Marascuillo procedure.
17
Statistics (Spring 2015)
Multiple Comparisons Procedure
• Marascuillo Procedure
• We compute the absolute value of the pairwise
difference between sample proportions.
Colonial and Log Cabin:
p1  p2  .741  .802  .061
Colonial and A-Frame:
p1  p3  .741  .669  .072
Log Cabin and A-Frame: p2  p3  .802  .669  .133
18
Statistics (Spring 2015)
Multiple Comparisons Procedure
• Critical Values for the Marascuillo Pairwise
Comparison
• For each pairwise comparison compute a critical
value as follows:
CVij   ;k1
2
pi (1  pi ) pj (1  pj )

ni
nj
For  = .05 and k = 3: 2 = 5.991
19
Statistics (Spring 2015)
Multiple Comparisons Procedure
• Pairwise Comparison Tests
Pairwise Comparison
pi  p j
CVij
Significant if
pi  p j > CVij
Colonial vs. Log Cabin
.061
.0923
Not Significant
Colonial vs. A-Frame
.072
.0971
Not Significant
Log Cabin vs. A-Frame
.133
.1034
Significant
20
Statistics (Spring 2015)
Test of Independence
• 1. Set up the null and alternative hypotheses.
H0: The column variable is independent of the row variable
Ha: The column variable is not independent of the row variable
• 2. Select a random sample and record the
observed frequency, fij , for each cell of the
contingency table.
• 3. Compute the expected frequency, eij , for each
cell.
(Row i Total)(Column j Total)
eij 
Sample Size
21
Statistics (Spring 2015)
Test of Independence
• 4. Compute the test statistic.
 2  
i
j
( fij  eij ) 2
eij
• 5. Determine the rejection rule.
2
2



Reject H0 if p -value ≤  or
 .
where  is the significance level and, with n rows and m
columns, there are (n - 1)(m - 1) degrees of freedom.
22
Statistics (Spring 2015)
Test of Independence
• Example: Finger Lakes Homes (B)
Each home sold by Finger Lakes Homes can
be classified according to price and to style. Finger
Lakes’ manager would like to determine if the price
of the home and the style of the home are
independent variables.
23
Statistics (Spring 2015)
Test of Independence
• Example: Finger Lakes Homes (B)
The number of homes sold for each model
and price for the past two years is shown below.
For convenience, the price of the home is listed as
either $99,000 or less or more than $99,000.
Price
≤ $99,000
> $99,000
Colonial
Log
Split-Level
18
12
6
14
19
16
A-Frame
12
3
24
Statistics (Spring 2015)
Test of Independence
• 1. Hypotheses
H0: Price of the home is independent of the
style of the home that is purchased
Ha: Price of the home is not independent of the
style of the home that is purchased
• 2. Test of Independence → 2 Test
• 3.  = .05
2
2



• 4. Reject H0 if

25
Statistics (Spring 2015)
Test of Independence
2

With  = .05 and (2 - 1)(4 - 1) = 3 d.f., .05  7.815
Reject H0 if p-value ≤ .05 or 2 ≥ 7.815
• 5. Calculate Test Statistic
– Expected Frequencies
Price Colonial
Log
Split-Level A-Frame
Total
< $99K
18
6
19
12
55
> $99K
12
14
16
3
45
Total
30
20
35
15
100
26
Statistics (Spring 2015)
Test of Independence
• Test Statistic
2
2
2
(18

16.5)
(6

11)
(3

6.75)

 ... 
2 
16.5
11
6.75
 .1364  2.2727    2.0833  9.149
27
Statistics (Spring 2015)
Test of Independence
• Conclusion Using the p-Value Approach
Area in Upper Tail .10 .05 .025
.01
.005
2 Value (df = 3)
6.251 7.815 9.348 11.345 12.838
• Because 2 = 9.145 is between 7.815 and 9.348,
the area in the upper tail of the distribution is
between .05 and .025.
• The p-value ≤ . We can reject the null
hypothesis.
Actual p-value is .0274
28
Statistics (Spring 2015)
Test of Independence
• Conclusion Using the Critical Value Approach
– 2 = 9.145 ≥ 7.815
• 6. Conclusion
– We reject, at the .05 level of significance, the
assumption that the price of the home is independent
of the style of home that is purchased.
29
Statistics (Spring 2015)
Goodness of Fit Test:
Multinomial Probability Distribution
• 1. State the null and alternative hypotheses.
H0: The population follows a multinomial
distribution with specified probabilities
for each of the k categories
Ha: The population does not follow a
multinomial distribution with specified
probabilities for each of the k categories
30
Statistics (Spring 2015)
Goodness of Fit Test:
Multinomial Probability Distribution
• 2. Select a random sample and record the
observed frequency, fi, for each of the k
categories.
• 3. Assuming H0 is true, compute the expected
frequency, ei, in each category by multiplying the
category probability by the sample size.
31
Statistics (Spring 2015)
Goodness of Fit Test:
Multinomial Probability Distribution
• 4. Compute the value of the test statistic.
( fi  ei )
 
ei
i 1
2
k
2
where:
fi = observed frequency for category i
ei = expected frequency for category i
k = number of categories
Note: The test statistic has a chi-square distribution
with k – 1 df provided that the expected frequencies
are 5 or more for all categories.
32
Statistics (Spring 2015)
Goodness of Fit Test:
Multinomial Probability Distribution
• 5. Rejection rule:
p-value approach:
Reject H0 if p-value < 
2
2



Critical value approach: Reject H0 if

where  is the significance level and there are k - 1
degrees of freedom
33
Statistics (Spring 2015)
Multinomial Distribution Goodness of Fit Test
• Example: Finger Lakes Homes (A)
Finger Lakes Homes manufactures four
models of prefabricated homes, a two-story
colonial, a log cabin, a split-level, and an A-frame.
To help in production planning, management would
like to determine if previous customer purchases
indicate that there is a preference in the style
selected.
34
Statistics (Spring 2015)
Multinomial Distribution Goodness of Fit Test
• Example: Finger Lakes Homes (A)
The number of homes sold of each model for
100 sales over the past two years is shown below.
SplitAModel Colonial Log Level Frame
# Sold
30
20
35
15
35
Statistics (Spring 2015)
Multinomial Distribution Goodness of Fit Test
• 1. Hypotheses
H0: pC = pL = pS = pA = .25
Ha: The population proportions are not
pC = .25, pL = .25, pS = .25, and pA = .25
where:
pC = population proportion that purchase a colonial
pL = population proportion that purchase a log cabin
pS = population proportion that purchase a split-level
pA = population proportion that purchase an A-frame
• 2. Goodness of Fit Test → 2 Test
• 3.  = .05
• 4. Reject H0 if  2  2
36
Statistics (Spring 2015)
Multinomial Distribution Goodness of Fit Test
• Rejection Rule
– Reject H0 if p-value ≤ .05 or 2 ≥ 7.815.
With  = .05 and
k-1=4-1=3
degrees of freedom
Do Not Reject H0
Reject H0
7.815
2
37
Statistics (Spring 2015)
Multinomial Distribution Goodness of Fit Test
• 5. Calculate Test Statistic
– Expected Frequencies
e1 = .25(100) = 25
e3 = .25(100) = 25
e2 = .25(100) = 25
e4 = .25(100) = 25
– Test Statistic
(30  25) 2 (20  25) 2 (35  25) 2 (15  25) 2



 
25
25
25
25
=1  1  4  4  10
2
38
Statistics (Spring 2015)
Multinomial Distribution Goodness of Fit Test
• Conclusion Using the p-Value Approach
Area in Upper Tail .10 .05 .025
.01
.005
2 Value (df = 3)
6.251 7.815 9.348 11.345 12.838
• Because 2 = 10 is between 9.348 and 11.345, the
area in the upper tail of the distribution is
between .025 and .01.
• The p-value ≤ . We can reject the null
hypothesis.
39
Statistics (Spring 2015)
Multinomial Distribution Goodness of Fit Test
• Conclusion Using the Critical Value Approach
– 2 = 10 > 7.815
• 6. Conclusion
– We reject, at the .05 level of significance, the
assumption that there is no home style preference.
40
Statistics (Spring 2015)
Goodness of Fit Test: Normal Distribution
• 1. State the null and alternative hypotheses.
H0: The population has a normal distribution
Ha: The population does not have a normal distribution
• 2. Select a random sample and
– a. Compute the mean and standard deviation.
– b. Define intervals of values so that the expected
frequency is at least 5 for each interval.
– c. For each interval, record the observed frequencies
• 3. Compute the expected frequency, ei, for each
interval. (Multiply the sample size by the probability
of a normal random variable being in the interval.)
41
Statistics (Spring 2015)
Goodness of Fit Test: Normal Distribution
• 4. Compute the value of the test statistic.
( f i  ei )
 
ei
i 1
2
k
2
2
2



• 5. Reject H0 if
 (where  is the
significance level and there are k – 2 – 1 degrees
of freedom).
42
Statistics (Spring 2015)
Goodness of Fit Test: Normal Distribution
• Example: IQ Computers
IQ Computers (one better than HP?)
manufactures and sells a general purpose
microcomputer. As part of a study to evaluate sales
personnel, management wants to determine, at a .05
significance level, if the annual sales volume
(number of units sold by a salesperson) follows a
normal probability distribution.
43
Statistics (Spring 2015)
Goodness of Fit Test: Normal Distribution
• Example: IQ Computers
A simple random sample of 30 of the
salespeople was taken and their numbers of units
sold are listed below.
33
64
83
43
65
84
44
66
85
45
68
86
52
70
91
52
72
92
56
73
94
58 63 64
73 74 75
98 102 105
(mean = 71, standard deviation = 18.54)
44
Statistics (Spring 2015)
Goodness of Fit Test: Normal Distribution
• 1. Hypotheses
H0: The population of number of units sold
has a normal distribution with mean 71
and standard deviation 18.54.
Ha: The population of number of units sold
does not have a normal distribution with
mean 71 and standard deviation 18.54.
• 2. Goodness of Fit Test → 2 Test
• 3.  = .05
2
2



• 4. Reject H0 if

45
Statistics (Spring 2015)
Goodness of Fit Test: Normal Distribution
• 5. Interval Definition
– To satisfy the requirement of an expected frequency
of at least 5 in each interval we will divide the normal
distribution into 30/5 = 6 equal probability intervals.
46
Statistics (Spring 2015)
Goodness of Fit Test: Normal Distribution
• Interval Definition
Areas
= 1.00/6
= .1667
53.02
71
88.98 = 71 + .97(18.54)
71  .43(18.54) = 63.03 78.97
47
Statistics (Spring 2015)
Goodness of Fit Test: Normal Distribution
• Observed and Expected Frequencies
i
fi
ei
fi - ei
Less than 53.02
53.02 to 63.03
63.03 to 71.00
71.00 to 78.97
78.97 to 88.98
More than 88.98
Total
6
3
6
5
4
6
30
5
5
5
5
5
5
30
1
-2
1
0
-1
1
48
Statistics (Spring 2015)
Goodness of Fit Test: Normal Distribution
• Rejection Rule
With  = .05 and k - p - 1 = 6 - 2 - 1 = 3 d.f.
(where k = number of categories and p = number
2
of population parameters estimated), .05  7.815
Reject H0 if p-value ≤ .05 or 2 ≥ 7.815.
• Test Statistic
2
2
2
2
2
2
(1)
(

2)
(1)
(0)
(

1)
(1)





 1.600
2 
5
5
5
5
5
5
49
Statistics (Spring 2015)
Goodness of Fit Test: Normal Distribution
• Conclusion Using the p-Value Approach
Area in Upper Tail
2 Value (df = 3)
.90 .10
.05
.584 6.251 7.815
.025
.01
9.348 11.345
• Because 2 = 1.600 is between .584 and 6.251 in the
Chi-Square Distribution Table, the area in the upper
tail of the distribution is between .90 and .10.
• The p-value > . We cannot reject the null
hypothesis. There is little evidence to support
rejecting the assumption the population is normally
distributed with  = 71 and  = 18.54.
Actual p-value is .6594
50
Statistics (Spring 2015)
Goodness of Fit Test: Normal Distribution
• Conclusion Using the Critical Value Approach
– 2 = 1.6 < 7.815
• 6. Conclusion
– We cannot reject, at the .05 level of significance, the
assumption that the population is normally distributed
with  = 71 and  = 18.54.
51
Statistics (Spring 2015)
Goodness of Fit Test: Poisson
Distribution
1. Set up the null and alternative hypotheses.
2. Select a random sample and
a. Record the observed frequency, fi , for each of
the k values of the Poisson random variable.
b. Compute the mean number of occurrences, .
3. Compute the expected frequency of occurrences,
ei , for each value of the Poisson random variable.
52
Statistics (Spring 2015)
Goodness of Fit Test: Poisson
Distribution
4. Compute the value of the test statistic.
k
 fi  ei 
i 1
ei
 
2
2
5. Reject H0 if
(where  is the significance level and there are k
– 1 – 1 degrees of freedom).
 2  2
53
Statistics (Spring 2015)
Example: Troy Parking Garage
• Poisson Distribution Goodness of Fit Test
In studying the need for an additional entrance
to a city parking garage, a consultant has
recommended an approach that is applicable only
in situations where the number of cars entering
during a specified time period follows a Poisson
distribution.
54
Statistics (Spring 2015)
Example: Troy Parking Garage
• Poisson Distribution Goodness of Fit Test
A random sample of 100 one-minute time intervals
resulted in the customer arrivals listed below. A
statistical test must be conducted to see if the
assumption of a Poisson distribution is reasonable.
# Arrivals
Frequency
0
0
1
1
2
4
3 4 5 6 7 8 9 10 11 12
10 14 20 12 12 9 8 6 3 1
55
Statistics (Spring 2015)
Example: Troy Parking Garage
• Poisson Distribution Goodness of Fit Test
• 1. Hypotheses
H0: Number of cars entering the garage during a oneminute interval is Poisson distributed.
Ha: Number of cars entering the garage during a oneminute interval is not Poisson distributed
• 2. Goodness of Fit Test → 2 Test
• 3.  = .05
2
2



• 4. Reject H0 if

56
Statistics (Spring 2015)
Example: Troy Parking Garage
• Poisson Distribution Goodness of Fit Test
• 5. Calculation
– Estimate of Poisson Probability Function
otal Arrivals = 0(0) + 1(1) + 2(4) + . . . + 12(1) = 600
Total Time Periods = 100
Estimate of  = 600/100 = 6
x 6
Hence,
6 e
f ( x) 
x!
57
Statistics (Spring 2015)
Example: Troy Parking Garage
• Poisson Distribution Goodness of Fit Test
– Expected Frequencies
x
f (x ) nf (x )
x
f (x )
0
.0025
.25
7
.1389
1
.0149
1.49
8
.1041
2
.0446
4.46
9
.0694
3
.0892
8.92
10
.0417
4
.1339
13.39
11
.0227
5
.1620
16.20
12
.0155
6
.1606
16.06
Total 1.0000
nf (x )
13.89
10.41
6.94
4.17
2.27
1.55
100.00
58
Statistics (Spring 2015)
Example: Troy Parking Garage
• Poisson Distribution Goodness of Fit Test
– Observed and Expected Frequencies
i
fi
ei
fi - ei
0 or 1 or 2
5
6.20
-1.20
3
10
8.92
1.08
4
14
13.39
.61
5
20
16.20
3.80
6
12
16.06
-4.06
7
12
13.89
-1.89
8
9
10.41
-1.41
9
8
6.94
1.06
10 or more
10
7.99
2.01
59
Statistics (Spring 2015)
Example: Troy Parking Garage
• Poisson Distribution Goodness of Fit Test
– Test Statistic
2
2
2
(

1.20)
(1.08)
(2.01)
2 

 ... 
 3.42
6.20
8.92
7.99
– Rejection Rule
With  = .05 and k - p - 1 = 9 - 1 - 1 = 7 d.f. (where k =
number of categories and p = number of population
2
parameters estimated), .05
 14.07
Reject H0 if 2 > 14.07
• 6. Conclusion
We cannot reject H0. There’s no reason to doubt
the assumption of a Poisson distribution.
60
Download