Cross Tabulation and Chi Square Test for Independence

advertisement
Cross Tabulation and Chi Square
Test for Independence
Cross-tabulation
• Helps answer questions about whether two
or more variables of interest are linked:
– Is the type of mouthwash user (heavy or light)
related to gender?
– Is the preference for a certain flavor (cherry or
lemon) related to the geographic region (north,
south, east, west)?
– Is income level associated with gender?
• Cross-tabulation determines association not
causality.
Dependent and Independent Variables
• The variable being studied is called the
dependent variable or response variable.
• A variable that influences the dependent
variable is called independent variable.
Cross-tabulation
• Cross-tabulation of two or more variables is
possible if the variables are discrete:
– The frequency of one variable is subdivided by the
other variable categories.
• Generally a cross-tabulation table has:
– Row percentages
– Column percentages
– Total percentages
• Which one is better?
DEPENDS on which variable is considered as
independent.
Cross tabulation
GROUPINC * Gender Crosstabulation
GROUPINC
income <= 5
5<Income<= 10
income >10
Total
Count
% within GROUPINC
% within Gender
% of Total
Count
% within GROUPINC
% within Gender
% of Total
Count
% within GROUPINC
% within Gender
% of Total
Count
% within GROUPINC
% within Gender
% of Total
Gender
Female
Male
10
9
52.6%
47.4%
55.6%
18.8%
15.2%
13.6%
5
25
16.7%
83.3%
27.8%
52.1%
7.6%
37.9%
3
14
17.6%
82.4%
16.7%
29.2%
4.5%
21.2%
18
48
27.3%
72.7%
100.0%
100.0%
27.3%
72.7%
Total
19
100.0%
28.8%
28.8%
30
100.0%
45.5%
45.5%
17
100.0%
25.8%
25.8%
66
100.0%
100.0%
100.0%
Contingency Table
• A contingency table shows the conjoint
distribution of two discrete variables
• This distribution represents the probability
of observing a case in each cell
– Probability is calculated as:
Observed
cases
P=
Total cases
Chi-square Test for Independence
• The Chi-square test for independence
determines whether two variables are
associated or not.
H0: Two variables are independent
H1: Two variables are not independent
Chi-square test results are unstable if cell count is lower than 5
Chi-Square Test
R iC j
Estimated cell
E

ij
Frequency
n
Ri = total observed frequency in the ith row
Cj = total observed frequency in the jth column
n = sample size
Eij = estimated cell frequency
Chi-Square
statistic
x² 

(Oi  E i )²
Ei
x² = chi-square statistics
Oi = observed frequency in the ith cell
Ei = expected frequency on the ith cell
Degrees of
Freedom
d.f.=(R-1)(C-1)
Awareness of Tire
Manufacturer’s Brand
Men
Women
Total
Aware
50/39
10/21
60
Unaware
15/21
65
25/14
35
40
100
Chi-Square Test: Differences Among
Groups Example
X
2
( 50  39 ) 2
(10  21) 2


39
21
2
(15  26 )
( 25  14 ) 2


26
14
 2  3.102  5.762  4.654  8.643 
 2  22.161
d . f .  ( R  1)(C  1)
d . f .  ( 2  1)( 2  1)  1
X2 with 1 d.f. at .05 critical value = 3.84
Chi-square Test for Independence
• Under H0, the joint distribution is
approximately distributed by the Chisquare distribution (2).
Chi-square
3.84
2
Reject H0 
22.16
Differences Between Groups
when Comparing Means
• Ratio scaled dependent variables
• t-test
– When groups are small
– When population standard deviation is
unknown
• z-test
– When groups are large
Null Hypothesis About Mean
Differences Between Groups
 
1
2
OR
  0
1
2
t-Test for Difference of Means
mean 1 - mean 2
t
Variabilit y of random means
t-Test for Difference of Means
1   2
t
S X1  X 2
X1 = mean for Group 1
X2 = mean for Group 2
SX1-X2 = the pooled or combined standard error
of difference between means.
t-Test for Difference of Means
1   2
t
S X1  X 2
t-Test for Difference of Means
X1 = mean for Group 1
X2 = mean for Group 2
SX -X = the pooled or combined standard error
1 2
of difference between means.
Pooled Estimate of the
Standard Error
 n1 1S (n2 1)S
SX1X2  
n1  n2 2

2
1
2
2
)  1 1 
  
 n1 n2 
Pooled Estimate of the
Standard Error
S12 = the variance of Group 1
S22 = the variance of Group 2
n1 = the sample size of Group 1
n2 = the sample size of Group 2
Pooled Estimate of the Standard Error
t-test for the Difference of Means
S X1  X 2
 n1  1S12  ( n2  1) S 22 )  1
1 
  
 
n1  n2  2

 n1 n2 
S12 = the variance of Group 1
S22 = the variance of Group 2
n1 = the sample size of Group 1
n2 = the sample size of Group 2
Degrees of Freedom
• d.f. = n - k
• where:
– n = n1 + n2
– k = number of groups
t-Test for Difference of Means
Example
 202.1  132.6
 
33

2
S X1 X 2
 .797
2
 1 1 
  
 21 14 

16.5  12.2
4 .3
t

.797
.797
 5.395
Comparing Two Groups when
Comparing Proportions
• Percentage Comparisons
• Sample Proportion - P
• Population Proportion - 
Differences Between Two Groups
when Comparing Proportions
The hypothesis is:
Ho: 1  2
may be restated as:
Ho: 1  2  0
Z-Test for Differences of
Proportions
Ho : 1   2
or
Ho : 1   2  0
Z-Test for Differences of
Proportions
Z

p1  p 2    1   2 

S p1  p 2
Z-Test for Differences of
Proportions
p1 = sample portion of successes in Group 1
p2 = sample portion of successes in Group 2
1  1) = hypothesized population proportion 1
minus hypothesized population
proportion 1 minus
Sp1-p2 = pooled estimate of the standard errors of
difference of proportions
Z-Test for Differences of
Proportions
S p1  p2 
1 1
pq   
n
n
2 
 1
Z-Test for Differences of
Proportions
pp = pooled estimate of proportion of success in a
sample of both groups
qp = (1- pp) or a pooled estimate of proportion of
failures in a sample of both groups
n1= sample size for group 1
n2= sample size for group 2
Z-Test for Differences of
Proportions
n1 p1  n2 p2
p
n1  n2
Z-Test for Differences of
Proportions
S p1  p2
1 
 1
 .375 .625 


 100 100 
 .068
A Z-Test for Differences of
Proportions

100 .35  100 .4 
p
100  100
 .375
Analysis of Variance
Hypothesis when comparing three groups
1  2  3
Analysis of Variance
F-Ratio
Variance  between  groups
F
Variance  within  groups
Analysis of Variance
Sum of Squares
SStotal  SSwithin  SSbetween
Analysis of Variance
Sum of SquaresTotal
n
c
SStotal   ( X ij  X )
i  1 j 1
2
Analysis of Variance
Sum of Squares
X
piij = individual scores, i.e., the ith observation or
test unit in the jth group
pi = grand mean
X
n = number of all observations or test units in a
group
c = number of jth groups (or columns)
Analysis of Variance
Sum of SquaresWithin
n
c
SS within   ( X ij  X j )
i  1 j 1
2
Analysis of Variance
Sum of SquaresWithin
X
piij= individual scores, i.e., the ith observation or
test unit in the jth group
pi = grand mean
X
n = number of all observations or test units in a
group
c = number of jth groups (or columns)
Analysis of Variance
Sum of Squares Between
n
SS between   n j ( X j  X )
j 1
2
Analysis of Variance
Sum of squares Between
X j= individual scores, i.e., the ith observation or
test unit in the jth group
X = grand mean
nj = number of all observations or test units in a
group
Analysis of Variance
Mean Squares Between
MS between
SS between

c 1
Analysis of Variance
Mean Square Within
MS within
SS within

cn  c
Analysis of Variance
F-Ratio
MSbetween
F
MS within
A Test Market Experiment
on Pricing
Sales in Units (thousands)
Regular Price
$.99
Test Market A, B, or C
Test Market D, E, or F
Test Market G, H, or I
Test Market J, K, or L
Mean
Grand Mean
Reduced Price
$.89
Cents-Off Coupon
Regular Price
130
118
87
84
145
143
120
131
153
129
96
99
X1=104.75
X=119.58
X2=134.75
X1=119.25
ANOVA Summary Table
Source of Variation
• Between groups
• Sum of squares
– SSbetween
• Degrees of freedom
– c-1 where c=number of groups
• Mean squared-MSbetween
– SSbetween/c-1
ANOVA Summary Table
Source of Variation
• Within groups
• Sum of squares
– SSwithin
• Degrees of freedom
– cn-c where c=number of groups, n= number of
observations in a group
• Mean squared-MSwithin
– SSwithin/cn-c
ANOVA Summary Table
Source of Variation
• Total
• Sum of Squares
– SStotal
• Degrees of Freedom
– cn-1 where c=number of groups, n= number of
observations in a group
MS BETWEEN
F
MS WITHIN
Download