Cross Tabulation and Chi Square Test for Independence Cross-tabulation • Helps answer questions about whether two or more variables of interest are linked: – Is the type of mouthwash user (heavy or light) related to gender? – Is the preference for a certain flavor (cherry or lemon) related to the geographic region (north, south, east, west)? – Is income level associated with gender? • Cross-tabulation determines association not causality. Dependent and Independent Variables • The variable being studied is called the dependent variable or response variable. • A variable that influences the dependent variable is called independent variable. Cross-tabulation • Cross-tabulation of two or more variables is possible if the variables are discrete: – The frequency of one variable is subdivided by the other variable categories. • Generally a cross-tabulation table has: – Row percentages – Column percentages – Total percentages • Which one is better? DEPENDS on which variable is considered as independent. Cross tabulation GROUPINC * Gender Crosstabulation GROUPINC income <= 5 5<Income<= 10 income >10 Total Count % within GROUPINC % within Gender % of Total Count % within GROUPINC % within Gender % of Total Count % within GROUPINC % within Gender % of Total Count % within GROUPINC % within Gender % of Total Gender Female Male 10 9 52.6% 47.4% 55.6% 18.8% 15.2% 13.6% 5 25 16.7% 83.3% 27.8% 52.1% 7.6% 37.9% 3 14 17.6% 82.4% 16.7% 29.2% 4.5% 21.2% 18 48 27.3% 72.7% 100.0% 100.0% 27.3% 72.7% Total 19 100.0% 28.8% 28.8% 30 100.0% 45.5% 45.5% 17 100.0% 25.8% 25.8% 66 100.0% 100.0% 100.0% Contingency Table • A contingency table shows the conjoint distribution of two discrete variables • This distribution represents the probability of observing a case in each cell – Probability is calculated as: Observed cases P= Total cases Chi-square Test for Independence • The Chi-square test for independence determines whether two variables are associated or not. H0: Two variables are independent H1: Two variables are not independent Chi-square test results are unstable if cell count is lower than 5 Chi-Square Test R iC j Estimated cell E ij Frequency n Ri = total observed frequency in the ith row Cj = total observed frequency in the jth column n = sample size Eij = estimated cell frequency Chi-Square statistic x² (Oi E i )² Ei x² = chi-square statistics Oi = observed frequency in the ith cell Ei = expected frequency on the ith cell Degrees of Freedom d.f.=(R-1)(C-1) Awareness of Tire Manufacturer’s Brand Men Women Total Aware 50/39 10/21 60 Unaware 15/21 65 25/14 35 40 100 Chi-Square Test: Differences Among Groups Example X 2 ( 50 39 ) 2 (10 21) 2 39 21 2 (15 26 ) ( 25 14 ) 2 26 14 2 3.102 5.762 4.654 8.643 2 22.161 d . f . ( R 1)(C 1) d . f . ( 2 1)( 2 1) 1 X2 with 1 d.f. at .05 critical value = 3.84 Chi-square Test for Independence • Under H0, the joint distribution is approximately distributed by the Chisquare distribution (2). Chi-square 3.84 2 Reject H0 22.16 Differences Between Groups when Comparing Means • Ratio scaled dependent variables • t-test – When groups are small – When population standard deviation is unknown • z-test – When groups are large Null Hypothesis About Mean Differences Between Groups 1 2 OR 0 1 2 t-Test for Difference of Means mean 1 - mean 2 t Variabilit y of random means t-Test for Difference of Means 1 2 t S X1 X 2 X1 = mean for Group 1 X2 = mean for Group 2 SX1-X2 = the pooled or combined standard error of difference between means. t-Test for Difference of Means 1 2 t S X1 X 2 t-Test for Difference of Means X1 = mean for Group 1 X2 = mean for Group 2 SX -X = the pooled or combined standard error 1 2 of difference between means. Pooled Estimate of the Standard Error n1 1S (n2 1)S SX1X2 n1 n2 2 2 1 2 2 ) 1 1 n1 n2 Pooled Estimate of the Standard Error S12 = the variance of Group 1 S22 = the variance of Group 2 n1 = the sample size of Group 1 n2 = the sample size of Group 2 Pooled Estimate of the Standard Error t-test for the Difference of Means S X1 X 2 n1 1S12 ( n2 1) S 22 ) 1 1 n1 n2 2 n1 n2 S12 = the variance of Group 1 S22 = the variance of Group 2 n1 = the sample size of Group 1 n2 = the sample size of Group 2 Degrees of Freedom • d.f. = n - k • where: – n = n1 + n2 – k = number of groups t-Test for Difference of Means Example 202.1 132.6 33 2 S X1 X 2 .797 2 1 1 21 14 16.5 12.2 4 .3 t .797 .797 5.395 Comparing Two Groups when Comparing Proportions • Percentage Comparisons • Sample Proportion - P • Population Proportion - Differences Between Two Groups when Comparing Proportions The hypothesis is: Ho: 1 2 may be restated as: Ho: 1 2 0 Z-Test for Differences of Proportions Ho : 1 2 or Ho : 1 2 0 Z-Test for Differences of Proportions Z p1 p 2 1 2 S p1 p 2 Z-Test for Differences of Proportions p1 = sample portion of successes in Group 1 p2 = sample portion of successes in Group 2 1 1) = hypothesized population proportion 1 minus hypothesized population proportion 1 minus Sp1-p2 = pooled estimate of the standard errors of difference of proportions Z-Test for Differences of Proportions S p1 p2 1 1 pq n n 2 1 Z-Test for Differences of Proportions pp = pooled estimate of proportion of success in a sample of both groups qp = (1- pp) or a pooled estimate of proportion of failures in a sample of both groups n1= sample size for group 1 n2= sample size for group 2 Z-Test for Differences of Proportions n1 p1 n2 p2 p n1 n2 Z-Test for Differences of Proportions S p1 p2 1 1 .375 .625 100 100 .068 A Z-Test for Differences of Proportions 100 .35 100 .4 p 100 100 .375 Analysis of Variance Hypothesis when comparing three groups 1 2 3 Analysis of Variance F-Ratio Variance between groups F Variance within groups Analysis of Variance Sum of Squares SStotal SSwithin SSbetween Analysis of Variance Sum of SquaresTotal n c SStotal ( X ij X ) i 1 j 1 2 Analysis of Variance Sum of Squares X piij = individual scores, i.e., the ith observation or test unit in the jth group pi = grand mean X n = number of all observations or test units in a group c = number of jth groups (or columns) Analysis of Variance Sum of SquaresWithin n c SS within ( X ij X j ) i 1 j 1 2 Analysis of Variance Sum of SquaresWithin X piij= individual scores, i.e., the ith observation or test unit in the jth group pi = grand mean X n = number of all observations or test units in a group c = number of jth groups (or columns) Analysis of Variance Sum of Squares Between n SS between n j ( X j X ) j 1 2 Analysis of Variance Sum of squares Between X j= individual scores, i.e., the ith observation or test unit in the jth group X = grand mean nj = number of all observations or test units in a group Analysis of Variance Mean Squares Between MS between SS between c 1 Analysis of Variance Mean Square Within MS within SS within cn c Analysis of Variance F-Ratio MSbetween F MS within A Test Market Experiment on Pricing Sales in Units (thousands) Regular Price $.99 Test Market A, B, or C Test Market D, E, or F Test Market G, H, or I Test Market J, K, or L Mean Grand Mean Reduced Price $.89 Cents-Off Coupon Regular Price 130 118 87 84 145 143 120 131 153 129 96 99 X1=104.75 X=119.58 X2=134.75 X1=119.25 ANOVA Summary Table Source of Variation • Between groups • Sum of squares – SSbetween • Degrees of freedom – c-1 where c=number of groups • Mean squared-MSbetween – SSbetween/c-1 ANOVA Summary Table Source of Variation • Within groups • Sum of squares – SSwithin • Degrees of freedom – cn-c where c=number of groups, n= number of observations in a group • Mean squared-MSwithin – SSwithin/cn-c ANOVA Summary Table Source of Variation • Total • Sum of Squares – SStotal • Degrees of Freedom – cn-1 where c=number of groups, n= number of observations in a group MS BETWEEN F MS WITHIN