Cross-Tabulations Cross-Tabs The level of measurement used for crosstabulations are mostly nominal. Even when continuous variables are used (such as age and income), they are converted to categorical variables. When continuous variables are converted to categorical variables, important information (variation) is lost. Data Types Data Numerical (Quantitative) Discrete Categorical (Qualitative) Continuous Prentice-Hall Categorical Data • Categorical random variables yield responses that classify – • • Example: Gender (female, male) Measurement reflects number in category Nominal or ordinal scale – Examples • Did you attend a community college? • Do you live on-campus or off-campus? Prentice-Hall Why Concerned about Categorical Random Variables? • Survey data tends to be categorical … hot/comfortable/cold, sunny/cloudy/fog/rain, yes/no… • Know limitations – nature of relationship – causality • Widely used in marketing for decision-making Cross-Tabs The Chi-square, 2, statistic is used to test the null hypothesis. [Unfortunately, Chi-square, like many other statistics that indicate statistical significance, tells us nothing about the magnitude of the relation.] Prentice-Hall 2 Test of Independence • Shows whether a relationship exists between two categorical variables – – – • • One sample is drawn Does not show nature of relationship Does not show causality Used widely in marketing Uses contingency table Prentice-Hall Critical Value What is the critical 2 value if table has 2 rows and 3 columns, a =.05? Reject 2 If fo = fe, = 0. Do not reject H0 df = (2 - 1)(3 - 1) =2 2 Table (Portion) a = .05 0 DF .995 1 ... 2 0.010 5.991 2 Upper Tail Area … .95 … … 0.004 … … 0.103 … Prentice-Hall .05 3.841 5.991 2 Test of Independence Hypotheses & Statistic • Hypotheses – – • H0: Variables are not dependent H1: Variables are dependent (related) Test statistic 2 all cells • Observed frequency fo fe 2 fe Expected frequency Degrees of freedom: (r - 1)(c - 1) Prentice-Hall 2 Test of Independence Expected Frequencies • Statistical independence means joint probability equals product of marginal probabilities – • • • P(A and B) = P(A)·P(B) Compute marginal probabilities Multiply for joint probability Expected frequency is sample size times joint probability Prentice-Hall 2 Test of Independence An Example You’re a marketing research analyst. You ask a random sample of 286 consumers if they purchase Diet Pepsi or Diet Coke. At the 0.05 level of significance, is there evidence of a relationship? Diet Coke No Yes Total Diet Pepsi No Yes 84 32 48 122 132 154 Prentice-Hall Total 116 170 286 Expected Frequencies Expected frequency = Column Prentice-Hall Row total total Grand total Expected Frequencies fe 1 in all cells Diet Pepsi 132·154 286 No Yes Diet Coke Obs. Exp. Obs. Exp. Total 132·116 286 No 84 53.5 32 62.5 116 Yes 48 78.5 122 91.5 170 132 132 154 154 286 Total 132·170 286 Prentice-Hall 154·170 286 2 Test of Independence fe fo - fe (fo - fe)² (fo - fe)²/ fe Cell fo 1,1 84 53.5 +30.5 930.25 17.3879 1,2 32 62.5 -30.5 930.25 14.8840 2,1 48 78.5 -30.5 930.25 11.8503 2,2 122 91.5 +30.5 930.25 10.1667 Total 286 286 54.2889 Prentice-Hall 2 Test of Independence H0: Not Dependent H1: Dependent a = .05 df = (2 - 1)(2 - 1) = 1 Critical Value(s): Reject a = .05 0 3.841 2 Test Statistic: 2 fo fe all cells 2 fe 54.2889 Decision: Reject at a = .05 Conclusion: There is evidence of a relationship Prentice-Hall Cross-Tabs Please provide the requested information by checking (once) in each category. What is your: age ____ < 18 ___ 18 - 26 ____ > 26 gender course load __ < 6 units __ 6 – 12 units __ > 12 units gpa __ < 2.0 __ 2.0 - 2.5 __ 2.6 - 3.0 __ 3.1 - 3.5 __ > 3.5 ____ male ____ female annual income __ < $15k __ $15k - $40k ___ > $40k Cross-Tabs The information is coded and entered in the file student.sf by letting the first response be recorded as a 1, the second as a 2, etc. Cross-Tabs The hypothesis test generally referred to as a test of dependence. The researcher wishes to determine whether the variables are dependent, or, exhibit a relationship. Cross-Tabs Let’s investigate whether a relationship between a student’s gpa and units attempted exists. H0: GPA and UNITS are not dependent H1: GPA and UNITS are dependent. Cross-Tabs Chi-Square Test -----------------------------------------Chi-Square Df P-Value -----------------------------------------3.67 8 0.8853 ------------------------------------------ Cross-Tabs p-value = 0.8853, Retain H0 thus, GPA and UNITS are not dependent [Based on our data, there is no evidence to support the concept that a relationship exists between gpa and units attempted.] Cross-Tabs Let’s investigate whether a relationship between a student’s age and units attempted exist. H0: AGE and UNITS are not dependent H1: AGE and UNITS are dependent. Cross-Tabs Chi-Square Test -----------------------------------------Chi-Square Df P-Value -----------------------------------------9.89 4 0.0423 ------------------------------------------ Cross-Tabs p-value = 0.0423, Reject H0 thus, AGE and UNITS are dependent [Based on our data, there is sufficient evidence to support the concept that a relationship exists between age and units attempted.] Cross-Tabs Frequency Table for age by units Units <6 6-12 >12 AGE Total -------------------------------------------------------<18 | 10 | 19 | 17 | 46 | 17.24% | 20.88% | 33.33% | 23.00% -------------------------------------------------------Age 18-26 | 24 | 22 | 16 | 62 | 41.38% | 24.18% | 31.37% | 31.00% ------------------------------------------------------->26 | 24 | 50 | 18 | 92 | 41.38% | 54.95% | 35.29% | 46.00% -------------------------------------------------------UNITS Total 58 91 51 200 29.00% 45.50% 25.50% 100.00% Questions? ANOVA