Cross Tabs

advertisement
Cross-Tabulations
Cross-Tabs
The level of measurement used for crosstabulations are mostly nominal. Even when
continuous variables are used (such as age
and income), they are converted to
categorical variables.
When continuous variables are converted to
categorical variables, important information
(variation) is lost.
Data Types
Data
Numerical
(Quantitative)
Discrete
Categorical
(Qualitative)
Continuous
Prentice-Hall
Categorical Data
•
Categorical random variables yield
responses that classify
–
•
•
Example: Gender (female, male)
Measurement reflects number in category
Nominal or ordinal scale
–
Examples
• Did you attend a community college?
• Do you live on-campus or off-campus?
Prentice-Hall
Why Concerned about
Categorical Random Variables?
• Survey data tends to be
categorical …
hot/comfortable/cold,
sunny/cloudy/fog/rain,
yes/no…
• Know limitations
– nature of relationship
– causality
• Widely used in marketing
for decision-making
Cross-Tabs
The Chi-square, 2, statistic is used to test
the null hypothesis.
[Unfortunately, Chi-square, like many other statistics that
indicate statistical significance, tells us nothing about the
magnitude of the relation.]
Prentice-Hall
2 Test of Independence
•
Shows whether a relationship exists
between two categorical variables
–
–
–
•
•
One sample is drawn
Does not show nature of relationship
Does not show causality
Used widely in marketing
Uses contingency table
Prentice-Hall
Critical Value
What is the critical 2 value if table has 2 rows and 3
columns, a =.05?
Reject
2
If fo = fe, = 0.
Do not reject H0
df = (2 - 1)(3 - 1)
=2
2 Table
(Portion)
a = .05
0
DF .995
1
...
2 0.010
5.991
2
Upper Tail Area
…
.95
…
… 0.004 …
… 0.103 …
Prentice-Hall
.05
3.841
5.991
2 Test of Independence
Hypotheses & Statistic
•
Hypotheses
–
–
•
H0: Variables are not dependent
H1: Variables are dependent (related)
Test statistic
 
2

all cells
•
Observed frequency
 fo  fe 
2
fe
Expected frequency
Degrees of freedom: (r - 1)(c - 1)
Prentice-Hall
2 Test of Independence
Expected Frequencies
•
Statistical independence means joint probability
equals product of marginal probabilities
–
•
•
•
P(A and B) = P(A)·P(B)
Compute marginal probabilities
Multiply for joint probability
Expected frequency is sample size times joint
probability
Prentice-Hall
2 Test of Independence
An Example
You’re a marketing research analyst. You ask a
random sample of 286 consumers if they purchase
Diet Pepsi or Diet Coke. At the 0.05 level of
significance, is there evidence of a relationship?
Diet Coke
No
Yes
Total
Diet Pepsi
No
Yes
84
32
48
122
132
154
Prentice-Hall
Total
116
170
286
Expected Frequencies
Expected frequency = Column
Prentice-Hall

Row
total  
total 
Grand total
Expected Frequencies
fe  1 in all cells
Diet Pepsi
132·154
286
No
Yes
Diet Coke Obs. Exp. Obs. Exp. Total
132·116
286
No
84
53.5
32
62.5
116
Yes
48
78.5
122
91.5
170
132
132
154
154
286
Total
132·170
286 Prentice-Hall
154·170
286
2

Test of Independence
fe
fo - fe (fo - fe)² (fo - fe)²/ fe
Cell
fo
1,1
84
53.5 +30.5
930.25
17.3879
1,2
32
62.5
-30.5
930.25
14.8840
2,1
48
78.5
-30.5
930.25
11.8503
2,2
122
91.5 +30.5
930.25
10.1667
Total
286
286
54.2889
Prentice-Hall
2 Test of Independence
H0: Not Dependent
H1: Dependent
a = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Reject
a = .05
0
3.841
2
Test Statistic:
 
2

 fo  fe 
all cells
2
fe
 54.2889
Decision:
Reject at a = .05
Conclusion:
There is evidence of a
relationship
Prentice-Hall
Cross-Tabs
Please provide the requested information by checking
(once) in each category.
What is your:

age
____ < 18 ___ 18 - 26 ____ > 26

gender

course load __ < 6 units __ 6 – 12 units __ > 12 units

gpa __ < 2.0 __ 2.0 - 2.5 __ 2.6 - 3.0 __ 3.1 - 3.5 __ > 3.5
____ male
____ female
 annual income __ < $15k __ $15k - $40k ___ > $40k
Cross-Tabs
The information is coded and entered in the
file student.sf by letting the first response
be recorded as a 1, the second as a 2, etc.
Cross-Tabs
The hypothesis test generally referred to as
a test of dependence.
The researcher wishes to determine whether
the variables are dependent, or, exhibit a
relationship.
Cross-Tabs
Let’s investigate whether a relationship
between a student’s gpa and units attempted
exists.
H0: GPA and UNITS are not dependent
H1: GPA and UNITS are dependent.
Cross-Tabs
Chi-Square Test
-----------------------------------------Chi-Square
Df
P-Value
-----------------------------------------3.67
8
0.8853
------------------------------------------
Cross-Tabs
p-value = 0.8853, Retain H0
thus, GPA and UNITS are not dependent
[Based on our data, there is no evidence to support
the concept that a relationship exists between gpa
and units attempted.]
Cross-Tabs
Let’s investigate whether a relationship
between a student’s age and units attempted
exist.
H0: AGE and UNITS are not dependent
H1: AGE and UNITS are dependent.
Cross-Tabs
Chi-Square Test
-----------------------------------------Chi-Square
Df
P-Value
-----------------------------------------9.89
4
0.0423
------------------------------------------
Cross-Tabs
p-value = 0.0423, Reject H0
thus, AGE and UNITS are dependent
[Based on our data, there is sufficient evidence to
support the concept that a relationship exists
between age and units attempted.]
Cross-Tabs
Frequency Table for age by units
Units
<6
6-12
>12
AGE Total
-------------------------------------------------------<18
| 10
| 19 | 17
|
46
| 17.24% | 20.88% | 33.33% | 23.00%
-------------------------------------------------------Age
18-26 | 24
| 22 | 16
|
62
| 41.38% | 24.18% | 31.37% | 31.00%
------------------------------------------------------->26 | 24
| 50 | 18
|
92
| 41.38% | 54.95% | 35.29% | 46.00%
-------------------------------------------------------UNITS Total 58
91
51
200
29.00% 45.50% 25.50% 100.00%
Questions?
ANOVA
Download