Crosstabs & Measures of Association

POL242 October 9 and 11, 2012 Jennifer Hove Questions of Causality  Recall:  Most causal thinking in social sciences is probabilistic, not deterministic: as X increases, the probability of Y increases, not that X invariably produces Y  We can observe only association per Hume  We must therefore infer causation  Not one, but many possible causes Inferring Causal Relations 1. There must be association  X   Y; ~X   ~Y 2. Time order must be considered  Presumed cause should precede presumed effect 3. Must rule out possible rival explanations  Sometimes what appears to be a strong relationship between two variables is due to influence of others 4. Must be able to identify the process by which one factor brings about change in another  Causal linkage Establishing Association  With nominal or ordinal data, relationships usually presented in tabular or table form  Why? Hypotheses rest on core idea of comparison  Ex: if we compare respondents on basis of their value on the IV, say party identification, they should also differ along DV, say support for gay rights  Crosstabs are a wonderful means of making comparisons  “God speaks to you through crosstabs!” Using/Interpreting Crosstabs Table 1: Support for the Afghan Mission by Perceived Impact of Taliban Resurgence, 2007 Fear of Taliban Resurgence All Support for Low High Respondents Afghan Mission 86.1% 52.7% 60.4% Low (173) (355) (528) 13.9 47.3 39.6 High (28) (318) (346) Total 100 100 100 (N) (201) (673) (874) Tau-b=.29 Source: Strategic Counsel, CTV/Globe and Mail Survey, July 2007  Data arranged in side- by-side frequency distributions  IV (X) presented across the top of the table – in columns  If ordinal, arrange from low scores (on left) to high scores (on right)  DV (Y) presented down the left hand side of the table – in rows  Again, if ordinal, arrange from low (at top) to high (at bottom) Using/Interpreting Crosstabs Table 1: Support for the Afghan Mission by Perceived Impact of Taliban Resurgence, 2007 Fear of Taliban Resurgence All Support for Low High Respondents Afghan Mission 86.1% 52.7% 60.4% Low (173) (355) (528) 13.9 47.3 39.6 High (28) (318) (346) Total 100 100 100 (N) (201) (673) (874)  Data presented so that categories of the IV add to 100%  Percentaging within categories of the IV (down in a table)  Comparisons are made across categories of the IV Tau-b=.29  From left to right Source: Strategic Counsel, CTV/Globe and Mail Survey, July 2007  To see the effect of the IV on the DV Rules (!) of Crosstabs 1. Make the IV define the columns and the DV define the rows of the table 2. Always percentage down within categories of the IV 3. Interpret the relationship by comparing across columns, within rows of the table Example: 2 x 2 Crosstab on Xfor Variable Support for Y VariableScore by Support X Variable Score on Y Variable Low High Low A C A+C High B D B+ D A+B C+D Table 1: Support for the Afghan Mission by Perceived Impact of Taliban Resurgence, 2007 Fear of Taliban Resurgence All Support for Low High Respondents Afghan Mission 86.1% 52.7% 60.4% Low (173) (355) (528) 13.9 47.3 39.6 High (28) (318) (346) Total 100 100 100 (N) (201) (673) (874) Diagonals  Main diagonal: running to the right and down  When larger proportion of cases fall on main diagonal, relationship is said to be direct or positive  Low values on X associated with low values on Y; high values on X associated with high values on Y Score on Y Variable Low High Score on X Variable Low High A B C D A+C B+ D A+B C+D Diagonals  Off diagonal: running to the right and up  When larger proportion of cases fall on off diagonal, relationship is said to be inverse or negative  Low values on X associated with high values on Y; high values on X associated with low values on Y Score on Y Variable Low High Score on X Variable Low High A B C D A+C B+ D A+B C+D Explaining Variation in Y  Relationships between variables in social sciences are rarely, if ever, perfectly predictable  You are unlikely to see something like this: Support for Y Variable by Support for X Variable Score on Y Variable Low High Total Score on X Variable Low High 100% 0 0 100% 100 100 Explaining Variation in Y  There is likely to be more than one explanation or “cause” behind the variation in Y  So we will generally be looking at:  X1   Y  X2   Y  To compare, we want to know relative strength of each relationship  A variety of summary terms called measures of association are used Measures of Association  Compress information that appears in a crosstab into a single number by summarizing:  Magnitude (strength) of the relationship  Direction of the relationship  Magnitude: ranges from 0 (completely unpredictable) to 1 (perfectly predictable)  Direction: positive (+) = cases primarily on main diagonal; negative (-) = cases primarily on off diagonal Two Cautionary Notes  Direction is not useful with nominal-level variables, since they are not ordered/ranked from low to high  Even with ordinal measurement, interpretation of direction depends entirely on how your variables are coded  Should always code your variables so that high scores indicate “more” of what you want to explain Direction & Strength  Combining direction & strength, we get a range of possibilities -1.0 -.8 -.6 -.4 -.2 0 +.2 +.4 +.6 +.8  All intermediary values can also occur, e.g. -.2367  Note that equivalent positive and negative scores are equal in strength  Ex: +.4 and -.4 are equal in strength; they differ only in direction +1.0 Choosing among Measures  We use different measures of association for 2 main reasons: 1. There are different levels of measurement  Ordinal measurement offers ranking information used to calculate association, which isn’t available with nominal data 2. Some measures are specific to tables of certain sizes and shapes  Specific measures for 2 x 2 tables; others for larger square tables; still others for rectangular tables Phi Φ  Use with dichotomous variables, 2 x 2 tables  Applies to nominal and ordinal data  Measures the strength of a relationship by taking the # of cases on the main diagonal minus the # of cases on the off diagonal (adjusting for marginal distribution of cases, i.e. the sum of the columns and rows) AD  BC  ( A  B)(C  D)( A  C )(B  D) 2 Examples: Phi Φ Score on Y Variable Score on Y Variable Low High Total Score on X Variable Low High 75% 10% 25% 90% 100 100   .6 Low High Total Score on X Variable Low High 50% 20% 50% 80% 100 100   .2 Cramer’s V  An extension of Phi  Logic of Cramer’s V is based on percentage differences across the columns, not on logic of diagonals  Use with nominal data, when tables are larger than 2 x2 Lambda  Lambda (λ) is another measure of association for nominal data  Its rationale of “percentage of improvement” or “proportion reduction in error” is relatively easy to explain  Not recommended in this course  When modal category of each column is in same row, λ=0 Measures of Association: Ordinal Data  Measures include Tau-b, Tau-c and Gamma  Rely on analysis of diagonals Support for Y Low Med High Support for X Low Med High a b c d e f g h i Measures of Association: Ordinal Data  Measures include Tau-b, Tau-c and Gamma  Rely on analysis of diagonals Support for Y Low Med High Support for X Low Med High a b c d e f g h i Measures of Association: Ordinal Data  Measures include Tau-b, Tau-c and Gamma  Rely on analysis of diagonals Support for Y Low Med High Support for X Low Med High a b c d e f g h i Mind your Ps and Qs  The letter P indicates the # of pairs of cases on the     main diagonals (from left to right) The letter Q indicates the # of pairs of cases on the off diagonal (from right to left) If P > Q, we have a positive association If P < Q, we have a negative association The core calculation = P - Q Gamma  The information of P and Q can be used to calculate Gamma (γ) PQ  PQ PQ P Q    PQ PQ PQ  Problems:  Any vacant cell produces a score of 1.0  Tends to overstate strength of a relationship Tau-b and Tau-c  Preferable to Gamma, though built on the same logic of diagonals  Tends to produce results similar to phi (using nominal data) or the most important interval measure (r) – to be discussed later in the year P Q Tau  b  ( P  Q  X )(P  Q  Y ) Tau-b and Tau-c  Tau-b never quite reaches 1.0 in non-square tables  So Tau-c was developed to use with rectangular tables  In practice, the difference between Tau-b and Tau-c when applied to the same table is not great, but keep the distinction above in mind Example Table 2: Approval of President Chavez by Opinion of the United States, 2007 Opinion of the United States Approval of Chavez Disapprove Approve Total (N) Very Bad 12.7% (26) 87.3 (178) 100 (204) Bad Good 22.8% (64) 77.2 (217) 100 (281) 43.4% (171) 56.6 (223) 100 (394) Very Good 67.9% (110) 32.1 (52) 100 (162) Tau-c: -.39 Tau-b: -.35 Source: Latinobarometer, 2007 – Venezuelan respondents only All Respondents 35.6% (371) 64.4 (670) 100 (1041) Summing Up  With nominal data, use Phi or Cramer’s V  Phi used for 2 x 2 tables  Cramer’s V used for any other crosstab involving nominal data  Avoid Lambda  With ordinal data, use Tau-c or Tau-b  Tau-b used for square tables: 3 x 3, 4 x 4, etc  Tau-c used for rectangular tables  Avoid Gamma

Crosstabs & Measures of Association

Related documents

Products

Support

Crosstabs & Measures of Association

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib