Controlling for a third variable: Examples (with one exception) from the NES data • Note: The example of a spurious relationship I gave in class was incorrect. It was incorrect because education was not the cause of race. (The expression “duh” comes to mind here.) • The substantive point of work on this subject was that lower participation on the part of African Americans could be explained by their lower levels of education—i.e., the explanation was not race per se. • That point is correct. But Professor Powell’s point—that education was an intervening variable—was also correct. • The example of a spurious relationship that follows is corrected. (Same as before; diff var’s) Example: Spurious relationship • Spurious relationships, where the original relationship more or less completely disappears when you control for a second variable, are quite rare. • So, for this one example, I’m going to show you hypothetical numbers. Spurious relationship (cont.) • Hypothesis: People with low levels of political efficacy participate less than those with high efficacy. • But, this effect could be explained by lower levels of education, so controlling for education would make the relationship between efficacy and participation disappear. • That is, the relationship between efficacy and participation is spurious. The simple (uncontrolled) relationship might look like this. Tau-c = -.33 Low part. High Low efficacy efficacy 65% 75% Med. part. 25% 20% High part. 10% 5% The difference is not large, but it clearly shows that those with low efficacy participate less. Low education tau-c = -.05 Hi Low 73 78 Md 22 18 Lo Hi 5 When we control for education, we might get something like this. 4 Medium education tau-c = .01 High education tau-c = .04 Hi Low 68 67 Lo Md 23 25 Md 40 40 Hi 15 Lo Hi 9 8 Hi 47 13 Low 45 Spurious relationship (cont.) • How do we interpret the results? • As strong evidence that the original relationship was spurious—i.e., that the difference in participation rates between more and less efficacious people was due to their differing education levels. • Question: Does this mean that people with low efficacy actually participate as much as those with high efficacy? Spurious relationship (cont.) • No. The original relationship is not wrong. But it is misleading if left unexplained. • A sensible interpretation is that those with low efficacy lag in education and that is why they participate less. They evidently do not participate less because of various psychological or political motivations that are associated low efficacy. Example: Conditional (specification) relationship • Hypothesis: People’s overall liberal-conservative views (judged by their self-placement) influence (cause) their feelings of attachment to the political parties (measured by their three-point party identification). • However, this relationship is likely not to be as strong for African Americans (who, overall, very often consider themselves Democrats). Conditional relationship (cont.) • So, let’s first look at the relationship between lib-cons self placement and partisanship (uncontrolled). Party ID: 3 categories * Self plcmnt lib-con 3 cats Crosstabulation % within Self plcmnt lib-con 3 cats Self plcmnt lib-con 3 cats Party ID: 3 categories 1. Democrat 2. independent 3. Republican Total 1. Liberal 52.8% 41.7% 5.5% 100.0% 3. Moderate 37.4% 55.1% 7.5% 100.0% 5. Conservative 22.1% 35.0% 42.9% 100.0% Total 34.3% 38.7% 27.0% 100.0% Symmetric Measures Ordinal by Ordinal Kendall's tau-b Kendall's tau-c N of Valid Cases Value .383 .343 1608 Asymp. a Std. Error .019 .018 b Approx. T 19.410 19.410 a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis. Approx. Sig. .000 .000 Conditional relationship (cont.) • Now, let’s control for race (blacks vs. whites only). Party ID: 3 categories * Self plcmnt lib-con 3 cats * Race: White/Black Crosstabulation % within Self plcmnt lib-con 3 cats Self plcmnt lib-con 3 cats Race: White/Black 1. White 2. Black Party ID: 3 categories Total Party ID: 3 categories 1. Democrat 2. independent 3. Republican 1. Democrat 2. independent 3. Republican Total 1. Liberal 49.8% 44.0% 6.2% 100.0% 80.6% 18.1% 1.4% 100.0% 3. Moderate 31.3% 59.0% 9.6% 100.0% 69.2% 30.8% 100.0% 5. Cons ervative 17.6% 34.1% 48.3% 100.0% 59.2% 35.5% 5.3% 100.0% Total 29.6% 39.2% 31.2% 100.0% 69.6% 27.3% 3.1% 100.0% Symmetric Measures Race: White/Black 1. White 2. Black Ordinal by Ordinal Kendall's tau-b Kendall's tau-c N of Valid Cas es Ordinal by Ordinal Kendall's tau-b Kendall's tau-c N of Valid Cas es Value .418 .371 1263 .217 .163 161 a. Not as s uming the null hypothes is. b. Using the as ymptotic standard error as suming the null hypothesis. Asymp. a Std. Error .021 .019 Approx. T 19.170 19.170 Approx. Sig. .000 .000 .072 .055 2.970 2.970 .003 .003 b Conditional relationship (cont.) • Summary of relationships Self-placement x party (tau-b) Overall Whites Blacks .38 .42 .22 Conditional relationship (cont.) • How do we interpret the results? • As strong support for the hypothesis—both that people’s liberal-conservative views influence their partisanship and that this relationship is conditioned by race (i.e., is stronger for whites than blacks). • How do we know that? Because when we control, the tau-c (.38) is strengthened for whites (.42) and considerably reduced (.22) for blacks. Conditional relationship (cont.) • Note that this conclusion does not say that African Americans are more often Democratic—though that is true. • Rather, it says that the relationship between liberal-conservative views and partisanship is weaker for blacks. Presumably, blacks’ ideological views play less of a role (than for whites) in determining their partisanship. Example: Intervening variable • Hypothesis: Partisanship has a very strong effect on who one votes for. • But, it has this effect because partisanship causes people to have very different views of issues and candidates, which in turn, influence the vote. • That is, issue and candidate views intervene between partisanship and voting choices. Intervening variable (cont.) • So, let’s first look at the relationship between partisanship and vote choice (uncontrolled). Vote: Gore or Bush * Party ID: 3 categories Crosstabulation % within Party ID: 3 categories Vote: Gore or Bush Gore Bush Total Party ID: 3 categories 2. 1. Democrat independent 3. Republican 94.2% 47.1% 7.2% 5.8% 52.9% 92.8% 100.0% 100.0% 100.0% Total 52.7% 47.3% 100.0% Symmetric Measures Ordinal by Ordinal Kendall's tau-b Kendall's tau-c N of Valid Cases Value .675 .777 1114 Asymp. a Std. Error .016 .018 b Approx. T 43.153 43.153 a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis. Approx. Sig. .000 .000 Intervening variable (cont.) • Now, let’s control for Clinton’s handling of the economy. (Yes, I’m aware that we’re looking at 2000 and Gore was the Dem. candidate. But how Clinton handled the economy might still have been important.) Vote: Gore or Bush * Party ID: 3 categories * Clinton's handling of economy Crosstabulation % within Party ID: 3 categories Clinton's handling of economy 1. Approve s trongly Vote: Gore or Bus h Gore Bus h Total Vote: Gore or Bus h Gore Bus h Total 4. Dis approve not s trongly Vote: Gore or Bus h Gore Bus h Total Vote: Gore or Bus h Gore Bus h 2. Approve not strongly 5. Dis approve strongly Total Tau-c for the four categories of handling of the economy Party ID: 3 categories 2. 1. Democrat independent 3. Republican 96.0% 68.9% 19.7% 4.0% 31.1% 80.3% 100.0% 100.0% 100.0% 94.0% 48.0% 7.5% 6.0% 52.0% 92.5% 100.0% 100.0% 100.0% 70.0% 19.2% 6.0% 30.0% 80.8% 94.0% 100.0% 100.0% 100.0% 63.6% 10.9% 1.2% 36.4% 89.1% 98.8% 100.0% 100.0% 100.0% Approve strongly Approve not strongly .47 .72 Disapprove not strongly .34 Disapprove strongly .24 Total 79.3% 20.7% 100.0% 46.7% 53.3% 100.0% 17.4% 82.6% 100.0% 9.3% 90.7% 100.0% Intervening variable (cont.) • How do we interpret the results? • First, as (pretty) strong support for the hypothesis (there is that messy .72). It appears as if views of Clinton’s handling of the economy intervene between partisanship and voting choices. • How do we know that? Because when we control, the tau-c’s are (with one exception) considerably reduced from the original value of .78 and because of our theory. Intervening variable (cont.) • Second, partisanship makes some difference above and beyond that of Clinton’s handling of the economy. • How do we know that? The tau-c values are still quite large, even after controlling. (Also, look at the percentages in each of the sub-tables.) Intervening variable (cont.) • Third, it’s another example of a conditional relationship. Note that (again, with the exception of the one category), as Clinton approval goes down, the relationship between partisanship and the vote is weaker. • NOTE: this does not simply mean that fewer people voted for Gore, though this is true—but that the relationship between partisanship and the vote was weaker. Intervening variable (cont.) • What do we do with that pesky .72? • Don’t totally ignore it: the results are not perfect. Reality isn’t always simple. • If possible, try to explain why the “oddity” occurs. (In this case, I think it would be very difficult.) Try to find support for any explanation you come up with. • Don’t over-interpret—i.e, don’t come up some unlikely, unsupported explanation. Intervening variable (cont.) • One thing you should do is to look at the n (number of cases) underlying the odd result. (Here I’ve suppressed the n’s simply to make things big enough to read.) • In this case, the n is not small. Good try, but it doesn’t work. • We’re left with (as is often the case) a good, but not perfect (or perfectly explicable) analysis and interpretation. Example: Antecedent variable • Hypothesis: Interest in the campaign causes people to be more informed about politics. • But, education is an antecedent variable— i.e., education causes people to be interested in politics and thus, indirectly, is a cause of knowledge. • If this sounds rather like the intervening variable case, it should (as you will see). Antecedent variable (cont.) • What I’m going to do is simply follow the steps outlined by Professor Powell (next slide). Using a third variable to find an antecedent cause: a b + c a + b A causes b, but we can learn more by finding a is caused by c. a Here we start with: a We ascertain: b c b a c With… a Then we identify a as intervening by predicting b with c and controlling for a. To the extent the relationship is attenuated by the control, c is antecedent and works through a. Example: Antecedent variable • So, let’s first look at the relationship between campaign interest and political knowledge (uncontrolled). (a & b) • For convenience of presentation, I’ve collapsed the six-item knowledge scale into three categories. Generally, I would not do this. I would normally prefer to have more, rather than fewer, categories (especially in my dependent variable). knowl2 * Campaign interest Crosstabulation % within Cam paign interest knowl2 Total 1.00 2.00 3.00 Cam paign interest High Low 37.4% 69.1% 48.3% 29.0% 14.2% 1.9% 100.0% 100.0% Total 43.2% 44.8% 12.0% 100.0% Symmetric Measures Ordinal by Ordinal Kendall's tau-b Kendall's tau-c N of Valid Cases Value -.246 -.208 1424 Asymp. a b Std. Error Approx. T Approx. Sig. .022 -10.270 .000 .020 -10.270 .000 a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis. Antecedent variable (cont.) • Now we need to see if education is related to interest in the campaign. (c & a) • So, we crosstab education and interest. Campaign interest * Education: 3 categories Crosstabulation % within Education: 3 categories Campaign interest Total High Low Education: 3 categories 1. Les s 3. More than HS 2. HS than HS 65.0% 70.5% 84.0% 35.0% 29.5% 16.0% 100.0% 100.0% 100.0% Total 78.2% 21.8% 100.0% Symmetric Measures Ordinal by Ordinal Kendall's tau-b Kendall's tau-c N of Valid Cas es Value -.174 -.148 1800 Asymp. a Std. Error .023 .020 b Approx. T -7.240 -7.240 a. Not as s uming the null hypothes is. b. Using the as ymptotic standard error as suming the null hypothesis. Approx. Sig. .000 .000 Antecedent variable (cont.) • Note that we have a tau-c here of .15 (the output says -.15, but effectively, it’s a positive relationship). • Two steps left. • We next check to see if education is related to knowledge. (c & b) know2 * Education: 3 categories Crosstabulation % within Education: 3 categories know2 1.00 2.00 3.00 Total Education: 3 categories 1. Les s 3. More than HS 2. HS than HS 78.3% 59.7% 32.0% 19.2% 35.2% 52.0% 2.5% 5.1% 16.0% 100.0% 100.0% 100.0% Total 43.1% 44.8% 12.0% 100.0% Symmetric Measures Ordinal by Ordinal Kendall's tau-b Kendall's tau-c N of Valid Cases Value .305 .250 1421 Asymp. a Std. Error .022 .019 b Approx. T 13.344 13.344 a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis. Approx. Sig. .000 .000 Antecedent variable (cont.) • Education and knowledge are related (tauc = .25) Note: if you are really quick-eyed, you will note that I used a tau-c here on a 3x3 table. I did so for consistency. • Finally, we again look at the relationship between education and knowledge, but now controlling for interest. (c & b, controlling for a) know2 * Education: 3 categories * Campaign interest Crosstabulation % within Education: 3 categori es Cam pai gn interest High know2 Low Total know2 1.00 2.00 3.00 1.00 2.00 3.00 Total Education: 3 categori es 1. Les s 3. More than HS 2. HS than HS 73.2% 55.4% 27.5% 23.2% 38.5% 54.3% 3.7% 6.1% 18.2% 100.0% 100.0% 100.0% 89.5% 72.3% 60.3% 10.5% 25.5% 37.3% 2.1% 2.4% 100.0% 100.0% 100.0% Total 37.4% 48.3% 14.3% 100.0% 69.0% 29.1% 1.9% 100.0% Symmetric Measures Campaign interest High Ordinal by Ordinal Low N of Valid Cases Ordinal by Ordinal N of Valid Cases Kendall's tau-b Kendall's tau-c Kendall's tau-b Kendall's tau-c Value .296 .235 1163 .200 .155 258 a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis. Asymp. a Std. Error .025 .021 Approx. T 11.389 11.389 b Approx. Sig. .000 .000 .053 .042 3.666 3.666 .000 .000 Antecedent variable (cont.) • How do we interpret the results? • As some support for the hypothesis. It appears as if education is an antecedent cause of the relationship between the campaign interest and political knowledge. • How do we know that? Because, when we controlled, the tau-c’s were reduced from the original value of .25 (to .24 for high interest and .16 for low interest) and because of our theory. Some cautionary notes • Be careful about using a control variable with too many categories (or recode so there aren’t too many categories). Vote: Gore or Bush * Clinton's handling of economy Crosstabulation % within Clinton's handling of economy Vote: Gore or Bush Total Gore Bush 1. Approve strongly 79.2% 20.8% 100.0% Clinton's handling of economy 2. Approve 4. Disapprove 5. Disapprove not strongly not strongly strongly 46.7% 17.4% 9.2% 53.3% 82.6% 90.8% 100.0% 100.0% 100.0% Total 54.3% 45.7% 100.0% I took this relationship and then, unthinkingly, controlled for education, with 7 categories. (Education is not a very sensible control here theoretically, but set that aside.) Cautionary notes (cont.) • There’s a lot of variation in the tau-c values, but no sensible pattern. • Some of the variability may be caused by small numbers of cases. 8 grades or less .45 9-11 grades .64 Grad. high school .58 Some college .48 Community college .84 BA degree .57 Advanced degree .53 • Similarly, in an earlier example of a specified relationship, I looked only at liberal-conservative views x partisanship for blacks and whites, not for all races. That’s because my theory told me what to expect for these two groups, not for others. • Which leads to the next point. Cautionary notes (cont.) • Theory/reasoning is important. • What it makes sense to control for, and what the interpretation is once you’ve controlled, depends heavily on your reasoning about what causes what. • In particular, whether you have an intervening variable or an antecedent variable, isn’t determined simply by the tables you run (or the measures you calculate). An explanatory point • As shown in the text (pp. 87-92), the same sort of reasoning (about kinds of relationships) is applicable when you have interval-level variables and use means instead of crosstabs. A look forward • It might occur to you to ask about controlling for more than one variable. • Good thought. We will. But we generally do not do it by using crosstabs—for obvious reasons about complexity and interpretability. • We will get into this soon by looking at correlation and regression. Data Analysis #2 Due one week from today (by ind’s, not pairs) • Directions are on the syllabus • Reminders (unnecessary for most of you) Do not simply give us SPSS tables. Do create tables with meaningful labels, only the entries that are necessary, and so on. Explain your results. (More than “yes my hypothesis is supported.) • Usually c3 pp. (double-spaced) + tables Tables should go on a separate page. • Writing is important. Use clear, straightforward prose. Proper grammar; correct spelling, punctuation, and capitalization; typofree