Stairway to heaven or highway to hell? A skeptical view on combining regression analysis and case studies - draft, 3/27/2008 – Paper to be presented at the ECPR Joint Sessions, Rennes, 11-16 April 2008 Workshop on “Methodological Pluralism? Consolidating Political Science Methodology“ Ingo Rohlfing, PhD Dr. Peter Starke Research Associate Research Associate Department of Management, Economics and Collaborative Research Center 597 Social Sciences “Transformations of the State” University of Cologne University of Bremen rohlfing@wiso.uni-koeln.de peter.starke@sfb597.uni-bremen.de 1 1. Introduction Mixed-method approaches, triangulation, nested design – those are the terms that have been used to describe – and often advocate – methods that combine small-N, case study work and large-N, statistical techniques of analysis (Symposium, 2007). After years of mutual ignorance, skepticism, or even hostility between the two camps of qualitative and quantitative researchers in political science, a growing number of scholars now takes the view that methodological pluralism and, indeed, the combination of various methods is the way forward (see, for example, Bäck and Dumont, 2007; Bennett, 2002; Capoccia and Freeden, 2006; Coppedge, 1999; Lieberman, 2005; Tarrow, 1995). We share this view but, in this paper, will point to a number of problems of ’mixing methods’ in practice and the combination of process-tracing case studies and multivariate regression in particular. More specifically, this paper addresses several serious problems which affect precisely the case selection procedures that have been advanced in the methodological literature as ‘good practice’. In what follows, we concentrate on three research designs in particular: typical case, deviant case and pathway case. In contrast to what is argued in the literature, we contend that there is no simple way to identify genuine deviant or typical cases. The same goes for the recently developed pathway case technique (Gerring, 2007b), which largely derives its logic from the more long-standing deviant-case design. The baseline of our paper is that the modeldependence of case selection should move to the center of the discussion. While there is considerable literature on the model-dependence of statistical results (e.g., Bartels, 1997; Ho, Imai, King and Stuart, 2007; King and Zeng, 2007), there is almost no reflection about how this dependence affects the choice of cases and within-case analyses. On the basis of our criticism of the existing techniques, we suggest a robust case selection technique that allows us to select deviant and typical cases with more confidence. Our point is not to argue that multi-method work is not advisable per se, but to point to the problems and unresolved issues that may, in some cases, seriously endanger the validity of the causal inferences. Put differently, the whole may not be more, but less than the sum of its parts (Dunning, 2007; Rohlfing, 2008). The paper is structured as follows: We first present what has emerged as something like the ‘canon’ of case selection in multi-method designs. We particularly focus on mixedmethod designs aimed at uncovering spurious empirical relationships and at identifying omitted variables. We then go on to criticize the case selection techniques that have been advised in this context. The deviant case analysis and its more recent ‘enhanced version’, the pathway case, are at the centre of our argument. In the fourth section, we generalize our 2 critique and look at what we call the ‘model-dependence’ of case selection and its various forms. We present, in section five, a procedure to select ‘robust’ typical or deviant cases that explicitly takes some of our criticisms into account. The last section concludes. 2. Why and how to select cases in regression analysis? Before addressing some of the problems of case selection and analysis in mixed-methods designs, we should ask for the main reasons for using mixed methods in the first place. Why should quantitative large-N techniques – usually multivariate regressions – be supplemented with less technical methods based on a much smaller number of units (cf., Beck, 2006)? Or, conversely, why should qualitative scholars base their case selection on large-N analyses? Don’t they already have enough detailed knowledge about ‘their’ cases to be able to develop a workable research design without having to rely on quantitative methods and its numerous and demanding underlying assumptions (Ebbinghaus, 2005)? A variety of reasons for using mixed methods have been advanced in the literature. For instance, after having carried out an in-depth study of a single case, a scholar may have reason to believe that his or her conclusions can be generalized (cf., Rueschemeyer, 2003). The validity of the small-N conclusions can then simply be tested on a larger number of cases in a regression analysis. An exploratory case study is thus combined with a confirmatory large-N analysis (Lieberman, 2005). In this paper, however, we focus on two specific mixedmethods research designs and case selection techniques usually associated with them. These are, first, testing for spuriousness and the identification of within-case causal processes and, second, finding omitted variables. On the first count, process tracing can be used to identify the causal processes operating at the within-case level, presuming that a causal effect is underpinned by a causal process (Tarrow, 1995: 472). Indeed, testing for spuriousness and the presence of causal links is one major purpose of process tracing (George and Bennett, 2005: 35; Lieberman, 2005: 444). The most basic form of spuriousness is present when an X/Y-relationship identified through large-N analysis is non-causal because both X and Y depend on a common background variable Z (Simon, 1954). Process tracing thus looks for the presence and nature of causal pathways linking X and Y and, if there is no causal process, whether a third variable Z causes both X and Y. It might even be the case that X and Y correlate with each other, but there is neither a causal link between them, nor is there an antecedent variable Z accounting for the correlation. An example for this variant of spuriousness can be found in Thomson’s 3 (2007) regression analysis of the compliance of 15 European Union (EU) member states with six labor market directives. One independent variable is the degree of centralization of the EU countries. The hypothesis is that an increasing degree of decentralization makes the timely implementation of directives less likely, which is supported by the (large-N) results. However, a qualitative analysis of the directives shows that the sub-federal level was not involved at all in the implementation stage (Falkner, 2007). Thus, the correlation between the decentralization variable and the dependent variable is spurious. If there is a causal process, one can inductively generate hypotheses about how a certain X causes Y (Lijphart, 1971: 692). Alternatively, one can proceed deductively by developing theoretical expectations about causal within-case patterns that are then tested empirically through ‘pattern matching’, that is, by contrasting theoretical expectations with empirical observations.1 In these variants, process tracing is particularly useful for the analysis of overdetermined causal relationships at the cross-case level, that is, several theories predict the same X/Y-relationship. Ideally, we have several rival theories and can assume that all, except one, are spurious. Small-N techniques then help to identify the one theory that causally explains the outcome. Process tracing is suitable for such purposes because rival theories are usually based on rival mechanisms that we can try to uncover at the within-case level through pattern matching. 2 For instance, the reciprocal exchange of concessions in international trade can be explained through the politicians’ concerns about national welfare, national security, and the lobbying of private economic actors (cf., Bhagwati, 2002; Gowa and Mansfield, 1993, 2004). Thus, observing reciprocity does not suggest which of the three approaches really explains trade cooperation. According to the security account, however, the military should play an important role in the trade policy process and lobbying of economic actors should be irrelevant. The domestic politics explanation makes the opposite predictions, thus allowing it to clearly discriminate between the two theories at the within-case level.3 1 Another purpose of process tracing is to develop comprehensive explanations for a small number of cases (Mahoney and Goertz, 2006), which is the key feature of Comparative Historical Analysis (CHA) (Mahoney and Rueschemeyer, 2003). In terms of causal perspectives, one can say that case studies in regression analysis are Xcentered, i.e., one is interested the effects of certain independent variables. On the other hand, case studies in the realm of CHA are Y-centered insofar as they are interested in understanding a specific outcome. 2 Obviously, this view is based on the assumption that causal mechanisms are observable. There is, however, no scholarly agreement on this point (cf., Gerring, 2007c). For the sake of our argument, however, that issue is not crucial. 3 One important precondition for pattern-matching is that theories are sufficiently specified in terms of their underlying causal mechanisms, which is often not the case. Strictly speaking, in order to do deductive process tracing correctly, we need to ‘establish an uninterrupted causal path linking the putative causes to the observed effects, at the appropriate level(s) of analysis as specified by the theory being tested’ (George and Bennett, 2005: 222). Spelling out the expected empirical implications of a theory in a fine-grained manner, however, is no trivial task and may itself be based on a number of contestable assumptions. 4 Now, what case(s) should be selected for pattern-matching in a mixed-method design? Prima facie, it is argued that one should select a ‘typical case’ or ‘onlier’ (Eckstein, 1975: 108; Gerring, 2007a: 91-97). In statistical terms, a typical case is defined by a low residual in a regression model. Statistically typical cases are believed to be theoretically typical too. This means that they no omitted variables should be in place and that they are best suited to test the within-case implications of one or multiple theories. For example, Lieberman (2003) performs a regression analysis of tax structures and picks South Africa as a typical case so as to perform a within-case analysis (South Africa is additionally compared with the Brazil, which is an outlier). A second important aim of mixed-method designs is the search for omitted variables. The most prominent case selection technique in this context is the ‘deviant case analysis’ which is done in an exploratory fashion (Eckstein, 1975: 110). In Arend Lijphart’s classic definition (1971: 692) deviant case analyses are ‘studies of single cases that are known to deviate from established generalizations’. The aim is ‘to uncover relevant additional variables that were not considered previously or to refine the (operational) definitions of some or all of the variables’.4 This mode of research is at present probably more widely acknowledged as a key feature of case studies than pattern-matching. In fact, some of the more respected casestudy designs are based on the idea that within-case analysis may help in the search for variables previously not considered relevant (Rogowski, 1995). A famous example of a deviant case analysis is Lijphart’s (1968)own study of the Netherlands. Whereas it was previously thought that democratic consolidation in segmented societies is impossible when cross-cutting cleavages between groups are absent, Lijphart demonstrated that this is not necessarily the case. The Netherlands, a ‘pillarized society’ marked by high segmentation into a small number of religious/ideological groups nonetheless had a history of stable democracy. He explained this particular outcome by highlighting the ‘politics of accommodation’ at the elite level, a variable which he thought may enrich the existing pluralist theories of democracy. Lijphart thus went beyond explaining the single case of the Netherlands as an historical curiosity and towards the development of more general propositions (to be tested in later studies). In statistical terms, we are thus looking for cases with a high residual, or ‘outliers’ (Gerring, 2007a: 105-108). Contrary to what is often believed, quantitative analysis is not blind to the issue of deviant cases and case knowledge in more general. In a textbook 4 The second part of the quote refers to the refinement of concepts and indicators. This is an important task of within-case analysis (Adcock and Collier, 2001), which we cannot address in our paper in more detail. 5 regression, the researcher is equipped with within-case insights that inform the specification of the model (Achen, 2005; Beck, 2006). Given that deviant cases pose problems for proper regression estimation, sound knowledge of cases particularly extends to deviant cases (cf., King and Zeng, 2007).5 The rationale for the choice of statistically atypical cases, that is, cases with a high residual, is that they are expected to be theoretically deviant as well. It is precisely this theoretical deviance one aims to resolve by discerning the variables keeping the case away from the regression surface. Because of this, deviant cases are considered inappropriate for pattern-matching (George and Bennett, 2005: 20-21; Gerring, 2007a: 105106), which is what should be done with typical cases as described above. One serious problem of both typical case and deviant case analysis is the issue of systematic and non-systematic variables, which we think it is a crucial problem when it comes to integrating quantitative and case study methods. In the quantitative literature, a variable is considered important when it has a systematic effect, that is, when the causal effect is different from zero in a large number of cases (King, Keohane and Verba, 1994: 76-85).6 Thus, “systematic” and “non-systematic” are cross-case properties of variables that are necessarily impossible to identify in small-n research. Finding omitted factors should not be a problem in a case study because the empirical picture is much more complex than the one captured by the model (Eckstein, 1975: 107; Geddes, 2003: Ch. 3). However, the tricky task is to separate the idiosyncratic factors – e.g., the factors explaining stable democracy in the Netherlands and only there – from the ones that are relevant at the cross-case level – e.g., factors that explain democratic consolidation in divided societies more generally. Due to the larger number of cases, regression analysis is more suitable for separating systematic from non-systematic variables. Yet, as emphasized earlier, the systematic relationships may still be non-causal, that is, spurious. Conversely, while the advantage of small-N analysis is its ability to detect spuriousness, the corresponding disadvantage is the difficulty to detect nonsystematic variables through case-study analysis. What does this mean for the problem of selecting deviant and typical cases? In the case of the deviant case, we may well find a cause that pulls the case away from the regression surface, but without looking at a larger number of cases, we simply cannot know if the causal factor we have found is systematic or not. With respect to the typical case, there 5 We readily acknowledge that the practice of handling outliers often deviates from the textbook advice, which is, probably true for all methods. In the case of regression estimation, one may dispense a deviant-case analysis by running a robust regression that diminishes the influence of the case on the results (Berk, 1990; Western, 1995) or eliminate the case altogether. The latter strategy actually is viable if it can be shown that the case should not belong to the population. However, this requires a within-case analysis in the first place. 6 Of course, a variable with a systematic effect must also be substantively relevant, meaning that it must make theoretical sense to include it in the model at hand. 6 may equally be non-systematic factors that transform a theoretically atypical case into an empirically typical case, i.e., one with a low residual. In other words, a high residual alone does not suggest whether a typical or deviant case is not just statistically but also theoretically typical or deviant. Notwithstanding this criticism, we still believe that it makes sense to select cases on the residual. However, we would like to sound a note of caution with respect to some of the strong theoretical assumptions regarding the status of a deviant or typical case. A deviant-case analysis should be performed with the knowledge that the acquired process tracing evidence can only deliver clues about another model specification, the appropriateness of which needs to be determined through large-N analysis and diagnostics. Another implication of the impossibility of case studies to credibly identify theoretically typical and deviant cases is that the large-n method should provide the best possible context for a within-case analysis. This means that the estimated residuals should be as valid indicators as possible for the theoretical status of a case. In the following sections, we argue that this is less often case than is acknowledged in the literature. 3. Outliers, pathway cases, and the statistics of case selection In the previous section, we have detailed the standard approach toward the choice of cases in regression analysis. Recently, the pathway case procedure has been proposed as a somewhat more sophisticated variant of the conventional account (Gerring, 2007b). The rationale for the choice of pathway cases is to achieve the best possible context for the inductive development of hypotheses on causal process and pattern-matching.7 The search for pathway cases begins with the identification of the model thought to be free of misspecification errors and which displays a good performance.8 After having selected the model, the variable one wants to make subject of a within-case analysis is dropped from the equation. The reduced model is then estimated and the residuals for all cases are computed. In the next step, one identifies those cases that are typical in the full model and for which the absolute value of the reducedmodel residual is larger than the absolute value of the full-model residual. In other words, we should look for cases for which the inclusion of the additional variable in the full model 7 The pathway approach can be applied to the classic two-case comparisons with binary variables and continuous variables in regression analysis. We limit the following discussion to the latter type because of our interest in regression analysis and residuals as the basis for case selection. 8 Selecting a pathway case without having a well-performing model in the first place makes no sense because there is little value in examining independent variables without considerable explanatory power. The precise criterion for model-performance and selection – like the Akaike and Bayesian Information Criterion (cf., Kuha, 2004) – is not relevant for the point we make. 7 makes a big difference in terms of pulling them towards the regression surface. The set of cases satisfying these criteria are the pathway cases. Within this set, one should choose the one case with the largest difference between the residuals of the reduced and the full model (Gerring, 2007b: 242-243).9 After having selected the appropriate case, one proceeds with a process tracing analysis of the X/Y relationship of interest. As we explained above, this can be done in an exploratory fashion, i.e., by discerning how X actually produces Y, or deductively, which requires theorizing in advance of the within-case analysis about competing causal mechanisms (George and Bennett, 2005: Ch. 10). As a matter of fact, the pathway case technique is a formalization of a longstanding argument about the analysis of deviant cases. When a variable previously identified in an outlier is added to a model, the case should be a typical case in the expanded model. The reason is that a variable that is relevant at the cross-case level should add explanatory power so that the cases move closer to the regression surface on average. The pathway case technique takes the opposite starting point. It drops a variable from a model that seems wellspecified and then picks the case that is most deviant in the reduced model. Because of the intimate link between the pathway case procedure and the established deviant-case analysis, our criticism of the former automatically applies to the latter as well in ways that we detail at the end of this section. The rationale for the pathway case technique, and case selection based on regression analysis more generally, is that the residuals capture the causal effect of unmeasured variables. With respect to the pathway case, this means that the difference between the fullmodel and reduced-model residual can be attributed to the variable that is dropped from the model. We argue that this interpretation of differences in residuals is misleading because it ignores the adverse statistical effects of omitting variables in regression estimation. In this context, we want to emphasize that we do not claim to provide innovative statistical insight because the statistics on which our critic is based is basic in quantitative analysis. Instead, our argument is that in the realm of mixed-method designs, the seemingly intuitive manipulation of regression analysis through the lenses of case study analysis is fallacious because of inherent incompatibilities of quantitative and qualitative research. In the best of all worlds, the included independent variables are orthogonal, that is, completely uncorrelated. In practice, however, this is almost never the case (Gujarati, 2004: 9 The formula summarizing the procedure is: Pathway = |Resreduced-Resfull|, if |Resreduced| > |Resfull| (Gerring, 2007b: 243). 8 513), so we discuss the more realistic case of multicollinear independent variables here.10 Multicollinearity is present when some variance of an independent variable can be modeled through a linear combination of the other independent variables. The presence of multicollinearity is the rule in multivariate analysis and it is the more severe, the larger the number of independent variables. With respect to case selection, collinearity is a two-fold problem. First, the estimated coefficients in the full model may change in size and may even switch signs as compared to the identical model with no multicollinearity (Fox, 1991: 11). Since the choice of pathway case also depends on the accuracy of the full-model residuals, it is obvious that this problem undermines case selection. Second, the causal effect of the independent variable one drops from the full model is not fully absorbed by the residuals. The stronger is the degree of multicollinearity, the larger the share of the causal effect that will be absorbed by the other independent variables. Technically seen, the omission of a multicollinear variable from a model renders the estimators of the remaining independent variables biased and inconsistent (Gujarati, 2004: 510-511). As a consequence of that, the estimated reduced-model coefficients are systematically different from the true coefficients that we need to know in order to obtain meaningful residuals. More specifically, the reduced-model residuals capture the causal influence of the eliminated independent variable plus a specification error with ambiguous effects on the regression output. Thus, case selection takes place under uncertainty about the interpretability of the reduced-model residuals and is prone to faulty case selection. Because of these problems of collinearity, we argue that the pathway procedure is inherently unable to be useful for what it has been designed.11 While multicollinearity is the more severe problem, we want to add that the pathway approach is questionable even if the variables are orthogonal. In this constellation, the good thing is that the estimation of the full model is not undermined and the estimators of the reduced-model coefficients are unbiased. However, the estimator of the intercept of the regression surface remains biased (Gujarati, 2004: 511). Since an accurate estimation of the intercept is as important as the correct estimation of the coefficients for the identification of pathway cases, it can be seen that the omission of a variable is a general problem. 10 One may argue that our critique is somewhat unfair because multicollinearity is a specification problem that is not characteristic for a “true model”, the identification of which is considered a prerequisite for the pathway technique. However, we do not see much value in discussing a technique that is too detached from real-world empirical analyses because multicollinearity is a pervasive problem. Thus, we evaluate the pathway approach in the presence of multicollinearity. 11 On a more general level, the implications of multicollinearity show that case selection is model dependent, which is what we address in more detail in the following section. 9 Ultimately, the problem for case selection is that one may choose a wrong case. We define a wrong case as a case the observed status of which is different from the true status. This means a truly typical case appears as deviant and vice versa. Because of such discrepancies, one may select a true outlier for pattern-matching because of the belief that the case is typical. Similarly, it is conceivable that a truly typical case is selected for an exploratory within-case analysis searching for omitted variable. When a wrong case is chosen, process tracing will be based on false premises and may undermine the generation of valid causal inferences. Of course, not all cases have the wrong status in the pathway procedure. However, there is some unknown potential for erroneous case selection when the estimated model is not the correct one, thus introducing uncertainty in the validity of case selection. At the beginning of this section we explained that the pathway procedure takes the reverse view on an established deviant-case argument. According to this, an outlier should become a typical case when the model is expanded by a variable that has been identified earlier in the within-case analysis of an outlier. We believe that this perspective is deficient, too, because a decreasing residual should not come as a surprise. In general, more variables tend to capture more variance on the outcome and the cases are closer to the regression surface on average. An increase in model-fit may be spurious because the variable that is relevant in the deviant case may be non-systematic in the whole set of cases. Thus, it is essential to run the appropriate diagnostics for overfitting like Hausman specification tests on the expanded model. If the originally omitted variable is indeed systematic, the test results for overfitting should be negative. This finding can be strengthened even further by running tests for underfitting on the original model. If these tests are negative and the test for overfitting on the expanded model is positive, there is good reason to believe that the within-case evidence is particular to the outlier. An additional problem we see with the outlier approach corresponds closely with our critique of the pathway case. When a systematic variable is missing in the original model, the residuals carry the effect of this variable as well as a specification error. A case with a large residual may also be a theoretical outlier and therefore suitable for an exploratory within-case analysis. However, it is also conceivable that the residual is an artifact deriving from a misspecified regression model producing misleading coefficients. In this instance, the case is theoretically typical, not atypical, and it only appears as a statistical outlier because of the adverse effects of the ignored variable on regression estimation. If one would pick such a case for the search of omitted variable, one selects a wrong case and performs process tracing on 10 false premises because the case actually is a theoretical onlier. To conclude, we believe that the traditional deviant-case perspective and the pathway technique are less appropriate for what they were developed than is currently argued in the literature. 4. The model-dependence of case selection The particular problem of the pathway technique and the deviant case analysis is that cases may not have the status they should have because of an inherent misspecification of the model. In this section, we generalize this argument by highlighting the model-dependence of case selection. Put simply, the problem can be summarized as one of “how the model you choose affects the cases you get”. We base our discussion in this section on the analysis of welfare state expenditure data. The analyses serve to illustrate our methodological claims. They do not imply any substantive claims on welfare state development or the veracity of the models we estimate. Assume that we aim to assess the explanatory power of a simple linear-additive model having as independent variables the share of elderly people (65 years and older), Gross Domestic Product (GDP), and the share of unemployed people. The outcome we want to explain is welfare state expenditure as a share of the GDP. The dataset comprises observations for 21 OECD member states for the years 1980, 1990, and 2000, totaling 63 observations. We estimate two models: one taking logged welfare state expenditure as the dependent variable and one taking the non-transformed data as the outcome. The logged expenditure model performs somewhat better according to the AIC and BIC (output not reported here), so a “mindless quantitative researcher” (Beck, 2006) would select this model without running any diagnostics. Afterwards, all cases within one standard deviation above or below the regression line are classified as typical.12 While the model performs more or less well when using the transformed dependent variable, it is a contestable choice. The original data yields better results in the Shapiro-Wilk W test for the normal distribution of the dependent variable. The pscore for the logged data is .11 and the result for the ordinary data is .60.13 In this respect, the model drawing on the non-transformed data is superior. The result for the transformed data is not significant at the conventional .05 level, which would be a strong sign for a non-normal distribution. It is, however, sufficiently close to create strong doubts about the suitability of 12 The assignment of cases to the set of typical and deviant cases evidently depends on where the boundary is drawn. The sensitivity of case classification to the specification of the dividing line is an important issue and affects case selection too. However, we cannot pursue this topic further here. 13 Of course, our simple model suffers from additional specification problems. Recall that the example is purely illustrative. 11 the transformed data and to consider the non-transformed model the better one (leaving aside all other specification issues). The decision between the original and transformed data is important because of the status of a case as typical and deviant may be sensitive to the model one estimates.14 Table 2 details the cases that are outliers in the two models.15 Cases that are deviant in both analyses are printed bold. All cases that are not bold are outliers only in one of the two models. Cases that are not included at all are typical irrespective of the data used. Table 1: Sensitivity of case selection to the estimated model deviant cases exp model Denmark, 1980 Finland, 1990 Greece, 1980 Ireland, 1990 Ireland, 2000 Japan, 1990 Japan, 2000 The Netherlands, 1980 New Zealand, 1990 Sweden, 1980 Sweden, 1990 Sweden, 2000 Switzerland, 1990 Uuited States, 1990 deviant cases logexp model Austria, 1980 Denmark, 1980 Finland, 1990 Greece, 1980 Ireland, 1990 Ireland, 2000 Japan, 1990 Japan, 2000 The Netherlands, 1980 New Zealand, 1990 Portugal, 1980 Spain, 2000 Sweden, 1980 Sweden, 1990 Switzerland, 1990 Uuited States, 1990 We call those cases that have the same status in both regressions robust cases because their classification is insensitive to the estimated model. 13 out of 16 outliers in the regression operating with the transformed data are robust deviant cases, denoting that they are outliers in both estimated models. Consequently, all cases that are listed in neither column of the table are robust typical cases. Furthermore, it can be seen that four cases are non-robust. Austria and Portugal in 1980 and Spain in 2000 are outliers in the wrong model and typical in the correct regression drawing on the non-transformed data. The reverse constellation holds for Sweden in 2000. Similarly to what we have discussed above, the presence of non-robust cases opens the floor for the choice of wrong cases, that is, cases having the wrong status given the research question at hand. On the one hand, three of sixteen cases appear as deviant in the wrong 14 See Deken and Kittel (2006) for a treatment of the (lack of) sensitivity of panel regression results for the type of welfare state expenditure data one uses. 15 The regression output is not reported here because it is irrelevant for our point. 12 regression while they are typical in the correctly specified one. This means that there is a chance of nearly twenty percent that one will select a wrong case (presuming that cases are randomly chosen, cf. Lieberman, 2005: 446-448). On the other hand, Sweden in 2000 appears as typical in the logged model, but actually should be classified as deviant because this is the case’s status in the non-transformed model. Since there is only one out of 47 typical cases that should not be selected for a typical-case analysis because of an incorrect status, the chance of committing a wrong case selection is rather small in our hypothetical example. In practice, however, one is often confronted with multiple and probably more severe specification problems affecting the proper estimation of the regression surface and the valid identification of cases. In some instances, it may be easy to make the correct specification decision, while it will be a rather ambiguous endeavor to determine the most appropriate form of the model in other cases. As a matter of fact, it is often not possible to single out one model as the unequivocally superior one (Bartels, 1997; Ho et al., 2007; Kittel, 1999; Kittel and Winner, 2005). Thus, we believe that the model-dependence of case selection is a pervasive problem that is currently largely neglected in the methodological literature and empirical research selection cases from regression analyses. While model-dependence is a widespread phenomenon, we also argue in the next section that there is a way to diminish this problem. 5. A robust procedure for the choice of cases The model-dependence of case selection is a problem because the expectations with which we approach a case crucially hinges on whether a case is statistically typical or deviant – or a ‘pathway case’, for that matter. Thus, it is essential to maximize the confidence in the accuracy of the residuals and the classification of cases. We argue that this can be achieved by determining the sets of robust typical and deviant cases and by picking a case from this set. The precise case selection procedure one should follow depends on whether the estimated models differ from each other statistically or with respect to the included variables.16 There generally is a variety of ways to estimate a model containing the same independent variables. For example, panel data can be estimated with or without a lagged dependent variable, with or without panel-corrected standard errors, and so on (cf., Beck and Katz, 1995; Kittel, 1999; Kittel and Winner, 2005). When a case is an onlier independently of the applied estimation techniques, we can be as certain as possible that this case is appropriate for theory-generating or theory-testing process tracing. A similar argument applies to robust 16 Robust case selection is no solution to the problems of the pathway technique. This approach presumes that one has identified the true model, which is difficult enough and what we do not assume because then there is no need for the robust choice of cases. The problems of the pathway procedure are inherent to this approach and cannot be fully resolved. 13 deviant cases, i.e., cases that are outliers in whatever way the model is estimated. We can trust in the suitability of the case for the search of omitted variables when it is an outlier under all variants of estimation techniques. Robust case selection works best when the correctly specified model is among the estimated models (presuming that there is one). Of course, one generally does not know the true model because in that case one would simply estimate it and select an onlier for process tracing.17 However, it is often possible to identify a set of model specifications that appear to be the most appropriate without being able to identify a single model as the unambiguously superior one. The failure to determine the correct model is not a major problem for case selection as long as it is one of the estimated models, since then the true model contributes to the identification of the robust cases. If the correct model is run, the set of robust typical and robust deviant cases will be equal or, more likely, a subset of the sets of typical and deviant cases one would derive from the true model. The identified robust cases are only a subset of the true sets – even if we cannot identify the true model yet – because some cases may have a wrong status in one of the incorrect models. Consequently, these cases appear as non-robust and will be excluded from the empirical analysis. Figure 2 visualizes this argument with a hypothetical example involving the true model and an alternative model that is misspecified. Figure 1: Robust case selection when the true model is estimated True model Alternative model Joint perspective Outliers Robust outliers True outliers Non-robust cases True onliers Onliers Robust onliers Outliers Non-robust cases Sample of cases 17 The analysis of outliers would be fruitless because we know that there are no omitted variables. Nevertheless, theory-oriented process tracing is useful, since in most studies multiple causal processes are compatible with the same cross-case evidence (George and Bennett, 2005: Ch. 10). 14 Since we do not know which of the two models is the correct one, we estimate both and determine the groups of robust and non-robust cases. As can be seen, the number of robust typical cases is smaller than true number of onliers. At the same time, the set of robust outliers is a subset of the true set of deviant cases. Consequently, there are also some nonrobust cases that have a different status in the wrong and the correct model. In sum, the number of robust onliers and outliers is smaller than the number of truly typical and deviant cases. What matters, however, is that all robust cases have the same status as in the true model. This ensures that one will select the right cases for pattern-matching and the search for omitted variables, respectively. Moreover, figure 2 shows that one generalizes to fewer cases when one implements the robust case selection procedure. The causal inferences generated in pattern-matching and deviant case analyses are not generalized to non-robust cases because one does not know to which of the groups a non-robust case belongs. In this view, our approach is conservative when it comes to the generalization of causal inferences. We think that this is a beneficial aspect of robust case selection because it avoids the overgeneralization of causal insights (cf., Collier and Mahoney, 1996). This discussion implies that our robust procedure may result in the choice of a wrong case when the true model has not been estimated. The identified robust cases may be identical to the set one would obtain if the correct model would have been estimated too. Yet, it is more likely that the group of robust cases is too large inasmuch it includes some cases that have a different status in the true model and that would be excluded as non-robust if it would have been estimated. Since this has not been done, there is a certain probability that one selects a wrong case even if one picks it from the set of robust onliers or outliers. However, as long as the results of the estimated models are not substantially different from the true regression output, a considerable share of the robust cases is likely to have the true, yet unknown status as under the true model. Figure 3 captures this scenario. In this hypothetical example, we estimate two models that are misspecified in some respect and additionally fail to run the correct model. The nonestimation of the true model is a problem because a small set of cases that are truly deviant appear as robust typical when taking a joint perspective on the cases’ status in the wrong models. Thus, there is some potential for the choice of a wrong case. 15 Figure 2: Robust case selection when the true model is not estimated Wrong model 1 Wrong model 2 True model Outliers Outliers Onliers Outliers Onliers Onliers Cases without true model Cases with true model Robust outliers Robust outliers Nonrobust Robust onliers Nonrobust Robust onliers Sample of cases In principle, robust case selection is equally applicable to models that differ with respect to the included variables. However, some additional remarks are in order when one confronts the situation that is at the heart of the pathway technique and the classic deviant case that we discussed in section 3. The identification of robust typical cases can be performed as described above when the variable of interest is not the variable distinguishing the estimated models from each other. Independently of whether the smaller or expanded model is the superior one, the inferences generated in process tracing are likely to apply to all robust typical cases. As explained above, non-robust cases should be ignored when the estimated models exhibit divergent statistical specifications. In contrast to this, cases that lack robustness are useful targets for within-case analysis when the models differ with respect to the included variables. More specifically, cases that are outliers in the smaller model and onliers in the expanded model are suitable for the search of omitted variables. On a general level, we thus agree with the pathway procedure and the deviant-case argument. However, observing a change of status from deviant to typical alone is a rather weak basis for making a decision about whether to expand the smaller model or not. As explained above, more variables tend to capture more variance on the outcome. Thus, it is intuitive that cases move closer to the regression surface when the model is expanded even if no systematic variable has been omitted from the original model. 16 In a similar vein, it would be fallacious to infer from the robust deviance of a case that no variable has been omitted from the reduced model. Outliers do not necessarily move closer to the regression surface when a systematic variable has been omitted. Two reasons may account for robust deviance when a systematic variable is added to the model. First, one may have ignored two (or more) variables some of which go undetected in an exploratory withincase analysis. Second, a non-systematic variable may exert a strong effect and keep the case away from the regression surface in the expanded model. In sum, we contend that one should not make strong claims from non-robustness or robust deviance. If one decides on the basis of a deviant case analysis to add a variable to a model, it is mandatory to run the appropriate diagnostics for underfitting on the original model and tests for underfitting and overfitting for the expanded equation. This is the point where our approach is sharply distinct from the existing approaches that are solely based on the (non-)observation of a shrinking residual. Of course, the diagnostics for underfitting and overfitting should be always applied in regression analysis so as to check whether the model is too narrow, too broad, or both. In the light of how the inspection of outliers is treated in the literature, however, we deem it particularly necessary to emphasize that process tracing and the change from deviant to robust, or the lack thereof, are a viable substitute for regression diagnostics. 6. Conclusions The choice of cases on the basis of a regression analysis is the oldest and most widely accepted way to combine small-n and large-n research. We have shown that the intuitively plausible perspective on case selection is methodologically deficient. The basis for all problems is the impossibility to distinguish between systematic and non-systematic variables in process tracing. Because of this, the validity of case selection fully hinges on the quality of the estimated residuals. This is a problem because one often estimates a range of plausible models that perform equally well in the regression diagnostics. Just as the regression output may be model dependent, so is the classification of cases as typical and deviant. Moreover, we have demonstrated that the existing perspective on the analysis of outliers and the recently suggested pathway technique is flawed in several respects. As a solution to both problems, we proposed the choice of robust cases and the systematic application of tests for underfitting and overfitting. We believe that case selection in regression analysis will improve when our guidelines are followed. 17 References Achen, Christopher H. (2005): Two Cheers for Charles Ragin. Studies in Comparative International Development 40(1):27-32. Adcock, Robert, and David Collier (2001): Measurement Validity: A Shared Standard for Qualitative and Quantitative Research. American Political Science Review 95(3):529-546. Bäck, Hanna, and Patrick Dumont (2007): Combining Large-N and Small-N Strategies: The Way Forward in Coalition Research. West European Politics 30(3):467-501. Bartels, Larry M. (1997): Specification Uncertainty and Model Averaging. American Journal of Political Science 41(2):641-674. Beck, Nathaniel (2006): Is Causal-Process Observation an Oxymoron? Political Analysis 14(3):347-352. Beck, Nathaniel, and Jonathan N. Katz (1995): What to Do (and Not to Do) with Time-Series Cross-Section Data. American Political Science Review 89(3):634-647. Berk, Richard A. (1990): A Primer on Robust Regression. In Modern Methods of Data Analysis, edited by John Fox, and J. Scott Long, pp. 292-324. Newbury Park: Sage. Bhagwati, Jagdish N. (2002): Introduction: The Unilateral Freeing of Trade Versus Reciprocity. In Going Alone: The Case for Relaxed Reciprocity in Freeing Trade, edited by Jagdish N. Bhagwati, pp. 1-30. Cambridge, Mass.: MIT Press. Capoccia, Giovanni C., and Michael Freeden (2006): Multi-Method Research in Comparative Politics and Political Theory. Committee on Concepts and Methods working paper series, no. 9 Collier, David, and James Mahoney (1996): Insights and Pitfalls: Selection Bias in Qualitative Research. World Politics 49(1):56-91. Coppedge, Michael (1999): Thickening Thin Concepts and Theories - Combining Large N and Small in Comparative Politics. Comparative Politics 31(4):465-476. Deken, Johan De, and Bernhard Kittel (2006): Putting the Chain Saw into Social Expenditures. Retrenchment and the Problems of Using Aggregate Data. In Welfare Reform in Advanced Societies: Exploring the Dynamics of Reform, edited by Nico Siegel, and Jochen Clasen, pp. Cheltenham: Edward Elgar. Dunning, Thad (2007): The Role of Iteration in Multi-Method Research. APSA Qualitative Methods Newsletter 5(1):22-24. Ebbinghaus, Bernhard (2005): When Less Is More: Selection Problems in Large-N and SmallN Cross-National Comparisons. International Sociology 20(2):133-152. Eckstein, Harry (1975): Case Study and Theory in Political Science. In Strategies of Inquiry. Handbook of Political Science, Vol. 7, edited by Fred I. Greenstein, and Nelson W. Polsby, pp. 79-137. Reading, Mass.: Addison-Wesley. Falkner, Gerda (2007): Time to Discuss: Data to Crunch or Problems to Solve? A Rejoinder to Robert Thomson. West European Politics 30(5):1009-1021. Fox, John (1991): Regression Diagnostics. Newbury Park: Sage. Geddes, Barbara (2003): Paradigms and Sand Castles: Theory Building and Research Design in Comparative Politics. Ann Arbor: University of Michigan Press. George, Alexander L., and Andrew Bennett (2005): Case Studies and Theory Development in the Social Sciences. Cambridge, Mass.: MIT Press. 18 Gerring, John (2007a): The Case Study Method: Principles and Practices. Cambridge: Cambridge University Press. Gerring, John (2007b): Is There a (Viable) Crucial-Case Method? Comparative Political Studies 40(3):231-253. Gerring, John (2007c): The Mechanismic Worldview: Thinking inside the Box. British Journal of Political Science 38:161-179. Gowa, Joanne, and Edward D. Mansfield (1993): Power Politics and International Trade. American Political Science Review 87(2):408-420. Gowa, Joanne, and Edward D. Mansfield (2004): Alliances, Imperfect Markets, and MajorPower Trade. International Organization 58(4):775-805. Gujarati, Damodar N. (2004): Basic Econometrics. Toronto: McGraw-Hill. Ho, Daniel E., Kosuke Imai, Gary King, and Elizabeth A. Stuart (2007): Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis 15(3):199-236. King, Gary, Robert O. Keohane, and Sidney Verba (1994): Designing Social Inquiry: Scientific Inference in Qualitative Research. Princeton: Princeton University Press. King, Gary, and Langche Zeng (2007): When Can Be History Be Our Guide? The Pitfalls of Counterfactual Inference. International Studies Quarterly 51:183-210. Kittel, Bernhard (1999): Sense and Sensitivity in Pooled Analysis of Political Data. European Journal of Political Research 35(4):533-558. Kittel, Bernhard, and Hannes Winner (2005): How Reliable Is Pooled Analysis in Political Economy? The Globalization-Welfare State Nexus Revisited. European Journal of Political Research 44(2):269-293. Kuha, Jouni (2004): Aic and Bic: Comparisons of Assumptions and Performance. Sociological Methods & Research 33(2):188-229. Lieberman, Evan S. (2003): Race and Regionalism in the Politics of Taxation in Brazil and South Africa. Cambridge: Cambridge University Press. Lieberman, Evan S. (2005): Nested Analysis as a Mixed-Method Strategy for Comparative Research. American Political Science Review 99(3):435-452. Lijphart, Arend (1968): The Politics of Accommodation: Pluralism and Democracy in the Netherlands. Berkeley: University of California Press. Lijphart, Arend (1971): Comparative Politics and the Comparative Method. American Political Science Review 65(3):682-693. Mahoney, James, and Gary Goertz (2006): A Tale of Two Cultures: Contrasting Quantitative and Qualitative Research. Political Analysis 14:227-249. Mahoney, James, and Dietrich Rueschemeyer (2003): Comparative Historical Analysis in the Social Sciences. Cambridge: Cambridge University Press. Rogowski, Ronald (1995): The Role of Theory and Anomaly in Social-Scientific Inference. American Political Science Review 89(2):467-470. Rohlfing, Ingo (2008): What You See and What You Get: Pitfalls and Problems of Nested Analysis in Comparative Research. Comparative Political Studies 19 Rueschemeyer, Dietrich (2003): Can One or a Few Cases Yield Theoretical Gains? In Comparative Historical Analysis in the Social Sciences, edited by James Mahoney, and Dietrich Rueschemeyer, pp. 305-332. Cambridge: Cambridge University Press. Simon, Herbert A. (1954): Spurious Correlation: A Causal Interpretation. Journal of the American Statistical Association 49(267):467-479. Symposium (2007): Multi-Method Work, Dispatches from the Front Lines. APSA Qualitative Methods Newsletter 5(1):9-28. Tarrow, Sidney (1995): Bridging the Quantitative-Qualitative Divide in Political-Science. American Political Science Review 89(2):471-474. Thomson, Robert (2007): Time to Comply: National Responses to Six Eu Labour Market Directives Revisited. West European Politics 30(5):987-1008. Western, Bruce (1995): Concepts and Suggestions for Robust Regression Analysis. American Journal of Political Science 39(3):786-817. 20