WORKING PAPER SERIES Controlling for Heterogeneity in Gravity Models of Trade I-Hui Cheng Howard J. Wall Working Paper 99-010A http://www.stls.frb.org/research/wp/99-010.html February 1999 FEDERAL RESERVE BANK OF ST. LOUIS Research Division 411 Locust Street St. Louis, MO 63102 The views expressed are those of the individual authors and do not necessarily reflect official positions of the Federal Reserve Bank of St. Louis, the Federal Reserve System, or the Board of Governors. Controlling for Heterogeneity in Gravity Models of Trade I-Hui Cheng Birkbeck College, University of London Howard J. Wall Federal Reserve Bank of St. Louis February 1999 This paper argues that it is necessary to allow for country-pair heterogeneity when using the gravity model to estimate international trade flows. We propose and estimate a fixed-effects model that eliminates the heterogeneity bias inherent in standard methods. Further, we show that there is no statistical support for the restrictions necessary to obtain existing empirical models, which are special cases of our model. Because the gravity model has become the ‘workhorse’ baseline model for estimating the effects of international integration, this has important empirical implications. In particular, our results suggest that standard gravity estimates of the effects of integration can differ a great deal from what is obtained when heterogeneity is accounted for. (JEL F15, F17) Corresponding author: Howard J. Wall, Research Division, Federal Reserve Bank of St. Louis, P.O. Box 442, St. Louis, MO 63166-0442, United States E-mail: wall@stls.frb.org; Phone: (314)444-8533; Fax: (314)444-8731 We would like to thank Ron Smith for his insightful and helpful suggestions. We are also grateful for comments from the participants at the Midwest International Economics Conference at Purdue University, May 1999. The views expressed are those of the authors and do not necessarily represent official positions of the Federal Reserve Bank of St. Louis, nor of the Federal Reserve System. Controlling for Heterogeneity in Gravity Models of Trade I-Hui Cheng and Howard J. Wall 1. Introduction Starting in the 1860s when H. Carey first applied Newtonian Physics to the study of human behavior, the so-called “gravity equation” has been widely used in the social sciences. More recently, gravity model studies have achieved empirical success in explaining various types of inter-regional and international flows, including labor migration, commuting, customers, hospital patients, and international trade.1 The widespread use of gravity equations is despite the fact that they have tended to lack strong theoretical bases. The gravity model of international trade was developed independently by Tinbergen (1962) and Pöyhönen (1963). In its basic form, the amount of trade between two countries is assumed to be increasing in their sizes, as measured by their national incomes, and decreasing in the cost of transport between them, as measured by the distance between their economic centers.2 Following this work, Linnemann (1966) included population as an additional measure of country size, employing what we will call the augmented gravity model.3 It is also common to instead specify the augmented model using per capita income, which captures the same effects.4 Whichever specification of the augmented model is used, the purpose is to allow for non1 See Sen and Smith (1995) for a survey. For recent examples of the basic gravity model see McCallum (1995), Helliwell (1996), and Boisso and Ferrantino (1997). 3 For recent uses of the augmented gravity model with population see Oguledo and MacPhee (1994), Boisso and Ferrantino (1997), and Bayoumi and Eichengreen (1997). 4 Examples of the augmented model with per capita income include Sanso, Cuairan, and Sanz (1993), Frankel and Wei (1998), Frankel, Stein, and Wei (1995,1998), Eichengreen and Irwin (1998). 2 1 homothetic preferences in the importing country, and to proxy for the capital/labor ratio in the exporting country (Bergstrand, 1989). The gravity model has been used widely as a baseline model for estimating the impact of a variety of policy issues, such as regional trading groups, political blocs, patent rights, and various trade distortions.5 Typically, these events and policies are modeled as deviations from the volume of trade predicted by the baseline gravity model, and, in the case of regional integration, are captured by dummy variables. The recent popularity of the gravity model is highlighted by Eichengreen and Irwin (1997, p.33) who call the it the “workhorse of empirical studies of (regional integration) to the virtual exclusion of other approaches.” This is despite the fact that, as Deardorff (1984) points out, most early papers were ad hoc rather than being based on theoretical foundations. Exceptions to this include Anderson (1979), Bergstrand (1985), Hummels and Levinsohn (1995), Deardorff (1998), and Feenstra, Markusen, and Rose (1998), whose models are consistent with the gravity model. See also Evenett and Keller (1998) who, along with Deardorff (1998), evaluate the usefulness of gravity models in testing alternative theoretical models of trade. The recent flurry of theoretical work has led Frankel (1998, p.2) to say that the gravity equation has “gone from an embarrassment of poverty of theoretical foundations to an embarrassment of riches.” The perceived empirical success of the gravity model has come without a great deal of analysis regarding its econometric properties, as its empirical power has usually been stated 5 See Aitken (1973), Brada and Mendez (1983), Bikker (1987), Sanso, Cuairan, and Sanz (1993), Oguledo and MacPhee (1994), McCallum (1995), Helliwell (1996), Wei and Frankel (1997), Bayoumi and Eichengreen (1997), Mátyás (1997), Frankel and Wei (1998), Frankel, Stein, and Wei (1998), and Smith (1999). 2 simply on the basis of goodness of fit; i.e. a relatively high R 2 .6 The lack of attention paid to the empirical properties of the model is despite the fact that the strength of any baseline model lies in the accuracy of its estimates. The aim of this paper is to begin to fill the gap regarding the empirical estimation of gravity models of trade. In particular, we demonstrate that standard methods for estimating the gravity model produce biased estimates, tending to overestimate trade between low-trade countries, and to underestimate it between high-trade countries. We argue that the primary source of this bias is the failure of standard methods to account for the pairwise heterogeneity of bilateral trade relationships. The solution we propose uses simple panel data methods to allow for the intercepts of the gravity equation to be specific to each trading pair. We demonstrate how with this empirical model the correlation between the residuals and the volume of trade disappears. To illustrate the empirical significance of our findings, we apply our model to the question of the effects of regional integration on trade volumes. We find that standard methods find a strong negative relationship between membership in a trading bloc and intra-bloc trade, whereas this counterintuitive finding is eliminated when heterogeneity is controlled for. Section 2 briefly sets out the various statistical models we examine. Section 3 presents standard empirical results for the basic and augmented gravity models, and illustrates the inherent estimation bias. In Section 4 we offer a solution to this, namely using a panel with fixed effects to estimate the bilateral trade relationship. In Section 5 we compare our model to alternatives 6 See Sanso, Cuairan, and Sanz (1993) for an examination of the predictive power of various specifications of the augmented gravity model. Also see Oguledo and MacPhee (1994) for a survey of pre-1990 empirical results. 3 which also control for heterogeneity. Section 6 illustrates the importance of controlling for heterogeneity when estimating the effects of trade blocs. Concluding remarks are provided in Section 7. 2. A Statistical Overview This section briefly sets out the various forms of gravity models that have been constructed to estimate bilateral trade flows. These models can be considered as restricted versions of a general gravity model, which has a log-linear specification,7 but places no restrictions on the parameters. In the general model, the volume of trade between countries i and j in year t can be characterized by ln X ijt = α 0 + α t + α ij + ′ijt Z ijt + ε ijt , t = 1,…,T; (1) where X ijt is exports from country i to country j in year t, and Z ′ijt = [ z it z jt ... ] the 1 × k row vector of gravity variables (GDP, population, and distance). The intercept has three parts, one which is common to all years and country pairs, α 0 , one which is specific to year t and common to all pairs, α t , and one which is specific to the country pairs and common to all years, α ij . The disturbance term ε ijt is assumed to be normally distributed with zero mean and constant variance for all observations, i.e. ε ijt ~ IN (0, σ t2 ) , E (ε ijt , ε ij′t ) = 0 and E (ε ijt , ε ijt −1 ) = 0 . It is also assumed that the disturbances are pairwise uncorrelated. Obviously, because (1) has only one observation, it is not useful for estimation unless 7 Sanso, Cuairan, and Sanz (1993) conclude that the log-linear specification, while not optimal, is a fair and ready approximation of the optimal form. 4 restrictions are imposed on the parameters. The standard single-year cross section model (CS) imposes the restrictions that the slopes and intercepts are the same across country pairs; i.e. that α ij = 0 and = ijt t , ln X ijt = α 0 + α t + ′t Z ijt + ε ijt , t = 1, ..., T; (CS) where α 0 and αt cannot be separated. Assuming that all the classical disturbance-term assumptions hold, the CS model is estimated by ordinary least squares (OLS) for each year. The other standard estimation method is a pooled cross-section model (PCS), which imposes the further restriction on the general model that the parameter vector is the same for all t, 1 = 2 = ... = T = , although it normally allows for the intercepts to differ over time; ln X ijt = α 0 + α t + ′Z ijt + ε ijt , t = 1, ..., T. (PCS) This is estimated by OLS using data for all available years. Virtually all estimates of the gravity model of trade use either the CS or the PCS model, which, as we show below, provide biased estimates. We attribute this to heterogeneity bias due to the restriction that the parameters are the same for all country-pairs. To address this, we remove the restriction that the country-pair intercept terms equal zero, although we maintain the restriction that the slope coefficients are constant across country pairs and over time. Specifically, we consider a fixed effects model (FE) ln X ijt = α 0 + α t + α ij + ′Z ijt + ε ijt , t = 1, ..., T. (FE) Note also that in the FE model the country-pair effects are allowed to differ according to the direction of trade, i.e. αij ≠ α ji . The FE model is a two-way fixed effects model in which the 5 independent variables are assumed to be correlated with α ij , and is a classical regression model which can be estimated using OLS. Bayoumi and Eichengreen (1997) and Mátyás (1997) have also proposed models to handle country-pair heterogeneity, each of which can be modeled as a restricted version of the FE model. In the Bayoumi and Eichengreen (BE) model the differences in the dependent and independent variables are used to eliminate the fixed variables, including the country-pair dummies and distance. Specifically, ∆ ln X ijt = γ 0 + γ t + ∆Z ijt + µ ijt , t = 1, ..., T; (BE) ZKHUH LVWKHGLIIHUHQFHRSHUDWRUDQG γ 0 + γ t = α t − α t −1 . In this model the intercept has two parts: γ 0 is the change in the period-specific effect that is common across years, and γ t is the change that is specific to year t. As is well known, when there are no time dummies, such a differencing model should yield results identical to a model with dummy variables to control for fixed effects. However, with time dummies it is necessary to impose restrictions on the time effects so as to avoid collinearity, which in turn makes the BE model a restricted form of the FE model. If the collinearity restriction is that the first time dummy in the BE model is equal to zero, this is equivalent to restricting the common component of the change in the period-specific effects as equal to the difference in the first two period-specific effects, i.e. γ 0 = α 2 − α1 . If instead the collinearity restriction is that the sum of the time dummies in the BE model is zero, this is equivalent to restricting the common component as equal to the difference between the first and last time dummies, i.e. γ 0 = α T − α1 . 6 Mátyás (1997) proposes ln X ijt = α 0 + α t + θ i + ω j + ′Z ijt + ε ijt , t = 1, ..., T; (M) as the correct specification of the gravity model, where the country-specific effect when a country is an exporter is θ i , and when it is an importer is ω j . Note also that in this specification, distance, contiguity, and language are eliminated because they are fixed over time, even though they are not collinear with the country-specific effects. This model is a special case of the FE model in that it imposes arbitrary restrictions on the country-pair effects; i.e. because α ij = θ i + ω j and α ik = θ i + ω k ; it must also be true that α ij = α ik − ω k + θ j . These cross-pair restrictions do not change the coefficient estimates, but instead lead to odd residuals, and greatly inaccurate predictions of trade flows. 3. An Overview of the Standard Empirical Results This section presents regression results for the basic and the augmented versions of the standard empirical models, CS and PCS. The data set is a balanced panel with 2110 observations, and includes countries with many different levels of economic development and performance for the period 1991-1995. It includes observations on exports from 22 countries to 116 destination countries, although we do not have data on all possible pairs. Descriptions of the data and their sources are provided in the Data Appendix. 3.1. Single-year cross-section data In the augmented version of the gravity model, the gravity variables are the countries’ 7 GDPs, their populations, and the distance between them. Thus, the augmented CS model (CSa) assumes that in a given year trade flows from exporting country i to importing country j can be estimated using: 8 ln X ij = α + β1 ln Yi + β 2 ln Y j + β 3 ln N i + β 4 ln N j + δ1 ln Dij + δ 2 C ij + λLij + ε ij ; (2) where Yi and Y j are the two countries’ GDPs; N i and N j are their populations, Dij is the distance between their economic centers (their capital cities); Cij is a contiguity dummy; and Lij is a common-language dummy. As trade flows are expected to be positively related to national incomes, and negatively related to distance, it is expected that β1 , β 2 , and δ 2 are positive, and that δ1 is negative. Also, estimation typically yields a negative sign for β 3 , which would indicate that exported goods tend to be capital-intensive. It is also common to obtain a negative sign for β 4 , which would indicate that traded goods tend to have income-elastic demands. Finally, because Lij is meant to capture cultural and historical similarities between the trading SDLUVZKLFKDUHWKRXJKWWRLQFUHDVHWKHYROXPHRIWUDGH LVH[SHFWHGWREHSRVLWLYH The basic version of the gravity model does not include the populations of the two countries, so it can be viewed as a special case of the augmented model in which the coefficients on population are restricted to zero. Thus, the basic CS model (CSb) assumes that bilateral trade can be estimated with the following regression: ln X ij = α + β1 ln Yi + β 2 ln Y j + δ1 ln Dij + δ 2 C ij + λLij + ε ij . The expected signs for the coefficients are as in the augmented model. Note that because ln ( per capita incom e i ) = ln Yi − ln N i , the regression could be suitably rearranged to instead obtain the augmented model with per capita income. 8 8 (3) Table 1 reports the results for the five yearly cross-sections of the CSa and CSb models. For each year, the coefficients on the GDPs of the two countries are statistically significant and have the expected sign for CSa and CSb. For CSa, only the coefficient on the destination populations has the usual negative sign, although the positive sign for the coefficient on origin population is not statistically different from zero for any year. For all years for CSa and CSb the coefficient on distance has the expected sign and is statistically significant. Despite having the expected sign, none of the coefficients on contiguity are statistically significant at the 5% level. The coefficient on the common-language dummy is positive and statistically significant for every year. Importantly for assessing the appropriateness of pooling the data over the five years, the result differ little from year to year for either version. Comparing the two versions of the CS model, for CSa, exports are less elastic with respect to origin GDP, and more elastic with respect to destination GDP. However, for none of the years is there a startling distinction between the two versions, as they have almost identical R 2 s and log-likelihoods. In fact, for none of the years does a likelihood ratio test reject the null hypothesis that the CSa and CSb models are statistically the same.9 3.2. Pooling cross-section and time-series data The other standard estimation method is to pool cross-sectional and time series data so as to increase the number of observations without greatly increasing the number of variables. The regression equation for the augmented version of the pooled cross-section model (PCSa) is: This is with a critical value of 5.99 at the 5% level, and χ 2 ( 2 ) = 2[Log-likelihood CSa − Log-likelihood CSb]. 9 9 ln X ijt = α 0 + α t + β1 ln Yit + β 2 ln Y jt + β 3 ln N it + β 4 ln N jt + δ1 ln Dij + δ 2 C ij + λLij + ε ijt ; (4) where α 0 is the portion of the intercept that is common to all years and trading pairs, and αt denotes the year-specific effect common to all trading pairs. Note that we omit the dummy for 1991 so as to avoid collinearity. We suppress the regression equation for the basic version of the model (PCSb), as it is the same as (4) except for the restriction that β 3 = β 4 = 0 . The expected signs for the coefficients are the same as for the CS models, except that the PCS models have time dummies to consider. We take the time dummies as an indicator of the extent of “globalization”, which we define as the common trend towards greater real trading volumes, independent of the sizes of the economies. The regression results for PCSa and PCSb are reported in the first two columns of Table 2. Unsurprisingly, the results are similar to those from the single-year cross-sections. The CS and PCS models yield roughly the same elasticities on GDPs and distance, and have roughly the same predictive power as measured by R 2 . Note, however, that because of the large number of observations relative to the CS models, the coefficients on the countries’ populations and on contiguity are statistically significant. Comparing the two versions of the PCS model, although PCSa and PCSb yield very similar results, a likelihood ratio test rejects the null hypothesis that they are statistically the same.10 We therefore conclude that the augmented version of the gravity model is preferred to the basic model when using a pooled cross-section. According to the estimates of the preferred PCSa model: (i) a 10% rise in a country’s 10 This is with a critical value of 5.99 at the 5% level, and χ2(2) = 14.94. 10 GDP should be associated with a 6.2% rise in its exports and an 8.5% rise in its imports, all else constant; (ii) exports tend to be labor-intensive and income elastic (luxury goods), as indicated by WKHSRVLWLYHVLJQIRU 3DQGWKHQHJDWLYHVLJQIRU 4; and (iii) a country will export 82% less to a market that is twice as distant as another otherwise-identical market, 20% more to a country that is contiguous, and 70% more to a country with the same first language. Finally, we take the fact that our time dummies are not statistically different from zero to mean that globalization, as defined above, was not an important factor in increasing trading volumes during the period. The general conclusion from our estimation of standard gravity models is that they yield stable results that do not differ a great deal over the sample period, nor between the basic and augmented versions. We also conclude that the augmented version is preferred statistically, although the basic version provides predictions of trade volumes that are nearly as accurate. Finally, the estimates we obtain are not greatly out of line with those obtained previously in the literature. One should be cautious before concluding that there are no empirical problems with these standard methods. This is clear from the upper-left panel of Figure 1, which plots the residuals for the PCSa model. The strong positive relationship between the residuals and the level of exports indicates that the PCSa model tends to underestimate the level of trade when the actual level is high, and overestimates it when the actual level is low. To our knowledge, this bias has not been recognized in the literature. Because the gravity model is used to establish baseline levels of trade, it is important to have unbiased estimates the coefficients on the gravity variables, 11 even if one is not interested in these coefficients themselves. It is difficult to argue that a model can be useful for establishing a baseline when it yields such obviously biased predictions. 4. The Model with Pairwise Heterogeneity a. The model As we describe in the previous section, standard cross-section estimates of the gravity model yield biased estimates of the volume of bilateral trade. One possible source of this bias is that heterogeneity is not allowed for by the regression equations. With such heterogeneity a country may export different amounts to two countries, even though the two export markets have the same GDPs and are equidistant from the exporter. This can be because there can be historical, cultural, ethnic, political, or geographic factors that affect the level of trade, and are correlated with the gravity variables (GDP, population, distance). If so, then estimates that do not account for these factors will suffer from heterogeneity bias. Various studies have to some extent tried to control for this by including things such as whether trading partners share a common language, have had a colonial history, are in military alliance, etc. However, cultural, historical, and political factors are often difficult to observe, let alone quantify. This is why we will control for these factors using a simple fixed-effects model that assumes that there are fixed pair-specific factors that may be correlated with levels of bilateral trade and with the right-hand-side variables. We assume that the gravity equation for a country pair may have a unique intercept, and 12 that it may be different for each direction of trade (i.e. α ij ≠ α ji ). However, we retain the assumptions of the PCS model that the slope coefficients are constant over time and across trading pairs. Our specification of the augmented gravity model with fixed effects (FEa) is: ln X ijt = αij + αt + β1 ln Yit + β 2 ln Y jt + β 3 ln N it + β 4 ln N jt + ε ijt ; (5) where α ij is the specific “country-pair” effect between the trading partners. The basic version (FEb) is the same as this, except for the constraint that β 3 = β 4 = 0 . The country-pair intercepts include the effects of all omitted variables that are cross-sectionally specific but remain constant over time, such as distance, contiguity, language, culture, etc. Using the pooled data described above, we have 422 country-pair intercepts. Because there is a long-standing problem with determining the appropriate measure of economic distance so as to capture transportation and information costs, an added benefit of the fixed effects model is that it eliminates the need to include distance in the regression. The most common method for handling distance is to do as we have above and simply measure it between the economic centers (assumed to be the capital cities) of the two countries. There are obvious problems with this, such as the implicit assumptions that overland transport costs are the same as those over sea, and that all overland/oversea distances are equally costly. To provide just one obvious example, Los Angeles is about 1300 kms further from Tokyo than is Moscow, but it is difficult to believe that the economic distance between Tokyo and Los Angeles is not much lower than that between Tokyo and Moscow. Our fixed-effects approach eliminates the need to include a distance variable, as it controls for all variables that do not change over time. 13 Another difficulty with standard measures of economic distance is the simple assumption that the capital city is a useful proxy for the economic center. While this may be useful for small countries with one major city, it is wide of the mark for countries like Canada and the US, which have major cities thousands of miles apart on different oceans, and which serve as centers for trade with completely different countries. By using Washington, DC or Ottawa to measure distance between the US or Canada and its Pacific trading partners is to overstate distance by the entire breadth of the North American continent. As the US has the highest GDP and the highest volume of trade, the mis-measure of economic distance can bias the estimation of the coefficients on the other variables in the gravity model.11 Another advantage of our approach is that it removes the problem of controlling for contiguity. Although it is clearly important, as a great deal of the trade can occur from people crossing the border to make everyday purchases, it is accounted for only sometimes. Even when it is accounted for with a dummy variable as we do above, it still assumes that all contiguity is equivalent in terms of its effect on trade. Considering that Canada and the US, China and Russia, and Argentina and Chile are all equivalently contiguous pairs, this is difficult to abide by. b. The results The middle columns of Table 2 report the estimation results for the augmented and basic versions of the fixed-effects model (FEa and FEb). Note that for comparison with the pooled 11 As a practical matter this mis-measure of distance is magnified by the fact that data sets tend to have proportionally more data on US trade. For example, in our data set 700 of the 2110 observations (33%) have the US as either the importing or exporting country. 14 cross-section results, the year dummies are measured relative to that of 1991. Also, the estimates of the country-pair intercepts are omitted for space considerations. The difference between the augmented and basic versions of the FE model is not glaring, but is nonetheless clear statistically. A likelihood ratio test rejects the null hypothesis that the two models are statistically the same.12 In other words, it rejects the restriction that the coefficients on population are zero (as in FEb). We therefore conclude that FEa is the preferred version of the FE model. According to the results for the preferred FEa model: (i) a 10% rise in a country’s GDP should be associated with a 2.9% rise in exports and a 5.2% rise in imports; (ii) exports tend to be labor-intensive, as indicated by a positive sign of origin population, and income-elastic (normal non-luxury goods), as indicated by the negative sign on destination population; and (iii) globalization increased the real volume of trade by nearly 20% between 1991 and 1995. Comparing the results of the FEa and PCSa models, allowing for trading-pair heterogeneity, as in the FEa model, lowers the estimated income elasticities of trade, greatly increases the absolute value of the coefficients on the countries’ populations, and greatly increase the estimated role of globalization. It is obvious from the results that restricting the country-pair effects to zero, as does the PCSa model, has significant effects on the results, and this is easily confirmed by a likelihood ratio test. Further, as shown by the upper-right panel of Figure 1, there is no obvious correlation between the residuals and the log of exports, indicating that the FEa model does not suffer from the estimation bias exhibited by the PCSa model. It is also obvious 12 This is with a critical value of 5.99 at the 5% level, and χ2(2) = 46.42. 15 from Figure 1 that the residuals from the FEa model tend to be much smaller than those from the PCSa model. To summarize, because the PCSa model is a restricted form of the FEa model, and the restrictions are not supported statistically, we conclude that the FEa model is the preferred specification of the gravity model. Also, the FEa model does not exhibit the obvious heterogeneity bias of the PCSa model, and is preferred on the basis of traditional measures of goodness of fit in that it provides a higher R 2 and a lower sum of squared residuals. Note though that this improved statistical performance arises from the FEa model having 422 more independent variables than does the PCSa model. Nonetheless, on the basis of having smaller values for the Akaike Information Criteria and the Amemiya Probability Criteria, which have more severe penalties for the number of parameters, the FEa model is still easily preferred. In short, there is no statistical support for imposing the parameter restrictions required by the standard procedures for estimating the gravity model of trade. In the absence of any economic arguments for believing that the intercepts of the gravity equation are the same across trading pairs, we conclude that the fixed effects model is the more appropriate specification. Oddly, Wei and Frankel (1997, p.125) reject the inclusion of country-pair dummies a priori on the basis that doing so would undermine their efforts at estimating the effects of variables that are constant over the sample period. Presumably their worry is that because these variables are subsumed into the country-pair effects they are hidden from analysis. This is unfounded because the effects of these variables are easily estimated by regressing them on the 16 country-pair effects from the FE model. Specifically, where the estimates of the 422 country-pair effects are denoted as α̂ ij , and including the log of distance and the contiguity and language dummies as independent variables, we obtain αˆ ij = 4.45 − 1.04 ln Dij + 1.01C ij − 0.51Lij . (2.57) (0.296) (1.19) (0.63) The numbers in parentheses are standard errors, and the R 2 = 0.049. According to these results, only the distance variable is a statistically significant determinant of the country-pair effects. Contiguity and a common language do not therefore appear to be important determinants of the volume of bilateral trade. Further, the low R 2 indicates that very little of the country-pair effects are explained by the variables traditionally included in the standard cross-section models. Note that these estimates are quite different from those obtained from the PCSa model, in which estimates of the effects of time-invariant factors suffer from the same heterogeneity bias as the time-variant factors. So, far from undermining estimation efforts, it is instead necessary to control for country-pair heterogeneity to obtain unbiased estimates of the importance of timeinvariant factors. Curiously, though, the coefficient on the distance variable is practically the same as obtained from the PCSa model. It is not possible to tell if this is a coincidence, or if it is due to the distance variable being uncorrelated with the other independent variables. However, because the distance variable is so suspect, we would not want to push this result in either case. 5. Alternatives with Heterogeneity As discussed earlier, two other papers have proposed empirical models for dealing with 17 heterogeneity, and these models can be regarded as restricted forms of the FE model. Using many of the same arguments we use to argue for the FE model, Bayoumi and Eichengreen (1997) propose estimating the gravity equation in first differences. This method of handling heterogeneity is nearly identical to ours, except in the restrictions that it imposes on the effects of time. An additional difference between their model and our FEa model is that they use their independent variables are the products of the GDPs and populations, therefore imposing the additional restrictions that β1 = β 2 and β 3 = β 4 . Because we wish to focus on their treatment of heterogeneity only, we will not estimate the model under those restrictions. However, as is clear from the results, these additional restrictions are easily rejected statistically. Taking the time difference of (5), the model that we will estimate (BEa) is ∆ ln X ijt = γ 0 + γ t + β1∆ ln Yit + β 2 ∆ ln Y jt + β 3 ∆ ln N it + β 4 ∆ ln N jt + µ ijt ; (6) where the intercept is as defined in Section 2, γ 0 + γ t = α t − α t −1 . To prevent collinearity, we set the time dummy for 1992 equal to zero, meaning that other time dummies are measured relative to it. In terms of the more-general FEa model, this is equivalent to restricting the common component of the change in the period-specific effects as equal to the difference in the first two period-specific effects, i.e. γ 0 = α 2 − α1 . 13 The empirical results are presented in Table 2. The results for the FEa and BEa models are very similar in terms of the signs and order of magnitude of the coefficients. Also, as illustrated by the lower-left panel of Figure 1, the BEa model is not subject to the obviously biased residuals of the PCSa model. Nonetheless, the FEa 13 The alternative assumption that the sum of the year dummies is zero means that same results except for the time dummies and the constant. 18 γ 0 = α T − α1 , and yields the and BEa results differ enough to reject the restrictions needed to obtain BEa model. This is confirmed by a likelihood ratio test. Further, in terms of fitting the data, the FEa model is preferred in terms of the sum of squared residuals, the Akaike Information Criteria, and the Amemiya Probability Criteria. As for the further restrictions in Bayoumi and Eichengreen (1997) that β1 = β 2 and β 3 = β 4 , our results indicate that this would clearly have significant effects on the estimates, and therefore should not be imposed. The other alternative to the FEa model is due to Mátyás (1997), who proposes using the specification ln X ijt = α 0 + α t + θ i + ω j + β1 ln Yit + β 2 ln Y jt + β 3 ln N it + β 4 ln N jt + ε ijt ; (7) where the effect when a country is an exporter is θ i , and when it is an importer is ω j . This model is a restricted form of the FEb model in that it imposes arbitrary cross-pair restrictions on the country-pair effects; α ij = α ik − ω k + θ j . The empirical results, summarized by the last column of Table 2, show that the coefficients are identical to those from the FEa model, although their standard errors are very large. In fact, they are large enough to reject the statistical significance of all but the coefficient on destination GDP. The lower-right panel of Figure 1 plots the residuals of this model against the log of exports, and shows that this model has very peculiar results. The wide dispersion of the residuals also indicates a very poor fit relative to FEa. Further, a likelihood ratio test easily rejects the null hypotheses that the cross-pair restrictions do not change the results in a statistically important way. So, although this model eliminates the bias in the estimates of the coefficients on the 19 independent variables, it has little predictive power. 6. Implications for Estimating the Effects of Integration As we discuss in the Introduction, the gravity model has become the primary tool for estimating the effects of regional integration on trade volumes. Up to this point, we have omitted integration variables in order to focus on the importance of controlling for country-pair heterogeneity when estimating gravity models. However, now that we have established that the FEa model is statistically preferred to standard cross-sectional methods, we introduce integration into our model and demonstrate the striking effect that heterogeneity bias has on the results. We would also like to alleviate the legitimate concern that the heterogeneity bias we detected above was due to our implicit assumption that regional integration is uncorrelated with the independent variables. The most common and straightforward method for estimating the effects of integration in a gravity model is to include dummy variables for each integration regime in place during the sample period. Each of these dummies takes the value of 1 for each observation for which the two countries are members of the regime, with the expectation that the coefficients on these dummies are positive. We include three such dummy variables in our model, one each for the European trading bloc, the North American trading bloc, and the South American trading bloc (MERCOSUR). Although there has been some deepening of trade integration in the European bloc, the 20 primary change over the period was an expansion in the number of countries covered under the customs union. The twelve countries of the European Community (EC) renamed themselves the European Union (EU) in 1992, but this had relatively little effect on internal trade policy, as it was already nearly unfettered under the EC. Expansion of the bloc came in 1994 with the European Economic Area (EEA), which extended the free trade zone to include Austria, Iceland, Finland, Norway, and Sweden. To capture the effect of this trading bloc, our European bloc dummy variable takes the value of 1 when trade is between members of the EC or EU for 199193, and between members of the EEA for 1994-95. The North American trading bloc included only Canada and the United States for 199193, under the Canada-US Trade Agreement of 1988. The North American Free Trade Agreement (NAFTA) expanded the free trade zone in 1994 to include Mexico. For present purposes, we will ignore NAFTA’s relatively mild deepening of US-Canada integration, and focus instead on it as an extension of the free trade bloc to Mexico. To capture the effects of North American integration, our North American bloc dummy takes the value of 1 for trade between the US and Canada for 1991-95, and between Mexico, Canada, and the US for 1994-95. The third significant trade bloc during the period was MERCOSUR, which came into force in 1995, reducing trade barriers between Argentina, Brazil, Paraguay, and Uruguay. Our MERCOSUR dummy takes the value of 1 for trade between any two of these countries in 1995. We include these three trade bloc dummies in the PCSa and FEa models, and report the empirical results in Table 3. Note that inclusion of these dummies makes little difference for the 21 PCSa model. Nonetheless, a likelihood ratio test rejects the null hypotheses that including the trade bloc dummies in the PCSa model does not alter the results to a statistically significant extent.14 The results for the FEa model are also not dramatically different when the trade bloc dummies are included. In fact, the null hypothesis that the inclusion of these variables has no statistically significant effect on the results cannot be rejected.15 The dramatic change in the empirical results is in the comparison of the FEa and PCSa models. The negative effects for all three trade blocs in the PCSa model are certainly contrary to expectations. Not only are the coefficients on the European bloc and MERCOSUR dummies statistically different from zero, they are also very large. The results suggest that membership in the European trade bloc decreases trade with another bloc member by roughly the same as would occur if the distance between the countries doubled. The predicted decrease in trade due to membership in MERCOSUR is three time this. Although these predictions have interesting implications regarding the future of the world trading system, they do not hold up when the estimation allows for heterogeneity. Using the FEa model, all three trade blocs have positive effects on the volume of trade, although none are statistically different from zero. However, this is likely due to the simplistic nature of the dummies, which do not account for trade diversion effects, rather than to the absence of real effects (Cheng and Wall, 1999). For our present purposes though, the dummies we use are useful to illustrate how the FEb model at least brings the results into the realm of believability. 7KLVLVZLWKDFULWLFDOYDOXHRIDWWKHOHYHODQG 2(3) = 79.38. 7KLVLVZLWKDFULWLFDOYDOXHRIDWWKHOHYHODQG 2(3) = 4.12. 14 15 22 7. Conclusions The objective of this paper is to argue that heterogeneity needs to be allowed for when using the gravity model to estimate bilateral trade flows. Our empirical analysis indicates that standard methods for estimating gravity models of trade suffer from heterogeneity bias due to omitted or misspecified variables. To address the problem we adopt a two-way fixed-effects model in which country-pair and period dummies are used to reflect the bilateral relationship between trading partners. The fixed effects capture those factors such as physical distance, the length of border (or contiguity), history, culture, language, etc., that are constant over the span of the data, and which are correlated with the volume of bilateral trade. We show that existing empirical models are special cases of our model, and that the restrictions necessary to obtain these special cases are not supported statistically. We conclude that the preferred specification of the gravity model takes into account the heterogeneity of the country pairs using country-pair dummies, and includes the populations of the two countries. As the gravity model has become the “workhorse” of empirical work on the effects of integration, we also check the importance of allowing for heterogeneity when doing this work. Our results suggest that the results when heterogeneity is allowed for can differ wildly from when it is not. 23 Data Appendix 1. Definitions of variables Volume of Exports, measured in millions of US dollars, from the International Monetary Fund’s Direction of Trade Statistics. Bilateral Exchange Rates are from the IMF’s International Financial Statistics. Both are downloaded from Datastream’s IMF database, and deflated using the $-deflator from the World Tables 97 CD-ROM. Gross Domestic Product is in millions of 1987 US dollars, and Population is in thousands of inhabitants. Both are from the World Bank’s World Tables 97 CD-ROM. Distance, expressed in kilometers, is the distance between capital cities, obtained from John Haveman’s web site at ftp://intrepid.mgmt.purdue.edu/pub/Trade.Data/dist.txt, and from http://www.indo.com/distance/. Contiguity is equal to 1 if two trading partners share common border. Common Language is equal to 1 if two trading partners share a common first language. European Bloc is equal to 1 when both countries are members of the EC for 1991, the EU, for 1992-93, or the EEA for 1994-95. North American Bloc is equal to 1 for Canada-US trade for all years, and Canada-Mexico and US-Mexico trade for 1994-95. MERCOSUR is equal to 1 in 1995 for trade between Argentina, Brazil, Paraguay, and Uruguay. 2. Countries included in data set 22 Exporters: Argentina, Australia, Canada, China, Finland, France, Hong Kong, Italy, Japan, Kenya, South Korea, Malaysia, The Netherlands, Philippines, Portugal, Singapore, Spain, Sweden, Switzerland, Thailand, United Kingdom, United States 116 Importers: Albania, Algeria, Angola, Argentina, Australia, Austria, Bahamas, Bahrain, Bangladesh, Barbados, Belgium, Bolivia, Botswana, Bulgaria, Burkina Faso, Burundi, Brazil, Brunei, Cameroon, Canada, Central African Republic, Chad, Chile, China, Colombia, Congo, Costa Rica, Czech Republic, Denmark, Djibouti, Dominican Republic, Ecuador, Egypt, El Salvador, Ethiopia, Finland, France, Gabon, Gambia, Germany, Ghana, Greece, Guinea, GuineaBissau, Guyana, Haiti, Honduras, Hong Kong, Hungary, India, Indonesia, Ireland, Israel, Italy, Jamaica, Japan, Kenya, S. Korea, Kuwait, Lebanon, Luxembourg, Macao, Madagascar, Malawi, Malaysia, Mauritania, Mexico, Moldova, Mongolia, Morocco, Mozambique, Namibia, Netherlands, New Zealand, Nicaragua, Nigeria, Norway, Oman, Pakistan, Panama, Papua New Guinea, Paraguay, Peru, Philippines, Poland, Portugal, Qatar, Romania, Russia, Rwanda, Saudi Arabia, Senegal, Sierra Leone, Singapore, Solomon Islands, South Africa, Spain, Sri Lanka, Suriname, Swaziland, Sweden, Switzerland, Syria, Tanzania, Thailand, Togo, Tunisia, Turkey, Uganda, United Kingdom, United States, Uruguay, Venezuela, Yemen, Zambia, Zimbabwe 24 References Aitken, N. D., 1973, “The Effect of the EEC and EFTA on European Trade: A Temporal CrossSection Analysis,” American Economic Review, 63, 5, 881-892. Anderson, J. E., 1979, “A Theoretical Foundation for the Gravity Equation,” American Economic Review, 69, 1, 106-116. Bayoumi, T. and B. Eichengreen, 1997, “Is Regionalism Simply a Diversion? Evidence from the Evolution of the EC and EFTA,” in T. Ito and A. O. Krueger, Eds., Regionalism versus Multilateral Trade Arrangements, University of Chicago Press. Bergstrand, J. H., 1985, “The Gravity Equation in International Trade: Some Microeconomic Foundations and Empirical Evidence,” Review of Economics and Statistics, 67, 474-481. Bergstrand, J. H., 1989, “The Generalized Gravity Equation, Monopolistic Competition, and the Factor-Proportions Theory of International Trade,” Review of Economics and Statistics, 71, 143-153. Bikker, J. A., 1987, “An International Trade Flow Model with Substitution: An Extension of the Gravity Model,” Kyklos, 40, 315-337. Boisso, D. and M. Ferrantino, 1997, “Economic Distance, Cultural Distance, and Openness in International Trade: Empirical Puzzles,” Journal of Economic Integration, 12, 456-484. Brada, J. C. and J. A. Mendez, 1983, “Regional Economic Integration and the Volume of IntraRegional Trade: A Comparison of Developed and Developing Country Experience,” Kyklos, 36, 589-603. Cheng, I.H. and H.J. Wall, 1999, “Estimating the Effects of Regional Integration on Trade Volumes,” working paper. Deardorff, A. V., 1984, “Testing Trade Theories and Predicting Trade flows,” in R. W. Jones and P. B. Kenen, Eds., Handbook of International Economics, Vol. I, Elsevier. Deardorff, A. V., 1998, “Determinants of Bilateral Trade: Does Gravity Work in a Neoclassical World?” in J. A. Frankel, Ed., The Regionalization of the World Economy, University of Chicago Press. Eichengreen, B. and D. A. Irwin, 1998, “The Role of History in Bilateral Trade Flows,” in J. A. Frankel, Ed., The Regionalization of the World Economy, University of Chicago Press. Evenett, S. J. and W. Keller, 1998, “On Theories Explaining the Success of the Gravity Equation,” NBER Working Paper 6529. 25 Feenstra. R. C., J.A. Markusen, and A.K. Rose, 1998, “Understanding the Home Market Effect and the Gravity Equation: The Role of Differentiating Goods,” NBER Working Paper 6804. Frankel, F., Stein, E. and S. Wei, 1995, “Trading Blocs and Americas: the Natural, the Unnatural, and the Super-natural,” Journal of Development Economics, 47, 61-95. Frankel, F., Stein, E. and S. Wei, 1998, “Continental Trading Blocs: Are they Natural or Supernatural?,” in J.A. Frankel, Ed., The Regionalization of the World Economy, University of Chicago Press. Frankel, F. and S. Wei, 1998, “Regionalization of World Trade and Currencies,” in J. A. Frankel, Ed., The Regionalization of the World Economy, University of Chicago Press. Helliwell, J., 1996, “Do National Borders Matter for Quebec’s Trade?” Canadian Journal of Economics, 29, 507-522. Hummels, D. and J. Levinsohn, 1995, “Monopolistic Competition and International Trade: Reconsidering the Evidence,” Quarterly Journal of Economics, 110, 799-836. Linnemann, H., 1966, An Econometric Study of International Trade Flows, North-Holland. McCallum, J., “National Borders Matter: Canada-U.S. Regional Trade Patterns,” American Economic Review, 85, 615-623. Mátyás, L., 1997, “Proper Econometric Specification of the Gravity Model,” The World Economy, 20, 363-368. Oguledo, V. I. and C. R. MacPhee, 1994, “Gravity Model: A Reformulation and an Application to Discriminatory Trade Arrangements,” Applied Economics, 40, 315-337. Pöyhönen, P., 1963, “A Tentative Model for the Volume of Trade Between Countries,” Weltwirtschaftliches Archive, 90, 93-100. Sanso, M., R. Cuairan, and F. Sanz, 1993, “Bilateral Trade Flows, the Gravity Equation, and Functional Form,” Review of Economics and Statistics, 75, 266-275. Sen, A. and T. E. Smith, 1995, Gravity Models of Spatial Interaction Behavior, Springer-Verlag. Smith. P.J., 1999, “Are Weak Patent Rights a Barrier to US Exports?” Journal of International Economics, forthcoming. Tinbergen, J., 1962, Shaping the World Economy - Suggestions for an International Economic Policy, The Twentieth Century Fund. Wei, S.J. and J.A. Frankel, 1997, “Open versus Closed Trading Blocs,” in T. Ito and A. Krueger, Eds., Regionalism versus Multilateral Trade Arrangements, University of Chicago Press. 26 Table 1: Regression results for single-year cross-section; augmented and basic versions; 1991-95 dependent variable = log of exports 1991 1992 1993 1994 1995 CSa CSb CSa CSb CSa CSb CSa CSb CSa CSb constant -5.154* (1.058) -4.954* (0.997) -5.380* (1.068) -5.184* (1.014) -4.913* (1.049) -4.943* (0.993) -4.798* (1.073) -4.797* (1.014) -4.686* (1.099) -4.694* (1.036) origin GDP 0.635* (0.050) 0.670* (0.040) 0.641* (0.051) 0.671* (0.041) 0.609* (0.050) 0.658* (0.039) 0.612* (0.053) 0.663* (0.041) 0.607* (0.055) 0.662* (0.042) destination GDP 0.853* (0.046) 0.805* (0.034) 0.828* (0.046) 0.784* (0.034) 0.840* (0.046) 0.807* (0.034) 0.858* (0.046) 0.820* (0.034) 0.857* (0.046) 0.817* (0.034) origin population 0.073 (0.063) 0.061 (0.064) 0.099 (0.064) 0.102 (0.065) 0.106 (0.066) dest. population -0.075 (0.055) -0.069 (0.055) -0.044 (0.055) -0.055 (0.056) -0.060 (0.056) distance -0.819* (0.081) -0.823* (0.079) -0.765* (0.078) -0.771* (0.080) -0.811* (0.081) -0.806* (0.080) -0.849* (0.082) -0.847* (0.081) -0.850* (0.083) -0.849* (0.081) contiguity 0.160 (0.316) 0.151 (0.316) 0.212 (0.320) 0.204 (0.320) 0.208 (0.318) 0.200 (0.319) 0.236 (0.322) 0.225 (0.323) 0.188 (0.323) 0.175 (0.324) common language 0.704* (0.169) 0.701* (0.169) 0.729* (0.171) 0.725* (0.171) 0.740* (0.170) 0.736* (0.171) 0.661* (0.172) 0.658* (0.172) 0.662* (0.173) 0.660* (0.173) log-likelihood -709.37 0.642 -710.92 0.641 -714.69 0.625 -715.90 0.624 -712.71 0.635 -714.19 0.634 -717.65 0.641 -719.33 0.640 -718.88 0.638 -720.71 0.637 R2 All variables except for the contiguity and language dummies are in logs. Standard errors are in parentheses. * denotes significance at 5% level. The basic version (CSa) is a form of the augmented version (CSa) with the coefficients on the populations restricted to zero. There are 422 observations for each year. 27 Table 2: Regression results for models using pooled data; 1991-95; dependent variable = log of exports pooled cross-section PCSa PCSb fixed effects FEa FEb alternatives w/heterog. BEa Ma constant -5.007* (0.477) -4.931* (0.453) - - 0.034† (0.020) -4.505 (10.887) origin GDP 0.621* (0.023) 0.664* (0.018) 0.293* (0.066) 0.323* (0.063) 0.195* (0.083) 0.293 (0.263) destination GDP 0.847* (0.020) 0.807* (0.015) 0.520* (0.054) 0.506* (0.054) 0.427* (0.066) 0.520* (0.214) origin population 0.088* (0.029) 1.762* (0.685) 2.293† (1.195) 1.769 (2.732) dest. population -0.061* (0.024) -2.014* (0.352) -1.458* (0.587) -2.016 (1.403) distance -0.819* (0.036) -0.819* (0.036) contiguity 0.200 (0.142) 0.190 (0.143) common language 0.700* (0.076) 0.696* (0.076) 1992 -0.005 (0.091) -0.004 (0.091) 0.030 (0.019) 0.021 (0.018) 1993 0.027 (0.091) 0.028 (0.091) 0.050* (0.023) 0.035* (0.018) -0.023 (0.022) 0.050 (0.094) 1994 0.022 (0.091) 0.022 (0.091) 0.099* (0.029) 0.073* (0.019) 0.023 (0.021) 0.099 (0.114) 1995 0.056 (0.091) 0.056 (0.091) 0.198* (0.035) 0.164* (0.022) 0.080* (0.021) 0.198 (0.138) 2110 10 -3582.57 0.638 3.405 1.764 3686.24 2110 430 131.01 0.987 0.283 0.078 109.11 2110 428 107.80 0.986 0.304 0.080 111.54 1688 8 -331.93 0.073 0.403 0.088 146.45 2110 147 -2950.43 0.788 2.935 1.102 2024.71 observations parameters log-likelihood 2110 12 -3575.10 0.641 R2 Akaike Info. Crt. 3.400 Amemiya Prob. Crt. 1.755 sum of sqd. resids. 3660.23 0.030 (0.076) All non-dummy variables are in logs. Standard errors are in parentheses. * and † denote significance at 5% and 10% levels. For the BEa model all variables are in differences from the previous year. 28 Table 3: Regression results with integration dummies; 1991-95; dependent variable = log of exports pooled cross-section PCSa fixed effects FEa constant -3.572* (0.490) - origin GDP 0.649* (0.023) 0.312* (0.067) destination GDP 0.870* (0.020) 0.524* (0.054) origin population 0.054† (0.028) 1.864* (0.690) destination population -0.083* (0.024) -1.948* (0.356) distance -1.022* (0.041) contiguity 0.025 (0.144) common language 0.622* (0.075) European bloc -1.112* (0.115) 0.064 (0.044) North American bloc -0.062 (0.397) 0.157 (0.233) MERCOSUR -3.165* (0.924) 0.166 (0.202) 1992 -0.006 (0.089) 0.027 (0.019) 1993 0.027 (0.089) 0.047* (0.024) 1994 0.104 (0.089) 0.087* (0.030) 1995 0.149† (0.089) 0.181* (0.037) 2110 15 -3524.73 0.657 3.355 1.678 3489.61 2110 433 132.97 0.987 0.284 0.078 108.91 observations parameters log-likelihood R2 Akaike Info. Crt. Amemiya Prob. Crt. sum of sqd. resids. All non-dummy variables are in logs. Standard errors are in parentheses. * and † denote significance at 5% and 10% levels. 29 Figure 1: Plots of Residuals; Various Models Pooled Cross-Section (PCSa) -3 6 4 2 0 -2 -4 -6 -8 2 7 Fixed Effects (FEa) 12 -3 6 4 2 0 -2 -4 -6 -8 2 log of exports -3 2 7 12 log of exports Restricted Time Effects (BEa) 6 4 2 0 -2 -4 -6 -8 7 Restricted Fixed Effects (Ma) 12 -3 log of exports 6 4 2 0 -2 -4 -6 -8 2 7 log of exports 30 12