Measuring Disagreement in Qualitative Survey Data∗ Frieder Mokinskia , Xuguang Shengb† and Jingyun Yangc a Centre for European Economic Research (ZEW), Germany b Department c The of Economics, American University, USA Methodology Center, Pennsylvania State University, USA ∗ This paper was presented at the 19th Federal Forecasters Conference and Society of Government Economists Annual Conference. We thank the participants in the conferences for helpful comments and suggestions. Dr. Yang’s research was supported by Award Number P50DA010075-16 from the National Institute on Drug Abuse and NIH/NCI R01 CA168676. The usual disclaimer applies. † Corresponding author. Mailing address: 4400 Massachusetts Avenue N.W., Washington, DC, 20016, USA. Email: sheng@american.edu. Tel: +1 202 885 3782. Fax: +1 202 885 3790. Measuring Disagreement in Qualitative Survey Data Abstract To measure disagreement among respondents in qualitative survey data, we propose new methods applicable to both univariate and multivariate comparisons. Based on prior work, our first measure quantifies the level of disagreement in predictions of a single variable. Our second method constructs an index of overall disagreement from a dynamic factor model across several variables. Using directional forecasts from the Centre for European Economic Research Financial Market Survey, we find that our measures yield levels of disagreement consistent with point forecasts from the European Central Bank’s Survey of Professional Forecasters. To illustrate usefulness, we explore the source and predictive power of forecast disagreement. Keywords: Disagreement, Dynamic factor model, Qualitative data, Survey forecast. 1 Introduction Disagreement among forecasters plays an increasingly important role in economic modeling, forecasting and policy. After having documented substantial disagreement in inflation forecasts, Mankiw, Reis, and Wolfers (2004) suggest that disagreement may be a key to macroeconomic dynamics. In a similar vein, Driver, Trapani, and Urga (2012) show that disagreement, as a proxy for private information, may increase forecast accuracy for typical macroeconomic variables. Despite the growing body of literature on disagreement, the overwhelming majority of studies focus on point forecasts. However, most business and consumer surveys only provide qualitative indications of the current and expected future economic conditions.1 The expectations of consumers and firms are central to most economic analyses, as illustrated by the New Keynesian model. The joint importance of these economic agents’ expectations and disagreement motivates our development of econometric methods to analyze the level of disagreement in qualitative survey data. In this paper, we propose two measures of disagreement among the qualitative expectations of survey respondents. Building on Carlson and Parkin’s (1975) well-known probability approach, we design our first measure to extract the level of disagreement for predictions of a single variable. More specifically, we relax the restrictive assumptions of the Carlson-Parkin method by using a flexible distribution and allowing for time-varying parameters. Our second measure employs a dynamic factor model to quantify the level of overall disagreement in forecasts for several variables. To illustrate, we estimate our two measures of disagreement on directional forecasts from the Centre for European Economic Research (ZEW) Financial Market Survey, and compare the estimates to conventional disagreement measures obtained from point forecasts. We find that the two novel measures are highly correlated with the conventional measures of disagreement, and the second measure tracks the benchmark even more closely. We apply our measures to explore the source and predictive power of disagreement in qualitative survey data. In the first application, we employ a special survey conducted by the ZEW and study the role of heterogeneous forecasting methods in generating forecast disagreement. Among six methods - econometric modeling, fundamental analysis, technical analysis, judgment, in-house research and consensus forecasts - most respondents report that fundamental analysis and judgment are especially important forecasting techniques, while technical analysis is relatively 1 There exist a number of qualitative business and consumer surveys across many countries, such as the European Commission Business and Consumer Surveys, the IFO Business Survey, the Confederation of British Industry Business Survey, and the University of Michigan Survey of Consumers. 1 less important. Our analysis indicates that relying heavily upon econometric modeling increases disagreement, whereas paying close attention to the consensus forecasts reduces disagreement. In another application, we explore the economic significance of disagreement by studying whether disagreement has any predictive power for economic activity. To this end, we utilize the well-known business survey conducted by the Institute for Supply Management (ISM), available since January 1948. We find that disagreement in the ISM survey can indeed improve our forecasts of industrial production. For instance, including the estimated price and employment disagreement in the model significantly decreases the mean squared forecast error for almost all forecast horizons. Our paper makes three contributions to the literature on macroeconomic forecasting. First, we propose new econometric methods for measuring disagreement in qualitative survey data. These methods may effectively be applied in both univariate and multivariate comparisons. We establish the validity of our disagreement measures by comparing results from qualitative and quantitative data sets. Second, we provide direct evidence that forecasters use differing methods when revising their predictions, which confirms the implications of the theoretical models in Lahiri and Sheng (2008) and Patton and Timmermann (2010). Third, we find that disagreement has economically meaningful predictive value. With quantitative data, Legerstee and Franses (2010) document the predictive power of disagreement measures. Our confirmation of this relationship in qualitative data firmly establishes that the degree of disagreement signals upcoming structural and temporal changes in an economic process. These contributions are especially significant because we focus on the underemphasized, yet immensely important field of qualitative economic expectations. The paper proceeds as follows. Section 2 describes the methods for measuring disagreement in qualitative survey data. In section 3, we compare the result of our measures of disagreement for directional forecasts to that of conventional measures of disagreement for point forecasts. In section 4, we link forecast disagreement to forecasting technology. Section 5 explores the predictive power of disagreement, and section 6 concludes. 2 Measuring Disagreement in Directional Forecasts For most qualitative surveys, the percentages of respondents who expect a variable to increase, stay the same or decrease are the only aggregate statistics that are available.2 To best address the real 2 When the individual responses are available, one can use the kappa statistic to measure (dis)agreement in qualitative survey data. See Song, Boulier, and Stekler (2009) for a recent application. 2 world limitations of data, our methods of measuring disagreement only require aggregated data. Throughout this section, we develop our measures of disagreement in qualitative survey data. 2.1 Disagreement in Predictions of A Single Variable Carlson and Parkin (1975) present a method for obtaining point forecasts from qualitative survey data. The Carlson-Parkin quantification method has been widely used in the literature. See Nardo (2003) and Pesaran and Weale (2006) for recent reviews. Their quantification method rests on two key assumptions. First, the method assumes that survey respondents convert latent point forecasts to directional forecasts. If the point forecast fit of respondent i at time t is larger than the threshold τup,t , the respondent will report “go up”; if fit is between τdown,t and τup,t , the respondent will report “stay same”; if fit is below τdown,t , the respondent will report “go down.” The second assumption is that the point forecasts {fit }(i=1,...,Nt ) in period t are independently and identically distributed normal with mean µt and variance σt2 . While many studies have focused on the cross-sectional mean µt of the latent distribution of point forecasts, very few have analyzed the cross-sectional variance σt2 .3 We recognize the importance of the cross-sectional variance (or standard deviation) as a measure of disagreement among survey respondents. Below, we explain how to obtain estimates of µt and σt2 . Let Ut and Dt be the percentages of “go up” and “go down” responses in period t. If the number of responses in t is sufficiently large, then Ut ≈ P rob(fit ≥ τup,t ) = 1 − Φ( τup,t −µt ) σt and Dt ≈ P rob(fit ≤ τdown,t ) = Φ( τdown,t −µt ), σt where Φ is the cumulative distribution function of the standard normal distribution. By assuming symmetric thresholds that do not vary over time, that is, τup,t = −τdown,t = τ , we obtain µt = τ Φ−1 (Dt ) + Φ−1 (1 − Ut ) / Φ−1 (Dt ) − Φ−1 (1 − Ut ) , σt = 2τ / Φ−1 (1 − Ut ) − Φ−1 (Dt ) . (1) (2) If we know the share of responses in each direction, the only unknown in equation (2) is τ. However, since τ is only a scaling constant, we can set it to an arbitrary value without losing information on the dynamics of this basic measure of disagreement. To further develop our model, we address two criticisms of the Carlson-Parkin method. Carlson 3 Notable exceptions include Dasgupta and Lahiri (1992) and Mankiw, Reis, and Wolfers (2004), who use σt2 as a measure of dispersion in forecasts. 3 (1975) finds that the cross-sectional distribution of point inflation forecasts is non-Gaussian. Alternative distributions have been used. Dasgupta and Lahiri (1992) use the scaled t-distribution, and Batchelor (1981) experiments with skewed distributions. To alleviate this concern, we replace the normal distribution with a scaled t-distribution as an alternative to our standard measure. Similarly, others criticize that the assumption of symmetrical, time-constant thresholds could be violated in practice. Smith and McAleer (1995) estimate thresholds in a time-varying parameter framework, while Pesaran (1984) relates thresholds to observed variables. To address this issue, we estimate a simple time-varying parameter model in the spirit of Cooley and Prescott (1976). By allowing the thresholds to be asymmetric and time-varying, equations (1) and (2) become τup,t Φ−1 (Dt ) − τdown,t Φ−1 (1 − Ut ) / Φ−1 (Dt ) − Φ−1 (1 − Ut ) , = − (τup,t − τdown,t ) / Φ−1 (Dt ) − Φ−1 (1 − Ut ) . µt = (3) σt (4) We can rewrite equation (3) as Φ−1 (D Φ−1 (D Φ−1 (1 0 − Ut ) t )/ t) − τup,t 0 µt = = xt β t . −Φ−1 (1 − Ut )/ Φ−1 (Dt ) − Φ−1 (1 − Ut ) τdown,t (5) Moreover, we assume that the threshold vector βt evolves according to a multivariate random walk βt = βt−1 + νt , (6) where V ar[νt ] = diag(σν2 ). Equations (5) and (6) specify a state-space model, in which the average point forecast µt is the measurement. Since µt is unobserved, the model cannot be estimated. Different approaches may effectively estimate µt . First, µt can be replaced by the realization of the target variable. This method implicitly assumes that there is no systematic discrepancy between the average point forecast and the actual, implying that forecasts are unbiased on average. Alternatively, besides directional forecasts, questionnaires sometimes ask for directional assessment of the past. Thresholds are then estimated by replacing xt with the corresponding assessment and µt with the realization. Since the assessment of the past is not available in most surveys, we follow the first approach and replace µt with the realization. We introduce a measurement error, ut , to allow for the possiblity that the realization of the target variable, yt , may not exactly equal the 4 average point forecast. The measurement equation thus becomes yt = µt + ut = x0t βt + ut , (7) where the second equality follows from imputing equation (5) for µt . We first estimate the model in equations (6) and (7) using the Kalman filter and then obtain filtered estimates of the time-varying thresholds. By imputing the estimated thresholds into equation (4), we derive the disagreement measure.4 For more details on parameter estimation, see Section 2.3. 2.2 Disagreement in Predictions of Many Variables We propose an econometric approach to properly analyzing disagreement among the growing number of qualitative forecasts for disaggregate variables. Suppose, for instance, that we are interested in measuring overall disagreement about the state of an economy from forecasts for its sectors, or overall disagreement for the Euro area from disagreeing forecasts for its member countries. We estimate such an index of overall disagreement from a dynamic factor model. Our disagreement index is related to Banternghansa and McCracken (2009) and Sinclair and Stekler (2012). Using the Mahalanobis distance, the former measures overall disagreement in point forecasts across several variables and the latter tests the difference between the vectors that contain different vintages of GDP estimates. We develop the model under the assumption that there is a single unobserved factor driving the disaggregate disagreement measures (Stock and Watson, 1991). This assumption is justified by the observation that the pattern of estimated disagreement in empirical analyses is often quite similar across variables. This commonality is in turn reflected in the fact that most of the total variation in forecast disagreement of macro variables is well summarized by a single common factor.5 Let σit denote a disaggregate disagreement measure computed via equation (2) or (4) for variable i, i = 1, . . . , n at period t, t = 1, . . . , T . Let Yt denote an n × 1 vector of σit that are assumed to move contemporaneously with overall disagreement. We decompose Yt into two stochastic components: the common unobserved variable, which we call “overall disagreement,” Ft , and an n-dimensional component that represents idiosyncratic movements in the variable-specific disagreement series, 4 In another experiment, we also estimate the Carlson-Parkin model with time-constant but asymmetric thresholds. 5 While it would be possible to model disagreement using more than one factor, using a single factor brings large computational benefits and offers a simple interpretation. In a recent paper, Carriero, Clark, and Marcellino (2012) show that the cost paid in using only one factor for volatilities is more than offset by the possibility of using a larger information set in Bayesian VARs. 5 ut . Both Ft and ut are modeled to have stochastic structures. This setup suggests the following model specification: Yt = c0 + γFt + ut , (8) φ(L)Ft = ξt , (9) D(L)ut = νt , (10) where L denotes the lag operator, φ(L) and D(L) are lag polynomials of orders p and q, respectively. In equation (8), Ft enters each of the variables contemporaneously with variable-specific weights. For the purpose of parameter identification, we normalize σξ2 to 0.01 and further assume that Ft and ut are mutually uncorrelated at all leads and lags. This assumption requires that D(L) is diagonal and the n × 1 disturbances are mutually uncorrelated: D(L) = diag(d1 (L), . . . , dn (L)), Var[ξt νt0 ]0 = diag(σξ2 , σν21 , . . . , σν2n ). Equations (8)-(10) form a state-space model, in which equation (8) is the measurement equation and equations (9)-(10) are the state equations. We perform maximum likelihood estimation and obtain filtered overall disagreement using the Kalman filter. The following section provides more details on the estimation process. 2.3 Estimation of State-Space Models Both the time-varying thresholds model in Section 2.1 and the dynamic factor model in Section 2.2 are linear Gaussian state-space models. In this section we briefly discuss the Kalman filter algorithm used to estimate both models. The presentation is based on Durbin and Koopman (2012), an excellent reference for state-space modeling. We begin by describing a general linear Gaussian state-space model that nests our two models: y t = c + Z t α t + εt , αt+1 = Tt αt + ηt , (11) (12) where equation (11) is the measurement equation, equation (12) is the state equation, yt is a 6 k × 1 vector of observed variables, and αt is an m × 1 vector of state variables. The model assumes that the disturbances εt and ηt are both serially independent and independent of each other at all leads and lags with εt ∼ N (0, Ht ) and ηt ∼ N (0, Qt ). Given Yt = (y1 , . . . , yt ), define at|t−1 = E[αt |Yt−1 ], at|t = E[αt |Yt ], Pt|t−1 = V ar[αt |Yt−1 ], and Pt|t = V ar[αt |Yt ]. Moreover, let vt = yt − E[yt |Yt−1 ] = yt − c − Zt at|t−1 be the one-step-ahead forecast error of yt given Yt−1 , and Ft = V ar(vt |Yt−1 ) = Zt Pt|t−1 Zt0 + Ht be its variance. We assume that the initial state vector has a Gaussian distribution, α1 ∼ N (a1 , P1 ), where a1 and P1 are known. By the law of iterated expectations, the log-likelihood is log L(YT ) = PT t=1 log p(yt |Yt−1 ), with p(y1 |Y0 ) = p(y1 ). Since yt |Yt−1 ∼ N (c + Zt at|t−1 , Ft ), the log-likelihood then becomes T log L(YT ) = − Tk 1X log 2π − log |Ft | + vt0 Ft−1 vt . 2 2 (13) t=1 We calculate vt and Ft in equation (13) with the Kalman filter, which works as follows. Starting with N (at|t−1 , Pt|t−1 ), the distribution of αt given Yt−1 , we obtain vt = yt − c − Zt at|t−1 and Ft = Zt Pt|t−1 Zt0 + Ht . After observing yt , we update our inference about the state vector αt by the equations at|t = at|t−1 + Pt|t−1 Zt0 Ft−1 vt , (14) Pt|t = Pt|t−1 − Pt|t−1 Zt0 Ft−1 Zt Pt|t−1 . (15) Based on the updated inference, we forecast the distribution of the state vector in period t + 1 as at+1|t = Tt at|t , (16) Pt+1|t = Tt Pt|t Tt0 + Qt . (17) The recursions (14)-(17) constitute the Kalman filter for model (11)-(12) and enable us to update our knowledge of the system for each new observation. We use the exact initial Kalman filter of Koopman (1997). Loosely speaking, this algorithm deals with the problem of initialization by taking the limit of the log-likelihood function for P1 approaching infinity. Next, we show how the models in Sections 2.1 and 2.2 are special cases of the general model presented above. In terms of the notation of the general model, the time-varying threshold Carlson- 7 Parkin model in (6)-(7) can be reformulated as yt = yt , c = 0, Zt = xt , εt = ut , Ht = σu2 , αt+1 = βt+1 , 1 0 Tt = , 0 1 ηt = νt , 2 σν 0 Qt = . 0 σν2 Similarly, the dynamic factor model in (8)-(10) arises as a special case of the general model, if we set the vectors and matrices of the measurement and state equations as yt = Yt , c = c0 , Zt = γ, εt = ut , Ht = diag(σ12 , . . . , σk2 ), φ(1) φ(2) · · · 1 0 ··· 1 ··· 0 .. .. .. . . . Tt = 0 ··· 0 0 0 ··· 0 0 ··· . .. .. .. . . 0 0 ··· φ(p) 0 0 ··· 0 0 0 ··· 0 .. . 0 .. . 0 .. . ··· .. . 1 0 0 ··· 0 d1 (1) 0 ··· 0 .. . 0 .. . 0 0 8 d2 (1) · · · .. .. . . 0 ··· 0 0 0 .. . 0 , 0 0 .. . dk (1) 0 αt+1 = ηt Ft+1 Ft · · · 0 0 = ξt νt , Ft−p+1 u1,t+1 · · · uk,t+1 , Qt = diag(ση2 , σν21 , . . . , σν2k ), where we assume that q = 1; that is, the idiosyncratic errors of the individual disagreement measures have AR(1) dynamics. 3 Estimation of Disagreement from Qualitative Survey Data In this section, we compare measures of disagreement from directional forecasts to that from point forecasts of Euro area real GDP growth and inflation. 3.1 Data We use two data sets: the ZEW financial market survey and the European Central Bank (ECB)’s survey. From the ZEW financial market survey, we obtain directional forecasts. Since December 1991, this survey has collected the monthly responses of roughly 300 professionals in the German financial sector. The survey asks respondents for six-month directional forecasts of economic activity, inflation, different interest rates, stock market indexes and exchange rates for Germany, Italy, France, Great Britain, the Euro area, the United States, etc. Nolte and Pohlmeier (2007) explore the predictive ability of quantified forecasts based on the ZEW data. The so-called ZEW indicator of economic sentiment is formed from the survey and this indicator receives wide media coverage. According to Entorf, Gross, and Steiner (2012), the ZEW indicator has a significant high-frequency impact both on returns and the volatility of the German stock market index DAX. In conjunction to the qualitative data, we obtain point forecasts from the ECB’s Survey of Professional Forecasters (SPF). Since the first quarter of 1999, the ECB SPF has documented the quarterly responses of about 59 respondents on average. Respondents are professional forecasters with economic expertise and about 75 percent of them are located in the European Union. The survey asks participants only for forecasts of three variables: Euro area inflation (Harmonized Index of Consumer Prices; HICP), real GDP growth and unemployment rate. However, for each variable the respondents provide point as well as density forecasts at several fixed horizons. 9 Both surveys feature forecasts for Euro area inflation and real GDP growth. There are, however, differences in the wording of the questions. While the ECB SPF point forecasts explicitly refer to Euro area HICP inflation and real GDP growth, the ZEW directional forecasts refer to the “annual inflation rate” and the “overall macroeconomic situation” in the Euro area.6 With these differences in mind, we interpret the ZEW directional forecasts as if they were referring to the same variables as in the ECB SPF. In addition, the two surveys are conducted at different times. While the monthly ZEW survey is typically conducted in the first half of a month, the quarterly ECB survey - at least since 2002Q2 - is conducted in the third week of the first month of a quarter.7 To synchronize the timing of the two surveys, we match the quarterly ECB SPF forecasts with the ZEW forecasts conducted the first month in a quarter. Below, we assess the consistency of our new measures of qualitative expectations with the more commonly studied quantitative measures. We apply our measures of disagreement to the ZEW directional forecast of Euro area inflation and real GDP growth. With these results, we assess their consistency with benchmark disagreement measures obtained from ECB SPF point forecasts. 3.2 Disagreement about Euro Area GDP Growth We compare the results of our measurement approach to a benchmark measure computed as the cross-sectional standard deviation of the ECB SPF point forecasts for Euro area real GDP growth in the next twelve months8 . The top panel of Figure 1 depicts the Carlson-Parkin measure of disagreement (equation 2) against the benchmark measure. Because the disagreement measure for directional forecasts is only determined up to a constant of scale, we standardized both series.9 A correlation of .60 shows that the two measures are closely related. The bottom panel of Figure 1 shows the overall disagreement index, extracted from individual disagreement measures for German, French and Italian economic activity. Although very similar to the measure based on directional forecast for the Euro area, our measure of overall disagreement 6 The corresponding questions read, “In the medium-term (6 months), the macroeconomic situation will improve, not change or worsen;” and “In the medium-term (6 months), the annual inflation rate will rise, stay the same or fall.” 7 Before 2002Q2 the ECB SPF survey had typically been conducted one or two weeks later. 8 As an alternative measure of disagreement from point forecasts, we have used the inter-quartile range. None of our results change qualitatively. For brevity, we do not report the results here. 9 The remaining figures show standardized disagreement measures as well. 10 tracks the benchmark measure even more closely. The correlation between the overall disagreement and the benchmark is .70, which is .10 higher than the correlation between the disagreement constructed from the forecasts for the Euro area and the benchmark. This result importantly implies that there are gains from estimating disagreement at the country level and then pooling the country-level disagreement measures, relative to estimating disagreement at the aggregate level. The gains arise from the similarity of disagreement across three country-level variables, which is reflected in high correlations (ranging from 0.76 to 0.99) of each disagreement estimate with the overall disagreement. Our finding is consistent with the result in Marcellino, Stock, and Watson (2003) that there are typically gains from forecasting the economic series at the country level and then pooling the forecasts, relative to forecasting at the aggregate level. We experimented with several modifications of the basic Carlson-Parkin measure. We replaced the latent normal distribution with a scaled t-distribution with ten degrees of freedom. The top panel of Figure 2 depicts this disagreement measure against the ECB SPF benchmark. The correlation between the disagreement measure using the scaled t-distribution and the benchmark is moderately higher than that between the basic Carlson-Parkin measure and the benchmark (.65 vs. .60). Moreover, we relaxed the assumption of symmetrical thresholds. Appendix A.1 describes how we constructed the actuals to estimate the asymmetric thresholds. The middle panel of Figure 2 depicts the disagreement measure that assumes asymmetric but time-constant thresholds. It turns out that this alteration does not increase the correlation with the benchmark over the basic Carlson-Parkin measure. Finally, we estimated disagreement allowing for asymmetric and timevarying thresholds. As shown in the bottom panel of Figure 2, this measure shows no improvement over the basic Carlson-Parkin measure in terms of the correlation with the benchmark disagreement measure. 3.3 Disagreement about Euro Area Inflation Similar to our analysis of Euro area GDP growth, we apply our new measures of disagreement to forecasts of Euro area inflation. We compare disagreement measures obtained from ZEW directional forecasts with a benchmark measure computed as the cross-sectional standard deviation of the 12-month HICP inflation forecasts from the ECB SPF. The upper panel of Figure 3 compares the basic Carlson-Parkin disagreement measure to the benchmark. The two series display a correlation of .40. Surprisingly, the disagreement measure 11 from directional forecasts seems to lead the benchmark disagreement from point forecasts. Indeed, the correlation between the two rises to .54 when the Carlson-Parkin measure is lagged by one quarter. The relationship is even more pronounced when we use the approach in Section 2.2. Results from this approach are depicted in the lower panel of Figure 3. We first calculate Carlson-Parkin disagreement measures for the member countries, Germany, Italy and France. Then, we extract an overall disagreement using the dynamic factor model from the three country-level inflation disagreement series. By taking account of the similarity of disagreement across countries, this bottom-up approach increases the correlation between the disagreement measure and the benchmark from .40 to .49. Again, if we lag the overall disagreement by one period, the correlation is remarkably increased to .64. We also explored three modifications of the basic Carlson-Parkin approach. The three modifications are depicted in Figure 4. The first modification relies on a scaled t-distribution with ten degrees of freedom. Unfortunately, its correlation with the benchmark disagreement is slightly lower than that of the basic Carlson-Parkin measure (.38 vs. .40). The second modification uses asymmetric, but time-constant thresholds. This modification achieves a minor improvement over the basic Carlson-Parkin measure (.43 vs. .40). The third alternative, a model with time-varying and asymmetric thresholds, does not show a clear improvement over the simple Carlson-Parkin approach. To summarize, the basic Carlson-Parkin approach accurately measures disagreement in predictions of a single variable. The various modifications, including using a scaled t distribution and allowing asymmetric and/or time-varying thresholds, do not produce significant improvements over the basic approach. The overall disagreement constructed from individual disagreement series performs very well and closely tracks the benchmark measures of disagreement. However, one has to interpret these results with caution, since they are based on only thirteen years of quarterly data. As the time period covered by the data is extended, more robust analyses can be conducted. 4 Forecasting Technology and Forecast Disagreement We demonstrate the usefulness of our disagreement measures in two applications: the source of disagreement and the predictive power of disagreement. We address the first issue in this section and the second one in the next section. 12 The literature offers several prominent theories about why forecasters disagree. In the sticky information model of Mankiw and Reis (2002), forecasters only occasionally pay attention to news, and this inattention endogenously generates disagreement in aggregate expectations. In contrast, Sims (2003) and Woodford (2003) develop the noisy information model in which forecasters are continuously updating their information, but observe noisy signals about the true state. Andrade and Le Bihan (2010) and Coibion and Gorodnichenko (2012) consider both sticky information and noisy information models for professional forecasters. While the latter finds the basic noisy information model to be the best characterization of the expectations formation process of professional forecasters, the former concludes that both models cannot quantitatively replicate the forecast error and disagreement observed in the SPF data. A third explanation for the existence of disagreement focuses on the strategic behavior of forecasters. Ehrbeck and Waldmann (1996), Laster, Bennett, and Geoum (1999) and Ottaviani and Sørensen (2006) have explored this possibility. These models typically assume that after observing the same public information, forecasters have incentives to report distorted predictions for reputational concerns. However, the strategic forecasting models seem to be less relevant in our case because of the anonymity of respondents in the ZEW survey. With a focus on expectation formation, Kandel and Pearson (1995), Lahiri and Sheng (2008), Patton and Timmermann (2010) and Manzan (2011) show that forecasters may disagree due to their different prior beliefs or their differential interpretation of public information. These papers provide indirect evidence that forecasters use different models and judgment in revising their predictions. In this section, we utilize a special questionnaire, which was attached to the ZEW financial market survey in March 2011, to explore the role of heterogeneous models in directly generating forecast disagreement. Among other things, the questionnaire asked respondents about the importance of several methods in their directional forecasts for economic activity. Respondents used the categories “small,”“medium” or “high” to assess the importance of each of the following methods: econometric modeling, fundamental analysis, technical analysis, judgment, in-house research and consensus forecasts. Table 1 summarizes the results. The importance of the methods varies largely across the panel. For instance, some respondents pay little attention to econometric modeling, in-house research and consensus forecasts, but others assign high importance to these methods. The respondents disagree less on technical analysis, which most of them do not use. On the other hand, most of the respondents rely heavily on fundamental analysis and judgment. This finding 13 coincides with the finding in Batchelor and Dua (1990) that the single most important forecasting technique used by the Blue Chip panel was judgment. We expect that the forecasting technology of a respondent is a mixture of the methods mentioned above. In order to identify groups that use similar mixtures of methods we apply cluster analysis. More specifically, we use the K-means clustering algorithm of Hartigan and Wong (1979) on coded data, where we denote the response “small” by 1, “medium” by 2 and “high” by 3. We choose the number of clusters by inspecting the incremental reduction of the within-cluster sum of squares achieved by an additional cluster. Since the curve flattens out when more than four clusters are included, we assign the individual respondents to four clusters. Table 2 gives the average (coded) value of the variables and the number of respondents for each of the four clusters. The clusters differ most substantially with respect to the importance of econometric modeling, in-house research and consensus forecasts. Respondents in clusters one and four put little weight on econometric modeling, whereas clusters two and three include respondents who rely heavily on econometrics. In clusters one and three, forecasters attach a low weight to in-house research and consensus forecasts, whereas clusters two and four are much more in favor of accounting for the forecasts of others. To summarize, the four clusters mainly differ in two dimensions: the use of econometric modeling (clusters 1 and 4 vs. 2 and 3), and the weight on the forecasts of others (clusters 1 and 3 vs. 2 and 4). We use the cluster assignments to examine the relationship between forecasting technologies and the level of disagreement. The four panels of Figure 5 depict cluster-level disagreement against full-sample disagreement. In Table 2 we also include average levels of disagreement for each cluster and the result of testing whether average cluster-level disagreement differs from average full-sample disagreement. We find that clusters one and four have below average disagreement. Cluster two does not display a significantly different disagreement than the full sample. Cluster three has above average disagreement. If we link disagreement levels to forecasting technology of the respondents, we find that the use of econometric modeling boosts disagreement, whereas paying close attention to the forecasts of others reduces disagreement. Therefore, different forecasting technologies induce different levels of disagreement, implying that research about disagreement should pay close attention to the forecasting technology of the group of forecasters. Furthermore, studies that yield conflicting results but have similar methodologies may be explained by differences in the forecasting technologies of the samples. 14 5 Does Disagreement Have Predictive Value? As a second application of our disagreement measures, in this section we examine the potential predictive power of disagreement in qualitative survey data for economic activity. Due to the relatively short time span of the ZEW survey, we use another well-known business survey conducted by the Institute for Supply Management (ISM). Each month the ISM sends supply chain managers and business executives a questionnaire that asks about their firm’s production, employment and other information for the preceding month. Respondents report whether a variable has gone “up,” “down” or “stayed the same” over the previous month. The ISM data have three main advantages for forecasting purposes. First, as the ISM survey has been conducted since January 1948, it has a long history. Second, it is timely, as new ISM data comes out on the first business day of every month. Third, the ISM data is subject to minimal revisions. We focus on data on new orders, production, employment and prices, because they are available from January 1948. Other series began sporadically between June 1976 and January 1997. We construct the standard Carlson-Parkin disagreement measure for each of the four variables and also compute an overall disagreement index using the dynamic factor model. We then explore the potential of these disagreement measures for forecasting U.S. industrial production. We consider the following four models: h = ayt + εt , yt+h (18) h yt+h = ayt + b1 Dt + εt , (19) h yt+h = ayt + b1 P M It + εt , (20) h yt+h = ayt + b1 P M It + b2 Dt + εt , (21) where yth = ln (IPt+h )−ln (IPt ) is the log growth rate in industrial production from t−h to t, Dt is the disagreement in month t, and P M It is ISM Purchasing Managers’ Index (PMI) for that month. We include the PMI in the regression because several studies indicate that this index has forecasting power for GDP and the business cycle (see, inter alia, Dasgupta and Lahiri (1993) and Banerjee and Marcellino (2006)). By comparing model (19) to (18), and model (21) to (20), we measure the extent to which a disagreement measure adds value to forecasting industrial production. Each of the four models is estimated with h = 1, 3, 6, 9, 12 months using real-time data for the industrial production index from archival Federal Reserve economic data(ALFRED). Additionally, models 15 (19) and (21) are estimated with each of the five disagreement measures separately. The forecasts are produced using an expanding estimation window methodology. Starting in January 1970, we estimate each model using data between January 1948 when the ISM survey was first conducted and December 1969, the latest period for which the target variable had been publicly observed in January 1970. Based on the estimated parameters we make a one-stepahead forecast with each model. If the forecast horizon is h = 1, the forecast refers to the log growth in industrial production from December 1969 to January 1970. Similarly, forecasts with horizon h = 12 refer to the log growth in industrial production from December 1969 to December 1970. Next, we proceed to February 1970 by extending the estimation window from January 1948 to January 1970 and making one-step-ahead forecasts again. We continue analogously until we reach January 2010, when the estimation window includes the observations from January 1948 to December 2009, and a final set of one-step-ahead forecasts are made. This systematic procedure produces a set of forecasts for industrial production.10 We evaluate the forecasts using the first release data, available one month after the month that the industrial production index refers to. We compute the mean squared forecast errors (MSFE) for each model, and compare the MSFEs of the models that include disagreement to that of the models without disagreement. We test for equal predictive ability for these pairs of models (with vs. without the disagreement index) by the Clark and West (2007) test statistic.11 Table 3 shows the results of our forecast evaluation. Three points are worth noting. First, when comparing a simple AR(1) model (column 3) to the AR(1) model with the PMI as an additional regressor (column 5), we see that the PMI adds power to predicting industrial production at all horizons. Second, the disagreement extracted from ISM survey data can be used to forecast industrial production with varying degrees of success (columns 7 and 8). More specifically, there is not much improvement when disagreement in new orders or production is added to the regression. However, including estimated disagreement in employment or prices as additional regressors significantly decreases the MSFE in both the simple AR(1) model and the AR(1) model with the PMI. This substantial decrease implies that the disagreement in these two variables has usefulness in forecasting industrial production beyond that of the PMI. Third, including the overall disagreement 10 We also repeat the experiment with rolling estimation windows of 10 and 20 years of data and find similar results. To save space, the results are not reported here. 11 Note that when comparing nested models, the alternative hypothesis is that the larger model has higher predictive ability than the nested model because the test refers to predictive ability at the population level. Thus, if the additional variables have predictive power, their coefficients should be non-zero, indicating an improvement in predictive ability over the nested model. 16 constructed from all four variables in the models does not significantly reduce the MSFE. Considering the estimation results of the dynamic factor model, the reason is quite obvious. Whereas the disagreement in new orders and production affects the common factor heavily (factor loadings of 1.197 and 1.015, respectively), the disagreement in employment has only a moderate effect (factor loading of 0.387), and the disagreement in prices does not significantly load on the factor (factor loading of 0.048). Therefore, the disagreement in new orders and production dominates the overall disagreement, yielding an insignificant result. We assess the robustness of the predictive power of disagreement by incorporating the mean forecast in equations (18)-(21), as many studies confirm the predictive value of the mean (see, eg. Elliott and Timmermann (2005) and Legerstee and Franses (2010)). Our analysis specific to industrial production finds that including the mean slightly reduces the MSFE and disagreement still significantly adds value in producing forecasts.12 In thinking about why disagreement has predictive power in forecasting industrial production and possibly other macroeconomic variables, note that disagreement might pick up some omitted variables (Driver, Trapani, and Urga, 2012) or proxy for forecast uncertainty (Lahiri and Sheng, 2010). 6 Conclusion We present two methods for measuring disagreement in qualitative data on economic expectations. The first method quantifies the level of disagreement in predictions of a single variable. The second constructs an index of overall disagreement across several target variables. Our empirical results show that our disagreement measures estimated from directional forecasts closely track the conventional disagreement measures obtained from point forecasts. We apply our disagreement measures to analyze the source and economic significance of disagreement. We find that forecasters use a wide range of techniques in interpreting and weighting public information, resulting in substantial heterogeneity in their forecasts. Furthermore, our analysis shows that the disagreement estimated from qualitative survey data contains economically meaningful information for forecasting purposes. We hope to spur further research into disagreement among forecasters by providing appropriate tools for the analysis of qualitative survey data. Such tools are especially important because many surveys, in particular those with large panels of non-professional forecasters, collect directional 12 To save space, these results are not reported here. They are available upon request. 17 forecasts only. Due to the large cross-sectional dimension of many of these data sets, qualitative data may provide invaluable insight to the determinant and use of disagreement in economic modeling, forecasting and policy. 18 A Appendix A.1 Construction of Realizations Realizations are required for the estimation of the time-varying, asymmetric thresholds model and the time-constant, asymmetric thresholds model of Section 2.1. These have been constructed through the following procedure. In the case of inflation, respondents provide six months ahead forecasts of the annual inflation rate of the Euro area (“annual inflation will rise/stay same/fall”). As the observed counterpart, we utilize the six months ahead change in the annual HICP inflation rate. We account for the publication lag by lagging the actual series by one month. The idea is to start from the latest figure available at the time when the directional forecasts are made. The one month lag is sufficient because we use flash estimates that are available from October 2001. For example, the six months ahead change in the annual inflation rate in January 2012 is computed as the annual inflation rate in June 2012 minus the annual inflation rate in December 2011. Because flash estimates are not available before October 2001, we impute final revised data for inflation. In the case of GDP, the survey questionnaire asks respondents to forecast whether the overall macroeconomic situation will improve, stay the same or worsen in the next six months. As the observed counterpart, we employ the six months ahead growth of real GDP relative to the latest available figure at the time of the forecast. In the third month of each quarter, we use the GDP growth of the current and next quarter as the realization. Due to the quarterly frequency of GDP, we only use directional forecasts of the last month of each quarter for estimation. Therefore, the estimates of the time-varying thresholds refer to the third month in a quarter. We adopt a pragmatic approach and use the same estimated thresholds for the first month in the following quarter. 19 References Andrade, P., and H. Le Bihan (2010): “Inattentive Professional Forecasters,” Banque de France Working Paper no. 307. Banerjee, A., and M. Marcellino (2006): “Are There Any Reliable Leading Indicators for US Inflation and GDP Growth?,” International Journal of Forecasting, 22(1), 137 – 151. Banternghansa, C., and M. McCracken (2009): “Forecast Disagreement Among FOMC Members,” Federal Reserve Bank of St. Louis. Working Paper 2009-059A. Batchelor, R. (1981): “Aggregate Expectations Under the Stable Laws,” Journal of Econometrics, 16(2), 199 – 210. Batchelor, R., and P. Dua (1990): “Forecaster Ideology, Forecasting Technique, and the Accuracy of Economic Forecasts,” International Journal of Forecasting, 6(1), 3 – 10. Carlson, J. A. (1975): “Are Price Expectations Normally Distributed?,” Journal of the American Statistical Association, 70(352), 749–754. Carlson, J. A., and M. J. Parkin (1975): “Inflation Expectations,” Economica, 42, 123–138. Carriero, A., T. E. Clark, and M. Marcellino (2012): “Common Drifting Volatility in Large Bayesian VARs,” CEPR Working Paper no. DP8894. Clark, T. E., and K. D. West (2007): “Approximately Normal Tests for Equal Predictive Accuracy in Nested Models,” Journal of Econometrics, 138, 291–311. Coibion, O., and Y. Gorodnichenko (2012): “What Can Survey Forecasts Tell Us about Information Rigidities?,” Journal of Political Economy, 120, 116–159. Cooley, T. F., and E. C. Prescott (1976): “Estimation in the Presence of Stochastic Parameter Variation,” Econometrica, 44(1), 167–84. Dasgupta, S., and K. Lahiri (1992): “A Comparative Study of Alternative Methods of Quantifying Qualitative Survey Responses Using NAPM Data,” Journal of Business and Economic Statistics, 10(4), 391–400. (1993): “On the Use of Dispersion Measures from NAPM Surveys in Business Cycle Forecasting,” Journal of Forecasting, 12(3-4), 239–253. Driver, C., L. Trapani, and G. Urga (2012): “On the Use of Cross-Sectional Measures of Uncertainty,” Working paper, Cass Business School, City University London. Durbin, J., and S. Koopman (2012): Time Series Analysis by State Space Methods. Oxford University Press Oxford, 2 edn. Ehrbeck, T., and R. Waldmann (1996): “Why Are Professional Forecasters Biased? Agency versus Behavioral Explanations,” The Quarterly Journal of Economics, 111(1), 21–40. Elliott, G., and A. Timmermann (2005): “Optimal Forecast Combination under Regime Switching,” International Economic Review, 46(4), 1081–1102. Entorf, H., A. Gross, and C. Steiner (2012): “Business Cycle Forecasts and their Implications for High Frequency Stock Market Returns,” Journal of Forecasting, 31(1), 1–14. Hartigan, J. A., and M. A. Wong (1979): “A K-Means Clustering Algorithm,” Applied Statistics, 28, 100–108. Kandel, E., and N. D. Pearson (1995): “Differential Interpretation of Public Signals and Trade in Speculative Markets,” Journal of Political Economy, 103(4), pp. 831–872. Koopman, S. J. (1997): “Exact Initial Kalman Filtering and Smoothing for Nonstationary Time Series Models,” Journal of the American Statistical Association, 92(440), 1630–1638. 20 Lahiri, K., and X. Sheng (2008): “Evolution of Forecast Disagreement in a Bayesian Learning Model,” Journal of Econometrics, 144(2), 325–340. Lahiri, K., and X. Sheng (2010): “Measuring Forecast Uncertainty by Disagreement: The Missing Link,” Journal of Applied Econometrics, 25(2), 514–538. Laster, D., P. Bennett, and I. S. Geoum (1999): “Rational Bias in Macroeconomic Forecasts,” The Quarterly Journal of Economics, Vol. 114, No. 1, 293–318. Legerstee, R., and P. H. Franses (2010): “Does Disagreement amongst Forecasters Have Predictive Value?,” Working Paper, Tinbergen Institute Discussion Paper 088/4. Mankiw, N. G., and R. Reis (2002): “Sticky Information versus Sticky Prices: A Proposal to Replace the New Keynesian Phillips Curve,” The Quarterly Journal of Economics, 117, 1295– 1328. Mankiw, N. G., R. Reis, and J. Wolfers (2004): “Disagreement about Inflation Expectations,” in NBER Macroeconomics Annual 2003, ed. by M. Gertler, and K. Rogoff, pp. 209–248. MIT Press, MA. Manzan, S. (2011): “Differential Interpretation in the Survey of Professional Forecasters,” Journal of Money, Credit and Banking, 43(5), 993–1017. Marcellino, M., J. H. Stock, and M. W. Watson (2003): “Macroeconomic Forecasting in the Euro area: Country Specific versus Area-Wide Information,” European Economic Review, 47(1), 1–18. Nardo, M. (2003): “The Quantification of Qualitative Survey Data: A Critical Assessment,” Journal of Economic Surveys, 17(5), 645–668. Nolte, I., and W. Pohlmeier (2007): “Using Forecasts of Forecasters to Forecast,” International Journal of Forecasting, 23, 15–28. Ottaviani, M., and P. N. Sørensen (2006): “The Strategy of Professional Forecasting,” Journal of Financial Economics, 81(2), 441–466. Patton, A. J., and A. Timmermann (2010): “Why Do Forecasters Disagree? Lessons from the Term Structure of Cross-sectional Dispersion,” Journal of Monetary Economics, 57(7), 803–820. Pesaran, M. H. (1984): “Expectations Formation and Macroeconometric Modelling,” in Contemporary Macroeconomic Modelling, ed. by P. Malgrange, and P.-A. Muet, pp. 27–55. Basil Blackwell, Oxford. Pesaran, M. H., and M. Weale (2006): “Survey Expectations,” in Handbook of Economic Forecasting, ed. by G. Elliott, C. W. Granger, and A. Timmermann, vol. 1, chap. 14, pp. 715– 776. North-Holland. Sims, C. (2003): “Implications of Rational Inattention,” Journal of Monetary Economics, 50, 665–690. Sinclair, T. M., and H. O. Stekler (2012): “Examining the Quality of Early GDP Component Estimates,” International Journal of Forecasting, forthcoming. Smith, J., and M. McAleer (1995): “Alternative Procedures for Converting Qualitative Response Data to Quantitative Expectations: an Application to Australian Manufacturing,” Journal of Applied Econometrics, 10, 165–185. Song, C., B. L. Boulier, and H. O. Stekler (2009): “Measuring Consensus in Binary Forecasts: NFL Game Predictions,” International Journal of Forecasting, 25(1), 182–191. Stock, J., and M. Watson (1991): “A Probability Model of the Coincident Economic Indicators,” in Leading Economic Indicators: New Approaches and Forecasting Records, ed. by K. Lahiri, and G. Moore, pp. 63–90. Cambridge University Press. 21 Welch, B. L. (1947): “The Generalization of “Student’s” Problem when Several Different Population Variances are Involved,” Biometrika, 34(1-2), 28–35. Woodford, M. (2003): “Imperfect Common Knowledge and the Effects of Monetary Policy,” in Knowledge, Information, and Expectations in Modern Macroeconomics: In Honor of Edmund Phelps, ed. by P. Aghion, R. Frydman, J. Stiglitz, and M. Woodford, chap. 1, pp. 25–58. Princeton University Press, New Jersey. 22 usage of methods econometric modeling fundamental analysis technical analysis small medium high 0.39 0.33 0.28 0.04 0.20 0.76 0.67 0.27 0.07 judgement in-house research consensus forecasts 0.09 0.34 0.57 0.32 0.38 0.30 0.28 0.51 0.21 Table 1: Distribution of methods employed by respondents to the ZEW Financial Market Survey. 23 cluster members etric. model. fundamen. analysis technical analysis judgement in-house research consens. fcasts avg. disagreement 1 2 3 4 39 63 41 41 1.00 2.49 2.51 1.22 2.79 2.77 2.90 2.37 1.26 1.60 1.22 1.39 2.49 2.44 2.37 2.61 1.44 2.43 1.15 2.63 1.69 1.90 1.46 2.63 1.24 1.33 1.74∗ 1.05∗ full panel 184 1.90 2.72 1.40 2.47 1.98 1.92 1.32 Table 2: Descriptive statistics for the four clusters identified by the K-means clustering algorithm. Column “members” shows the number of respondents assigned to each cluster; the columns to the right of “members” show the average importance that the members of each cluster assign to the methods, respectively. Numeric values are obtained by converting response “small” to 1, “medium” to 2 and “high” to 3. The last column shows the average disagreement in the directional forecast for Euro area economic activity for each cluster. One asterisk denotes significance of the t-test for equal sample means of Welch (1947) at the one percent significance level. The null hypothesis is that the two sub-samples have equal sample means, and the two-sided alternative hypothesis is that the two means differ. 24 Horizon Disagr. Measure MSFE AR AR+D AR+PMI 1.00 3.00 6.00 9.00 12.00 New New New New New 0.42 2.96 10.68 20.91 31.73 0.42 3.02 10.78 21.02 32.26 1.00 3.00 6.00 9.00 12.00 Product. Product. Product. Product. Product. 0.42 2.96 10.68 20.91 31.73 1.00 3.00 6.00 9.00 12.00 Employm. Employm. Employm. Employm. Employm. 1.00 3.00 6.00 9.00 12.00 1.00 3.00 6.00 9.00 12.00 relative MSFE AR+PMI+D AR+D AR AR+PMI+D AR+PMI 0.37 2.73 9.89 20.46 30.89 0.37 2.74 9.75 20.16 31.30 1.01 1.02 1.01 1.00 1.02 1.00∗∗ 1.00∗ 0.99∗ 0.99∗ 1.01 0.42 3.04 10.92 21.06 31.89 0.37 2.73 9.89 20.46 30.89 0.36 2.75 9.86 20.29 31.38 1.00 1.03 1.02 1.01 1.00 0.98∗∗ 1.00 1.00∗ 0.99∗ 1.02 0.42 2.96 10.68 20.91 31.73 0.42 2.95 10.29 19.89 30.12 0.37 2.73 9.89 20.46 30.89 0.36 2.62 9.05 18.62 28.46 1.00 0.99∗ 0.96∗∗∗ 0.95∗∗∗ 0.95∗∗∗ 0.97∗∗∗ 0.96∗∗∗ 0.91∗∗∗ 0.91∗∗∗ 0.92∗∗∗ Prices Prices Prices Prices Prices 0.42 2.96 10.68 20.91 31.73 0.41 2.74 9.48 17.20 25.53 0.37 2.73 9.89 20.46 30.89 0.36 2.52 8.79 17.59 26.15 0.97∗∗∗ 0.92∗∗∗ 0.89∗∗∗ 0.82∗∗∗ 0.80∗∗∗ 0.98∗∗ 0.92∗∗ 0.89∗∗ 0.86∗∗ 0.85∗∗ DI DI DI DI DI 0.42 2.96 10.68 20.91 31.73 0.42 3.07 10.90 20.96 31.85 0.37 2.73 9.89 20.46 30.89 0.36 2.74 9.68 19.94 31.19 1.01 1.03 1.02 1.00 1.00 0.99∗∗ 1.00∗∗ 0.98∗∗ 0.97∗ 1.01 Ord. Ord. Ord. Ord. Ord. Table 3: Results of the pseudo out-of-sample forecasting experiment. Column “Horizon” indicates the forecast horizon in months; column “Disagr.Measure” indicates the disagreement measures; the columns under the header “MSFE” indicate mean squared forecast errors of each of the four models presented in equations (18-21); the last two columns under the header “relative MSFE” indicate (1) the relative MSFE of the AR(1) model with disagreement as an explanatory variable relative to the pure AR(1) model (left column), and (2) the relative MSFE of the AR(1) model that includes the PMI and disagreement as explanatory variables relative to the AR(1) model that only includes the PMI as an additional regressor (right column). ∗/ ∗ ∗/ ∗ ∗∗ indicates significance of the Clark and West (2007) test for equal predictive ability at the 1/5/10 percent significance level, respectively (one-sided alternative hypothesis: large model is better). 25 (a) Carlson-Parkin (Correlation .60) (b) Overall Disagreement (Correlation .70) Figure 1: Disagreement measures for Euro area GDP. Solid lines: measures based on directional forecasts from the ZEW financial market survey. Dotted lines: standard deviation of twelve-month point forecasts for Euro area real GDP growth from ECB SPF. Top panel: standard Carlson-Parkin measure based on six-month directional forecasts for Euro area economic activity. Bottom panel: overall disagreement based on six-month directional forecasts for German, Italian and French economic activity. 26 (a) Scaled t10 Carlson-Parkin (Correlation .65) (b) Asymmetric but Time-Constant Thresholds Carlson-Parkin (Correlation .56) (c) Asymmetric and Time-Varying Thresholds Carlson-Parkin (Correlation .50) Figure 2: Disagreement measures for Euro area GDP. Solid lines: measures based on directional forecasts from the ZEW financial market survey. Dotted lines: standard deviation of twelve-month point forecasts for Euro area real GDP growth from ECB SPF. 27 (a) Carlson-Parkin (Correlation .40) (b) Overall Disagreement (Correlation .49) Figure 3: Disagreement measures for Euro area inflation. Solid lines: measures based on directional forecasts from the ZEW financial market survey. Dotted lines: standard deviation of twelve-month point forecasts for Euro area HICP inflation from ECB SPF. Top panel: standard Carlson-Parkin measure based on six-month directional forecasts for the Euro area inflation rate. Bottom panel: overall disagreement based on six-month directional forecasts for the German, Italian and French inflation rate. 28 (a) Scaled t10 Carlson-Parkin (Correlation .38) (b) Asymmetric but Time-Constant Thresholds Carlson-Parkin (Correlation .39) (c) Asymmetric and Time-Varying Thresholds Carlson-Parkin (Correlation .29) Figure 4: Disagreement measures for Euro area inflation. Solid lines: measures based on directional forecasts from the ZEW financial market survey. Dotted lines: standard deviation of twelve-month point forecasts for Euro area HICP inflation from ECB SPF. 29 (a) Cluster 1 (Correlation .58) (b) Cluster 2 (Correlation .85) (c) Cluster 3 (Correlation .78) (d) Cluster 4 (Correlation .82) Figure 5: Cluster-level disagreement. Solid lines: Carlson-Parkin Disagreement for Euro area economic activity for each cluster. Dotted line: Carlson-Parkin Disagreement for Euro area economic activity for the full panel. 30