Which Alpha? Francisco Barillas Goizueta Business School, Emory University A common approach to comparing asset pricing models involves a competition in pricing test-asset returns. In contrast, we show that for models with traded factors, when the comparison is framed appropriately in terms of success in pricing both the test-asset and factor returns, the extent to which each model is able to price the factors in the other model is what matters for model comparison. Test assets are irrelevant based on several prominent criteria. For models with nontraded factors, test assets are relevant for model comparison insofar as they are needed to identify factor-mimicking portfolio returns. (JEL C12, C52, G11, G12) Received October 29, 2015; editorial decision September 6, 2016 by Editor Andrew Karolyi. The classic “alpha” of investment analysis is the intercept in the time-series regression of an asset’s excess returns on those of the market portfolio (Mkt). The early paper by Jensen (1968) recognized the interpretation of alpha as an asset’s deviation from the capital asset pricing model (CAPM) expected return relation of Sharpe (1964) and Lintner (1965). Over the decades, other benchmark factors have also been included in asset pricing models, notably the small minus big (SMB) size and high minus low (HML) value factors of the Fama-French (FF3, 1993) three-factor model. More recently, profitability and investment factors have also been introduced in papers by Fama and French (2015) and Hou, Xue, and Zhang (2015a). With these so-called “traded factors,” the time-series intercept can still be viewed as an asset’s deviation from the model or measure of model mispricing. For factors like consumption growth, which are not traded, it is always possible in principle to substitute a traded mimicking portfolio in the pricing relation.1 Alpha also plays an important role in analyzing portfolio performance, as measured by the Sharpe ratio, a portfolio’s expected excess return divided by Thanks to Raymond Kan, Junghoon Lee, Jon Lewellen, Cesare Robotti, and several anonymous reviewers. Special thanks to Gene Fama, Ken French, and Andrew Karolyi (the editor) for valuable suggestions and helpful questions. Send correspondence to Jay Shanken, Goizueta Business School, Emory University, 1300 Clifton Road, Atlanta, GA 30322; telephone: (404) 727-4772. E-mail: jay.shanken@emory.edu. 1 See Breeden (1979) footnote 8. © The Author 2016. Published by Oxford University Press on behalf of The Society for Financial Studies. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com. doi:10.1093/rfs/hhw101 Advance Access publication December 30, 2016 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 Jay Shanken Goizueta Business School, Emory University, and National Bureau of Economic Research Which Alpha? 2 Stambaugh and Yuan (2016) appeal to our framework in some of their model comparison analysis. 1317 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 its standard deviation. A nonzero alpha indicates that the Sharpe ratio can be improved and a more efficient portfolio obtained by complementing investment in the benchmark portfolios with a position in the given asset. If the factors already span the tangency portfolio, however, such improvement is not possible and alpha is zero. The classic asset pricing challenge is then to identify a small number of observable factors that come as close as possible to achieving the equivalent objectives of correctly pricing returns and spanning an efficient portfolio. Different models have been proposed and, given that no model is perfect, it is desirable to be able to compare the models. Many papers in the empirical literature, for example, Fama and French (1993, 1996, 2015, 2016), Hou, Karolyi, and Kho (2011), Hou, Xue, and Zhang (2015a, 2015b), and Stambaugh and Yuan (2016) compare the performance of different models in pricing a wide variety of test assets using metrics like the average absolute alpha or the Gibbons, Ross, and Shanken (1989) F-statistic for the zero-alpha restriction. These papers often leave the impression that the model with smaller alphas is preferred.2 However, we show that for several important metrics simply comparing pricing performance for various sets of test assets, while it may provide some useful descriptive information, cannot serve to identify the superior model and can even be misleading in this regard. We provide examples that illustrate this surprising conclusion. Our central insight in this context is a simple one—that models should be compared in terms of their ability to price all returns, both test assets and traded factors. In principle, this could be the entire universe of returns or, in practice, just an available subset of test assets, along with the factors. Thus, when comparing models with traded factors, the test assets used to evaluate the models should be augmented by the factors from the other models. But when we adopt this perspective, it turns out that test assets tell us nothing about model comparison beyond what we learn by examining the extent to which each model prices the factors in the other models. So the test assets can be ignored for this purpose. This is the main message of our paper. A related paper by Fama (1998) explores the pricing of state variables in an intertemporal CAPM context. Fama argues that it is not necessary to test alternative ICAPMs on all assets for the purpose of identifying the state variables of concern to investors. He does not discuss model comparison, however. Some of the more recent papers mentioned above supplement test-asset results with evidence on each model’s ability to price excluded factors (also, see Asness and Frazzini 2013). But the issue of how to combine and interpret the two kinds of evidence is not addressed. This is important since, as we will see, test-asset results can favor one model while excluded-factor evidence favors another. In particular, a nested model will sometimes do a better job of pricing test assets than a larger model if the additional factors result in predictions that The Review of Financial Studies / v 30 n 4 2017 Sh2 (f1 , f2 ,R)−Sh2 (f1 ) < Sh2 (f2 , f1 ,R)−Sh2 (f2 ), (1) 2 where Sh(·) denotes the maximum squared Sharpe ratio obtainable from portfolios of the given returns. Equivalently, since each side of Equation (1) is the sum of the increases from adding the other factors to the given model’s factors and from adding test assets to all the factors, the latter increment drops out of the comparison and we have the equivalent relation Sh2 (f1 , f2 )−Sh2 (f1 ) < Sh2 (f2 , f1 )−Sh2 (f2 ). (2) Thus, the mispricing that matters for model comparison is the mispricing of the factors in one model by the factors in the other model. The test-asset returns R are irrelevant for this comparison in that one obtains the same ranking of models and the same difference in metrics for any choice of test assets. 3 It is possible that the corresponding tangency portfolio will be inefficient and have a negative Sharpe ratio. This case is rarely encountered in practice, however, and so we assume henceforth that the Sharpe ratio is positive. 4 The modification due to Kan and Robotti (2008) assumes that pricing restrictions are imposed on excess returns. The version we appeal to then imposes the further restriction that the traded factors are correctly priced by the model, as is the case in the alpha-based setting. See, in particular, their footnote 9. 1318 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 are further from the actual expected test-asset returns. Ironically, in such cases, we prove that the nested model must misprice the excluded factors, implying that the larger model is actually the superior model. To formalize these ideas, suppose there are two (or more) pricing models to be compared and the better model identified. The first model, M 1 , is based on a vector of factors f1 and the other, M2 , on the factors f2 . In addition to the two sets of factors, which may overlap, there is a set of test-asset returns, R (securities or portfolios), that can be used in evaluating the models. Ideally, a model would “price,” that is, produce zero alphas for all of these investments, both test-asset and factor returns. Equivalently, the factors in the model would span the tangency portfolio for the full investment universe and, therefore, maximize the attainable Sharpe ratio.3 Let us suppose, now, that the models may be imperfect and the extent of model mispricing is measured by the quadratic form in the alphas, which Gibbons, Ross, and Shanken (1989) show gives the possible improvement in the squared Sharpe ratio. This Sharpe improvement also equals the square of a modified version of the well-known Hansen and Jagannathan (1997) distance. This characterization of model misspecification measures how close a proposed stochastic discount factor based on a model’s factors comes to being a valid SDF and also gives the maximum pricing error of the model (over portfolios with unit second moment).4 Test-asset irrelevance for model comparison is easily established with this economic measure of model misspecification. According to this metric, M1 is better than M2 if the Sharpe improvement from exploiting mispricing (nonzero alphas) by M1 is smaller than that based on mispricing by M2 , that is, Which Alpha? This is true regardless of the extent to which the test assets are correctly priced by either model. The inequalities in Equations (1) and (2) allow us to compare models in terms of the economic significance of model deviations. Such mispricing is the focus of much empirical asset pricing research. But clearly, Equations (1) and (2) are equivalent to a direct comparison of model Sharpe ratios Here, M1 is the better model, provided that a higher Sharpe ratio can be obtained with the investments in f1 than with those in f2 . Thus, we can view model comparison either in terms of investment opportunities or model mispricing. Whether the better model is actually a “good” model will depend on its performance in pricing test assets. For the model with factors f1 , this is reflected in the total Sharpe improvement Sh2 (f1 , f2 , R)− Sh2 (f1 ) and the corresponding Gibbons, Ross, and Shanken (1989) test statistic with left-hand-side returns f2 and R and explanatory factors f1 . We refer to traditional asset pricing tests of this sort as absolute tests, in contrast to a relative test of model comparison, which only requires analysis of the factor returns. Our test-asset irrelevance conclusion is not limited to the metric above, however. As we will see, it also holds if we rank models according to statistical likelihood. In this context, irrelevance means that the likelihood ratio for a pair of models does not depend on the test-asset returns. The likelihood concept is, of course, central to much of classical hypothesis testing as well as Bayesian analysis. Roughly speaking, a model is preferred under this metric if it does a better job of predicting the observed return data. The key again is to compare models in terms of their ability to price all returns, both test assets and factors. 1. Direct and Indirect Model Representations We saw in the introduction that when model comparison is framed in terms of performance in pricing all returns, then the test-asset returns are irrelevant. This was shown for the Sharpe improvement metric, which is equivalent to a modified version of a model’s Hansen and Jagannathan distance. The link between these metrics and the model alphas was alluded to, but was not explicitly provided. We present this well-known relation now and then use it to derive a representation of a nested pricing model that will provide useful insights into model comparison. First, some definitions and notation. A factor model M is a multivariate linear regression with N excess returns, R, and K traded factors, f : R = αR +βf +ε, (3) 1319 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 Sh2 (f1 ) > Sh2 (f2 ). The Review of Financial Studies / v 30 n 4 2017 where R, ε, and αR are Nx1; β is NxK; and f is Kx1.5 The disturbance has zero mean and an invertible covariance matrix, . As shown in Gibbons, Ross, and Shanken (1989), the following relation holds:6 αR −1 αR = Sh(f , R)2 − Sh(f )2 . (4) 1.1 Pricing restrictions for nested models We now derive an equivalence between what we will call the direct and indirect representations of a nested pricing model. In essence, the indirect approach characterizes a nested model as a restricted version of the larger model. Although this is of interest in itself, it will also prove useful in analyzing the likelihood criterion for model comparison in Section 2. We start with a simple lemma. Lemma 1. Let M1 be a pricing model with factors f1 nested in the model M with factors f = (f1 , f2 ). R is an N-vector of test-asset returns. Then the tangency portfolio τ (f1 ) equals the tangency τ (f ,R) if and only if τ (f1 ) = τ (f ) and τ (f ) = τ (f ,R). Equivalently, Sh(f1 )2 = Sh(f , R)2 if and only if Sh(f1 )2 = Sh(f )2 and Sh(f )2 = Sh(f , R)2 . Proof. If the factors f1 already span the tangency portfolio for the investment universe consisting of f1 , f2 , and R, then adding the investments in f2 to those in f1 will not improve on this tangency portfolio, nor will adding R to f1 and f2 . Conversely, if τ (f1 ) = τ (f ) and τ (f ) = τ (f ,R), then, clearly τ (f1 ) = τ (f ,R). The equivalent assertions about Sharpe ratios follow from the fact that the maximum squared Sharpe ratio for a set of investments equals the squared Sharpe ratio of the corresponding tangency portfolio. Some additional notation is needed to formulate the desired equivalence. Let αR1 , αR2 , and αR denote the alphas of the test assets when regressed on f1 , f2 , 5 Although R normally denotes test assets, it could also include factors omitted from f in the present context. 6 Also see related work by Jobson and Korkie (1982). 7 In the introduction, we compared models according to this metric for model mispricing and demonstrated its equivalence to choosing the model with the highest squared Sharpe ratio. Had we started with that criterion, test-asset irrelevance would not even have been an issue. A similar argument implies that test-asset irrelevance holds for criteria like the improvement in the Sharpe ratio itself or in expected portfolio utility, possibly subject to investment constraints. However, these measures will not generally reduce to model-mispricing metrics. For, as Pastor and Stambaugh (2000, Section 2) note, “A set of benchmarks that is correct for pricing need not be correct for investing.” 1320 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 The Sharpe ratio, Sh(f ,R), corresponds to the tangency portfolio, τ (f ,R), determined by the returns in R and f , along with the risk-free rate. The dependence on the latter is to be understood. Sh(f) and τ (f ) leave out the returns in R. Thus, the improvement in the squared Sharpe ratio equals the classic quadratic form in the alphas of the additional returns.7 Which Alpha? and f = (f1 , f2 ), respectively. α21 refers to the alphas for the factors f2 when they are regressed on f1 . We now have: Proof. By Equation (4) and positive definiteness of the quadratic forms in the alphas, αR1 = 0 and α21 = 0 is equivalent to Sh(f1 )2 = Sh(f , R)2 . Similarly, α21 = 0 is equivalent to Sh(f1 )2 = Sh(f )2 , and αR = 0 is equivalent to Sh(f )2 = Sh(f , R)2 . The result now follows immediately from Lemma 1.8 We refer to the joint restriction α21 = 0 and αR1 = 0 as the direct representation of M1 and the joint restriction α21 = 0 and αR = 0 as the indirect representation of M1 since the latter involves the test-asset alphas relative to the larger model M. Thus, Proposition 1 shows that a model M1 that is nested in a larger model M, in the sense that its factors are all included in M, is nested in the statistical sense that M1 can be obtained by imposing restrictions (α21 = 0) on M. For example, the CAPM can be viewed as a restricted version of FF3, with the CAPM alphas of HML and SMB constrained to be zero. 1.2 Implications for nested model comparison By Proposition 1, the test-asset restriction (αR = 0) is common to the nested model M1 and the larger model M, so the models cannot be distinguished by examining this constraint, whether it holds or not. Therefore, it is natural from a statistical perspective to compare M1 and M by testing the excludedfactor restriction α21 = 0. For example, testing the CAPM versus FF3 entails an examination of the CAPM alphas of HML and SMB. There are two possibilities. One is that α21 = 0, that is, the SMB and HML CAPM alphas are both zero so that Sh2 (Mkt HML SMB) = Sh2 (Mkt). Although the models are equivalent in this dimension, the tie would be broken by the principle of parsimony, favoring the one-factor CAPM over FF3. Alternatively, if α21 = 0, then Sh2 (Mkt HML SMB) > Sh2 (Mkt) and FF3 is the better model in this sense. In this case, the tangency portfolio has nonzero weight on HML and/or SMB. 8 This equivalence is related to a result in Sharpe (1984), who provides a multibeta interpretation of the CAPM. Sharpe describes his result as an approximation. However, one direction of our equivalence—that the usual CAPM restrictions imply the multifactor pricing model with restricted prices of risk—can be obtained using a variation on his argument. To see this, consider the case in which all factors are traded and the market is one of the factors. Then the factors will obviously span the market portfolio, so Sharpe’s Equation (4) holds exactly. Equation (5) then gives the desired result. Assuming the factors are multifactor minimum variance (MMV) portfolios in the sense of Fama (1996), Fama (1998) provides an alternative proof in this same direction and also discusses the converse. We do not impose the MMV assumption. 1321 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 Proposition 1. Let M1 be a pricing model with factors f1 nested in the model M with factors f = (f1 , f2 ). Then M1 holds for both the test-asset returns R and the excluded-factor returns f2 (α21 = 0 and αR1 = 0) if and only if those excludedfactor returns satisfy M1 (α21 = 0) and the larger model M holds for the test-asset returns (αR = 0). The Review of Financial Studies / v 30 n 4 2017 R = αR +β1 f1 +β2 f2 +ε. (5) We are interested in the relation between these parameters and those in the regression of R on f1 only, R = αR1 +bf1 +e. (6) This relation depends on the parameters in the excluded-factor regression f2 = α21 +df1 +u. (7) Substituting this expression for f2 in Equation (5) gives R = αR +β1 f1 +β2 (α21 +df1 +u)+ε = (αR +β2 α21 )+(β1 +β2 d )f1 +(β2 u+ε). It follows from the standard regression orthogonality conditions that u and ε have means of 0 and are uncorrelated with f1 . Therefore, the regression on f1 satisfies αR1 = αR +β2 α21 , b = β1 +β2 d and e = β2 u+ε. (8) Now suppose M1 holds for both the test-asset returns and the excluded-factor returns, that is, αR1 = 0 and α21 = 0. Then αR = 0 by Equation (8). Conversely, if αR = 0 and α21 = 0, then αR1 = 0, again establishing the proposition. 9 Our equivalence result also implies the following: Corollary 1. If α21 = 0, the model predictions for expected test-asset returns are identical under M1 and M. In this case, we also have equality of the (actual) test-asset alphas, αR = αR1 . Proof. Under M1 , the predicted expected return is bE(f1 ) by Equation (6). Under M, it is β1 E(f1 )+β2 E(f2 ) by Equation (5). Now α21 = 0 implies that E(f2 ) = dE(f1 ) by Equation (7), so test-asset expected returns under M equal β1 E(f1 )+β2 dE(f1 ) = (β1 +β2 d )E(f1 ). But this equals bE(f1 ) by the middle relation in Equation (8). αR = αR1 follows immediately from Equation (8). 9 See Pastor and Stambaugh (2002) for a very different application of Equation (8) to the estimation of mutual fund alphas. 1322 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 Thus, when M1 is nested in M, focusing on the restriction α21 = 0 in this way amounts to concluding that the nested model is the better model if and only if α21 = 0, consistent with the Sharpe improvement metric. Testing the hypothesis α21 = 0 is a straightforward application of the usual asset pricing tests, with the excluded factors serving as the left-hand-side returns. Additional insight will be obtained from an alternative derivation of Proposition 1 that relates the parameters in the regression models for M1 and M (all regressions in this paper include a constant). Partitioning β = (β1 , β2 ) to conform to the factor partition f = (f1 , f2 ) gives Which Alpha? 1.2.1 Nested models and test-asset alphas. While it might seem that when α21 = 0 the additional factors in M can only improve on pricing and produce smaller test-asset alphas than M1 , this need not be true. Thus, the larger model can perform worse as judged by the magnitude of test-asset alphas, even when it is the better model. For example, the expanded five-factor model (FF5) of Fama and French is superior to FF3 according to the Sharpe criterion.11 But while FF5 does about as well or better than FF3 in accounting for many anomalies, FF3 is better at pricing test-asset portfolios formed by independent sorts on size and accruals. A stripped-down example will help us see what drives this sort of result. Suppose we want to compare the CAPM with a two-factor model with factors Mkt and SMB. The SMB CAPM alpha is positive (the size effect), and 10 Cochrane (Section 13.4) shows that “if you want to know whether factor i helps to price other assets, look at bi ,” the coefficient on that factor in the linear specification for the SDF. Here, the b vector equals the inverse of the factor covariance matrix times the price of risk vector λ in the expected-return/beta relation (also see Kan, Robotti and Shanken 2013). In general, identification of λ requires test assets, not just factors. However, when the factors are traded, λ is the vector of expected factor returns and b depends on the factor moments only. In fact, b is proportional to the vector of weights in the tangency portfolio based on the factors in this case. Thus, factor i impacts the pricing of other assets if its tangency portfolio weight is nonzero. We discuss model comparison with nontraded factors in Section 4. 11 We ignore the fact that, strictly speaking, the models are not nested as SMB differs across models. 1323 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 Earlier, we noted that the models M1 and M yield the same Sharpe ratio when α21 = 0. Now we see that the models also provide the same predictions and make the same errors in this case, so parsimony favors the nested model M1 from this perspective as well. Cochrane (2005) has a related result for nested models in the stochastic discount factor framework but does not discuss its relevance for model comparison in the alpha-based setting with traded factors, nor does he consider non-nested models as we do below.10 Thus, while the pricing relations on which some of our conclusions are based have antecedents in the theoretical literature, we extend those results along several dimensions. But, more importantly, we explore the implications for analyzing model comparison as these issues do not seem to be recognized in the asset pricing literature. Finally, Equation (8) can also be used to formalize an idea arrived at independently by Fama and French (2016), who note that the Gibbons, Ross, and Shanken (1989) test for significance of investment- and profitability-factor alphas on the FF3 benchmark is highly significant. They assert (without proof) that if some stocks have nonzero exposures to the additional factors, those factors will “add information about expected returns to the three-factor model.” Likewise, the first relation in Equation (8) shows that if β2 = 0, then a finding that α21 = 0 implies that the test-asset alpha vectors αR1 and αR differ. However, as we will see in the next section, the added information can actually be detrimental to pricing. The Review of Financial Studies / v 30 n 4 2017 12 Test-asset irrelevance will hold for any metric with this sort of separability. 1324 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 therefore the two-factor Sharpe ratio will be higher and the modified Hansen and Jagannathan distance lower than the corresponding CAPM measures. Say there is just one test asset, the loser decile portfolio based on past-year returns. For 1963–2013, the annualized CAPM alpha for this portfolio is −11.06%, reflecting momentum as in Jegadeesh and Titman (1993). But the two-factor loser alpha, −11.79%, is even lower (larger in magnitude) than its CAPM alpha. This is a special case of the alpha identity in Equation (8). Whether adding a mispriced factor to a model decreases or increases the test-asset alphas depends, in general, on whether α21 and β2 happen to have the same or opposite signs. Here, the SMB CAPM alpha and the loser beta on SMB are both positive (many losers are small firms), so the loser alpha becomes more negative. However, α21 = 0 always implies that M is the better model in terms of Sharpe ratios and in a conventional statistical sense since M does not impose the false restriction on α21 . Although this is not a new discovery, the potential inconsistency between fundamental metrics for model comparison and model performance on test assets has not, to our knowledge, been explored previously. Thus, by focusing on test assets in isolation, a false inference about model comparison can be obtained. That the nested model constraint is violated (α21 = 0) in this example and the earlier Fama-French example is not a coincidence. Corollary 1 implies that M1 must fail to price some or all of the excluded factors if there is any difference between the test-asset alphas under M1 and M, even when those alphas are smaller under M1 . This highlights the potential pitfalls of focusing on test assets in isolation. It is important to jointly evaluate the pricing of test assets and excluded factors, whatever metric is chosen. The surprise is that, for several important metrics, test assets drop out in model comparison when this joint perspective is adopted. Equation (8) also offers insight as to why test assets do not drop out in model comparison with common empirical metrics based on the usual direct test-asset alphas. Squared values of these alphas, αR1 , involve interactions between the indirect alphas, αR , and the excluded-factor alphas, α21 . Consequently, two kinds of information are mixed together in the direct alphas, αR1 : relevant information, α21 , about the pricing of excluded factors, and irrelevant (for model comparison) information, αR , about the pricing of test assets by the set of all factors. Examining the combined information ends up providing a “noisy signal” that ultimately can obscure the comparison, rather than clarify it. This is in contrast to the separable influence of excluded factors and test assets on the Sharpe metric.12 Thus, by the identity of Equation (4), the overall Sharpe improvement is the sum of the improvement from adding excluded factors (a quadratic form in α21 ) and that from adding test assets (a quadratic form in αR ), where the latter is the same for each model. Note also that, for the purpose Which Alpha? of evaluating a single model in an absolute test, we can focus on α21 and αR or on α21 and αR1 since the corresponding joint restrictions are equivalent by Proposition 1. 2. Test-Asset Irrelevance under the Likelihood Metric We saw in the introduction that, given the factor information, test assets are irrelevant for model comparison under the economic mispricing metrics (Sharpe ratio improvement/modified Hansen and Jagannathan distance). In this section, we use Proposition 1 to demonstrate that irrelevance also holds under the likelihood criterion. Let R be a set of test-asset returns, and let M1 and M2 be pricing models with factors f1 and f2 , respectively. The models may be nested or non-nested, and nothing is assumed about the validity of the models. Although Proposition 1 deals with nested models, the greater generality of our irrelevance result is achieved by nesting each model in the combined model that consists of all the factors, f = (f1 , f2 ). Let fi∗ denote the factors in f that 13 In the special case that α = 0 and α = 0, M is superior to M regardless of the value of α since Sh2 (f ) = 21 12 1 2 12 1 Sh2 (f1 , f2 ) >Sh2 (f2 ). Thus, rejection of α12 = 0 and failure to reject α21 = 0 with a Gibbons, Ross, and Shanken (1989)–type test would be consistent with M1 being the better model under the Sharpe metric. 1325 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 1.3 Implications for non-nested model comparison Although Proposition 1 deals with nested models, it has implications for nonnested model comparison as well since each model can be nested in the larger model that includes all of the factors. To illustrate this idea, let M1 be the model with f1 = {Mkt HML} and excluded factor SMB, while M2 is the model with factors f2 = {Mkt SMB} and excluded factor HML. These models are both nested in the model with f = {Mkt HML SMB}. As usual, R is a vector of test-asset returns. M1 and M2 are non-nested in that each contains factors that are not included in the other model. The indirect representation for M1 is α21 = 0 and αR = 0, and for M2 it is α12 = 0 and αR = 0. Here, α21 is the SMB alpha when regressed on f1 , while α12 is the HML alpha when regressed on f2 . The important thing to notice here is that the test-asset alphas on f must be zero under both models. Thus, αR = 0 is a common restriction under M1 and M2 . Distinguishing between the performance of the models in the indirect representation, therefore, requires that we focus on the extent of deviations from the excluded-factor restrictions α21 = 0 and α12 = 0. Thus, what is relevant for model comparison is not simply whether these alphas are zero (the hypothesis of the Gibbons, Ross, and Shanken 1989 test), but rather the magnitude of the true alphas.13 These values interact with the residual covariance matrices in determining the Sharpe improvements. Alternatively, as we see in the next section, the true alphas affect the likelihoods for the different model restrictions through the data. The Review of Financial Studies / v 30 n 4 2017 Li (μfi ,fi |D) = Li (μfi ,fi |fi ), (9) which only depends on a subset of the data, the fi returns. Next, we need to consider the regression of fi∗ on fi , fi∗ = αi∗ +βi∗ fi +εi∗ , with Cov(εi∗ |fi ) = i∗ and corresponding likelihood term Li∗ (αi∗ ,βi∗ ,i∗ |D) = Li∗ (αi∗ ,βi∗ ,i∗ |f ), (10) ∗ which depends on the factor returns, f = (fi , fi ). Finally, we have the regression of R on f , R = αR +βR f +εR , with Cov(εR |f ) = R and corresponding likelihood term LR (αR ,βR ,R |D), (11) which depends on all the data. Now we’re ready to get the expression for the likelihood of Mi . However, we cannot simply take the product of Equations (9), (10), and (11). While that would use the factorization of the joint density suggested by Mi , it does not impose the constraints αi∗ = 0 and αR = 0 implied by Mi . Using this indirect characterization of Mi from Proposition 1 yields L(Mi ) = Li (μfi ,fi |fi )×Li∗ (0,βi∗ ,i∗ |f )×LR (0,βR ,R |D). (12) Although we now impose the model restrictions in the last two terms, we do not presume that these restrictions actually hold under the true return-generating 14 We work with two models to simplify the presentation, but the argument extends easily for any collection of models based on a given set of factors. 1326 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 are excluded from fi , and let αi∗ be the intercept vector in the multivariate linear regression of fi∗ on fi , i = 1, 2.14 The likelihood of a model is the joint density of the data D, viewed as a function of the model parameters, with the appropriate zero-alpha constraints imposed. Here D consists of all the factor and test-asset returns, (f, R). To economize on notation, f and R, which previously denoted random vectors (implicitly for a given period t), now refer to the time series of these variables for t = 1, 2, …T. Given model i, D can be partitioned as (fi , fi∗ ,R), so the joint density can be written as a product of marginal and conditional densities: (1) the marginal density for fi , times (2) the conditional density of fi∗ given fi , times (3) the conditional density of R given (fi , fi∗ ) = f , the vector of all the factors. We assume that the conditional densities are linear regressions, as is true, in particular, under the joint normality of (f, R) but also for some other members of the class of elliptical distributions (see Owen and Rabinovich 1983). We write the marginal density for fi as a function of the parameters μfi and fi . The corresponding likelihood term is Which Alpha? process from which the data are drawn. Of course, the likelihoods will tend to be higher the better the model fits the actual data. But the key observation is that the last term is the same for all models (independent of i) regardless of the fit. With this specification, we have: Proof. There will be likelihood terms like those in Equations (9)–(11) for each period t. Initially, assuming i.i.d. factor and test-asset returns, the joint likelihood over the whole sample will then equal the product from t = 1 to T of these period t likelihoods. We can group all the i, i ∗ , and R terms together to get the joint likelihood over the whole sample as a product of three terms, as in Equation (12). The conclusion then follows from the fact that the first two terms in Equation (12) do not depend on R, while the last term, which depends on R, does not vary with the model i. We work out an i.i.d. normal example in the next section.15 In the case of the multivariate t distribution, the regression densities for fi∗ given fi and for R given f will be conditionally heteroskedastic, but that poses no problem in Equation (12), which allows the second term to depend on the factor data only and the third term to depend on all the data—likewise for other elliptical distributions. Similarly, autocorrelation in f and in the regression disturbances for f ∗ and R could be introduced, provided that the implied likelihood terms for f and f ∗ do not depend on R. 2.1 An illustration of model comparison based on likelihoods Relating the likelihoods for each model given the observed data is a natural approach to model comparison in the presence of estimation uncertainty. This notion plays a fundamental role in classical as well as Bayesian inference. While the development of a detailed methodology of either sort is beyond the scope of this paper, we now apply the well-known Akaike information criterion (AIC), a likelihood-based heuristic often used in model selection, though not in formal hypothesis testing.16 15 Our models, characterized by zero-alpha constraints, do not restrict the other parameters. The betas and residual covariances are all functions of the covariance matrix for (f, R), which is assumed to be fixed in the likelihood comparison. A vector of means μf for the factors included in the models is fixed as well. The likelihood ratio for a pair of models can be affected by these fixed values, but it will nonetheless be true that the ratio is “independent of” the test-assets returns. 16 See Barillas and Shanken (2016) for a formal Bayesian analysis of model comparison. 1327 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 Proposition 2. Assume we are given models Mi , i = 1,2, which may be nested or non-nested, and need not hold exactly. We maintain the elliptical joint distribution assumption with finite second moments for (f, R) and assume that the values of parameters that are not restricted by the models are fixed. Then the likelihood ratio for a pair of models depends on the factor returns, but is independent of the test-asset returns, R. The Review of Financial Studies / v 30 n 4 2017 Let Lmax (Mi ) be the maximized value of the likelihood function for Mi given in Equation (12), and let mi be the number of parameters in Mi . It is easy to see that Lmax (Mi ) is obtained by maximizing each component of Equation (12), which we write as max Lmax (Mi ) = Lmax ×Lmax i i ∗ ×LR . (13) where lower values of AIC provide more support for a model. Accordingly, given several candidate models, nested or non-nested, the criterion favors the model with the lowest AIC value (highest adjusted likelihood). We consider a comparison of the non-nested models, FF3 = {Mkt SMB HML}, and a four-factor model, 4FM = {Mkt SMB HMLm UMD}. The latter is a variation on the Carhart (1997) model, with an alternative version of the value factor, HMLm , substituted for HML. This factor is based on bookto-market rankings that use the most recent monthly stock price in the denominator, as in Asness and Frazzini (2013) and Lewellen (2015). In contrast, Fama and French (1993) use annually updated lagged prices in constructing HML. The up-minus-down factor (UMD) is a momentum factor motivated by the work of Jegadeesh and Titman (1993). This particular comparison is chosen not as the outcome of a thorough empirical search for the best model. Rather, it serves as a good example to further illustrate the logical inconsistencies that can be encountered when ranking models on testasset metrics that are frequently used in the empirical literature, as opposed to the fundamental economic and statistical metrics analyzed above. We nest both FF3 and 4FM in the model M = {Mkt SMB HML HMLm UMD}, which includes all the factors. Then, using Proposition 1, FF3 can be represented in terms of zero restrictions on the three-factor alphas of the excluded factors, HMLm and UMD, and the five-factor alphas of the test assets. Similarly, 4FM imposes zero restrictions on HML’s four-factor alpha and the five-factor alphas of the test assets. These are the indirect model characterizations. Since the testasset restrictions are the same for each model, that evidence will be irrelevant for comparing the models. All densities are taken to be multivariate normal in this example.17 17 For a standard multivariate linear regression, Y = Xβ +U , with covariance matrix the log likelihood of the U data under joint normality and serial independence of U given X is given by ln(L(β,U |D)) = − 1 −1 NT ln(2π)+ T ln U + trace U UU , 2 where Y and U are TxN , X is TxK , β is KxN and U is NxN . T is the number of time-series observations, N the number of left-hand-side returns and K the number of independent variables. K(K −1) is the number of right-hand-side factors for the constrained (unconstrained) regressions. In the unconstrained case, the first column of X is a Tx1 unit vector and the first row of β is the 1xN vector of alphas. The likelihood is maximized at the usual ordinary least squares estimates. 1328 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 To avoid overfitting, AIC incorporates a penalty that increases with the number of parameters, AI Ci = −2ln[Lmax (Mi )]+2mi , (14) Which Alpha? In computing the likelihood for FF3, f1 = {Mkt SMB HML} and f1∗ = {HMLm UMD}. For 4FM, f2 = {Mkt SMB HMLm UMD} and f2∗ = {HML}. The test assets for this illustration are taken to be the twenty-five portfolios formed on size and momentum. Thus, we have max max ln[Lmax (FF3)] = ln(Lmax FF3 )+ln(LFF3∗ )+ln(LR ) and (15) max max ln[Lmax (4FM )] = ln(Lmax 4FM )+ln(L4FM ∗ )+ln(LR ) = 4724.4+1767.0+37001.0 = 43492.4 Note that, for both models, the test-asset portion of the log-likelihood is the same. Therefore, Lmax cancels out in the likelihood ratio for the two models R and has no effect on the AIC comparison. The number of parameters is eighteen for FF3 (three factor means, six factor variances/covariances, three residual variances/covariances, and six betas) and, similarly, nineteen for 4FM. The one additional parameter in the latter case reflects the fact that there is one fewer alpha constraint. By Equations (14) and (15), the AIC values for FF3 and 4FM are −86889.4 and −86946.7, respectively, with a difference of 57.3. The relative (adjusted) likelihood of 4FM to FF3 is, therefore, exp(57.3/2), overwhelmingly in favor of 4FM, largely due to the great difficulty FF3 has in pricing momentum. Now suppose we were unaware of the equivalence in Proposition 1 and the implied test-asset irrelevance but still wanted to compare the models above. How would we go about that and would the outcome be the same or different? Whereas the indirect alpha model representations were used above, now we would use the direct alpha model representations. As discussed earlier, the direct approach parallels the usual way of thinking about the model constraints except, perhaps, for the fact that we impose both excluded-factor and testasset restrictions for each model. In other words, we would again calculate the maximum likelihood values, only now employing the usual parameterization for each model and the associated factorization of the joint density of returns. Hence, we start with the unrestricted density for fi , as before, but now multiply by the restricted (αi∗ = 0 and αR,i = 0) joint conditional density for (fi∗ ,R) given fi . Recall that αi∗ refers to the excluded-factor alphas and αR,i to the test-asset alphas, both in regressions on fi , the model i factors. The corresponding likelihood for model Mi can then be written as Lmax (Mi ) = Lmax ×Lmax i i ∗ ,R . Here, Lmax is, as above, the likelihood corresponding to the marginal density i for fi . Lmax i ∗ ,R is the maximized value of the joint density of the excluded factors 1329 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 = 3622.9+2838.9+37001.0 = 43462.8 The Review of Financial Studies / v 30 n 4 2017 and the test assets, conditional on fi . The associated log-likelihoods are max ln[Lmax (FF3)] = ln(Lmax FF3 )+(LFF3∗ ,R ) = 3623.0+39839.8 = 43462.8 and (16) = 4724.4+38768.0 = 43492.4, identical to the values in Equation (15). This was inevitable, given the wellknown invariance of the likelihood function to one-to-one transformations of the parameter space. Here, the two parameterizations for each model correspond to the different factorizations of the joint density of factor and test-asset returns, which are induced by the indirect and direct model representations. The conclusion for model comparison is that the same outcome is obtained, regardless of which model representation is employed. In the end, both approaches evaluate the likelihood for each model given all the returns. Interestingly, though, the fact that test assets have no impact on this model comparison is not apparent from Equation (16). The reason is that, in this calculation, we only observe the joint impact of test assets and the factors excluded from each model (39839.8 and 38768.0), whereas Equation (15) isolates the common impact of test assets for each model conditional on all the factors (37001.0). 3. Test-Asset Metrics and the Non-nested Example In this section, we further explore our non-nested example using empirical test-asset metrics often employed in the literature. The goal here is to provide additional insight into how conclusions based on these metrics can conflict with those implied by the economic and statistical criteria discussed earlier. The excluded-factor time-series regression evidence is summarized in Table 1. Over the period July 1963 to December 2013, the relevant annualized alpha estimates are −0.56% for HMLm , 10.85% for UMD when regressed on FF3, and −2.12% for HML regressed on 4FM. The alphas are, of course, zero for the factors included in each model. Therefore, the average absolute alpha over all five factors is (0 + 0 + 0 + .56 + 10.85)/5 = 2.28% for FF3 and (0 + 0 + 0 + 0 + 2.12)/5 = 0.42% for 4FM. Alternatively, the square root of the average squared factor alpha is 4.86% for FF3 and 0.44% for 4FM. These informal metrics, which are typically applied to test assets rather than factors, clearly favor 4FM over FF3 (we ignore sampling variation). This is consistent with the formal finding based on likelihoods in the previous section, as well as the (annualized) Sharpe ratios of 0.76 for FF3 and 1.36 for 4FM. On the other hand, Table 2 shows measures of performance on test assets for each model and FF3 provides the better fit for six of the ten sets (indicated by boldface in 1330 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 max ln[Lmax (4FM )] = ln(Lmax 4FM )+(L4FM ∗ ,R ) Which Alpha? Table 1 Excluded factor regression coefficient estimates for the FF3 and 4FM models Panel A: Excluded-factor regressions for the FF3 model: {Mkt SMB HML} Coefficients Alpha Mkt SMB HML HMLm −0.56 (1.09) 10.85 (2.07) 0.08 (0.02) −0.18 (0.04) 0.04 (0.03) 0.01 (0.06) 0.96 (0.03) −0.31 (0.06) UMD R2 0.60 0.06 Panel B: Excluded-factor regressions for the 4FM model: {Mkt SMB HMLm UMD} Coefficients LHS Alpha Mkt SMB HMLm UMD R2 HML −2.12 (0.68) −0.04 (0.01) −0.07 (0.02) 0.92 (0.02) 0.37 (0.02) 0.80 The factors are value-weighted market return in excess of the one-month T-bill rate (Mkt), the size factor small minus big (SMB) market cap, the value factor high minus low (HML) book-to-market (B/M), the more timely value factor HMLm with monthly updating of price in B/M, and the momentum factor up minus down (UMD). Returns are for July 1963 to December 2013. The alphas are in annual percentage rates. Standard errors are provided in parentheses. the A|ai | column). For example, the annualized average absolute alpha across seventeen industry portfolios is 2.22% for FF3 and 2.40% for 4FM. Similarly, the Gibbons, Ross, and Shanken (1989) test statistic with these test assets, another basis of comparison that has been used previously, is 3.41 for FF3 and 4.96 for 4FM. Not surprisingly, when challenged to price the 25 portfolios based on independent size and momentum sorts, the 4FM that includes a momentum factor performs much better, with an average absolute alpha of 1.30% compared with 3.90% for FF3. Similar results are obtained with the alphas scaled by average deviations from the cross-sectional mean return, or if the Gibbons, Ross, and Shanken (1989) statistic is used. Here, the inability of FF3 to explain momentum drives the excluded-factor evidence (large UMD alpha), as well as the size-momentum test-asset evidence (larger FF3 alphas). As a result, including UMD in 4FM raises its Sharpe ratio, but also helps in pricing the momentum-sorted portfolios. But for many other portfolios, the test-asset alphas point to different conclusions than do the excluded-factor alphas. Take the twenty-five BM-Inv portfolios, for example, where the test-asset metrics clearly favor FF3 over 4FM, with average absolute alphas of 1.31% and 1.82%, respectively. A detailed look at the data reveals that most of this increase is due to inclusion of the UMD factor, not the substitution of HMLm for HML in 4FM. The test-asset loadings on UMD combine with a large UMD alpha on the other factors in such a way that the 4FM predictions for test-asset expected returns are mostly further from the actual values than the three-factor predictions.18 This reinforces the observations about test assets made earlier in 18 The analysis is similar to that in Equation (8). 1331 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 LHS The Review of Financial Studies / v 30 n 4 2017 Table 2 Statistics for FF3 and 4F models: July 1963 to December 2013 Model factors A|ai | A|ai |/A|ri | GRS p (GRS) 1.21 1.30 0.53 0.57 3.61 4.00 .0000 .0000 2.22 2.40 2.18 2.37 3.41 4.96 .0000 .0000 3.90 1.30 1.13 0.41 4.72 3.78 .0000 .0000 1.31 0.94 0.63 0.44 4.48 3.35 .0000 .0000 1.31 1.82 0.83 1.17 1.96 2.37 .0037 .0002 1.31 1.36 0.68 0.71 2.40 2.94 .0002 .0000 1.84 1.96 0.61 0.65 2.52 3.36 .0000 .0000 1.54 1.74 0.63 0.72 2.75 3.28 .0000 .0000 2.20 1.90 0.80 0.69 4.36 3.49 .0000 .0000 1.66 1.61 0.62 0.60 4.88 4.96 .0000 .0000 This table presents statistics for the three-factor Fama and French (1993) model versus a four-factor model that adds the momentum factor up minus down (UMD) and replaces high minus low (HML) with a more timely version (HMLm ). The table shows (1) the factors in each regression model; (2) the annualized average absolute value of the intercepts A|ai |; (3) the average absolute value of the intercepts over the average absolute value of the average return deviation, where A|ri | is the average return on portfolio i minus the cross-sectional average of the time-series average portfolio returns; (4) the Gibbons, Ross, and Shanken (1989) (GRS) F-statistic for testing whether the true intercepts are zero; and (5) the p−value for the Gibbons, Ross, and Shanken (1989) statistic. Test assets are taken from Ken French’s website and consist of portfolios formed based on the following characteristics: size, book-to-market (B/M), momentum (UMD), investment (INV), operating profitability (OP), and industry. Section 1.2.1, showing that incorrect conclusions about model comparison can also be obtained in a non-nested context, with “realistic” factors and test assets. This will not always be the case, but the essential point is that the excludedfactor evidence can always be relied on to provide the correct model rankings for the fundamental metrics we have considered, while test asset evidence can be misleading in this regard. 4. Nontraded Factors and Cross-Sectional Regression Analysis In this section, we again consider model comparison under the Sharpeimprovement metric (a modified Hansen and Jagannathan distance). First, we allow for the possibility that some factors are nontraded. We then discuss the relation between our alpha-based results and the cross-sectional regression methodology. 1332 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 25 size-B/M FF3 4FM 17 industries FF3 4FM 25 size-UMD FF3 4FM 25 size-INV FF3 4FM 25 BM-INV FF3 4FM 25 size-OP FF3 4FM 32 size-B/M-OP FF3 4FM 32 size-B/M-INV FF3 4FM 32 size-OP-INV FF3 4FM 16 size-B/M-OP-INV FF3 4FM Which Alpha? 4.2 Relation to the cross-sectional regression methodology Thus far, our focus has been on alphas, the pricing errors in the time-series methodology. Another approach often used in the literature is cross-sectional regression (CSR) analysis, where the pricing errors are average CSR residuals, the deviations from the linear expected return-beta relation.20 In this setting, the CSR R 2 for average returns, usually the ordinary least squares (OLS) measure, is often used to compare models. However, Lewellen, Nagel, and Shanken (2010), building on insights in Roll and Ross (1994) and Kandel and Stambaugh (1995), argue the merits of the generalized least squares (GLS) measure. Also, Shanken (1985) develops a model specification CSR test based on a quadratic form in the sample estimates of the GLS pricing errors, and Kan and Robotti (2008) show that the population counterpart is a modified Hansen 19 For example, let one set of test assets be R and another R+ consisting of R plus an additional return such that Sh(R + )2 > Sh(R)2 . Assume a nontraded factor fNT is the tangency portfolio τ + for R+ plus random noise. It follows that the mimicking portfolio for fNT constructed from R+ is τ + and Sh2 (τ + ) is higher than the squared ratio for any portfolio of R, including the mimicking portfolio for fNT formed from R. Finally, suppose that a traded factor fT is orthogonal to R+ and fNT and that Sh(fT )2 is between the squared ratios for the two mimicking portfolios. Then the model based on fT is preferred to the model based on fNT using R as test assets, but the Sharpe ranking is reversed using R+ . 20 We assume that the slopes do not sum exactly to zero. Note that “useless factors” with asset betas equal (or close) to zero do not pose a problem in the alpha-based framework with traded factors since the premia are just the factor means. There can be problems in connection with cross-sectional regressions and mimicking portfolios for nontraded factors, however. See, for example, Kan and Zhang (1999) and Burnside (2016). 1333 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 4.1 Models with nontraded factors We have seen that test assets are irrelevant for model comparison in the traditional alpha-based framework with traded factors. In the case of models with nontraded factors, those factors can be replaced with traded mimicking portfolios. The weights in such a portfolio equal the slope coefficients in the regression of the factor on the available returns (and a constant), normalized to sum to one. Given this, the model with the higher squared Sharpe ratio is preferred. Here, test assets are relevant insofar as they are needed to identify the mimicking portfolio returns. Thus, the result of this comparison can depend on which test assets are used to form the mimicking portfolios.19 In some contexts, however, a researcher may be willing to impose the restriction that the mimicking portfolio is spanned by a subset of the available returns or base assets. For example, Vassalou (2003) specifies a mimicking portfolio for GDP growth as a function of the six portfolios that Fama and French (1993) use to construct their SMB and HML factors, as well as two bond return spreads (term and default). The standard twenty-five size and book-to-market sorted portfolios are taken as test-asset returns. Here, our previous analysis implies that these test-asset returns would drop out in comparing models based on the Sharpe improvement metric since the model Sharpe ratios only depend on the base assets and the model traded factors. The Review of Financial Studies / v 30 n 4 2017 E(RB ) = βB γ , (17) where γ is a vector of risk premia associated with the betas βB of RB on f and a constant, with the excess zero-beta rate equal to zero (a constant is excluded from the CSR).21 Finally, let fp denote the vector of factors, but with mimicking portfolios formed from RB substituted for the nontraded factors, if any. When all factors are traded, fp = f . As earlier, we do not presume that any model is an exact description of reality. The corresponding GLS pricing errors for M are e = E(RB )−βB γGLS , where γGLS = (βB VB−1 βB )−1 βB VB−1 E(RB ) and VB denotes the covariance matrix of RB , assumed to be invertible. Note that this is a GLS regression on the betas defined relative to the original factors, f. The quadratic form in these GLS errors, e VB−1 e, is the measure of model misspecification or squared distance. Now, by Results 5 and 7 in Lewellen, Nagel, and Shanken (2010), we have e VB−1 e = Sh(RB )2 − Sh(fp )2 , (18) where the right-hand-side is the Sharpe improvement in going from investment in the model’s mimicking portfolios only to investment in all the returns (test assets and traded factors).22 This is the equivalence between the time-series and GLS CSR approaches asserted above. If a given model M is true, that is, there is a vector γ such that Equation (17) holds for all returns, then the GLS metric will be zero for any choice of test 21 The betas for any traded factors in both f and R will be one on those factors and zero on the other factors, B reflecting a perfect time-series fit. 22 This uses the partitioned inverse formula for the covariance matrix of returns, with blocks corresponding to f p and the (unspanned) test assets. Also see related results in Balduzzi and Robotti (2008). 1334 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 and Jagannathan distance when the zero-beta rate is constrained to equal the risk-free rate. Given these observations, one may reasonably ask how model comparison with the CSR distance measure relates to comparison in terms of the time series– based Sharpe improvement. A simple answer follows directly from algebraic results in Lewellen, Nagel, and Shanken (2010): the distance measures are identical and thus comparison is the same, provided that (1) the left-hand-side returns in the GLS regressions include the traded factors from both models, along with test assets, (2) the zero-beta rates are constrained to equal the riskfree rate; and (3) mimicking portfolios are substituted for the nontraded factors, if any, in computing the model Sharpe ratios. In particular, if all of the model factors are traded, there is no difference between the time-series and GLS CSR analyses of model comparison. More formally, let f be the factors (possibly nontraded) in a model M and RB the vector of base-asset returns consisting of the test-asset excess returns R plus the traded factors that are in M, along with those traded factors in the model to be compared with M. The CSR pricing restriction is then Which Alpha? where E(RB ) is the vector of actual expected returns here, not the model prediction. It then follows from Equation (18) that the numerator of the difference of R2 s for two models depends only on the mimicking portfolios since RB is the same in each case (it includes the traded factors for both models). Similarly, the “total sum of squares” in the denominator is the same for each model. Thus, we have essentially the same conclusion about the relevance of test assets (through the mimicking portfolio returns) as in Section 4.1, apart from this scaling effect in the denominator. In contrast to the analysis above, the usual CSR approach of estimating the zero-beta rate as a free parameter does not restrict the level of expected returns, despite the economic motivation to do so (see Lewellen, Nagel, and Shanken 2010 and Kan, Robotti and Shanken 2013). In this case, the zero-beta rates and the corresponding “generalized Sharpe ratios” of RB could differ across models, with the result that the impact of test assets does not drop out in model comparison, even with traded factors. 5. Conclusions It is natural to examine the performance of a given model in pricing test assets and traded factors that are not included in the model. The question that we have addressed is how this information should be used for the specific purpose of model comparison. We have obtained a simple yet unexpected answer to this question. In comparing traded-factor models on the basis of statistical likelihood or the classic asset pricing metric related to squared Sharpe ratios and the Hansen and Jagannathan distance, test assets tell us nothing beyond what we learn from the evidence on the pricing of factors excluded from each model. In particular, when performance with test assets favors one model but the excluded-factor evidence favors another, it is only the latter that is relevant in identifying the better model. The point is that a proper evaluation of a model should consider the totality of test-asset and factor-pricing evidence. But when this is done, the test assets drop out in the model comparison. In contrast, for models with nontraded factors, test assets are needed to identify factormimicking portfolio returns and are, therefore, relevant for comparison. There is certainly no guarantee that the model identified as best is a good model. To address this issue, which entails evaluation of the overall performance of the model, all information about the pricing of excluded factors 1335 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 assets. Therefore, by Equation (18), the corresponding squared Sharpe ratio for fp will equal that for RB . Hence the squared Sharpe ratio for M will always be greater than or equal to that for any other model. However, if two models are both misspecified, then one may perform better on some test assets and worse on others, as judged by the GLS and Sharpe metrics. Note also that the GLS CSR R2 measure for the zero-beta constrained model is 2 RGLS = 1− e VB−1 e/E(RB ) VB−1 E(RB ), The Review of Financial Studies / v 30 n 4 2017 References Asness, C., and A. Frazzini. 2013. The devil in HML’s details. Journal of Portfolio Management 39:49–68. Balduzzi, P., and C. Robotti. 2008. Mimicking portfolios, economic risk premia, and test of multi-beta models. Journal of Business and Economic Statistics 26:354–68. Barillas, F., and J. Shanken. 2016. Comparing asset pricing models. Working Paper, Emory University. Black, F., M. C. Jensen, and M. Scholes. 1972. The capital asset pricing model: Some empirical tests. In Studies in the theory of capital markets, ed. M. C. Jensen, 79–121. New York: Praeger. Breeden, D. 1979. An intertemporal asset pricing model with stochastic consumption and investment opportunities. Journal of Financial Economics 7:265–96. Burnside, C. 2016. Identification and inference in linear stochastic discount factor models with excess returns. Journal of Financial Econometrics 14(2):295–330. Carhart, M. 1997. On persistence in mutual fund performance. Journal of Finance 52:57–82. Cochrane, J. 2005. Asset pricing. Princeton, NJ: Princeton University Press. 1336 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 and test assets can be combined in an absolute test, for example, a Gibbons, Ross, and Shanken (1989) F-test with both sets of returns serving as “left-handside assets.” One may also want to conduct the test on various subsets of asset returns to better identify a model’s strengths and weaknesses. As emphasized by Lewellen, Nagel, and Shanken (2010), the choice of test-asset portfolios can have an important impact on the power of such tests. Data-mining biases are another concern in this context, as emphasized by Lo and MacKinlay (1990), and although test assets drop out in model comparison with traded factors, understanding biases related to the selection of candidate factors in such tests is important as well. Of course, the “mining” of models is an issue more generally in asset pricing research, as emphasized by Harvey, Liu, and Zhu (2016). Our goal in this paper has been to clarify the theoretical issues that arise in comparing pricing models in the alpha-based setting. We do not provide a methodology for carrying out statistical inference on model comparison, however. As discussed earlier, the standard Gibbons, Ross, and Shanken (1989) test can be used to compare nested models. However, inference about model comparison for non-nested models is more challenging. Barillas and Shanken (2016) focus on inference in a Bayesian setting. In essence, this amounts to a procedure for aggregating regression intercept evidence across the various candidate factor models, permitting simultaneous comparison of many such models, both nested and non-nested. Their application of this Bayesian approach is facilitated by the derivation of closed-form formulas for the various model probabilities. Finally, although test assets have been shown to be irrelevant for model comparison based on well-established economic and statistical criteria, test assets can have a bearing on model choice with other metrics. The challenge remains for those who advocate the use of such measures, in particular, ones that place no weight on the pricing of excluded factors, to articulate the economic motivation for these alternative approaches. Which Alpha? De Moor, L., G. Dhaene, and P. Sercu. 2015. On comparing zero-alpha tests across multifactor asset pricing models. Journal of Banking and Finance 61(Suppl 2):S235–40. Fama, E. F. 1996. Multifactor portfolio efficiency and multifactor asset pricing. Journal of Financial and Quantitative Analysis 31:441–65. ———. 1998. Determining the number of priced state variables in the ICAPM. Journal of Financial and Quantitative Analysis 33:217–31. ———. 1996. Multifactor explanations of asset pricing anomalies. Journal of Finance 51:55–84. ———. 2015. A five-factor asset pricing model. Journal of Financial Economics 116:1–22. ———. 2016. Dissecting anomalies with a five-factor model. Review of Financial Studies 29:69–103. Gibbons, M. R., S. A. Ross, and J. Shanken. 1989. A test of the efficiency of a given portfolio. Econometrica 57:1121–52. Hansen, L. P., and R. Jagannathan. 1997. Assessing specification errors in stochastic discount factor models. Journal of Finance 52:557–90. Harvey, C. R., Y. Liu, and H. Zhu. 2016. … and the cross-section of expected returns. Review of Financial Studies 29:5–68. Hou K., G. A. Karolyi, and B. C. Kho. 2011. What factors drive global stock returns? Review of Financial Studies 24:2527–74. Hou, K., C. Xue, and L. Zhang. 2015a. Digesting anomalies: An investment approach. Review of Financial Studies 28:650–705. ———. 2015b. A comparison of new factor models. Working Paper, National Bureau of Economic Research. Jegadeesh, N., and S. Titman. 1993. Returns to buying winners and selling losers: Implications for stock market efficiency. Journal of Finance 48:65–91. Jensen, M. C. 1968. The performance of mutual funds in the period 1945–1964. Journal of Finance 23:389–416. Jobson, D., and R. Korkie. 1982. Potential performance and tests of portfolio efficiency. Journal of Financial Economics 10:433–66. Kan, R., and C. Zhang. 1999. Two-pass tests of asset pricing models with useless factors. Journal of Finance 54:204–35. Kan, R., and C. Robotti. 2008. Specification tests of asset pricing models using excess returns. Journal of Empirical Finance 15:816–38. Kan, R., C. Robotti, and J. Shanken. 2013. Pricing model performance and the two-pass cross-sectional regression methodology. Journal of Finance 68:2617–49. Kandel, S., and R. Stambaugh. 1995. Portfolio inefficiency and the cross section of expected returns. Journal of Finance 50:157–84. Lintner, J. 1965. The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. Review of Economics and Statistics 47:13–27. Lewellen, J. 2015. The cross-section of expected stock returns. Critical Finance Review 4:1–44. Lewellen, J., S. Nagel, and J. Shanken. 2010. A skeptical appraisal of asset pricing tests. Journal of Financial Economics 96:175–94. Lo, A., and C. MacKinlay. 1990. Data snooping biases in tests of financial asset pricing models. Review of Financial Studies 3:431–68. 1337 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 Fama, E. F., and K. R. French. 1993. Common risk factors in the returns on stocks and bonds. Journal of Financial Economics 33:3–56. The Review of Financial Studies / v 30 n 4 2017 Owen, J., and R. Rabinovitch. 1983. On the class of elliptical distributions and their applications to the theory of portfolio choice. Journal of Finance 38:745–52. Pastor, L., and R. F. Stambaugh. 2000. Comparing asset pricing models: An investment perspective. Journal of Financial Economics 56:335–81. ———. 2002. Mutual fund performance and seemingly unrelated assets. Journal of Financial Economics 63: 315–49. Shanken, J. 1985. Multivariate tests of the zero-beta CAPM. Journal of Financial Economics 14:327–48. Sharpe, W. F. 1964. Capital asset prices: A theory of market equilibrium under conditions of risk. Journal of Finance 19:425–42. ———. 1984. Factor models, CAPMs, and the APT. Journal of Portfolio Management 11:21–25. Stambaugh, R. F., and Y. Yuan. 2016. Mispricing factors. Working Paper, Wharton School. Vassalou, M. 2003. News related to future GDP growth as a risk factor in equity returns. Journal of Financial Economics 68:47–73. 1338 Downloaded from https://academic.oup.com/rfs/article/30/4/1316/2758634 by National Science & Technology Library user on 07 June 2022 Roll, R., and S. A. Ross. 1994. On the cross-sectional relation between expected returns and betas. Journal of Finance 49:101–21.