Deep Tangency Portfolios* Guanhao Feng Liang Jiang Junye Li Yizhi Song First version: Nov. 2021; This version: September 27, 2023 Abstract We propose a parametric approach to directly estimate the tangency portfolio weights on high-dimensional individual assets by combining fundamental finance theory with deep learning techniques. The deep tangency portfolio combines the market factor and a deep long-short factor constructed using a large number of firm characteristics. We apply our approach to the corporate bond market. The deep factor acts as a market hedge and achieves a sizable market price of risk with an out-of-sample annualized Sharpe ratio of 1.79. The deep tangency portfolio outperforms those constructed from commonly used observable or latent factors with an out-of-sample annualized Sharpe ratio of 2.29. We also find evidence supporting the integration between the bond and equity markets. Keywords: Tangency Portfolios, Deep Learning, Factor Models, Portfolio Optimization, Corporate Bonds. JEL Classification: C45, G11, G12. * We appreciate insightful comments from Doron Avramov, Tarun Bali, Zhiguo He, Hendrik Bessembinder, Yong Chen, Hong Liu, Robert Macrae, Andreas Neuhierl, Seth Pruitt, Kuntara Pukthuanthong, Jianeng Xu, Dacheng Xiu, and Mao Ye. We are also grateful for helpful comments from the seminar and conference participants at the University of Missouri, Sun Yat-Sen University, Fudan University, Xi’an Jiaotong-Liverpool University, 2022 AsianFA, 2022 International Conference on Finance & Technology, 2022 China International Risk Forum, FIRM 2022, and 2023 SoFiE annual conference. Feng (Email: gavin.feng@cityu.edu.hk) and Song (Email: yizhisong2-c@my.cityu.edu.hk) are at the City University of Hong Kong, Li (Email: li junye@fudan.edu.cn) and Jiang (Email: jiangliang@fudan.edu.cn) are at Fudan University. 1 1 Introduction A fundamental theory in asset pricing is the equivalence of the mean-variance ef- ficient (MVE) portfolio and the stochastic discount factor (SDF) (Hansen and Jagannathan, 1991). The maximum squared Sharpe ratio of the MVE portfolio equals the minimum variance of the SDF in the asset space of the economy. Markowitz (1952) pioneers the modern portfolio theory, formulating an elegant solution to the tangency portfolio using only expected asset returns and covariance (Σ−1 µ). However, it is notoriously difficult to estimate the MVE portfolio when the number of individual assets becomes large, making Cochrane (2014, p.7) state, “but this formula is essentially useless in practice. The hurdles of estimating large covariance matrices, overcoming the √ curse of σ/ T in estimating mean returns, and dealing with parameter uncertainty and drift are not minor matters.” Modern asset pricing relies on factor models to approximate the SDF using a small number of characteristic-managed factors (e.g., Fama and French, 1996, 2015), hoping that those factors can span the efficient frontier. However, the commonly used factors can hardly achieve the maximum Sharpe ratio of the asset universe of either basis portfolios or individual assets (e.g., Kozak et al., 2018; Daniel et al., 2020; Lopez-Lira and Roussanov, 2020). The literature proposes a large number of factors (Harvey, Liu, and Zhu, 2016; Hou, Xue, and Zhang, 2020) to improve the spanning of the efficient frontier and explain “anomalies,” leading to the issue of a “factor zoo” (Cochrane, 2011) and the curse of high dimension. A recent study by Kozak and Nagel (2023) shows that characteristics-managed factors hardly span the efficient frontier unless a large number of characteristics are used simultaneously. This paper proposes a deep learning framework for constructing the optimal or tangency portfolio without relying on estimates of expected returns and covariance matrices. Unlike the typical portfolio optimization literature, which uses dozens or 2 hundreds of assets, we focus on constructing the tangency portfolio using thousands of individual assets. The key to constructing our deep tangency portfolio is to utilize high-dimensional firm characteristics containing rich information on the joint distribution of asset returns. Cochrane (2011) asserts that expected returns, variances, and covariances are stable functions of characteristics (also see, e.g., Kelly, Pruitt, and Su, 2019; Kozak, Nagel, and Santosh, 2020). Therefore, we directly parameterize the tangency portfolio weights as a non-linear function of high-dimensional characteristics. Indeed, using a large number of characteristics and their nonlinear combinations is crucial. Existing studies on machine learning (ML) have shown that there is no clear evidence of sparsity of characteristics (see, e.g., Kozak, Nagel, and Santosh, 2020; Giannone, Lenza, and Primiceri, 2021), and nonlinearity is important (see, e.g., Freyberger, Neuhierl, and Weber, 2020; Gu, Kelly, and Xiu, 2020).1 When constructing the tangency portfolio, we consider a large panel of individual assets (e.g., thousands of assets) and a small number of benchmark portfolios, such as the market factor. Using a divide-and-conquer strategy, we estimate the tangency portfolio by combining a deep factor and the benchmark factors. The nonlinear neural network, which is guided by an economically motivated loss function, provides supervised dimension reduction by transforming high-dimensional characteristics into a deep characteristic for each asset, based on which a deep factor is formed as a long-short portfolio of individual assets. The endogenous deep factor construction that relies on a nonlinear ranking scheme mimics the commonly used characteristicssorted factor approach in empirical asset pricing, and avoids extreme positions in long and short sides (see, e.g., Avramov, Cheng, and Metzker, 2023). Such a deep learning framework is flexible enough to incorporate various types of benchmark factors and multiple deep factors. In addition, our deep parametric portfolio policy can easily be adapted to other economic objectives, such as the minimum variance portfolio or 1 See the latest textbook survey of Negal (2021) and the review by Giglio, Kelly, and Xiu (2022), as well as references therein. 3 utility maximization, with various economic constraints. This makes it a valuable contribution to the field of asset allocation. The economically guided deep factor plays two important roles: (i) under the maximal Sharpe ratio objective of the tangency portfolio, the deep factor has a low or even negative correlation with benchmark factors, providing a potential market hedge portfolio, and (ii) the deep factor is constructed using information from high-dimensional characteristics and may span any missing risk factors, other than benchmark factors, which should enter the pricing kernel. Our deep parametric portfolio policy relies solely on improving the Sharpe ratio over the benchmark, without utilizing any test assets. This approach is similar to the factor selection methods of Barillas and Shanken (2017, 2018) and Fama and French (2018). These features make our deep learning model more easily interpretable and largely alleviate the “black-box” criticism. To demonstrate our methodology, we apply it to the corporate bond market, as studies on the cross-sectional pricing of corporate bonds remain limited compared to the equity market. The literature has proposed some observable factors to explain time-series comovement and cross-sectional variations in corporate bond returns. For example, Fama and French (1993) argue that two factors based on bond term and default and their three equity factors of Fama and French (1992) can capture common variation in equity and bond returns. Bai, Bali, and Wen (2019) (BBW hereafter) propose an alternative bond factor model based on the downside, credit, and liquidity risks (also see, Dickerson, Mueller, and Robotti, 2023).2 However, these models impose strong ad hoc sparsity by only using a few characteristics, which may suffer from model misspecification and omitted factors. Therefore, observable-factor models may not compete with latent factor models considering high-dimensional characteristics (Kelly, Palhares, and Pruitt, 2022; Kelly, Malamud, and Zhou, 2022). 2 Other corporate bond factors include liquidity (Lin, Wang, and Wu, 2011), momentum (Jostova, Nikolova, Philipov, and Stahel, 2013), volatility (Chung, Wang, and Wu, 2019), and long-term reversal (Bali, Subrahmanyam, and Wen, 2021). 4 Empirical Highlights. We construct monthly corporate bond returns using transaction data on corporate bond prices from the enhanced Trade Reporting and Compliance Engine (TRACE). To employ as many characteristics as possible, we consider three types of characteristics. The first type is the bond characteristics. We construct a set of 41 bond characteristics by combining TRACE and the Mergent Fixed Income Securities Database (FISD) data. Second, since both bond and stock prices are contingent on firm fundamentals, we collect 61 equity characteristics that are frequently used in the literature (see, e.g., Freyberger, Neuhierl, and Weber, 2020; Feng, He, Polson, and Xu, 2023). Lastly, it has been found in the recent literature that equity optionrelated variables contain information about future corporate bond returns (see, e.g., Cao, Goyal, Xiao, and Zhan, 2022; Chung, Wang, and Wu, 2019; Huang, Jiang, and Li, 2023). Therefore, we construct 30 equity option-related characteristics. Our deep learning model uses those 132 characteristics and considers the bond market factor as the only benchmark. The sample period is from July 2004 to December 2020, with the subsample from July 2004 to June 2014 for model training and validating and the subsample from July 2014 to December 2020 for out-of-sample testing. Our empirical findings can be briefly summarized as follows. First, although it earns a relatively small mean excess return compared to the bond market factor and other tradable factors, the deep factor only varies slightly over time, resulting in a higher Sharpe ratio than all other factors considered for both in-sample and out-ofsample periods. Furthermore, the deep factor negatively correlates with the bond market factor, and they rarely decrease simultaneously, providing us with a market-hedge portfolio. Figure 1 presents time series plots of excess returns on the deep and market factors over the out-of-sample period (normalized to have the same volatility). The deep factor remains positive during market downturn periods, in particular during the outbreak of the COVID-19 pandemic. Second, the deep tangency portfolio, constructed using the market factor and the 5 Figure 1: Bond Market Factor Versus Deep Factor This figure presents time series plots of excess returns on the deep factor based on a nonlinear ranking scheme (bar) and the bond market factor (solid line) over the out-of-sample period ranging from July 2014 to December 2020. The deep factor returns are normalized to have the same volatility as the market factor. The shaded area represents the outbreak of the Covid-19 pandemic. deep factor, achieves an out-of-sample annualized Sharpe ratio of 2.29, much higher than that of the market portfolio (0.86), of the tangency portfolio from the BBW four factors (0.69), and of the tangency portfolio from the Fama-French three equity factors (MKTRF, SMB, and HML) plus two bond factors (term and default factors) (1.27). Consistently, we find that neither of these observable factor models can explain excess returns on the deep factor and the deep tangency portfolio in the factor-spanning regressions. The deep factor has a negative loading on the bond market factor, and the deep tangency portfolio has minimal exposure to the bond market factor. This further emphasizes the market-hedging function of the deep factor. Third, we further show that it is crucial to consider various types of characteristics when constructing the deep tangency portfolio. When we exclude option-related variables, the out-of-sample Sharpe ratio of the deep tangency portfolio decreases to 1.83, 6 decreasing further when we only use bond characteristics. To better span the efficient frontier, it is crucial to introduce as many characteristics as possible. Our finding is in stark contrast to previous studies that argue that those characteristics that predict equity returns do not necessarily forecast corporate bond returns (see, e.g., Chordia et al., 2017; Bali et al., 2021). However, it offers further evidence in support of the integration of the bond and equity markets (e.g., Schaefer and Strebulaev, 2008; Kelly et al., 2022). Finally, our deep parametric portfolio policy provides an alternative approach to constructing latent factors. We make additional analyses by comparing the performance of our deep tangency portfolio with those constructed using two recently developed latent-factor methods: risk-premium principal component analysis (RP-PCA) by Lettau and Pelger (2020) and instrumental principal component analysis (IPCA) by Kelly, Pruitt, and Su (2019) and Kelly et al. (2022). Unlike these PCA-based approaches, our dimension reduction is implemented on firm characteristics rather than characteristics-managed portfolios, providing an interpretable deep characteristic. Our deep tangency portfolio outperforms: for the same out-of-sample period, the tangency portfolio from the five RP-PCA factors earns an annualized Sharpe ratio of only 0.95, and that from the five IPCA factors achieves an annualized Sharpe ratio of 1.67. Literature. Our paper contributes to several strands of literature. First, it contributes to the robust portfolio construction that sidesteps the direct estimation of expected returns and covariance matrix. Brandt (1999) and Ait-Sahalia and Brandt (2001) propose a nonparametric approach for estimating portfolio weights from the Euler first-order conditions, thus bypassing the estimation of return covariance and averages. Brandt, Santa-Clara, and Valkanov (2009) provide a parametric approach by estimating the portfolio weights as a linear function of characteristics (size, value, and momentum), but this approach cannot handle a large number of characteristics or assets. Based on the same approach as Brandt, Santa-Clara, and Valkanov (2009), Brandt and SantaClara (2006) examine a market-timing problem involving stocks, bonds, and cash, and 7 DeMiguel et al. (2020) show the economic rationale of transaction cost using multiple characteristics. Raponi, Uppal, and Zaffaroni (2021) combine an “alpha” portfolio and a “beta” portfolio relying on a factor model for the robust portfolio choice. Our paper presents a parametric approach to estimate portfolio weights directly. The distinctive design of long-short portfolio weights reflects the nonlinear risk-return relationship of the deep characteristics generated by the multi-layer deep neural network. Second, the paper adds to the recent literature on machine learning methods that construct latent factors to approximate the SDF by considering a large number of characteristics. Kozak, Nagel, and Santosh (2020) assume that the SDF loading is a linear function of characteristics, and find no clear evidence of sparsity of characteristics in the SDF loading, and Cong, Feng, He, and Li (2023) propose an alternative local sparsity framework for heterogeneous factor models selected for different assets and macroeconomic regimes. In addition, our paper also relates to recent attempts to develop nonlinear deep neural networks for latent factor models (see, e.g., Gu, Kelly, and Xiu, 2021; Chen, Pelger, and Zhu, 2022; Feng, He, Polson, and Xu, 2023). Differently, based on a fundamental economic theory of the equivalence between the SDF and MVE portfolio, our paper develops a flexible and interpretable methodology to create a tangency portfolio without estimating expected returns and covariance. Avramov, Cheng, and Metzker (2023) also emphasize the importance of economic restrictions when applying machine learning methods to long-short portfolio constructions. Finally, our paper contributes to the literature that investigates the cross-sectional predictability of corporate bond returns based on characteristics.3 Yet, most of those papers impose a strong ad hoc sparsity in modeling. Bali et al. (2021) and He et al. (2021b) are two recent works investigating corporate bond return predictability via machine learning methods. However, our method bypasses the estimation of expected returns and covariance by employing a deep nonlinear combination of characteris3 See, for example, Bai, Bali, and Wen (2019), Lin, Wang, and Wu (2011), Jostova et al. (2013), Chung, Wang, and Wu (2019), Huang, Jiang, and Li (2023), and He et al. (2021a). 8 tics to form the tangency portfolio. Related to our paper, Kelly, Palhares, and Pruitt (2022) apply IPCA to the cross-sectional pricing of corporate bonds, showing that a five-factor model outperforms commonly used observable factor models on the ICE corporate bond dataset. Besides different objectives (portfolio optimization vs. crosssectional pricing), our method is more flexible and allows for modeling nonlinearity and interactions. The remainder of the paper is organized as follows. Section 2 presents our model and the deep learning algorithms. Section 3 presents corporate bond returns and characteristics data. Section 4 provides empirical findings. Section 5 concludes the paper. 2 2.1 Methodology Maximal Sharpe Ratio Portfolio There exists a duality between the SDF variance and Sharpe ratios. We start with the minimum-variance SDF in the economy that spans N individual asset excess returns, rt = [r1,t , ..., rN,t ]′ , as constructed by Hansen and Jagannathan (1991), mt+1 = 1 − wt′ rt+1 − µt , (1) where µt = Et [rt+1 ] represents the conditional expectation of asset excess returns. By plugging the linear SDF in (1) in the fundamental pricing relation, Et [mt+1 rt+1 ] = 0, the solution to the SDF loading wt takes the form of wt = Σ−1 t µt , (2) where Σt is the conditional variance-covariance matrix of excess returns, Σt = Covt (rt+1 ). The conditional variance of the SDF is then given by V art (mt+1 ) = µ′t Σ−1 t µt , 9 (3) which equals the maximum conditional squared Sharpe ratio of the tangency portfolio, opt Rt+1 = wt′ rt+1 , (4) whose weights are the same as the SDF loadings in (2). In practice, it is challenging to estimate expected returns and covariance matrix. The number of individual assets, N , is usually very large, making it difficult to estimate the large covariance matrix (Σt ). Moreover, as a general observation, mean estimates (µt ) are often imprecise even with long samples and a high frequency of excess returns (see, e.g., Merton, 1980; Cochrane, 2014). Both issues yield a very inaccurate estimate of wt in (2), resulting in the poor out-of-sample performance of optimal portfolios (see, e.g., DeMiguel, Garlappi, and Uppal, 2009). A common approach in the finance literature is to adopt factor pricing models to reduce the dimensionality of the SDF by approximating it with a small number of factors (e.g., Fama and French, 1996, 2015). Assume that the SDF loading wt in (1) can be largely captured in a linear form by J characteristics, zt , an N × J matrix for J ≪ N , such that wt = w̃t + zt κ, (5) where following the convention of the finance literature, w̃t is normalized weights on market capitalization of firms, zt is usually cross-sectionally standardized to have zero mean, and κ is a J × 1 vector of coefficients. Define Rm,t+1 = w̃t′ rt+1 , representing the market portfolio, and ft+1 = zt′ rt+1 , representing J factors, which are zero-investment characteristic-managed long-short portfolios. The formula (5) suggests that the SDF takes the form of mt+1 = 1 − δ ′ Ft+1 − µF,t , (6) where δ = [1, κ′ ]′ , and Ft = [Rm,t , ft′ ]′ , and the maximum Sharpe ratio in (3) is equal to maximum Sharpe ratio of factors, µ′F,t Σ−1 F,t µF,t . 10 Such a dimension reduction aims to use those small number of factors (Ft ) to approximate the SDF (see (1)) and span the MVE portfolio (see (3)). Building on the intertemporal capital asset pricing model of Merton (ICAPM, 1973), Fama and French (1996, 2015) also interpret those factors of ft as “[they] are just diversified portfolios that provide different combinations of exposure to the unknown state variables”. However, the literature has found that there does not exist clear-cut evidence of sparsity of characteristics (e.g., Kozak, Nagel, and Santosh, 2020; Giannone, Lenza, and Primiceri, 2021) and many characteristics and their nonlinear combinations contain information on the joint distribution of asset returns for characterizing the crosssectional variation (e.g., Freyberger, Neuhierl, and Weber, 2020; Gu, Kelly, and Xiu, 2020). Kozak and Nagel (2023) formally show that characteristics-managed factors span the SDF only if a large number of characteristics are used simultaneously. Therefore, in this paper, we sidestep direct estimation of µt and Σt , or simple reduction of dimension with few characteristics, but instead approximate the tangency portfolio weights by parameterizing wt as a nonlinear function of a large number of K firm characteristics, zt , an N × K matrix with K ≫ J. We formulate wt as wi,t = w̃i,t + θ wd (zi,t ; Φ), i = 1, . . . , N, (7) where, as before, w̃i,t is the weight of asset i in the market portfolio, wd (·) is a function of zi,t that can account for any potential nonlinear relations among a large number of characteristics of asset i, Φ is the required parameters, and θ is a scalar controlling the relative weight in the tangency portfolio. We estimate the portfolio weights as a single function of characteristics that applies to all assets (see, e.g., Brandt, Santa-Clara, and Valkanov, 2009). To be explained clearly in the next subsection, the function, wd (·), produces weights with an economically motivated target for forming a zero-cost long-short portfolio. 11 The tangency portfolio return in (4) can then be represented by opt Rt+1 = N X i=1 w̃i,t ri,t+1 + θ N X wd (zi,t ; Φ)ri,t+1 = Rm,t+1 + θRd,t+1 , (8) i=1 where Rm,t+1 , as before, is the market portfolio return, and given that wd (·) crosssectionally sums to zero, Rd,t+1 is, in fact, the returns on a long-short portfolio constructed based on non-linear combinations of characteristics. The parameterization of (7) suggests a two-factor reduced-form SDF with factors of Rm,t and Rd,t . When the function wd (·) takes a linear form, and the number of characteristics is small, our parameterization becomes the standard approach as in (5). When we have a priori knowledge that a particular set of observable factors helps span the efficient portfolio frontier, we can introduce these factors by inserting them into (7) and construct the portfolio weights, wi,t , as follows, p θp + θd wd (zi,t ; Φ), wi,t = w̃i,t + w̃i,t i = 1, . . . , N, (9) where w̃tp is a N × P vector of weights on individual assets for constructing the P observable factors, and θp is a P × 1 vector of coefficients. The tangency portfolio return is then given by opt Rt+1 = Rm,t+1 + θp′ Rp,t+1 + θd Rd,t+1 , (10) where Rp,t+1 is a P × 1 vector of returns on P observable factors at time t + 1. Now denote θ = [θp′ , θd ]′ . The main objective of our model is to find the minimum-variance SDF or a tangency portfolio that delivers the maximum Sharpe ratio. For this purpose, we search for possilbe functional form of wd (·) and estimate the model parameters θ and Φ by 12 opt maximizing the average conditional squared Sharpe ratio of the portfolio Rt+1 , max θ,Φ T −1 1X opt ). SRt2 (Rt+1 T t=0 (11) The long-short portfolio, Rd,t+1 , plays two fundamental roles: (i) According to the principle of diversification, (11) suggests it should have a low or even negative correlation with the market factor (and other benchmark factors), providing us with a potential market hedge portfolio; and (ii) when the market (and other benchmark factors) alone cannot capture all systematic risk, the deep factor spans to a large extent any missing risk factors that should enter the pricing kernel, implying that it may have a sizable market price of risk. Our approach can also be interpreted as a dimension reduction of characteristics and risk factors. Empirically, many studies have proved the failure of CAPM. In addition to the market factor, more factors need to be introduced to the pricing kernel to explain the time-series comovement of asset returns and expected return spreads across individual assets. The most popular factors are characteristic-managed portfolios, such as the Fama-French factors (Fama and French, 1996, 2015). Our framework aims to find such characteristic-managed portfolios based on a fundamental economic theory: the MVE portfolio is equivalent to the SDF. The proposed nonlinear modeling approximates the long-short factor construction using a large number of characteristics and reflects the underlying risk-return relationship. The dimension reduction in constructing characteristic-managed portfolios relies on the Sharpe ratio improvement over the market or other benchmark factors without using test assets. Such irrelevance of test assets in factor model comparison has been discussed by Barillas and Shanken (2017, 2018) and Barillas et al. (2020). In what follows, we propose a deep learning method for constructing the portfolio weights of wd (·) in Equations (7) and (9) and a deep long-short portfolio Rd,t . While 13 many popular characteristic-managed factors have sidestepped the high-dimensional problem by focusing on only a few characteristics, we try to consider as many potential characteristics and their nonlinear combinations as possible. 2.2 Deep Factor and Deep Tangency Portfolio Our long-short portfolio construction, Rd,t , relies on a deep learning model with an economically motivated target, aiming to construct the tangency portfolio by complementing the benchmark factors. Rather than specifically relying on average returns and covariance matrix of high-dimensional individual assets, we retain the conventional sorting scheme in our deep learning model based on information of a large number of characteristics. Deep Characteristic. We follow the standard modeling approach of a neural network for dimensional reduction of a large number of characteristics (see, e.g., Gu et al., 2020; Feng et al., 2023; Bali et al., 2021). We first clarify notations. A typical training observation indexed by time t includes the following types of data: • {ri,t }N i=1 , excess returns of N individual assets; • {zk,i,t−1 : 1 ≤ k ≤ K}N i=1 , K characteristics of N assets observed at time t − 1; +1 • {Rb,t }Pb=1 , a (P + 1) × 1 vector of excess returns on the market factor and P ob- servable factors. We design a L-layer neural network that transforms K characteristics to one deep characteristic that is relatively interpretable. At each time t and for each asset i, i = 1, . . . , N , our deep learning model works as follows, ′ (0) Zi,t−1 = [z1,i,t−1 , · · · , zK,i,t−1 ] , (l) (l−1) Zi,t−1 = G(A(l) Zi,t−1 + b(l) ), 14 (12) (13) (l) (l) for l = 1, . . . , L, where Zi,t−1 is the i-th column of the Kl × N matrix of Zt−1 , for 1 ≤ Kl ≤ K, and G(·) is a univariate activation function, which is chosen to be the tanh function in the paper, G(x) = (ex − e−x )/(ex + e−x ). A(l) and b(l) are deep learning weight and bias parameters, respectively, and need to be trained in the algorithm. The algorithm performs the transformation and dimension reduction for each asset without interactions among different assets through the univariate activation function. (L) In the end, we have a 1 × N matrix of deep characteristics, Zt−1 . The parameters to be trained in this part are deep learning weights A and biases b, namely, n oL . A(l) , b(l) : A(l) ∈ RKl ×Kl−1 , b(l) ∈ RKl l=1 (14) (L) Deep Factors. The deep characteristics, Zt−1 , are then used to form weights of a deep portfolio (factor) as follows, (L) wd (zt−1 ) ≡ Wt−1 = h(Zt−1 ), (15) where the function, h(·), needs to be differentiable. Following the literature, our first choice of the function h(·) is simply a linear function, resulting in a deep characteristic-managed portfolio, i.e., Wt−1 = a (L) Z , N t−1 (16) where a is a scaling parameter. In addition, to mimic the commonly used portfolio sort approach (i.e., undifferentiable step function), following Feng et al. (2023), We adopt the softmax function and (L) calculate the portfolio weights as follows. For x = Zt−1 , the function h(·) takes the 15 form of, sof tmax(x+ 1) sof tmax(x+ 2) h(x) = .. . sof tmax(x+ N) sof tmax(x− 1) sof tmax(x− 2) − .. . sof tmax(x− N) , (17) where x+ := −a1 e−a2 x and x− := −a1 ea2 x , and a1 and a2 are two tuning parameters. The nonlinear softmax function is an increasing function, exi sof tmax(xi ) = PN xj j=1 e and PN i=1 , (18) sof tmax(xi ) = 1. On the right-hand side of (17), the first term represents the long position weights of assets, and the second term is for the symmetric short position. In implementation, we choose a1 = 50 and a2 = 8 such that at each time, about 50% to 70% assets are in the middle rank and have zero weights, similar to the traditional sorting procedure (see Figure A1 in the Internet Appendix). Furthermore, we normalize the portfolio weights such that the sum of weights in the long leg equals 1 and that in the short leg equals -1. As discussed in Feng et al. (2023), such a nonlinear ranking scheme depends on both the cross-sectional rank information and the distributional properties of characteristics. Our construction of the deep factor avoids extreme positions in both long and short legs (Avramov, Cheng, and Metzker, 2023). In what follows, we refer to Equation (16) as linear ranking and to Equation (17) as softmax ranking. The deep factor portfolio weights, Wt−1 , in Equation (16) or (17), sum to zero by construction. Our deep factor, Rd,t , can then be computed as Rd,t = Wt−1 rt . (19) Loss Function. The deep factor in Equation (19) can be combined with the market or other benchmark factors to form the deep tangency portfolio as in (8). Note that more 16 than one deep factor can be constructed iteratively by treating the previous one as a new benchmark factor in our algorithm. As a result, the additional deep factor may capture pricing information not contained in the previous one. Given that all parameters in our model are time-invariant and that we implicitly assume that characteristics fully capture all aspects of expected returns and covariance relevant to optimal portfolios, the conditional model becomes an unconditional one, and the objective function in (11) can be replaced by the unconditional squared Sharpe ′ ratio of optimal portfolio Rtopt on F̃t = [Rb,t , Rd,t ]′ , SR2 (Rtopt ) ≡ SR2 (F̃t ) = E(F̃t )′ Cov(F̃t )−1 E(F̃t ). (20) There are usually a large number of parameters for modeling a multi-layer neural network. To avoid overfitting and improve the model’s out-of-sample performance, we augment the objective function by introducing the regularization penalties and minimizing the following loss function, Lγ1 ,γ2 L−1 L−1 X X opt 2 (l) = exp −SR (Rt ) + γ1 A + γ2 ||A(l) ||2 , l=1 | (21) l=1 {z penalties } where the L1 -norm and L2 -norm penalties aim to restrict the complexity of the neural network, stabilize parameters, and thus avoid overfitting. The tuning parameters, γ1 and γ2 need to be tuned through training and validation. Figure 2 presents a visualization of our deep learning architecture and summarizes the critical stages for constructing the deep factor and the deep tangency portfolio. 3 Data To illustrate the performance of our methodology, we apply it to the corporate bond market, given that relative to the equity market, studies on the cross-sectional pricing of corporate bonds remain limited. We first construct the corporate bond returns based 17 Figure 2: Deep Learning Network Architecture This figure provides a visualization of the deep learning architecture. Different types of characteristics, Z (0) (e.g., equity, bond, and option characteristics) are transformed via the multilayer neural network to deep characteristics, Z (L) , based on which the deep portfolio (factor) weights, W , are formed. An optimal portfolio, Ropt , is constructed by combining the deep factor, Rd , and the benchmark factors, Rb . on the TRACE data in Subsection 3.1; we then introduce various types of characteristics that will be fed into our deep learning model in Subsection I and present the benchmark factor and competing factor models in Subsection 3.3. 3.1 Corporate Bond Returns and Summary Statistics We obtain corporate bond intraday transaction data from the enhanced version of TRACE, which offers the best-quality data on corporate bond prices, trading volume, and buy-sell indicators. Using TRACE transaction data to measure abnormal corporate bond performance is emphasized in Bessembinder et al. (2009). We merge the TRACE dataset with the FISD to obtain bond characteristics such as offering date, offering amount, maturity date, coupon type and rate, bond type and rating, interest payment frequency, and issuer information. 18 Following the standard procedures in Dick-Nielsen (2009, 2014), we exclude duplicates, withdrawn, and erroneous trade entries in the TRACE data. Additionally, we follow Bai, Bali, and Wen (2019) to apply several filters to the data such that we remove: (i) bonds that are not listed or traded in the U.S. public market; (ii) bonds that are structured notes, mortgage-backed, asset-backed, agency-backed, or equitylinked; (iii) convertible bonds whose option feature distorts the return calculation and makes it impossible to compare the returns of convertible and nonconvertible bonds; (iv) bonds with time to maturity of fewer than two years; and (v) bonds that trade under $5 or above $1,000. We then calculate the daily bond price as the trading-volumeweighted average of intraday prices, as in Bessembinder et al. (2009). In line with the literature, for each corporate bond i, its return at month t is calculated as follows: r̃i,t = P ri,t + AIi,t + Ci,t − 1, P ri,t−1 + AIi,t−1 (22) where P ri,t is its transaction price in month t, AIi,t is its accrued interest, and Ci,t is its coupon payment in month t. As in Bai, Bali, and Wen (2019), we identify two scenarios to calculate a realized return at the end of the month t: (i) from the end of the month t − 1 to the end of the month t and (ii) from the beginning of month t to the end of the month t. The end (beginning) of the month refers to the last (first) five trading days in that month, and if there is more than one trading record in this five-day window, we use the last (first) observation of the month. If a return at the end of a month is realized in both scenarios, we use the realized return from the end of the month t − 1 to the end of the month t. The excess bond return is then defined as the difference between the bond return and the risk-free rate, ri,t = r̃i,t − rf,t , where the risk-free rate, rf,t , is proxied by the one-month Treasury bill rate obtained from CRSP. Furthermore, as in Feng et al. (2023), we make a balanced panel by only keeping 3,200 bonds with the largest size each month.4 The final sample of corporate bond returns spans from 4 To avoid the volatility and liquidity effect of small market-value bonds, we select the largest 3200 19 July 2004 to December 2020. Table 1 presents the summary statistics of excess corporate bond returns and typical bond characteristics. Our sample includes 16,188 corporate bonds and 633,600 bond-month return observations. As shown in Panel A, the mean monthly excess bond return is about 0.49% with a standard deviation of 4.32%. The sample contains bonds with an average size of about 809 million, an average rating of 8.78, which is a BBB+ rating5 . Panel A also reports the cross-sectional statistics of investment grade (IG) bonds, which takes about 74.8% of all observations, and non-investment grade (NIG) bonds. Compared to the NIG bonds, the IG bonds have a smaller average return (0.43% vs. 0.69%), a lower standard deviation (2.75% vs. 7.19%), and a higher rating level (7.04 vs. 13.95). The last two columns report summary statistics of the public and private bonds. The public bonds take about 76.7% of all the bond-month observations, their returns are smaller on average, and their ratings are higher on average, compared to private bonds. Both IG and Public bonds have much larger average sizes than their counterparts. Panel B and C report the sample distributions by Rating & Maturity and Ownership & Rating, respectively. A general observation is that most bonds with high ratings are long-maturity bonds. 3.2 Characteristics We consider three types of characteristics that contain useful information for cor- porate bond return predictability. The first type of characteristics includes 41 bond characteristics that can be classified into three major categories: basis characteristics (e.g., rating, duration, liquidity), return-distribution characteristics (e.g., momentum, reversal, variance, skewness), and covariances with common risk factors (e.g., market among all available bonds each month. 5 Ratings are represented in numerical scores, where 1 refers to an AAA rating, 2 refers to an AA+ rating, . . . , and 21 refers to a C rating. Investment-grade bonds have ratings from 1 (AAA) to 10 (BBB-), and non-investment-grade bonds have ratings of 11 or above. Similar to Bai, Bali, and Wen (2019), we use the ratings of Standard & Poor’s (S&P) or Moody’s to determine a bond’s rating. When both rating companies rate a bond, we use the average of their ratings. 20 Table 1: Summary Statistics Our final data sample includes 633,600 monthly return observations of 16,188 unique corporate bonds from July 2004 to December 2020. We report the summary statistics of the whole sample (ALL) and several subsamples constructed based on Rating (Investment Grade(IG) & Non-Investment Grade(NIG)), ownership (Public & Private), and/or Maturity. Panel A: Cross-sectional statistics Bond-month observations Ret mean (%) Ret std (%) Rating mean Duration mean Age mean Size mean (million) ALL IG NIG Public Private 633,600 0.49 4.32 8.78 3.97 4.25 809 474,105 0.43 2.75 7.04 4.26 4.33 865 159,495 0.69 7.19 13.95 3.09 4.00 644 486,143 0.45 3.43 8.32 4.07 4.22 836 147,457 0.64 6.44 10.3 3.63 4.33 719 Panel B: Sample Distribution(%) by Maturity and Rating Maturity AAA AA A B Junk ALL 2 3 4 5 6 7 8 9 10 ≥11 ALL 0.15 0.19 0.18 0.15 0.10 0.09 0.08 0.07 0.07 0.58 1.66 0.71 0.77 0.77 0.74 0.40 0.38 0.34 0.35 0.35 1.53 6.33 2.61 3.00 3.02 3.02 1.92 1.89 1.75 1.75 1.72 8.59 29.27 2.65 3.28 3.54 3.73 2.84 2.88 2.77 2.81 2.74 10.33 37.56 1.44 2.01 2.54 3.11 3.44 3.42 2.61 1.92 1.34 3.33 25.17 7.57 9.25 10.06 10.75 8.69 8.66 7.55 6.90 6.22 24.35 100.00 Panel C: Sample Distribution(%) by Ownership and Rating Ownership AAA AA A B Junk ALL Private Public ALL 0.11 1.55 1.66 1.21 5.12 6.33 4.55 24.72 29.27 8.43 29.13 37.56 8.97 16.21 25.17 23.27 76.73 100.00 beta, TERM beta, DEF beta). Furthermore, given that both bond and stock prices are contingent on firm fundamentals, we also consider those equity characteristics shown helpful in predicting 21 equity returns. Recent studies have shown that bond and equity markets are largely integrated. Choi and Kim (2018) argue that market integration suggests different markets should share common factors. Schaefer and Strebulaev (2008) show that bond and equity returns are related through the capital structure hedge ratio. By approximating the hedge ratio with a Merton model for debt, they find that the sensitivity of debt returns to equity is close to that predicted by the Merton model. Building on Schaefer and Strebulaev (2008) and Choi and Kim (2018), Kelly, Palhares, and Pruitt (2022) find that debt and equity markets are more integrated than previous estimates suggest, and that these markets are substantially more integrated in terms of their systematic risks than their idiosyncratic risks. Therefore, the second type includes a total of 61 equity characteristics that cover six major categories: momentum, value, investment, profitability, frictions or size, and intangibles, which are also used in Freyberger, Neuhierl, and Weber (2020) and Feng et al. (2023). In addition, the recent literature has found that several option-related variables have predictive power for corporate bond returns (see, e.g., Cao et al. (2022), Chung, Wang, and Wu (2019), Huang, Jiang, and Li (2023)). We, therefore, construct a total of 30 option-related characteristics. Many of those option-related variables have been shown to have predictive power for equity returns (see, e.g., Neuhierl et al., 2021); here, we examine whether they also help forecast corporate bond returns. Altogether, we have a large number of characteristics (in total, 132). The bond, equity, and option characteristics are listed in Table A1, Table A2, and Table A3, respectively, in Appendix. Before feeding those characteristics into our deep learning model, we cross-sectionally rank and standardize them each month so that they are in the [−1, 1] range, and their cross-sectional averages are equal to 0. Any missing values are imputed to be 0. One advantage of using the cross-sectional ranks of characteristics is that the impact of potential data errors and outliers in individual characteristics can be largely alleviated (see, e.g., Kelly, Pruitt, and Su, 2019; Freyberger, Neuhierl, 22 and Weber, 2020; Kozak, Nagel, and Santosh, 2020). 3.3 Benchmark Market Factor and Competing Factors Benchmark Market Factor. There do not exist well-established characteristic-managed factors in the corporate bond market. Therefore, we take the corporate bond market portfolio as our benchmark. Similar to Kelly, Palhares, and Pruitt (2022), our benchmark market portfolio is constructed simply as the equal-weighted average of excess returns of corporate bonds in our sample, i.e., w̃i,t = 1/N . Competing Factor Models. We consider two corporate bond observable-factor models: one is the BBW four-factor model (Bai, Bali, and Wen, 2019), and the other is a Fama-French five-factor model that combines three equity factors and two bond factors (Fama and French, 1993, 1996): (i) The BBW four factors (BBW4). Bai, Bali, and Wen (2019) propose a four-factor model for the corporate bond market. Those factors include the bond market factor, the downside risk factor (DRF), the credit risk factor (CRF), and the liquidity factor (LRF). The downside risk factor is the value-weighted average return difference between the highest-VaR portfolio minus the lowest VaR portfolio within each rating portfolio; the credit risk factor is the value-weighted average return difference between the highest credit risk portfolio minus the lowest credit risk portfolio within each VaR portfolio, and the liquidity risk factor is the value-weighted average return difference between the highest illiquidity portfolio minus the lowest illiquidity portfolio within each rating portfolio. Following Bai, Bali, and Wen (2019) and Dickerson, Mueller, and Robotti (2023), we construct DRF, CRF, and LRF using our own sample. (ii) The Fama-French five factors (FF5). We combine the Fama-French three equity factors, i.e., MKT, SMB, and HML (Fama and French, 1996), and two bond factors, i.e., the term and default factors (Fama and French, 1993). The term factor is the difference between the long-term government bond returns and the one-month Treasury bill rate. 23 The default factor is the difference between the long-term corporate bond returns and the long-term government bond returns. 4 Empirical Findings In our empirical implementation, we split the sample into two parts: the subsam- ple from July 2004 to June 2014 for model training and validating and the subsample from July 2014 to December 2020 for out-of-sample testing. We adopt a two-fold deterministic cross-validation scheme to determine the penalty parameters and learning rate for a given number of neural network layers ranging from 1 to 3.6,7 See the Internet Appendix for implementation details. In what follows, we present our main empirical findings and examine how much improvement the deep factor can make over the benchmark and competing factor models. 4.1 Deep Corporate Bond Factors Table 2 presents summary statistics of deep factors regarding mean return, volatil- ity, and annualized Sharpe ratio. We consider the equal-weighted corporate bond market factor as the benchmark when constructing deep factors, and restrict the weights of the long and short legs to 1 and -1, respectively. We normalize the in-sample annualized volatility of each factor to 10%, and adjust its out-of-sample returns accordingly. All out-of-sample results are based on in-sample parameter estimates. Panels A and B present deep factors constructed from the 1-, 2-, and 3-layer neural networks based on linear ranking in (16) and softmax ranking in (17), respectively. In-sample training evidence shows that the shallow neural network works well enough because the 1-layer deep factor has the highest annualized Sharpe ratio, 1.98 6 To be specific, the two-fold deterministic design divides the sample from July 2004 to June 2014 into two equal-length consecutive fold samples. We train our neural network separately on one fold and then calculate the fitted results with different tuning parameters on the other. We average the loss from the validation samples and choose the parameter pair that results in the smallest loss. Finally, we refit the model with the selected tuning parameters. 7 We made Python codes available for replicating all empirical results in the paper. 24 Table 2: Deep Corporate Bond Factors and Competing Factors This table reports the descriptive statistics, including means (and their Newey-West t-statistics (Newey and West, 1987)), standard deviations (Std), and Sharpe ratios of the deep factors and competing factors. We normalize all factors’ in-sample annualized volatility to 10% (2.89% monthly), and adjust their outof-sample returns accordingly. We take the sample from July 2004 to June 2004 for model training and validating, and from July 2014 to December 2020 for out-of-sample testing. In Sample Period (2004.7–2014.6) Mean t-stat Std SR Out of Sample Period (2014.7–2020.12) Mean t-stat Std SR Panel A. Deep Factors: Linear Ranking R1l R2l R3l 1.65 1.17 0.90 5.71 4.03 3.20 2.89 2.89 2.89 1.98 1.40 1.08 0.77 0.71 0.44 1.61 2.68 3.04 3.50 1.86 1.58 0.77 1.33 0.96 3.49 2.36 2.44 1.79 1.30 0.50 1.86 1.48 0.97 1.53 0.86 0.62 1.01 0.09 2.92 3.59 3.42 2.46 2.42 0.84 −0.12 −0.88 0.66 0.14 Panel B. Deep Factors: Softmax Ranking R1s R2s R3s 2.34 1.50 1.36 8.50 5.26 4.71 2.89 2.89 2.89 2.81 1.80 1.63 1.80 0.89 0.35 4.41 3.29 1.11 Panel D. BBW Four Factors MKTC DRF LRF CRF 0.71 0.52 0.51 0.56 2.44 2.06 1.88 1.74 2.89 2.89 2.89 2.89 0.85 0.62 0.61 0.67 0.46 0.26 0.23 0.04 2.29 1.43 2.20 0.20 Panel E. FF Five Factors MKTE SMB HML TRM DEF 0.43 0.23 0.08 0.43 0.04 1.34 0.95 0.25 1.71 0.15 2.89 2.89 2.89 2.89 2.89 0.51 0.28 0.09 0.51 0.05 0.71 −0.13 −0.87 0.47 0.10 2.62 0.31 1.94 1.62 0.40 from the linear ranking and 2.81 from the softmax ranking. In the out-of-sample tests, while the softmax ranking-based deep factor remains to have the highest Sharpe ratio in the 1-layer neural network (1.79), the linear ranking-based deep factor has the highest Sharpe ratio in the 2-layer neural network (1.33). However, the softmax rankingbased deep factor performs much better than the linear ranking-based one both in and out of the sample. We present the same summary statistics for the two competing factor models for comparison. Panel C is for the BBW factors. In the in-sample period, the DRF factor 25 earns statistically significant average returns, and the LRF and CRF factors only earn marginally significant average returns; however, none earns annualized Sharpe ratios larger than 1.00. We further see that in the out-of-sample period, only the LRF factor earns a significant average return and has an annualized Sharpe ratio slightly larger than 1.00, which remains much smaller than that of the 1-layer softmax ranking-based deep factor (1.01 versus 1.79). From Panel D, we find that none of the Fama-French factors earns an annualized Sharpe ratio larger than 1.00 in both the in-sample and out-of-sample periods. 4.2 Deep Tangency Portfolios We now move on to examine the portfolio performance. Table 3 presents the Sharpe ratios of our deep tangency portfolios and various optimal portfolios constructed from the competing factors. In Panel A, our deep tangency portfolio earns an annualized in-sample Sharpe ratio of 11.86 based on linear ranking and 11.27 based on softmax ranking in the 1-layer neural network, in stark contrast to the corresponding Sharpe ratio of the benchmark market factor (0.85). Such high in-sample Sharpe ratios are not surprising as our deep learning model is trained to maximize the Sharpe ratio of the tangency portfolio formed by the market factor and the deep factor. Panel A of Figure 3 presents the in-sample scatter plot between the benchmark market factor and the 1-layer softmax ranking-based deep factor. They are highly negatively correlated, resulting in a high Sharpe ratio of the deep tangency portfolio according to the principle of diversification. It seems that our deep factor plays the role of a market-hedge portfolio. We are more interested in the out-of-sample performance of deep tangency portfolios and other optimal portfolios. Note that the in-sample estimates determine all portfolios’ weights. In Panel B, the deep tangency portfolio that combines the market portfolio and the linear ranking-based deep factor achieves the highest annualized 26 Table 3: Performance of Deep Tangency Portfolios This table presents the Sharpe ratios of various tangency portfolios. For the deep learning model, the market factor is the only benchmark, and we consider the 1-3 layers in the neural network architecture. We take the sample from July 2004 to June 2004 for model training and validating, and from July 2014 to December 2020 for out-of-sample testing. We follow Barillas and Shanken (2017) to statistically test the significance of the Sharpe ratio improvement of the deep tangency portfolio over the benchmark portfolio. ∗ ∗ ∗, ∗∗, and ∗ denote the level of significance of 1%, 5%, and 10%, respectively. Panel A. In Sample Period (2004.7–2014.6) MKTC BBW4 FF5 0.85 0.86 0.89 A.1 Deep TP: Linear Ranking A.2 Deep TP: Softmax Ranking L1 L2 L3 L1 L2 L3 11.86*** 11.88*** 3.94*** 11.77*** 11.78*** 3.09*** 12.09*** 12.17*** 2.65*** 11.27*** 11.30*** 5.14*** 12.23*** 12.31*** 3.67*** 11.88*** 11.89*** 3.38*** Panel B. Out of Sample Period (2014.7–2020.12) B.1 Deep TP: Linear Ranking MKTC BBW4 FF5 0.86 0.69 1.27 B.2 Deep TP: Softmax Ranking L1 L2 L3 L1 L2 L3 1.04*** 1.03*** 1.19 1.49*** 1.48*** 1.57*** 2.17*** 2.11*** 1.99*** 2.29*** 2.28*** 2.50*** 1.48*** 1.46*** 1.56*** 1.13*** 1.13*** 1.38 out-of-sample Sharpe ratio of 2.17 in the 3-layer neural network, whereas it earns the highest annualized out-of-sample Sharpe ratio of 2.29 in the 1-layer neural network when the market portfolio and the softmax ranking-based deep factor are combined. The Sharpe ratios of the deep tangency portfolios are much higher than those of the market portfolio (0.86) and the deep factors. Given that the softmax ranking-based deep factor and deep tangency portfolio perform much better than the linear rankingbased ones, in what follows, we implement our empirical analysis mainly relying on the softmax ranking-based deep learning model. Panel B of Figure 3 presents the scatter plot between the market portfolio and the 1layer softmax ranking-based deep factor in the out-of-sample period and suggests that the deep factor negatively correlates with the market portfolio. Notably, the deep factor and the market portfolio hardly go down simultaneously, as few observations are 27 in the lower-left quadrant (see Figure 1). To further verify this point, Panel C of Figure 3 displays the cumulative returns over time of the market portfolio and the deep factor for the in-sample and out-of-sample periods, respectively. We observe that the deep factor moves in the opposite direction during a market downturn. For instance, in the 2008 global financial crisis (in-sample) and the outbreak of the Covid-19 pandemic (out-of-sample), the cumulative returns of the deep factor keep increasing, whereas the market portfolio usually suffers losses, which is in line with the findings from the previous subsection. Those results provide further evidence supporting the deep factor as a market-hedge portfolio. The optimal portfolios constructed from the competing factors perform much worse than the deep tangency portfolio both in the in-sample and out-of-sample periods. The optimal portfolio constructed from the BBW four factors has an annualized in-sample Sharpe ratio of only 0.86 and an annualized out-of-sample Sharpe ratio of 0.69. The portfolio constructed from the Fama-French five factors has slightly larger in-sample and out-of-sample Sharpe ratios than the optimal portfolio based on the BBW four factors. Note that the Fama-French five factors contain three equity factors (MKT, SMB, and HML) and two bond factors (Term and Default factors). What happens when we combine the competing factors and our deep factors? Do those observable factors contain useful information not spanned by the deep factor? Table 3 also presents the Sharpe ratios of the portfolios constructed using various competing factors and a deep factor. When we combine the deep factor with the BBW four factors, both in-sample and out-of-sample Sharpe ratios are very similar to those of our deep tangency portfolios. For example, the out-of-sample Sharpe ratio of the optimal portfolio constructed from the BBW four factors and the 1-layer softmax rankingbased deep factor is 2.28, almost the same as that of our tangency portfolio (2.29). Given that our deep factor is constructed by taking the bond market factor as a benchmark and using all firm characteristics, it should already contain non-market infor28 Figure 3: Correlations and Cumulative Returns Panels A and B of the figure present scatter plots of the bond market factor and the deep factor for the in-sample and out-of-sample periods. Panel C presents cumulative returns of the deep and market factors for the in-sample (Panel C.1) and out-of-sample (Panel C.2) periods. Panel D plots the cumulative returns of the deep tangency portfolio and tangency portfolios constructed from BBW’s four factors and Fama-French’s five factors for the out-of-sample period. Panel D.1 is for cumulative returns of original tangency portfolios, and Panel D.2 is for cumulative returns of all tangency portfolios normalized to have 10% annualized volatility. Panel A: In Sample 0.010 0.008 0.006 Rd1 Return 0.000 0.005 0.004 0.002 0.000 0.010 0.002 0.015 0.004 Cum.Ret 0.10 0.05 0.00 0.05 Market Return 0.10 0.15 0.006 0.20 Panel C1: Deep Portfolio and Market: In sample 1.8 Rd1 MKT 1.6 Cum.Ret Rd1 Return 0.005 2.0 1.8 1.6 1.4 1.2 1.0 1.4 0.06 0.04 0.02 0.00 Market Return 0.02 0.04 0.06 2020 2021 Panel D1: Tangency Portfolio R1opt BBW FF5 1.2 2006 2007 2008 2009 2010 Date 2011 2012 2013 2014 2015 Panel C2: Deep Portfolio and Market: Out-of sample 4 Rd1 MKT 1.2 1.1 1.0 2015 2016 2017 2018 Date 2019 2020 2016 2017 2018 Date 2019 2018 2019 Panel D2: Tangency Portfolio (10% Annualized Volatility) Cum.Ret Cum.Ret 0.08 1.0 2005 1.3 Panel B: Out of Sample 0.010 3 2 1 2021 R1opt BBW FF5 2015 2016 2017 Date 2020 2021 mation of the BBW factors; therefore, including those factors should not improve the Sharpe ratio over our deep tangency portfolio. We notice that the portfolio weights on the three non-market BBW factors are negligible. A notable result is that the in-sample performance of the optimal portfolio constructed from the Fama-French five factors and a deep factor is much worse because our deep factor is constructed by taking the bond market factor, not the equity market factor, as a benchmark. However, its out-of-sample performance is slightly worse than 29 the 3-layer linear ranking-based deep tangency portfolio with an annualized Sharpe ratio of 1.99, but slightly better than the 1-layer softmax ranking-based deep tangency portfolio with an annualized Sharpe ratio of 2.50. Panel D1 of Figure 3 presents the cumulative returns of our 1-layer softmax rankingbased deep tangency portfolio and optimal portfolios constructed from the competing factor models in the out-of-sample period. We see that the cumulative returns of our deep tangency portfolio increase steadily over time, and market downturns do not have any impacts on its returns; however, in spite that the cumulative returns on the competing optimal portfolios increase over time, their variations are very large, and notably, those portfolios usually suffer big losses in periods of market downturns. To further examine the performance of various portfolios, we normalize all the above optimal portfolios to have the same annual volatility of 10% and present the cumulative returns of those normalized portfolio returns. We see from Panel D2 of Figure 3 that our deep tangency portfolio, benefiting from its low volatility, has much higher cumulative returns in the out-of-sample period. While we have just used one deep factor in our previous analysis, our methodology is flexible enough to introduce multiple deep factors if necessary. This can be done by simply iterating the algorithm by taking the deep factor extracted as another benchmark and the market factor. Table 4 presents the performance of the softmax ranking-based deep tangency portfolios constructed from the benchmark market factor and 1-3 deep factors. In-sample training suggests a minor benefit from using more deep factors. The out-of-sample evidence shows that the first deep factor extracted from the one-layer neural network performs very well, as the Sharpe ratio improvement from using 2 or 3 deep factors over the tangency portfolio with one deep factor is negligible and statistically insignificant. Therefore, in what follows, we focus on the first deep factor and the corresponding deep tangency portfolio from the 1-layer neural network. 30 Table 4: Multiple Deep Factors This table presents Sharpe ratios of the deep tangency portfolios constructed using multiple softmax ranking-based deep factors. We take the sample from July 2004 to June 2004 for model training and validating and the sample from July 2014 to December 2020 for out-of-sample testing. We sequentially add one additional deep factor for each choice of the number of neural network layers ranging from 1 to 3. The test of the Sharpe ratio improvement by including additional deep factors over the deep tangency portfolio with one deep factor is based on Barillas and Shanken (2017). ∗ ∗ ∗, ∗∗, and ∗ denote the level of significance of 1%, 5%, and 10%, respectively. MKTC D1 D2 D3 Panel A. In Sample Period (2004.7–2014.6) L1 L2 L3 0.85 0.85 0.85 11.27*** 12.23*** 11.88*** 11.31 12.24 14.12*** 11.33 12.37 14.12 Panel B. Out of Sample Period (2014.7–2020.12) L1 L2 L3 4.3 0.86 0.86 0.86 2.29*** 1.48*** 1.13*** 2.31 1.48 0.79 2.30 1.46 0.79 Factor-Spanning Regressions The key findings are that our deep factor constructed from a nonlinear combination of firm characteristics captures missing risks other than the market factor and plays a role as a hedge portfolio to market downturns. Commonly used observable factors do not contain extra pricing information regarding Sharpe ratio improvement combined with the deep factor. In this part, we further examine these issues by implementing the simple factor-spanning regressions of the form, Rd,t = α + β ′ ft + ϵt , (23) where ft is a set of observable factors (e.g., BBW factors), and Rd,t is our deep factor. Given its superior performance, we focus on the 1-layer softmax ranking-based deep factor. We also run a regression of Rtopt = α + β ′ ft + ϵt , 31 (24) Table 5: Factor-Spanning Regressions This table reports factor-spanning regression results for the out-of-sample period. Specifically, we regress the deep factor and the deep tangency portfolios on the two competing factor models, the BBW four-factor model and the Fama-French five-factor model. Newey-West tstatistics are presented in brackets (Newey and West, 1987). Panel A. BBW Four Factors Rd1 R1opt Rd1 R1opt α βM KTC βDRF βCRF βLRF R2 0.21 (5.88) 0.19 (5.88) −0.16 (−2.17) −0.06 (−0.81) 0.06 (0.94) 0.05 (0.94) 0.02 (0.12) 0.01 (0.12) 0.04 (1.91) 0.04 (1.91) 19.89 α βM KTE Panel B. FF Five Factors βSM B βHM L 0.21 (5.00) 0.19 (4.68) 0.01 (0.83) 0.02 (1.58) −0.01 (−1.04) −0.01 (−0.95) 0.02 (1.02) 0.02 (1.28) 15.90 βT RM βDEF R2 −0.05 (−2.51) −0.00 (−0.20) −0.08 (−2.33) −0.02 (−0.54) 21.18 9.16 where Rtopt represents the deep tangency portfolio constructed from the bond market factor and the 1-layer softmax ranking-based deep factor. Such a regression provides further evidence of whether the small number of observable factors can span the deep tangency portfolio. Table 5 presents the results of the factor-spanning regression for the out-of-sample period. Panel A reports the alphas and betas from the spanning regressions of the deep factor and the deep tangency portfolio on the BBW four factors, respectively. It can be observed that the BBW four factors cannot explain excess returns on both the deep factor and the deep tangency portfolio. The alpha estimate is approximately 0.21% in the regression of the deep factor, and it is 0.19% in the regression of the deep tangency portfolio. Both alpha estimates are highly statistically significant. The loading of the deep factor on the bond market factor is negative and statistically significant, -0.16 (t = −2.17), and the loading of the deep tangency portfolio on the bond market factor is almost zero, again suggesting that the deep factor serves as a market-hedge portfolio. We find similar results in the spanning regressions on the Fama-French five factors (Panel B). The alpha estimate is about 0.21% (t = 5.00) in the regression of the deep 32 factor and is about 0.19% (t = 4.68) in the regression of the deep tangency portfolio. We find that both the deep factor and the deep tangency portfolio insignificantly load on three equity factors. We also find that the deep factor significantly and negatively loads on both the term and default factors, while the deep tangency portfolio does not expose to those two factors. The negative loadings of the deep factor and negligible loadings of the deep tangency portfolio on the term and default factors further suggest that our deep factor is a bond market hedge portfolio. 4.4 Interpreting Deep Characteristics By combining an economically motivated loss function with deep learning and constructing the deep factor as long-short portfolio returns, we aim to improve the transparency and interpretability of our methodology. Therefore, a natural next step is understanding how different characteristics contribute to the deep factor. Our methodology’s nonlinear activation of neural networks transforms characteristics into a deep one, a highly nonlinear combination of raw characteristics whose exact functional form is unknown to us in principle. We first evaluate the linear contribution of each characteristic to the deep characteristic by running the Fama-MacBeth (L) cross-sectional regressions (Fama and MacBeth, 1973) of the deep characteristic Zi,t on raw characteristics zk,i,t , (L) Zi,t = at + b1,t z1,i,t + · · · + bk,t zk,i,t + · · · + bK,t zK,i,t + ϵi,t , (25) for i = 1, . . . , N . Given that all characteristics are cross-sectionally normalized, we can then evaluate each characteristic’s contribution by the explained variation using the time-series average of b̂k,t , for k = 1, . . . , K. Two observations are in order: first, most characteristics significantly contribute to the deep characteristic in the above regression, suggesting no clear evidence of sparsity of characteristics, and second, the deep characteristic is not dominated by a small 33 particular set of characteristics, as the (absolute) values of all coefficients are relatively small. Figure 4 presents the top 30 most important characteristics, with bond, equity, and option characteristics classified by the blue, yellow, and red bars, respectively. We report both the coefficient signs and significance levels. We find that all of them significantly contribute to deep characteristics. The top 10 most important variables include five bond characteristics, namely, size (SIZE), monthly turnover (TURN), illiquidity (LIQ BPW) (Bao, Pan, and Wang, 2011), downside risk beta (DRF BETA), and yield-to-maturity (YTM), two equity characteristics, namely, market equity (ME), and quarterly asset liquidity (ALM), and three option characteristics, namely, stock-option volume ratio (SO), trading volume (VOL), and implied and historical volatility spread (IVRV). To further examine the importance of different types of characteristics, we reconstruct the softmax ranking-based deep tangency portfolios using equity and bond characteristics with option-related variables removed, or using bond characteristics alone. Table 6 presents Sharpe ratios obtained from different types of characteristics. Even though the in-sample training results are more or less similar, their out-of-sample performance differs. When we use all characteristics, as before, the one-layer neural network works quite well, and the deep tangency portfolio earns an out-of-sample annualized Sharpe ratio of 2.29. However, when we exclude the option-related variables, even though the out-of-sample Sharpe ratio of the deep tangency portfolio still reaches the highest value from the 1-layer neural network, it becomes smaller, only 1.83. This suggests that option-related variables contain valuable information regarding future corporate bond returns. Even worse is that when we use bond characteristics alone, the performance of the deep tangency portfolio further deteriorates. Its best out-ofsample Sharpe ratio is only about 0.71 from the 1-layer neural network, which is even smaller than the market portfolio. To sum up, all three types of characteristics are important and weighted heavily 34 Figure 4: Variable Importance: Linear Contributions newchar This figure presents the variable importance via the Fama-MacBeth cross-sectional regressions (L) of the deep characteristic Zi,t on raw characteristics zk,i,t over the in-sample period. We report the normalized average coefficient β̂k,t . The blue bars stand for bond characteristics, the yellow for equity characteristics, and the red for option characteristics. 0.3 SIZE*** SO*** ME*** TURN*** LIQ_BPW*** ALM*** VOL*** DRF_BETA*** YTM*** IVRV*** STR*** ACC*** ABR*** std_barQ_1mom*** LIQ_P_FHT*** MOM1M*** RNK3M*** NOPT*** VAR5*** MOM12*** RNK1M*** VARIANCE*** MOM6M*** UNC_BETA*** BM_IA*** T2M*** LIQ_P_HL*** MOM12M*** TERM_DEF_RVAR*** OP*** RSUP*** barQ*** ROA*** LIQ_TRADE*** SEAS1A*** DSO*** RE*** NOA*** ISKEW*** ILL*** PCRATIO*** CHTX*** RVAR_FF3 RNK12M*** SUE*** LIQ_RANGE_M*** RVAR_MEAN** CHPM*** CASH*** RDM*** 0.2 0.1 0.0 0.1 0.2 0.3 Bond Equity Option in the deep characteristic. Therefore, they are necessary for constructing the deep tangency portfolio. This finding is, in fact, in stark contrast to previous studies that argue that those characteristics that predict equity returns do not necessarily forecast corporate bond returns (see, e.g., Chordia et al., 2017; Bali et al., 2021). But it provides further empirical evidence in support of the integration between the bond and equity markets (Schaefer and Strebulaev, 2008; Kelly, Palhares, and Pruitt, 2022). 35 Table 6: Importance of Characteristics This table presents annualized Sharpe ratios obtained from different types of characteristics. We consider three sets of characteristics: all 132 characteristics, 102 bond, and equity characteristics, and 41 bond characteristics. The test of the Sharpe ratio improvement of the deep tangency portfolio over the market factor is based on Barillas and Shanken (2017). ∗ ∗ ∗, ∗∗, and ∗ denote the level of significance of 1%, 5%, and 10%, respectively. MKTC Bond+Equity+Option Bond+Equity Bond Panel A. In Sample Period (2004.7–2014.6) 0.85 0.85 0.85 L1 L2 L3 11.27*** 12.23*** 11.88*** 11.73*** 10.16*** 12.05*** 11.05*** 11.08*** 11.55*** Panel B. Out of Sample Period (2014.6–2020.12) 0.86 0.86 0.86 L1 L2 L3 4.5 2.29*** 1.48*** 1.13*** 1.83*** 0.49 1.70*** 0.71 0.53 0.61 Additional Analyses 4.5.1 Latent Factors and Deep Factors A recent paper by Kelly, Palhares, and Pruitt (2022) shows that a five-factor model based on the instrumental principle component analysis (IPCA, Kelly, Pruitt, and Su, 2019) outperforms commonly used observable factor models in pricing corporate bonds. They find that a tangency portfolio constructed from their five IPCA factors using the ICE corporate bond return data can earn an annualized out-of-sample Sharpe ratio of as large as 6.23. We note that in another paper, Kelly and Pruitt (2022) shows that the core analysis of Kelly, Palhares, and Pruitt (2022) is robust to using the TRACE data. We follow their IPCA approach and construct five corporate bond factors using our TRACE data and all three types of characteristics. To be consistent with our primary empirical analysis, we use the same in-sample and out-of-sample split as before and extract the out-of-sample IPCA factors using in-sample model parameter estimates and out-of-sample characteristics. Panel A of Table 7 summarizes the optimal portfolio’s in-sample and out-of-sample Sharpe ratios constructed from the IPCA factors. While both Kelly, Palhares, and Pruitt 36 (2022) and Kelly and Pruitt (2022) find that an optimal portfolio constructed using the five IPCA factors can earn an out-of-sample Sharpe ratio of larger than 6 in using both ICE and TRACE corporate bond data, we find that such a portfolio can only earn an in-sample Sharpe ratio of 2.95 and an out-of-sample Sharpe ratio of 1.67, both of which are smaller than the corresponding values of our deep tangency portfolio (see Table 3). There are two reasons why we find such a weaker out-of-sample Sharpe ratio. First, the sample size in our paper is much larger than that in Kelly and Pruitt (2022): the total number of bond-month observations in our paper is 633,600, whereas it is only 144,933 in Kelly and Pruitt (2022). Second, both Kelly, Palhares, and Pruitt (2022) and Kelly and Pruitt (2022) adopt an expanding window procedure to construct the out-ofsample IPCA factors, whereas we extract out-of-sample IPCA factors by fixing model parameters at the in-sample estimates to make it comparable with our methodology.8 We further find that combining the deep factor with the IPCA five factors improves the out-of-sample Sharpe ratio of the optimal portfolio to 2.24, similar to that of our deep tangency portfolio; such a Sharpe ratio improvement over the IPCA optimal portfolio is highly statistically significant. Given that the IPCA factors are also estimated by taking into account all firm characteristics (in a linear form), and that Kelly, Palhares, and Pruitt (2022) and Kelly and Pruitt (2022) show that the IPCA factors extremely outperform popular observable factors, we examine whether they can span our deep factor and deep tangency portfolio. Panel B presents the spanning regression results, which show that the five IPCA factors cannot explain excess returns on both the deep factor and the deep tangency portfolio, as the alpha estimates are about 0.17% and 0.16%, respectively, which are highly statistically significant in both regressions. In addition, a recent paper by Lettau and Pelger (2020) proposes a risk-premium 8 We also implement a recursive expanding-window approach similar to Kelly, Palhares, and Pruitt (2022) and Kelly and Pruitt (2022) using our TRACE data and find an almost identical out-of-sample Sharpe ratio of the IPCA optimal portfolio. The results show a mean of 0.39 and a standard deviation of 0.81, resulting in an annualized Sharpe ratio of 1.67. 37 Table 7: Latent Factors and Deep Factors Panel A of the table presents Sharpe ratios of the tangency portfolios constructed from the IPCA five factors and RP-PCA five factors. The test of Sharpe ratio improvement of the tangency portfolio constructed from latent factors and the deep factor over that from latent factors alone is based on Barillas and Shanken (2017). ∗ ∗ ∗, ∗∗, and ∗ denote the level of significance of 1%, 5%, and 10%, respectively. Panel B presents the factor-spanning regressions of the deep factor and the deep tangency portfolio on the IPCA five factors or the RP-PCA five factors over the out-of-sample period. Newey-West t-statistics are presented in brackets (Newey and West, 1987). Panel A. Sharpe Ratios TP IPCA5 RP-PCA5 2.95 1.04 In Sample (2004.7–2014.6) L1 L2 11.24*** 11.29*** 12.34*** 12.06*** Out of Sample (2014.7–2020.12) TP L1 L2 L3 L3 11.65*** 11.74*** 1.67 0.95 2.24*** 2.19*** 1.49 1.41*** 1.14 1.07 Panel B. Spanning Regressions β2 β3 β4 β5 R2 B.1. IPCA Five Factors Rd1 0.17 0.03 (4.51) (0.47) R1opt 0.16 0.05 (4.44) (0.85) −0.04 (−1.90) 0.01 (0.34) −0.08 (−1.88) −0.05 (−1.38) 0.03 (0.80) 0.04 (1.08) 0.14 (4.36) 0.09 (3.16) 39.56 B.2. RP-PCA Five Factors Rd1 0.20 0.01 (7.98) (0.67) R1opt 0.18 0.03 (8.03) (2.25) −0.09 (−5.57) −0.08 (−5.44) 0.02 (1.34) 0.02 (1.88) −0.04 (−1.60) −0.04 (−1.61) −0.05 (−4.31) −0.04 (−4.06) 45.89 α β1 37.71 42.85 principal component analysis (RP-PCA) model for estimating latent asset pricing factors. Lettau and Pelger (2020) show that the RP-PCA performs much better than the PCA method, particularly in identifying the weak factors. Table 7 also examines how the five RP-PCA factors perform compared to our deep factor. Again, we find that the out-of-sample Sharpe ratio of the RP-PCA tangency portfolio is much smaller than that of the deep tangency portfolio (0.95 vs. 2.29), and the five RP-PCA factors are unable to explain excess returns on both the deep factor and the deep tangency portfolio. 4.5.2 Importance of Nonlinearity The deep factor in our deep learning model is formed on a deep characteristic that is a highly nonlinear combination of raw characteristics. A natural question is how im38 portant this nonlinear combination is, and how differently the deep tangency portfolio performs compared to those constructed from the linear machine learning models. For this purpose, we construct long-short portfolios based on Lasso, Ridge, PCA, and PLS. To be specific, at each time t, we form the return forecasts using all characteristics relying on these four linear machine learning models as follows, Wt ≡ Et [rt+1 ] = zt b, (26) where b is a K × 1 model parameters. We then construct an equal-weighted long-short portfolio using the standard sorting approach that longs top 30% and shorts bottom 30% corporate bonds based on Wt . As before, we use the sample from July 2004 to June 2004 for model training and validating and the sample from July 2014 to December 2020 for out-of-sample testing. See the Internet Appendix for details. Panel A of Table 8 presents summary statistics of the portfolios. We see that while the portfolios constructed from Lasso, Ridge, and PLS deliver average returns that are statistically significant both in the sample and out of the sample, none achieves an annualized Sharpe ratio larger than 1. Our deep tangency portfolio performs much better than those portfolios. We further examine what would happen if we remove the nonlinear tanh activation function and use a linear combination of raw characteristics in deep learning. We see from Panel B of Table 8 that without the nonlinear activation function, we need a deeper neural network , and the out-of-sample performance of the deep tangency portfolio becomes much worse, compared to the case with the nonlinear activation. To sum up, both nonlinear combinations of characteristics and nonlinear activation in deep learning play important roles in constructing the deep tangency portfolio. Such findings are largely consistent with what the literature has found on nonlinear effects of characteristics on expected returns and covariances (see, e.g., Freyberger et al., 39 Table 8: Importance of Nonlinearity Panel A reports descriptive statistics of the long-short portfolios based on return forecasts from Lasso, Ridge, PCA, and PLS. We normalize the in-sample annualized volatility to 10%. Panel B presents annualized Sharpe ratios of the softmax ranking-based deep tangency portfolios constructed by replacing the nonlinear activation function with a linear one in the deep learning model. The test of Sharpe ratio improvements by including an extra deep factor in the tangency portfolio is based on Barillas and Shanken (2017). ∗ ∗ ∗, ∗∗, and ∗ denote the level of significance of 1%, 5%, and 10%, respectively. We use the sample from July 2004 to June 2004 for model training and validating and the sample from July 2014 to December 2020 for out-of-sample testing. Panel A. Linear ML Portfolios In Sample (2004.7–2014.6) Mean t-stat SR Lasso Ridge PCA PLS 0.69 0.61 0.45 0.52 2.49 2.36 1.81 2.04 Out of Sample (2014.7–2020.12) Mean t-stat SR 0.83 0.74 0.54 0.63 0.46 0.34 0.46 0.19 2.21 2.42 1.59 2.01 0.83 0.94 0.67 0.85 Panel B. Linear Activation D1 L1 L2 L3 In Sample (2014.7–2020.12) D2 D3 10.10*** 9.21*** 9.71*** 10.11 9.21 9.81 D1 10.17 9.21 9.81 Out of Sample (2004.7–2014.6) D2 D3 0.31 0.56 0.40 0.31 0.56 0.41 0.30 0.57 0.41 2020; Gu et al., 2020; Cong et al., 2022). 5 Conclusion Constructing the tangency portfolio has economic importance: the mean-variance efficient (MVE) portfolio is equivalent to the stochastic discount factor (SDF). The solution proposed by Markowitz (1952) to the MVE portfolio is challenging to implement in practice for a large number of individual assets. Modern asset pricing relies on factor models to approximate the SDF using a small number of characteristics-managed factors. However, these commonly used factors cannot fully span the mean-variance efficient frontier of the entire asset universe, leading to the issue of a “factor zoo” and the curse of high dimension. The literature has not found clear-cut evidence of sparsity of characteristics, and Kozak and Nagel (2023) demonstrate that a large number of characteristics are necessary to span the efficient frontier. 40 This paper proposes a parametric approach to estimating optimal portfolio weights directly, bypassing the need to estimate mean returns and covariance using deep learning techniques with an economically motivated target. A divide-and-conquer strategy estimates the tangency portfolio by combining a deep factor with the market factor. Using high-dimensional firm characteristics, the endogenous deep factor construction mimics the commonly used characteristic-sorted factor approach in empirical asset pricing. The economically-guided deep factor plays two important roles: (i) it has a low or even negative correlation with benchmark factors, providing a potential hedge portfolio, and (ii) it may span any missing risk factors other than benchmark factors. We apply our method to the corporate bond market. Our deep tangency portfolio outperforms those constructed using commonly used observable or latent factors, with an annualized Sharpe ratio of 2.29. We further demonstrate that recently developed latent-factor models, such as RP-PCA and IPCA, cannot explain our deep factor and deep tangency portfolio. Additionally, we emphasize the importance of considering various types of characteristics in constructing the deep tangency portfolio. Excluding any type of characteristics (equity, corporate bond, or options) would result in a worsened performance of the deep tangency portfolio. This evidence contrasts starkly with previous studies that suggest characteristics predicting equity returns may not necessarily forecast corporate bond returns (see, e.g., Chordia et al., 2017; Bali et al., 2021), but offers supporting evidence for the integration between the bond and equity markets (see, e.g., Schaefer and Strebulaev, 2008; Kelly et al., 2022). References Ait-Sahalia, Y. and M. W. Brandt (2001). Variable selection for portfolio choice. Journal of Finance 56, 1297–1351. Avramov, D., S. Cheng, and L. Metzker (2023). Machine learning vs. economic restrictions: Evidence from stock return predictability. Management Science 69(5), 2587–2619. 41 Bai, J., T. G. Bali, and Q. Wen (2019). Common risk factors in the cross-section of corporate bond returns. Journal of Financial Economics 131(3), 619–642. Bali, T. G., A. Goyal, D. Huang, F. Jiang, and Q. Wen (2021). The cross-sectional pricing of corporate bonds using big data and machine learning. Technical report, Georgetown University. Bali, T. G., A. Subrahmanyam, and Q. Wen (2021). Long-term reversals in the corporate bond market. Journal of Financial Economics 139(2), 656–677. Bao, J., J. Pan, and J. Wang (2011). The illiquidity of corporate bonds. Journal of Finance 66, 911–946. Barillas, F., R. Kan, C. Robotti, and J. Shanken (2020). Model comparison with sharpe ratios. Journal of Financial and Quantitative Analysis 55(6), 1840–1874. Barillas, F. and J. Shanken (2017). Which alpha? Review of Financial Studies 30(4), 1316–1338. Barillas, F. and J. Shanken (2018). Comparing asset pricing models. Journal of Finance 73, 715– 754. Bessembinder, H., K. Kahle, W. Maxwell, and D. Xu (2009). Measuring abnormal bond performance. Review of Financial Studies 22, 4219–4258. Brandt, M. W. (1999). Estimating portfolio and consumption choice: A conditional euler equations approach. Journal of Finance 54, 1609–1646. Brandt, M. W. and P. Santa-Clara (2006). Dynamic portfolio selection by augmenting the asset space. Journal of Finance 61(5), 2187–2217. Brandt, M. W., P. Santa-Clara, and R. Valkanov (2009). Parametric portfolio policies: Exploiting characteristics in the cross-section of equity returns. Review of Financial Studies 22, 3411–3447. Cao, J., A. Goyal, X. Xiao, and X. Zhan (2022). Implied volatility changes and corporate bond returns. Management Science, Forthcoming. Chen, L., M. Pelger, and J. Zhu (2022). Deep learning in asset pricing. Management Science, Forthcoming. Choi, J. and Y. Kim (2018). Anomalies and market (dis)integration. Journal of Monetary Economics 100, 16–34. Chordia, T., A. Goyal, Y. Nozawa, A. Subrahmanyam, and Q. Tong (2017). Are capital market anomalies common to equity and corporate bond markets? an empirical investigation. Journal of Financial and Quantitative Analysis 52(4), 1301–1342. Chung, K. H., J. Wang, and C. Wu (2019). Volatility and the cross-section of corporate bond returns. Journal of Financial Economics 133(2), 397–417. 42 Cochrane, J. H. (2011). Presidential address: Discount rates. Journal of Finance 66(4), 1047–1108. Cochrane, J. H. (2014). A mean-variance benchmark for intertemporal portfolio theory. Journal of Finance 69, 1–49. Cong, L. W., G. Feng, J. He, and X. He (2022). Asset pricing with panel tree under global split criteria. Technical report, City University of Hong Kong. Cong, L. W., G. Feng, J. He, and J. Li (2023). Sparse modeling under grouped heterogeneity with an application to asset pricing. Technical report, City University of Hong Kong. Daniel, K., L. Mota, S. Rottke, and T. Santos (2020). The cross-section of risks and returns. Review of Financial Studies 33, 1927–1979. DeMiguel, V., L. Garlappi, and R. Uppal (2009). Optimal versus naive diversification: How inefficient is the 1/n portfolio strategy? Review of Financial Studies 22(5), 1915–1953. DeMiguel, V., A. Martin-Utrera, F. J. Nogales, and R. Uppal (2020). A transaction-cost perspective on the multitude of firm characteristics. Review of Financial Studies 33(5), 2180–2222. Dick-Nielsen, J. (2009). Liquidity biases in trace. Journal of Fixed Income 19(2), 43–55. Dick-Nielsen, J. (2014). How to clean enhanced trace data. Technical report, Copenhagen Business School. Dickerson, A., P. Mueller, and C. Robotti (2023). Priced risk in corporate bonds. Journal of Financial Economics 150, 103707. Fama, E. F. and K. R. French (1992). The cross-section of expected stock returns. Journal of Finance 47(2), 427–465. Fama, E. F. and K. R. French (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics 33(1), 3–56. Fama, E. F. and K. R. French (1996). Multifactor explanations of asset pricing anomalies. Journal of Finance 51(1), 55–84. Fama, E. F. and K. R. French (2015). A five-factor asset pricing model. Journal of Financial Economics 116(1), 1–22. Fama, E. F. and K. R. French (2018). Choosing factors. Journal of Financial Economics 128(2), 234–252. Fama, E. F. and J. D. MacBeth (1973). Risk, return, and equilibrium: Empirical tests. Journal of Political Economy 81(3), 607–636. Feng, G., J. He, N. Polson, and J. Xu (2023). Deep Learning of Characteristics-Sorted Factor Models. Journal of Financial and Quantitative Analysis, Forthcoming. 43 Freyberger, J., A. Neuhierl, and M. Weber (2020). Dissecting characteristics nonparametrically. Review of Financial Studies 33, 2326–2377. Giannone, D., M. Lenza, and G. E. Primiceri (2021). Economic predictions with big data: The illusion of sparsity. Econometrica 89, 2409–2437. Giglio, S., B. Kelly, and D. Xiu (2022). Factor models, machine learning, and asset pricing. Annual Review of Financial Economics 14. Gu, S., B. Kelly, and D. Xiu (2020). Empirical asset pricing via machine learning. Review of Financial Studies 33, 2223–2273. Gu, S., B. Kelly, and D. Xiu (2021). Autoencoder asset pricing models. Journal of Econometrics 222(1), 429–450. Hansen, L. and R. Jagannathan (1991). Implications of security market data for models of dynamic economies. Journal of Political Economy 99, 225–262. Harvey, C. R., Y. Liu, and H. Zhu (2016). ... and the cross-section of expected returns. Review of Financial Studies 29(1), 5–68. He, X., G. Feng, J. Wang, and C. Wu (2021a). Benchmarking individual corporate bonds. Technical report, City University of Hong Kong. He, X., G. Feng, J. Wang, and C. Wu (2021b). Predicting individual corporate bond returns. Technical report, City University of Hong Kong. Hou, K., C. Xue, and L. Zhang (2020). Replicating anomalies. Review of Financial Studies 33(5), 2019–2133. Huang, T., L. Jiang, and J. Li (2023). Downside variance premium, firm fundamentals, and expected corporate bond returns. Journal of Banking and Finance, Forthcoming. Jostova, G., S. Nikolova, A. Philipov, and C. W. Stahel (2013). Momentum in corporate bond returns. Review of Financial Studies 26(7), 1649–1693. Kaniel, R., Z. Lin, M. Pelger, and S. Van Nieuwerburgh (2022). Machine-learning the skill of mutual fund managers. Technical report, National Bureau of Economic Research. Kelly, B. T., S. Malamud, and K. Zhou (2022). The virtue of complexity in return prediction. Journal of Finance, Forthcoming. Kelly, B. T., D. Palhares, and S. Pruitt (2022). Modeling corporate bond returns. Journal of Finance, Forthcoming. Kelly, B. T. and S. Pruitt (2022). Reconciling trace bond returns. Technical report, Yale University. 44 Kelly, B. T., S. Pruitt, and Y. Su (2019). Characteristics are covariances: A unified model of risk and return. Journal of Financial Economics 134(3), 501–524. Kozak, S. and S. Nagel (2023). When do cross-sectional asset pricing factors span the stochastic discount factor? Technical report, University of Michigan. Kozak, S., S. Nagel, and S. Santosh (2018). Interpreting factor models. Journal of Finance 73(3), 1183–1223. Kozak, S., S. Nagel, and S. Santosh (2020). Shrinking the cross-section. Journal of Financial Economics 135(2), 271–292. Lettau, M. and M. Pelger (2020). Factors that fit the time series and cross-section of stock returns. Review of Financial Studies 33, 2274–2325. Lin, H., J. Wang, and C. Wu (2011). Liquidity risk and expected corporate bond returns. Journal of Financial Economics 99, 628–650. Lopez-Lira, A. and N. L. Roussanov (2020). Do common factors really explain the cross-section of stock returns? Technical report, University of Pennsylvania. Markowitz, H. (1952). Portfolio selection. Journal of Finance 7, 77–99. Merton, R. C. (1973). An intertemporal capital asset pricing model. Econometrica 41, 867–887. Merton, R. C. (1980). On estimating the expected return on the market: An exploratory investigation. Journal of Financial Economics 8, 323–361. Negal, S. (2021). Machine Learning in Asset Pricing. Princeton University Press. Neuhierl, A., X. Tang, R. Varneskov, and G. Zhou (2021). Option characteristics as crosssectional predictors. Technical report, Washington University in St. Louis. Newey, W. K. and K. D. West (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–708. Newey, W. K. and K. D. West (1994). Automatic lag selection in covariance matrix estimation. Review of Economic Studies 61(4), 631–653. Raponi, V., R. Uppal, and P. Zaffaroni (2021). Robust portfolio choice. Technical report, mperial College Business School. Schaefer, S. M. and I. Strebulaev (2008). Structural models of credit risk are useful: Evidence from hedge ratios on corporate bonds. Journal of Financial Economics 90, 1–19. 45 Internet Appendix for “Deep Tangency Portfolios” (not for publication) Summary of Contents • Section I presents detailed definitions of bond, equity, and option characteristics used in our empirical studies. • Section II presents the implementation details of our deep learning algorithm and some robustness checks. • Section III presents the implementation details of four linear machine learning algorithms. 1 I Bond, Equity, and Option Characteristics Table A1: Description of 41 Bond Characteristics Characteristics Description AGE RATING T2M SIZE DUR VAR5 VAR10 LIQ BPW LIQ ROLL LIQ P HL LIQ P FHT LIQ AMIHUD LIQ STD AMIHUD LIQ TC IQR MKT BETA DEF BETA TERM BETA LIQ BETA DRF BETA CRF BETA LRF BETA VIX BETA UNC BETA STR VARIANCE SKEW KURT COSKEW ISKEW LIQ RANGE LIQ TRADE MKT RVAR TERM DEF RVAR TURN YTM MOM6 MOM12 LTR barQ std barQ 1mom LIQ RANGE M Time since issuance in years Bond credit rating The number of years to maturity Amount outstanding Bond duration Value-at-risk 5% over past 3 years Value-at-risk 10% over past 3 years Liquidity measure of transitory price movements. Roll’s liquidity High-low spread estimator Modified illiquidity measure based on zero returns Amihud liquidity Standard deviation of Amihud daily liquidity Interquartile range Market beta DEF factor beta TERM factor beta Liquidity beta of bond illiquidity factor Downside risk beta controlling bond market factor Credit risk beta controlling bond market factor Liquidity risk beta controlling bond market factor VIX index beta Macroeconomic Uncertainty Beta Short-term reversal t-1 Variance of raw returns Skewness of raw returns Kurtosis of raw returns Systematic skewness with bond market Idiosyncratic skewness Simple high-low spread Number of trades Market residual variance TERM DEF residual variance Bond Turnover Yield-to-maturity Momentum from t-2 to t-6 Momentum from t-7 to t-12 Long-term reversal from t-13 to t-48 average daily dollar volume in the 1-month period standard deviation of dollar volume in the 1-month period Simple high-low spread 2 Table A2: Description of 61 Equity Characteristics Characteristics Description ABR ACC ADM AGR ALM ATO BASPREAD BETA BM BM IA CASH CASHDEBT CFP CHCSHO CHPM CHTX CINVEST DEPR DOLVOL DY EP GMA GRLTNOA HERF HIRE ILL LEV LGR MAXRET ME ME IA MOM12M MOM1M MOM36M MOM60M MOM6M NI NINCR NOA OP PCTACC PM PS RD SALE RDM RE RNA ROA ROE RSUP RVAR CAPM RVAR FF3 RVAR MEAN Abnormal returns around earnings announcement Operating Accruals Advertising Expense-to-market Asset growth Quarterly Asset Liquidity Asset Turnover Bid-ask spread (3 months) Beta (3 months) Book-to-market equity Industry-adjusted book to market Cash holdings Cash to debt Cashflow-to-price Change in shares outstanding Industry-adjusted change in profit margin Change in tax expense Corporate investment Depreciation / PP&E Dollar trading volume Dividend yield Earnings-to-price Gross profitability Growth in long-term net operating assets Industry sales concentration Employee growth rate Illiquidity rolling (3 months) Leverage Growth in long-term debt Maximum daily returns (3 months) Market equity Industry-adjusted size Cumulative Returns in the past (2-12) months Previous month return Cumulative Returns in the past (13-35) months Cumulative Returns in the past (13-60) months Cumulative Returns in the past (2-6) months Net Equity Issue Number of earnings increases Net Operating Assets Operating profitability Percent operating accruals profit margin Performance Score R&D to sales R&D Expense-to-market Revisions in analysts’ earnings forecasts Return on Net Operating Assets Return on Assets Return on Equity Revenue surprise Residual variance - CAPM (3 months) Res. var. - Fama-French 3 factors (3 months) Return variance (3 months) 3 Description of 61 Equity Characteristics (continued) Characteristics Description SEAS1A SGR SP STD DOLVOL STD TURN SUE TURN ZEROTRADE 1-Year Seasonality Sales growth Sales-to-price Std of dollar trading volume (3 months) Std. of Share turnover (3 months) Unexpected quarterly earnings Shares turnover Number of zero-trading days (3 months) Table A3: Description of 30 Equity Option Characteristics Characteristics Description IVSLOPE IVVOL IVRV IVRV RATIO ATM CIVPIV SKEWIV IVD DCIV DPIV ATM-DCIVPIV NOPT SO DSO VOL PCRATIO PBA TOI MFVU MFVD RNS1M RNK1M IVARUD30 RNS3M RNK3M RNS6M RNK6M RNS9M RNK9M RNS12M RNK12M Implied Volatility Slope Volatility of atm implied volatility Implied and historical volatility spread Ratio of implied to historical volatility Implied volatility spread Implied volatility skew Implied volatility duration Change of implied volatility of atm call Change of implied volatility of atm put Change of implied volatility spread Number of traded options Stock-option volume ratio Stock-option dollar volume ratio Option Trading Volume Put-call ratio Proportional bid-ask spread Total open interest Option-implied upside semivariance Option-implied downside semivariance 1-month risk-neutral skewness 1-month risk-neutral kurtosis Option-implied variance asymmetry 3-month risk-neutral skewness 3-month risk-neutral kurtosis 6-month risk-neutral skewness 6-month risk-neutral kurtosis 9-month risk-neutral skewness 9-month risk-neutral kurtosis 12-month risk-neutral skewness 12-month risk-neutral kurtosis 4 II Implementation Details II.1 Data Clearning As shown in Table 1, we provide the summary statistics for all our Bond-Month observations (3200 monthly observations). Since the raw bond return data started in July 2002 and we have a three-year window requiring a minimum of one year of data to initialize risk characteristics, we start with the data from June 2004. After standardizing the raw characteristics, we filter the bond data whose maturity is less than two years and impute the missing values with 0. The equity and option characteristics are merged into bond data by the ’PERMNO’. If the firm has issued multiple stocks simultaneously, we only keep and merge the earliest issued stock. II.2 Model Training We divide the TRACE dataset into two parts: we perform all the training on the sample from July 2004 to June 2014 and test the sample from July 2014 to December 2020. We train the neural network structure and parameters during the in-sample period and determine the mean-variance portfolio’s weights based on the in-sample factors’ statistics. To avoid the effect of outliers during training, we also monthly winsorize the in-sample bond returns within the retlow,t and retup,t bounds, where retlow,t and retup,t are the cross-sectional 2.5% and 97.5% quantiles of the month t, respectively. Each month, bond returns that are lower/higher than the monthly low/up quantile of cross-sectional returns will be revised as the retlow,t and retup,t . It is important to emphasize that winsorization is only applied to in-sample data. All out-of-sample results are tested on non-winsorized data with potential extreme values. Considering the factor value, the bond market employs an equally weighted approach for each month’s selected 3200 bonds’ excess return. We replicate the BBW factors using the methodology outlined in Bai et al. (2019) with our corporate bond 5 dataset. For the IPCA/RP-PCA model, we train the PCA structure using the balanced individual bond return from July 2004 to June 2014 and output the factor value for both in-sample and out-of-sample periods based on this trained structure. Similar to the IPCA model, our neural network takes the individual bond data from July 2004 to June 2014 as a test training for our neural network structure and determines the choice of tuning parameters through a two-fold cross-validation process as shown in Table A2. Figure A1: Softmax Ranking: a1 = 50, and a2 = 8 In implementation, we choose a1 = 50 and a2 = 8 in Equation (17) such that at each time, about 50% to 70% of assets are in the middle rank and have zero weights, similar to the traditional sorting procedure (see Figure A1 below). We fix the network structure as trained, then feed in the pairwise characteristics data and return data to generate the in-sample and out-of-sample estimates. Throughout section II, all presented results are computed with PyTorch 1.10.2 and are parallelized across a server with 96 Intel(R) Xeon (R) Gold 6230 @ 2.10GHz CPUs and 314 GB of RAM. 6 II.3 Robustness to Tuning Parameters Selection Figure A2: Two-Fold Cross Validation This figure demonstrates the deterministic two-fold cross-validation scheme. We determine the tuning parameters using the sample from July 2004 to June 2005. Specifically, the deterministic design divides the sample into two consecutive fold samples. We train our neural network separately on one and then calculate the fitting result with different tuning parameters on the other. We average the out-of-sample loss and choose the parameter pair with the best performance on this criterion. July 2004 to Jun 2009 Jul 2009 to June 2004 July 2014 to December 2020 Fisrt Fold Train Validation Holdout Second Fold Validation Train Holdout In this subsection, we outline our procedure for selecting tuning parameters. To determine the optimal tuning parameters for our network, we employ a two-fold crossvalidation approach (in Figure A2) as follows: We divide the sample period from July 2004 to June 2014 into two consecutive fold samples of equal length. We train our neural network on one fold and evaluate the fitted results using different tuning parameters on the other fold. We compute the average loss from the validation samples and select the parameter pair resulting in the smallest loss. Finally, we retrain the model using the chosen tuning parameters. This approach ensures that the data used is completely in-sample, eliminating any look-ahead bias that may affect our out-ofsample trading results discussed in the main text. Initially, we start with a reasonable set of tuning parameters and test additional points adjacent to these sets. We evaluate a total of 16 combinations of tuning parameters, which are detailed in Table A4. II.4 Alternative Sample Split We design a complete out-of-sample test to verify the performance of the tangency portfolio in the chronological sample set.9 Inspired by Kaniel et al. (2022), we sequen9 In this section, the data we use differs slightly from the main text: we have certain corporate bond characteristics that are obtained based on the BBW factors in Bai et al. (2019). In this section, we directly 7 Table A4: Tuning parameter selection in the empirical analysis This table presents the network tuning parameters about the Sharpe ratio observed on our validation data. Notation a1 a2 HDN BTCH LR L1 Penalty L2 Penalty EPCH OPT Tuning Parameters Value in equation (17) Value in equation (17) Number of nodes in the hidden layer Batch size, in months Learning rate L1 penalty in objective function L2 penalty in objective function Number of optimization epochs Optimization method Candidates 50 5,8 66 120 1e-3,5*1e-3,1e-2,5*1e-2,1e-1,5*1e-1 1e-9,1e-8,1e-7,1e-6 1e-9,1e-8.1e-7,1e-6 400 Adam Chosen 50 8 66 120 5*1e-2 1e-8 1e-8 400 Adam Figure A3: Market time series for the different cross-out-of-sample folds This figure plots the bond market return from July 2014 to December 2020. Different colors denote the three cross-out-of-sample folds we use throughout the robustness check. dataset_1 dataset_2 dataset_3 0.20 0.15 0.10 0.05 0.00 0.05 0.10 2004 2006 2008 2010 2012 2014 2016 2018 2020 tially randomly split the dataset into three parts following Fama and French (2018). Thus, at every quarter (3 months), these three-month data are randomly assigned to a specific group of datasets. We keep this random split until the data for every month incorporate their factor values into our data, rather than calculating them on our dataset as described in the main text. 8 Table A5: Performance on Random Splited Datasets We report the out-of-sample performance of our model on the whole time period (2014.07 to 2020.12) based on the sequentially three-fold random splitting dataset. This table’s Panel A reports the descriptive statistics in percentage containing the mean of return, Newey West standard error (Newey and West, 1994), adjusted t-statistics, standard deviation (Std), Sharpe Ratio (SR), and past twelve months’ Maximal Drawdown (Max DD) of whole period out-ofsample deep portfolios and tangency portfolios. Panel B compares the Sharpe Ratio of the MVE portfolio between the market factor and adding one deep factor (from 1- to 3-layer models). Panel A. Descriptive Statistics Rd1 Rd2 Rd3 R1opt R2opt R3opt Mean tstat Std SR Max DD 0.39 0.21 0.36 0.41 0.26 0.36 (3.34) (4.56) (4.32) (4.66) (5.17) (5.18) 1.06 1.08 0.82 1.03 0.65 0.71 1.27 0.66 1.51 1.38 1.41 1.75 1.61 11.76 2.18 0.50 6.29 1.40 Panel B. Out of Sample Sharpe Ratios M KTC L1 L2 L3 0.80 1.38*** 1.41*** 1.75*** has been assigned to a group setting. Figure A3 shows how we split the full dataset into three groups, where different colors denote the three folds. We then implement the same process in subsection 2.2 to construct the neural network and take the threefold validation for each sub-group data: for a specific test fold, the other two folds are used to estimate and validate the parameters. Crossly estimating the parameters for three folds, we obtain an out-of-sample result on the whole sample period. Panel A of Table A5 lists the descriptive statistics of the out-of-sample deep portfolio on the entire time period. The 3-layer model generates the best deep portfolio with the highest Sharpe Ratio (1.51), followed by the 1-layer model (1.27). The Sharpe Ratio of the tangency portfolios shows significant improvement compared to the bond market. The results remain robust when subjected to cross-out-of-sample analysis with sequentially random sampling, indicating that our sample selection does not influ- 9 ence the prominent results. III Machine Learning Implementations In section 4.5.2, we present a long-short portfolio based on predicted bond returns from four machine learning methods: Lasso, Ridge, principal component analysis regression (PCA), and partial least squares (PLS). Our implementation follows He, Feng, Wang, and Wu (2021b), and we find consistently positive performance. PCA and PLS are classic dimension reduction techniques commonly used in empirical asset pricing. They solve the bias-variance trade-off problem using a lowdimension version of linearly transformed predictors to construct a predictive model. PCA and PLS consist of a two-step procedure where the first step combines predictors with a small set of linear combinations that best preserves the covariance structure. The first K components are used in multiple regressions in the second step. For our applications, K is set to 5 for both PCA and PLS. Lasso and ridge are linear predictive regressions used in machine learning finance to preserve the interpretability of linear models. They add a penalty over ordinary linear regression to preserve the predictors without transforming them. Lasso and ridge share similar loss functions but have different regularization effects. Lasso performs variable selection, while ridge shrinks the regression coefficients of useless predictors to very small numbers. A tuning parameter controls the penalty weight, with a larger penalty weight imposing more shrinkage on the coefficients. A three-fold crossvalidation determines all tuning parameters. 10