Uploaded by adam.darmanin.03

Deep Tangency Portfolios

advertisement
Deep Tangency Portfolios*
Guanhao Feng
Liang Jiang
Junye Li
Yizhi Song
First version: Nov. 2021; This version: September 27, 2023
Abstract
We propose a parametric approach to directly estimate the tangency portfolio
weights on high-dimensional individual assets by combining fundamental finance
theory with deep learning techniques. The deep tangency portfolio combines the
market factor and a deep long-short factor constructed using a large number of
firm characteristics. We apply our approach to the corporate bond market. The
deep factor acts as a market hedge and achieves a sizable market price of risk with
an out-of-sample annualized Sharpe ratio of 1.79. The deep tangency portfolio
outperforms those constructed from commonly used observable or latent factors
with an out-of-sample annualized Sharpe ratio of 2.29. We also find evidence supporting the integration between the bond and equity markets.
Keywords: Tangency Portfolios, Deep Learning, Factor Models, Portfolio Optimization, Corporate Bonds.
JEL Classification: C45, G11, G12.
* We appreciate insightful comments from Doron Avramov, Tarun Bali, Zhiguo He, Hendrik Bessembinder, Yong Chen, Hong Liu, Robert Macrae, Andreas Neuhierl, Seth Pruitt, Kuntara Pukthuanthong,
Jianeng Xu, Dacheng Xiu, and Mao Ye. We are also grateful for helpful comments from the seminar and conference participants at the University of Missouri, Sun Yat-Sen University, Fudan University, Xi’an Jiaotong-Liverpool University, 2022 AsianFA, 2022 International Conference on Finance
& Technology, 2022 China International Risk Forum, FIRM 2022, and 2023 SoFiE annual conference.
Feng (Email: gavin.feng@cityu.edu.hk) and Song (Email: yizhisong2-c@my.cityu.edu.hk)
are at the City University of Hong Kong, Li (Email: li junye@fudan.edu.cn) and Jiang (Email:
jiangliang@fudan.edu.cn) are at Fudan University.
1
1
Introduction
A fundamental theory in asset pricing is the equivalence of the mean-variance ef-
ficient (MVE) portfolio and the stochastic discount factor (SDF) (Hansen and Jagannathan, 1991). The maximum squared Sharpe ratio of the MVE portfolio equals the
minimum variance of the SDF in the asset space of the economy. Markowitz (1952)
pioneers the modern portfolio theory, formulating an elegant solution to the tangency
portfolio using only expected asset returns and covariance (Σ−1 µ). However, it is notoriously difficult to estimate the MVE portfolio when the number of individual assets
becomes large, making Cochrane (2014, p.7) state, “but this formula is essentially useless in practice. The hurdles of estimating large covariance matrices, overcoming the
√
curse of σ/ T in estimating mean returns, and dealing with parameter uncertainty
and drift are not minor matters.”
Modern asset pricing relies on factor models to approximate the SDF using a small
number of characteristic-managed factors (e.g., Fama and French, 1996, 2015), hoping
that those factors can span the efficient frontier. However, the commonly used factors
can hardly achieve the maximum Sharpe ratio of the asset universe of either basis
portfolios or individual assets (e.g., Kozak et al., 2018; Daniel et al., 2020; Lopez-Lira
and Roussanov, 2020). The literature proposes a large number of factors (Harvey, Liu,
and Zhu, 2016; Hou, Xue, and Zhang, 2020) to improve the spanning of the efficient
frontier and explain “anomalies,” leading to the issue of a “factor zoo” (Cochrane,
2011) and the curse of high dimension. A recent study by Kozak and Nagel (2023)
shows that characteristics-managed factors hardly span the efficient frontier unless a
large number of characteristics are used simultaneously.
This paper proposes a deep learning framework for constructing the optimal or
tangency portfolio without relying on estimates of expected returns and covariance
matrices. Unlike the typical portfolio optimization literature, which uses dozens or
2
hundreds of assets, we focus on constructing the tangency portfolio using thousands
of individual assets. The key to constructing our deep tangency portfolio is to utilize
high-dimensional firm characteristics containing rich information on the joint distribution of asset returns. Cochrane (2011) asserts that expected returns, variances, and
covariances are stable functions of characteristics (also see, e.g., Kelly, Pruitt, and Su,
2019; Kozak, Nagel, and Santosh, 2020). Therefore, we directly parameterize the tangency portfolio weights as a non-linear function of high-dimensional characteristics.
Indeed, using a large number of characteristics and their nonlinear combinations is
crucial. Existing studies on machine learning (ML) have shown that there is no clear
evidence of sparsity of characteristics (see, e.g., Kozak, Nagel, and Santosh, 2020; Giannone, Lenza, and Primiceri, 2021), and nonlinearity is important (see, e.g., Freyberger,
Neuhierl, and Weber, 2020; Gu, Kelly, and Xiu, 2020).1
When constructing the tangency portfolio, we consider a large panel of individual assets (e.g., thousands of assets) and a small number of benchmark portfolios,
such as the market factor. Using a divide-and-conquer strategy, we estimate the tangency portfolio by combining a deep factor and the benchmark factors. The nonlinear
neural network, which is guided by an economically motivated loss function, provides supervised dimension reduction by transforming high-dimensional characteristics into a deep characteristic for each asset, based on which a deep factor is formed as
a long-short portfolio of individual assets. The endogenous deep factor construction
that relies on a nonlinear ranking scheme mimics the commonly used characteristicssorted factor approach in empirical asset pricing, and avoids extreme positions in long
and short sides (see, e.g., Avramov, Cheng, and Metzker, 2023). Such a deep learning
framework is flexible enough to incorporate various types of benchmark factors and
multiple deep factors. In addition, our deep parametric portfolio policy can easily
be adapted to other economic objectives, such as the minimum variance portfolio or
1
See the latest textbook survey of Negal (2021) and the review by Giglio, Kelly, and Xiu (2022), as
well as references therein.
3
utility maximization, with various economic constraints. This makes it a valuable contribution to the field of asset allocation.
The economically guided deep factor plays two important roles: (i) under the maximal Sharpe ratio objective of the tangency portfolio, the deep factor has a low or even
negative correlation with benchmark factors, providing a potential market hedge portfolio, and (ii) the deep factor is constructed using information from high-dimensional
characteristics and may span any missing risk factors, other than benchmark factors,
which should enter the pricing kernel. Our deep parametric portfolio policy relies
solely on improving the Sharpe ratio over the benchmark, without utilizing any test
assets. This approach is similar to the factor selection methods of Barillas and Shanken
(2017, 2018) and Fama and French (2018). These features make our deep learning
model more easily interpretable and largely alleviate the “black-box” criticism.
To demonstrate our methodology, we apply it to the corporate bond market, as
studies on the cross-sectional pricing of corporate bonds remain limited compared to
the equity market. The literature has proposed some observable factors to explain
time-series comovement and cross-sectional variations in corporate bond returns. For
example, Fama and French (1993) argue that two factors based on bond term and default and their three equity factors of Fama and French (1992) can capture common
variation in equity and bond returns. Bai, Bali, and Wen (2019) (BBW hereafter) propose an alternative bond factor model based on the downside, credit, and liquidity
risks (also see, Dickerson, Mueller, and Robotti, 2023).2 However, these models impose strong ad hoc sparsity by only using a few characteristics, which may suffer from
model misspecification and omitted factors. Therefore, observable-factor models may
not compete with latent factor models considering high-dimensional characteristics
(Kelly, Palhares, and Pruitt, 2022; Kelly, Malamud, and Zhou, 2022).
2
Other corporate bond factors include liquidity (Lin, Wang, and Wu, 2011), momentum (Jostova,
Nikolova, Philipov, and Stahel, 2013), volatility (Chung, Wang, and Wu, 2019), and long-term reversal
(Bali, Subrahmanyam, and Wen, 2021).
4
Empirical Highlights. We construct monthly corporate bond returns using transaction data on corporate bond prices from the enhanced Trade Reporting and Compliance Engine (TRACE). To employ as many characteristics as possible, we consider
three types of characteristics. The first type is the bond characteristics. We construct a
set of 41 bond characteristics by combining TRACE and the Mergent Fixed Income Securities Database (FISD) data. Second, since both bond and stock prices are contingent
on firm fundamentals, we collect 61 equity characteristics that are frequently used
in the literature (see, e.g., Freyberger, Neuhierl, and Weber, 2020; Feng, He, Polson,
and Xu, 2023). Lastly, it has been found in the recent literature that equity optionrelated variables contain information about future corporate bond returns (see, e.g.,
Cao, Goyal, Xiao, and Zhan, 2022; Chung, Wang, and Wu, 2019; Huang, Jiang, and
Li, 2023). Therefore, we construct 30 equity option-related characteristics. Our deep
learning model uses those 132 characteristics and considers the bond market factor
as the only benchmark. The sample period is from July 2004 to December 2020, with
the subsample from July 2004 to June 2014 for model training and validating and the
subsample from July 2014 to December 2020 for out-of-sample testing.
Our empirical findings can be briefly summarized as follows. First, although it
earns a relatively small mean excess return compared to the bond market factor and
other tradable factors, the deep factor only varies slightly over time, resulting in a
higher Sharpe ratio than all other factors considered for both in-sample and out-ofsample periods. Furthermore, the deep factor negatively correlates with the bond market factor, and they rarely decrease simultaneously, providing us with a market-hedge
portfolio. Figure 1 presents time series plots of excess returns on the deep and market
factors over the out-of-sample period (normalized to have the same volatility). The
deep factor remains positive during market downturn periods, in particular during
the outbreak of the COVID-19 pandemic.
Second, the deep tangency portfolio, constructed using the market factor and the
5
Figure 1: Bond Market Factor Versus Deep Factor
This figure presents time series plots of excess returns on the deep factor based on a nonlinear
ranking scheme (bar) and the bond market factor (solid line) over the out-of-sample period
ranging from July 2014 to December 2020. The deep factor returns are normalized to have the
same volatility as the market factor. The shaded area represents the outbreak of the Covid-19
pandemic.
deep factor, achieves an out-of-sample annualized Sharpe ratio of 2.29, much higher
than that of the market portfolio (0.86), of the tangency portfolio from the BBW four
factors (0.69), and of the tangency portfolio from the Fama-French three equity factors (MKTRF, SMB, and HML) plus two bond factors (term and default factors) (1.27).
Consistently, we find that neither of these observable factor models can explain excess
returns on the deep factor and the deep tangency portfolio in the factor-spanning regressions. The deep factor has a negative loading on the bond market factor, and the
deep tangency portfolio has minimal exposure to the bond market factor. This further
emphasizes the market-hedging function of the deep factor.
Third, we further show that it is crucial to consider various types of characteristics
when constructing the deep tangency portfolio. When we exclude option-related variables, the out-of-sample Sharpe ratio of the deep tangency portfolio decreases to 1.83,
6
decreasing further when we only use bond characteristics. To better span the efficient
frontier, it is crucial to introduce as many characteristics as possible. Our finding is in
stark contrast to previous studies that argue that those characteristics that predict equity returns do not necessarily forecast corporate bond returns (see, e.g., Chordia et al.,
2017; Bali et al., 2021). However, it offers further evidence in support of the integration
of the bond and equity markets (e.g., Schaefer and Strebulaev, 2008; Kelly et al., 2022).
Finally, our deep parametric portfolio policy provides an alternative approach to
constructing latent factors. We make additional analyses by comparing the performance of our deep tangency portfolio with those constructed using two recently developed latent-factor methods: risk-premium principal component analysis (RP-PCA)
by Lettau and Pelger (2020) and instrumental principal component analysis (IPCA)
by Kelly, Pruitt, and Su (2019) and Kelly et al. (2022). Unlike these PCA-based approaches, our dimension reduction is implemented on firm characteristics rather than
characteristics-managed portfolios, providing an interpretable deep characteristic. Our
deep tangency portfolio outperforms: for the same out-of-sample period, the tangency
portfolio from the five RP-PCA factors earns an annualized Sharpe ratio of only 0.95,
and that from the five IPCA factors achieves an annualized Sharpe ratio of 1.67.
Literature. Our paper contributes to several strands of literature. First, it contributes
to the robust portfolio construction that sidesteps the direct estimation of expected returns and covariance matrix. Brandt (1999) and Ait-Sahalia and Brandt (2001) propose
a nonparametric approach for estimating portfolio weights from the Euler first-order
conditions, thus bypassing the estimation of return covariance and averages. Brandt,
Santa-Clara, and Valkanov (2009) provide a parametric approach by estimating the
portfolio weights as a linear function of characteristics (size, value, and momentum),
but this approach cannot handle a large number of characteristics or assets. Based on
the same approach as Brandt, Santa-Clara, and Valkanov (2009), Brandt and SantaClara (2006) examine a market-timing problem involving stocks, bonds, and cash, and
7
DeMiguel et al. (2020) show the economic rationale of transaction cost using multiple
characteristics. Raponi, Uppal, and Zaffaroni (2021) combine an “alpha” portfolio and
a “beta” portfolio relying on a factor model for the robust portfolio choice. Our paper
presents a parametric approach to estimate portfolio weights directly. The distinctive
design of long-short portfolio weights reflects the nonlinear risk-return relationship of
the deep characteristics generated by the multi-layer deep neural network.
Second, the paper adds to the recent literature on machine learning methods that
construct latent factors to approximate the SDF by considering a large number of characteristics. Kozak, Nagel, and Santosh (2020) assume that the SDF loading is a linear
function of characteristics, and find no clear evidence of sparsity of characteristics
in the SDF loading, and Cong, Feng, He, and Li (2023) propose an alternative local
sparsity framework for heterogeneous factor models selected for different assets and
macroeconomic regimes. In addition, our paper also relates to recent attempts to develop nonlinear deep neural networks for latent factor models (see, e.g., Gu, Kelly, and
Xiu, 2021; Chen, Pelger, and Zhu, 2022; Feng, He, Polson, and Xu, 2023). Differently,
based on a fundamental economic theory of the equivalence between the SDF and
MVE portfolio, our paper develops a flexible and interpretable methodology to create
a tangency portfolio without estimating expected returns and covariance. Avramov,
Cheng, and Metzker (2023) also emphasize the importance of economic restrictions
when applying machine learning methods to long-short portfolio constructions.
Finally, our paper contributes to the literature that investigates the cross-sectional
predictability of corporate bond returns based on characteristics.3 Yet, most of those
papers impose a strong ad hoc sparsity in modeling. Bali et al. (2021) and He et al.
(2021b) are two recent works investigating corporate bond return predictability via
machine learning methods. However, our method bypasses the estimation of expected
returns and covariance by employing a deep nonlinear combination of characteris3
See, for example, Bai, Bali, and Wen (2019), Lin, Wang, and Wu (2011), Jostova et al. (2013), Chung,
Wang, and Wu (2019), Huang, Jiang, and Li (2023), and He et al. (2021a).
8
tics to form the tangency portfolio. Related to our paper, Kelly, Palhares, and Pruitt
(2022) apply IPCA to the cross-sectional pricing of corporate bonds, showing that a
five-factor model outperforms commonly used observable factor models on the ICE
corporate bond dataset. Besides different objectives (portfolio optimization vs. crosssectional pricing), our method is more flexible and allows for modeling nonlinearity
and interactions.
The remainder of the paper is organized as follows. Section 2 presents our model
and the deep learning algorithms. Section 3 presents corporate bond returns and characteristics data. Section 4 provides empirical findings. Section 5 concludes the paper.
2
2.1
Methodology
Maximal Sharpe Ratio Portfolio
There exists a duality between the SDF variance and Sharpe ratios. We start with
the minimum-variance SDF in the economy that spans N individual asset excess returns, rt = [r1,t , ..., rN,t ]′ , as constructed by Hansen and Jagannathan (1991),
mt+1 = 1 − wt′ rt+1 − µt ,
(1)
where µt = Et [rt+1 ] represents the conditional expectation of asset excess returns. By
plugging the linear SDF in (1) in the fundamental pricing relation, Et [mt+1 rt+1 ] = 0,
the solution to the SDF loading wt takes the form of
wt = Σ−1
t µt ,
(2)
where Σt is the conditional variance-covariance matrix of excess returns, Σt = Covt (rt+1 ).
The conditional variance of the SDF is then given by
V art (mt+1 ) = µ′t Σ−1
t µt ,
9
(3)
which equals the maximum conditional squared Sharpe ratio of the tangency portfolio,
opt
Rt+1
= wt′ rt+1 ,
(4)
whose weights are the same as the SDF loadings in (2).
In practice, it is challenging to estimate expected returns and covariance matrix.
The number of individual assets, N , is usually very large, making it difficult to estimate the large covariance matrix (Σt ). Moreover, as a general observation, mean
estimates (µt ) are often imprecise even with long samples and a high frequency of
excess returns (see, e.g., Merton, 1980; Cochrane, 2014). Both issues yield a very inaccurate estimate of wt in (2), resulting in the poor out-of-sample performance of optimal
portfolios (see, e.g., DeMiguel, Garlappi, and Uppal, 2009).
A common approach in the finance literature is to adopt factor pricing models to
reduce the dimensionality of the SDF by approximating it with a small number of
factors (e.g., Fama and French, 1996, 2015). Assume that the SDF loading wt in (1) can
be largely captured in a linear form by J characteristics, zt , an N × J matrix for J ≪ N ,
such that
wt = w̃t + zt κ,
(5)
where following the convention of the finance literature, w̃t is normalized weights on
market capitalization of firms, zt is usually cross-sectionally standardized to have zero
mean, and κ is a J × 1 vector of coefficients. Define Rm,t+1 = w̃t′ rt+1 , representing the
market portfolio, and ft+1 = zt′ rt+1 , representing J factors, which are zero-investment
characteristic-managed long-short portfolios. The formula (5) suggests that the SDF
takes the form of
mt+1 = 1 − δ ′ Ft+1 − µF,t ,
(6)
where δ = [1, κ′ ]′ , and Ft = [Rm,t , ft′ ]′ , and the maximum Sharpe ratio in (3) is equal to
maximum Sharpe ratio of factors, µ′F,t Σ−1
F,t µF,t .
10
Such a dimension reduction aims to use those small number of factors (Ft ) to
approximate the SDF (see (1)) and span the MVE portfolio (see (3)). Building on
the intertemporal capital asset pricing model of Merton (ICAPM, 1973), Fama and
French (1996, 2015) also interpret those factors of ft as “[they] are just diversified
portfolios that provide different combinations of exposure to the unknown state variables”. However, the literature has found that there does not exist clear-cut evidence
of sparsity of characteristics (e.g., Kozak, Nagel, and Santosh, 2020; Giannone, Lenza,
and Primiceri, 2021) and many characteristics and their nonlinear combinations contain information on the joint distribution of asset returns for characterizing the crosssectional variation (e.g., Freyberger, Neuhierl, and Weber, 2020; Gu, Kelly, and Xiu,
2020). Kozak and Nagel (2023) formally show that characteristics-managed factors
span the SDF only if a large number of characteristics are used simultaneously.
Therefore, in this paper, we sidestep direct estimation of µt and Σt , or simple reduction of dimension with few characteristics, but instead approximate the tangency
portfolio weights by parameterizing wt as a nonlinear function of a large number of K
firm characteristics, zt , an N × K matrix with K ≫ J. We formulate wt as
wi,t = w̃i,t + θ wd (zi,t ; Φ),
i = 1, . . . , N,
(7)
where, as before, w̃i,t is the weight of asset i in the market portfolio, wd (·) is a function
of zi,t that can account for any potential nonlinear relations among a large number of
characteristics of asset i, Φ is the required parameters, and θ is a scalar controlling the
relative weight in the tangency portfolio. We estimate the portfolio weights as a single
function of characteristics that applies to all assets (see, e.g., Brandt, Santa-Clara, and
Valkanov, 2009).
To be explained clearly in the next subsection, the function, wd (·), produces weights
with an economically motivated target for forming a zero-cost long-short portfolio.
11
The tangency portfolio return in (4) can then be represented by
opt
Rt+1
=
N
X
i=1
w̃i,t ri,t+1 + θ
N
X
wd (zi,t ; Φ)ri,t+1 = Rm,t+1 + θRd,t+1 ,
(8)
i=1
where Rm,t+1 , as before, is the market portfolio return, and given that wd (·) crosssectionally sums to zero, Rd,t+1 is, in fact, the returns on a long-short portfolio constructed based on non-linear combinations of characteristics. The parameterization
of (7) suggests a two-factor reduced-form SDF with factors of Rm,t and Rd,t . When
the function wd (·) takes a linear form, and the number of characteristics is small, our
parameterization becomes the standard approach as in (5).
When we have a priori knowledge that a particular set of observable factors helps
span the efficient portfolio frontier, we can introduce these factors by inserting them
into (7) and construct the portfolio weights, wi,t , as follows,
p
θp + θd wd (zi,t ; Φ),
wi,t = w̃i,t + w̃i,t
i = 1, . . . , N,
(9)
where w̃tp is a N × P vector of weights on individual assets for constructing the P
observable factors, and θp is a P × 1 vector of coefficients. The tangency portfolio
return is then given by
opt
Rt+1
= Rm,t+1 + θp′ Rp,t+1 + θd Rd,t+1 ,
(10)
where Rp,t+1 is a P × 1 vector of returns on P observable factors at time t + 1. Now
denote θ = [θp′ , θd ]′ .
The main objective of our model is to find the minimum-variance SDF or a tangency portfolio that delivers the maximum Sharpe ratio. For this purpose, we search
for possilbe functional form of wd (·) and estimate the model parameters θ and Φ by
12
opt
maximizing the average conditional squared Sharpe ratio of the portfolio Rt+1
,
max
θ,Φ
T −1
1X
opt
).
SRt2 (Rt+1
T t=0
(11)
The long-short portfolio, Rd,t+1 , plays two fundamental roles: (i) According to the principle of diversification, (11) suggests it should have a low or even negative correlation
with the market factor (and other benchmark factors), providing us with a potential
market hedge portfolio; and (ii) when the market (and other benchmark factors) alone
cannot capture all systematic risk, the deep factor spans to a large extent any missing
risk factors that should enter the pricing kernel, implying that it may have a sizable
market price of risk.
Our approach can also be interpreted as a dimension reduction of characteristics
and risk factors. Empirically, many studies have proved the failure of CAPM. In addition to the market factor, more factors need to be introduced to the pricing kernel
to explain the time-series comovement of asset returns and expected return spreads
across individual assets. The most popular factors are characteristic-managed portfolios, such as the Fama-French factors (Fama and French, 1996, 2015). Our framework
aims to find such characteristic-managed portfolios based on a fundamental economic
theory: the MVE portfolio is equivalent to the SDF. The proposed nonlinear modeling
approximates the long-short factor construction using a large number of characteristics and reflects the underlying risk-return relationship. The dimension reduction in
constructing characteristic-managed portfolios relies on the Sharpe ratio improvement
over the market or other benchmark factors without using test assets. Such irrelevance
of test assets in factor model comparison has been discussed by Barillas and Shanken
(2017, 2018) and Barillas et al. (2020).
In what follows, we propose a deep learning method for constructing the portfolio
weights of wd (·) in Equations (7) and (9) and a deep long-short portfolio Rd,t . While
13
many popular characteristic-managed factors have sidestepped the high-dimensional
problem by focusing on only a few characteristics, we try to consider as many potential
characteristics and their nonlinear combinations as possible.
2.2
Deep Factor and Deep Tangency Portfolio
Our long-short portfolio construction, Rd,t , relies on a deep learning model with
an economically motivated target, aiming to construct the tangency portfolio by complementing the benchmark factors. Rather than specifically relying on average returns
and covariance matrix of high-dimensional individual assets, we retain the conventional sorting scheme in our deep learning model based on information of a large
number of characteristics.
Deep Characteristic. We follow the standard modeling approach of a neural network
for dimensional reduction of a large number of characteristics (see, e.g., Gu et al., 2020;
Feng et al., 2023; Bali et al., 2021). We first clarify notations. A typical training observation indexed by time t includes the following types of data:
• {ri,t }N
i=1 , excess returns of N individual assets;
• {zk,i,t−1 : 1 ≤ k ≤ K}N
i=1 , K characteristics of N assets observed at time t − 1;
+1
• {Rb,t }Pb=1
, a (P + 1) × 1 vector of excess returns on the market factor and P ob-
servable factors.
We design a L-layer neural network that transforms K characteristics to one deep
characteristic that is relatively interpretable. At each time t and for each asset i, i =
1, . . . , N , our deep learning model works as follows,
′
(0)
Zi,t−1 = [z1,i,t−1 , · · · , zK,i,t−1 ] ,
(l)
(l−1)
Zi,t−1 = G(A(l) Zi,t−1 + b(l) ),
14
(12)
(13)
(l)
(l)
for l = 1, . . . , L, where Zi,t−1 is the i-th column of the Kl × N matrix of Zt−1 , for 1 ≤
Kl ≤ K, and G(·) is a univariate activation function, which is chosen to be the tanh
function in the paper, G(x) = (ex − e−x )/(ex + e−x ). A(l) and b(l) are deep learning
weight and bias parameters, respectively, and need to be trained in the algorithm.
The algorithm performs the transformation and dimension reduction for each asset
without interactions among different assets through the univariate activation function.
(L)
In the end, we have a 1 × N matrix of deep characteristics, Zt−1 . The parameters to be
trained in this part are deep learning weights A and biases b, namely,
n
oL
.
A(l) , b(l) : A(l) ∈ RKl ×Kl−1 , b(l) ∈ RKl
l=1
(14)
(L)
Deep Factors. The deep characteristics, Zt−1 , are then used to form weights of a deep
portfolio (factor) as follows,
(L)
wd (zt−1 ) ≡ Wt−1 = h(Zt−1 ),
(15)
where the function, h(·), needs to be differentiable.
Following the literature, our first choice of the function h(·) is simply a linear function, resulting in a deep characteristic-managed portfolio, i.e.,
Wt−1 =
a (L)
Z ,
N t−1
(16)
where a is a scaling parameter.
In addition, to mimic the commonly used portfolio sort approach (i.e., undifferentiable step function), following Feng et al. (2023), We adopt the softmax function and
(L)
calculate the portfolio weights as follows. For x = Zt−1 , the function h(·) takes the
15
form of,

sof tmax(x+
1)



 sof tmax(x+
2)
h(x) = 

..

.


sof tmax(x+
N)


sof tmax(x−
1)
 
 
 
  sof tmax(x−
2)
−
 
..
 
.
 
 
sof tmax(x−
N)





,




(17)
where x+ := −a1 e−a2 x and x− := −a1 ea2 x , and a1 and a2 are two tuning parameters.
The nonlinear softmax function is an increasing function,
exi
sof tmax(xi ) = PN
xj
j=1 e
and
PN
i=1
,
(18)
sof tmax(xi ) = 1. On the right-hand side of (17), the first term represents
the long position weights of assets, and the second term is for the symmetric short
position. In implementation, we choose a1 = 50 and a2 = 8 such that at each time,
about 50% to 70% assets are in the middle rank and have zero weights, similar to the
traditional sorting procedure (see Figure A1 in the Internet Appendix). Furthermore,
we normalize the portfolio weights such that the sum of weights in the long leg equals
1 and that in the short leg equals -1. As discussed in Feng et al. (2023), such a nonlinear ranking scheme depends on both the cross-sectional rank information and the
distributional properties of characteristics. Our construction of the deep factor avoids
extreme positions in both long and short legs (Avramov, Cheng, and Metzker, 2023).
In what follows, we refer to Equation (16) as linear ranking and to Equation (17)
as softmax ranking. The deep factor portfolio weights, Wt−1 , in Equation (16) or (17),
sum to zero by construction. Our deep factor, Rd,t , can then be computed as
Rd,t = Wt−1 rt .
(19)
Loss Function. The deep factor in Equation (19) can be combined with the market or
other benchmark factors to form the deep tangency portfolio as in (8). Note that more
16
than one deep factor can be constructed iteratively by treating the previous one as a
new benchmark factor in our algorithm. As a result, the additional deep factor may
capture pricing information not contained in the previous one.
Given that all parameters in our model are time-invariant and that we implicitly
assume that characteristics fully capture all aspects of expected returns and covariance
relevant to optimal portfolios, the conditional model becomes an unconditional one,
and the objective function in (11) can be replaced by the unconditional squared Sharpe
′
ratio of optimal portfolio Rtopt on F̃t = [Rb,t
, Rd,t ]′ ,
SR2 (Rtopt ) ≡ SR2 (F̃t ) = E(F̃t )′ Cov(F̃t )−1 E(F̃t ).
(20)
There are usually a large number of parameters for modeling a multi-layer neural
network. To avoid overfitting and improve the model’s out-of-sample performance,
we augment the objective function by introducing the regularization penalties and
minimizing the following loss function,
Lγ1 ,γ2
L−1
L−1
X
X
opt
2
(l)
= exp −SR (Rt ) + γ1
A + γ2
||A(l) ||2 ,
l=1
|
(21)
l=1
{z
penalties
}
where the L1 -norm and L2 -norm penalties aim to restrict the complexity of the neural network, stabilize parameters, and thus avoid overfitting. The tuning parameters,
γ1 and γ2 need to be tuned through training and validation. Figure 2 presents a visualization of our deep learning architecture and summarizes the critical stages for
constructing the deep factor and the deep tangency portfolio.
3
Data
To illustrate the performance of our methodology, we apply it to the corporate bond
market, given that relative to the equity market, studies on the cross-sectional pricing
of corporate bonds remain limited. We first construct the corporate bond returns based
17
Figure 2: Deep Learning Network Architecture
This figure provides a visualization of the deep learning architecture. Different types of characteristics, Z (0) (e.g., equity, bond, and option characteristics) are transformed via the multilayer neural network to deep characteristics, Z (L) , based on which the deep portfolio (factor)
weights, W , are formed. An optimal portfolio, Ropt , is constructed by combining the deep
factor, Rd , and the benchmark factors, Rb .
on the TRACE data in Subsection 3.1; we then introduce various types of characteristics that will be fed into our deep learning model in Subsection I and present the
benchmark factor and competing factor models in Subsection 3.3.
3.1
Corporate Bond Returns and Summary Statistics
We obtain corporate bond intraday transaction data from the enhanced version of
TRACE, which offers the best-quality data on corporate bond prices, trading volume,
and buy-sell indicators. Using TRACE transaction data to measure abnormal corporate bond performance is emphasized in Bessembinder et al. (2009). We merge the
TRACE dataset with the FISD to obtain bond characteristics such as offering date, offering amount, maturity date, coupon type and rate, bond type and rating, interest
payment frequency, and issuer information.
18
Following the standard procedures in Dick-Nielsen (2009, 2014), we exclude duplicates, withdrawn, and erroneous trade entries in the TRACE data. Additionally,
we follow Bai, Bali, and Wen (2019) to apply several filters to the data such that we
remove: (i) bonds that are not listed or traded in the U.S. public market; (ii) bonds
that are structured notes, mortgage-backed, asset-backed, agency-backed, or equitylinked; (iii) convertible bonds whose option feature distorts the return calculation and
makes it impossible to compare the returns of convertible and nonconvertible bonds;
(iv) bonds with time to maturity of fewer than two years; and (v) bonds that trade under $5 or above $1,000. We then calculate the daily bond price as the trading-volumeweighted average of intraday prices, as in Bessembinder et al. (2009). In line with the
literature, for each corporate bond i, its return at month t is calculated as follows:
r̃i,t =
P ri,t + AIi,t + Ci,t
− 1,
P ri,t−1 + AIi,t−1
(22)
where P ri,t is its transaction price in month t, AIi,t is its accrued interest, and Ci,t is its
coupon payment in month t. As in Bai, Bali, and Wen (2019), we identify two scenarios
to calculate a realized return at the end of the month t: (i) from the end of the month
t − 1 to the end of the month t and (ii) from the beginning of month t to the end of the
month t. The end (beginning) of the month refers to the last (first) five trading days
in that month, and if there is more than one trading record in this five-day window,
we use the last (first) observation of the month. If a return at the end of a month is
realized in both scenarios, we use the realized return from the end of the month t − 1
to the end of the month t. The excess bond return is then defined as the difference
between the bond return and the risk-free rate, ri,t = r̃i,t − rf,t , where the risk-free rate,
rf,t , is proxied by the one-month Treasury bill rate obtained from CRSP. Furthermore,
as in Feng et al. (2023), we make a balanced panel by only keeping 3,200 bonds with
the largest size each month.4 The final sample of corporate bond returns spans from
4
To avoid the volatility and liquidity effect of small market-value bonds, we select the largest 3200
19
July 2004 to December 2020.
Table 1 presents the summary statistics of excess corporate bond returns and typical bond characteristics. Our sample includes 16,188 corporate bonds and 633,600
bond-month return observations. As shown in Panel A, the mean monthly excess
bond return is about 0.49% with a standard deviation of 4.32%. The sample contains
bonds with an average size of about 809 million, an average rating of 8.78, which is
a BBB+ rating5 . Panel A also reports the cross-sectional statistics of investment grade
(IG) bonds, which takes about 74.8% of all observations, and non-investment grade
(NIG) bonds. Compared to the NIG bonds, the IG bonds have a smaller average return (0.43% vs. 0.69%), a lower standard deviation (2.75% vs. 7.19%), and a higher
rating level (7.04 vs. 13.95). The last two columns report summary statistics of the
public and private bonds. The public bonds take about 76.7% of all the bond-month
observations, their returns are smaller on average, and their ratings are higher on average, compared to private bonds. Both IG and Public bonds have much larger average
sizes than their counterparts. Panel B and C report the sample distributions by Rating
& Maturity and Ownership & Rating, respectively. A general observation is that most
bonds with high ratings are long-maturity bonds.
3.2
Characteristics
We consider three types of characteristics that contain useful information for cor-
porate bond return predictability. The first type of characteristics includes 41 bond
characteristics that can be classified into three major categories: basis characteristics
(e.g., rating, duration, liquidity), return-distribution characteristics (e.g., momentum,
reversal, variance, skewness), and covariances with common risk factors (e.g., market
among all available bonds each month.
5
Ratings are represented in numerical scores, where 1 refers to an AAA rating, 2 refers to an AA+
rating, . . . , and 21 refers to a C rating. Investment-grade bonds have ratings from 1 (AAA) to 10 (BBB-),
and non-investment-grade bonds have ratings of 11 or above. Similar to Bai, Bali, and Wen (2019), we
use the ratings of Standard & Poor’s (S&P) or Moody’s to determine a bond’s rating. When both rating
companies rate a bond, we use the average of their ratings.
20
Table 1: Summary Statistics
Our final data sample includes 633,600 monthly return observations of 16,188 unique corporate bonds
from July 2004 to December 2020. We report the summary statistics of the whole sample (ALL) and several subsamples constructed based on Rating (Investment Grade(IG) & Non-Investment Grade(NIG)),
ownership (Public & Private), and/or Maturity.
Panel A: Cross-sectional statistics
Bond-month observations
Ret mean (%)
Ret std (%)
Rating mean
Duration mean
Age mean
Size mean (million)
ALL
IG
NIG
Public
Private
633,600
0.49
4.32
8.78
3.97
4.25
809
474,105
0.43
2.75
7.04
4.26
4.33
865
159,495
0.69
7.19
13.95
3.09
4.00
644
486,143
0.45
3.43
8.32
4.07
4.22
836
147,457
0.64
6.44
10.3
3.63
4.33
719
Panel B: Sample Distribution(%) by Maturity and Rating
Maturity
AAA
AA
A
B
Junk
ALL
2
3
4
5
6
7
8
9
10
≥11
ALL
0.15
0.19
0.18
0.15
0.10
0.09
0.08
0.07
0.07
0.58
1.66
0.71
0.77
0.77
0.74
0.40
0.38
0.34
0.35
0.35
1.53
6.33
2.61
3.00
3.02
3.02
1.92
1.89
1.75
1.75
1.72
8.59
29.27
2.65
3.28
3.54
3.73
2.84
2.88
2.77
2.81
2.74
10.33
37.56
1.44
2.01
2.54
3.11
3.44
3.42
2.61
1.92
1.34
3.33
25.17
7.57
9.25
10.06
10.75
8.69
8.66
7.55
6.90
6.22
24.35
100.00
Panel C: Sample Distribution(%) by Ownership and Rating
Ownership
AAA
AA
A
B
Junk
ALL
Private
Public
ALL
0.11
1.55
1.66
1.21
5.12
6.33
4.55
24.72
29.27
8.43
29.13
37.56
8.97
16.21
25.17
23.27
76.73
100.00
beta, TERM beta, DEF beta).
Furthermore, given that both bond and stock prices are contingent on firm fundamentals, we also consider those equity characteristics shown helpful in predicting
21
equity returns. Recent studies have shown that bond and equity markets are largely
integrated. Choi and Kim (2018) argue that market integration suggests different markets should share common factors. Schaefer and Strebulaev (2008) show that bond and
equity returns are related through the capital structure hedge ratio. By approximating
the hedge ratio with a Merton model for debt, they find that the sensitivity of debt
returns to equity is close to that predicted by the Merton model. Building on Schaefer
and Strebulaev (2008) and Choi and Kim (2018), Kelly, Palhares, and Pruitt (2022) find
that debt and equity markets are more integrated than previous estimates suggest, and
that these markets are substantially more integrated in terms of their systematic risks
than their idiosyncratic risks. Therefore, the second type includes a total of 61 equity
characteristics that cover six major categories: momentum, value, investment, profitability, frictions or size, and intangibles, which are also used in Freyberger, Neuhierl,
and Weber (2020) and Feng et al. (2023).
In addition, the recent literature has found that several option-related variables
have predictive power for corporate bond returns (see, e.g., Cao et al. (2022), Chung,
Wang, and Wu (2019), Huang, Jiang, and Li (2023)). We, therefore, construct a total
of 30 option-related characteristics. Many of those option-related variables have been
shown to have predictive power for equity returns (see, e.g., Neuhierl et al., 2021);
here, we examine whether they also help forecast corporate bond returns.
Altogether, we have a large number of characteristics (in total, 132). The bond,
equity, and option characteristics are listed in Table A1, Table A2, and Table A3, respectively, in Appendix. Before feeding those characteristics into our deep learning
model, we cross-sectionally rank and standardize them each month so that they are in
the [−1, 1] range, and their cross-sectional averages are equal to 0. Any missing values
are imputed to be 0. One advantage of using the cross-sectional ranks of characteristics is that the impact of potential data errors and outliers in individual characteristics
can be largely alleviated (see, e.g., Kelly, Pruitt, and Su, 2019; Freyberger, Neuhierl,
22
and Weber, 2020; Kozak, Nagel, and Santosh, 2020).
3.3
Benchmark Market Factor and Competing Factors
Benchmark Market Factor. There do not exist well-established characteristic-managed
factors in the corporate bond market. Therefore, we take the corporate bond market
portfolio as our benchmark. Similar to Kelly, Palhares, and Pruitt (2022), our benchmark market portfolio is constructed simply as the equal-weighted average of excess
returns of corporate bonds in our sample, i.e., w̃i,t = 1/N .
Competing Factor Models. We consider two corporate bond observable-factor models: one is the BBW four-factor model (Bai, Bali, and Wen, 2019), and the other is a
Fama-French five-factor model that combines three equity factors and two bond factors (Fama and French, 1993, 1996):
(i) The BBW four factors (BBW4). Bai, Bali, and Wen (2019) propose a four-factor
model for the corporate bond market. Those factors include the bond market factor,
the downside risk factor (DRF), the credit risk factor (CRF), and the liquidity factor
(LRF). The downside risk factor is the value-weighted average return difference between the highest-VaR portfolio minus the lowest VaR portfolio within each rating
portfolio; the credit risk factor is the value-weighted average return difference between the highest credit risk portfolio minus the lowest credit risk portfolio within
each VaR portfolio, and the liquidity risk factor is the value-weighted average return
difference between the highest illiquidity portfolio minus the lowest illiquidity portfolio within each rating portfolio. Following Bai, Bali, and Wen (2019) and Dickerson,
Mueller, and Robotti (2023), we construct DRF, CRF, and LRF using our own sample.
(ii) The Fama-French five factors (FF5). We combine the Fama-French three equity
factors, i.e., MKT, SMB, and HML (Fama and French, 1996), and two bond factors, i.e.,
the term and default factors (Fama and French, 1993). The term factor is the difference
between the long-term government bond returns and the one-month Treasury bill rate.
23
The default factor is the difference between the long-term corporate bond returns and
the long-term government bond returns.
4
Empirical Findings
In our empirical implementation, we split the sample into two parts: the subsam-
ple from July 2004 to June 2014 for model training and validating and the subsample
from July 2014 to December 2020 for out-of-sample testing. We adopt a two-fold deterministic cross-validation scheme to determine the penalty parameters and learning
rate for a given number of neural network layers ranging from 1 to 3.6,7 See the Internet Appendix for implementation details. In what follows, we present our main
empirical findings and examine how much improvement the deep factor can make
over the benchmark and competing factor models.
4.1
Deep Corporate Bond Factors
Table 2 presents summary statistics of deep factors regarding mean return, volatil-
ity, and annualized Sharpe ratio. We consider the equal-weighted corporate bond market factor as the benchmark when constructing deep factors, and restrict the weights
of the long and short legs to 1 and -1, respectively. We normalize the in-sample annualized volatility of each factor to 10%, and adjust its out-of-sample returns accordingly.
All out-of-sample results are based on in-sample parameter estimates. Panels A and B
present deep factors constructed from the 1-, 2-, and 3-layer neural networks based on
linear ranking in (16) and softmax ranking in (17), respectively.
In-sample training evidence shows that the shallow neural network works well
enough because the 1-layer deep factor has the highest annualized Sharpe ratio, 1.98
6
To be specific, the two-fold deterministic design divides the sample from July 2004 to June 2014
into two equal-length consecutive fold samples. We train our neural network separately on one fold
and then calculate the fitted results with different tuning parameters on the other. We average the loss
from the validation samples and choose the parameter pair that results in the smallest loss. Finally, we
refit the model with the selected tuning parameters.
7
We made Python codes available for replicating all empirical results in the paper.
24
Table 2: Deep Corporate Bond Factors and Competing Factors
This table reports the descriptive statistics, including means (and their Newey-West t-statistics (Newey
and West, 1987)), standard deviations (Std), and Sharpe ratios of the deep factors and competing factors.
We normalize all factors’ in-sample annualized volatility to 10% (2.89% monthly), and adjust their outof-sample returns accordingly. We take the sample from July 2004 to June 2004 for model training and
validating, and from July 2014 to December 2020 for out-of-sample testing.
In Sample Period (2004.7–2014.6)
Mean
t-stat
Std
SR
Out of Sample Period (2014.7–2020.12)
Mean
t-stat
Std
SR
Panel A. Deep Factors: Linear Ranking
R1l
R2l
R3l
1.65
1.17
0.90
5.71
4.03
3.20
2.89
2.89
2.89
1.98
1.40
1.08
0.77
0.71
0.44
1.61
2.68
3.04
3.50
1.86
1.58
0.77
1.33
0.96
3.49
2.36
2.44
1.79
1.30
0.50
1.86
1.48
0.97
1.53
0.86
0.62
1.01
0.09
2.92
3.59
3.42
2.46
2.42
0.84
−0.12
−0.88
0.66
0.14
Panel B. Deep Factors: Softmax Ranking
R1s
R2s
R3s
2.34
1.50
1.36
8.50
5.26
4.71
2.89
2.89
2.89
2.81
1.80
1.63
1.80
0.89
0.35
4.41
3.29
1.11
Panel D. BBW Four Factors
MKTC
DRF
LRF
CRF
0.71
0.52
0.51
0.56
2.44
2.06
1.88
1.74
2.89
2.89
2.89
2.89
0.85
0.62
0.61
0.67
0.46
0.26
0.23
0.04
2.29
1.43
2.20
0.20
Panel E. FF Five Factors
MKTE
SMB
HML
TRM
DEF
0.43
0.23
0.08
0.43
0.04
1.34
0.95
0.25
1.71
0.15
2.89
2.89
2.89
2.89
2.89
0.51
0.28
0.09
0.51
0.05
0.71
−0.13
−0.87
0.47
0.10
2.62
0.31
1.94
1.62
0.40
from the linear ranking and 2.81 from the softmax ranking. In the out-of-sample tests,
while the softmax ranking-based deep factor remains to have the highest Sharpe ratio
in the 1-layer neural network (1.79), the linear ranking-based deep factor has the highest Sharpe ratio in the 2-layer neural network (1.33). However, the softmax rankingbased deep factor performs much better than the linear ranking-based one both in and
out of the sample.
We present the same summary statistics for the two competing factor models for
comparison. Panel C is for the BBW factors. In the in-sample period, the DRF factor
25
earns statistically significant average returns, and the LRF and CRF factors only earn
marginally significant average returns; however, none earns annualized Sharpe ratios
larger than 1.00. We further see that in the out-of-sample period, only the LRF factor
earns a significant average return and has an annualized Sharpe ratio slightly larger
than 1.00, which remains much smaller than that of the 1-layer softmax ranking-based
deep factor (1.01 versus 1.79). From Panel D, we find that none of the Fama-French
factors earns an annualized Sharpe ratio larger than 1.00 in both the in-sample and
out-of-sample periods.
4.2
Deep Tangency Portfolios
We now move on to examine the portfolio performance. Table 3 presents the Sharpe
ratios of our deep tangency portfolios and various optimal portfolios constructed from
the competing factors. In Panel A, our deep tangency portfolio earns an annualized
in-sample Sharpe ratio of 11.86 based on linear ranking and 11.27 based on softmax
ranking in the 1-layer neural network, in stark contrast to the corresponding Sharpe
ratio of the benchmark market factor (0.85). Such high in-sample Sharpe ratios are
not surprising as our deep learning model is trained to maximize the Sharpe ratio of
the tangency portfolio formed by the market factor and the deep factor. Panel A of
Figure 3 presents the in-sample scatter plot between the benchmark market factor and
the 1-layer softmax ranking-based deep factor. They are highly negatively correlated,
resulting in a high Sharpe ratio of the deep tangency portfolio according to the principle of diversification. It seems that our deep factor plays the role of a market-hedge
portfolio.
We are more interested in the out-of-sample performance of deep tangency portfolios and other optimal portfolios. Note that the in-sample estimates determine all
portfolios’ weights. In Panel B, the deep tangency portfolio that combines the market portfolio and the linear ranking-based deep factor achieves the highest annualized
26
Table 3: Performance of Deep Tangency Portfolios
This table presents the Sharpe ratios of various tangency portfolios. For the deep learning model, the
market factor is the only benchmark, and we consider the 1-3 layers in the neural network architecture.
We take the sample from July 2004 to June 2004 for model training and validating, and from July 2014
to December 2020 for out-of-sample testing. We follow Barillas and Shanken (2017) to statistically test
the significance of the Sharpe ratio improvement of the deep tangency portfolio over the benchmark
portfolio. ∗ ∗ ∗, ∗∗, and ∗ denote the level of significance of 1%, 5%, and 10%, respectively.
Panel A. In Sample Period (2004.7–2014.6)
MKTC
BBW4
FF5
0.85
0.86
0.89
A.1 Deep TP: Linear Ranking
A.2 Deep TP: Softmax Ranking
L1
L2
L3
L1
L2
L3
11.86***
11.88***
3.94***
11.77***
11.78***
3.09***
12.09***
12.17***
2.65***
11.27***
11.30***
5.14***
12.23***
12.31***
3.67***
11.88***
11.89***
3.38***
Panel B. Out of Sample Period (2014.7–2020.12)
B.1 Deep TP: Linear Ranking
MKTC
BBW4
FF5
0.86
0.69
1.27
B.2 Deep TP: Softmax Ranking
L1
L2
L3
L1
L2
L3
1.04***
1.03***
1.19
1.49***
1.48***
1.57***
2.17***
2.11***
1.99***
2.29***
2.28***
2.50***
1.48***
1.46***
1.56***
1.13***
1.13***
1.38
out-of-sample Sharpe ratio of 2.17 in the 3-layer neural network, whereas it earns the
highest annualized out-of-sample Sharpe ratio of 2.29 in the 1-layer neural network
when the market portfolio and the softmax ranking-based deep factor are combined.
The Sharpe ratios of the deep tangency portfolios are much higher than those of the
market portfolio (0.86) and the deep factors. Given that the softmax ranking-based
deep factor and deep tangency portfolio perform much better than the linear rankingbased ones, in what follows, we implement our empirical analysis mainly relying on
the softmax ranking-based deep learning model.
Panel B of Figure 3 presents the scatter plot between the market portfolio and the 1layer softmax ranking-based deep factor in the out-of-sample period and suggests that
the deep factor negatively correlates with the market portfolio. Notably, the deep factor and the market portfolio hardly go down simultaneously, as few observations are
27
in the lower-left quadrant (see Figure 1). To further verify this point, Panel C of Figure
3 displays the cumulative returns over time of the market portfolio and the deep factor
for the in-sample and out-of-sample periods, respectively. We observe that the deep
factor moves in the opposite direction during a market downturn. For instance, in
the 2008 global financial crisis (in-sample) and the outbreak of the Covid-19 pandemic
(out-of-sample), the cumulative returns of the deep factor keep increasing, whereas
the market portfolio usually suffers losses, which is in line with the findings from the
previous subsection. Those results provide further evidence supporting the deep factor as a market-hedge portfolio.
The optimal portfolios constructed from the competing factors perform much worse
than the deep tangency portfolio both in the in-sample and out-of-sample periods. The
optimal portfolio constructed from the BBW four factors has an annualized in-sample
Sharpe ratio of only 0.86 and an annualized out-of-sample Sharpe ratio of 0.69. The
portfolio constructed from the Fama-French five factors has slightly larger in-sample
and out-of-sample Sharpe ratios than the optimal portfolio based on the BBW four factors. Note that the Fama-French five factors contain three equity factors (MKT, SMB,
and HML) and two bond factors (Term and Default factors).
What happens when we combine the competing factors and our deep factors? Do
those observable factors contain useful information not spanned by the deep factor?
Table 3 also presents the Sharpe ratios of the portfolios constructed using various competing factors and a deep factor. When we combine the deep factor with the BBW four
factors, both in-sample and out-of-sample Sharpe ratios are very similar to those of
our deep tangency portfolios. For example, the out-of-sample Sharpe ratio of the optimal portfolio constructed from the BBW four factors and the 1-layer softmax rankingbased deep factor is 2.28, almost the same as that of our tangency portfolio (2.29).
Given that our deep factor is constructed by taking the bond market factor as a benchmark and using all firm characteristics, it should already contain non-market infor28
Figure 3: Correlations and Cumulative Returns
Panels A and B of the figure present scatter plots of the bond market factor and the deep factor for the in-sample and out-of-sample periods. Panel C presents cumulative returns of the
deep and market factors for the in-sample (Panel C.1) and out-of-sample (Panel C.2) periods.
Panel D plots the cumulative returns of the deep tangency portfolio and tangency portfolios
constructed from BBW’s four factors and Fama-French’s five factors for the out-of-sample period. Panel D.1 is for cumulative returns of original tangency portfolios, and Panel D.2 is for
cumulative returns of all tangency portfolios normalized to have 10% annualized volatility.
Panel A: In Sample
0.010
0.008
0.006
Rd1 Return
0.000
0.005
0.004
0.002
0.000
0.010
0.002
0.015
0.004
Cum.Ret
0.10
0.05
0.00
0.05
Market Return
0.10
0.15
0.006
0.20
Panel C1: Deep Portfolio and Market: In sample
1.8
Rd1
MKT
1.6
Cum.Ret
Rd1 Return
0.005
2.0
1.8
1.6
1.4
1.2
1.0
1.4
0.06
0.04
0.02
0.00
Market Return
0.02
0.04
0.06
2020
2021
Panel D1: Tangency Portfolio
R1opt
BBW
FF5
1.2
2006
2007
2008
2009 2010
Date
2011
2012
2013
2014
2015
Panel C2: Deep Portfolio and Market: Out-of sample
4
Rd1
MKT
1.2
1.1
1.0
2015
2016
2017
2018
Date
2019
2020
2016
2017
2018
Date
2019
2018
2019
Panel D2: Tangency Portfolio (10% Annualized Volatility)
Cum.Ret
Cum.Ret
0.08
1.0
2005
1.3
Panel B: Out of Sample
0.010
3
2
1
2021
R1opt
BBW
FF5
2015
2016
2017
Date
2020
2021
mation of the BBW factors; therefore, including those factors should not improve the
Sharpe ratio over our deep tangency portfolio. We notice that the portfolio weights on
the three non-market BBW factors are negligible.
A notable result is that the in-sample performance of the optimal portfolio constructed from the Fama-French five factors and a deep factor is much worse because
our deep factor is constructed by taking the bond market factor, not the equity market
factor, as a benchmark. However, its out-of-sample performance is slightly worse than
29
the 3-layer linear ranking-based deep tangency portfolio with an annualized Sharpe
ratio of 1.99, but slightly better than the 1-layer softmax ranking-based deep tangency
portfolio with an annualized Sharpe ratio of 2.50.
Panel D1 of Figure 3 presents the cumulative returns of our 1-layer softmax rankingbased deep tangency portfolio and optimal portfolios constructed from the competing
factor models in the out-of-sample period. We see that the cumulative returns of our
deep tangency portfolio increase steadily over time, and market downturns do not
have any impacts on its returns; however, in spite that the cumulative returns on the
competing optimal portfolios increase over time, their variations are very large, and
notably, those portfolios usually suffer big losses in periods of market downturns. To
further examine the performance of various portfolios, we normalize all the above
optimal portfolios to have the same annual volatility of 10% and present the cumulative returns of those normalized portfolio returns. We see from Panel D2 of Figure 3
that our deep tangency portfolio, benefiting from its low volatility, has much higher
cumulative returns in the out-of-sample period.
While we have just used one deep factor in our previous analysis, our methodology is flexible enough to introduce multiple deep factors if necessary. This can be
done by simply iterating the algorithm by taking the deep factor extracted as another
benchmark and the market factor. Table 4 presents the performance of the softmax
ranking-based deep tangency portfolios constructed from the benchmark market factor and 1-3 deep factors. In-sample training suggests a minor benefit from using more
deep factors. The out-of-sample evidence shows that the first deep factor extracted
from the one-layer neural network performs very well, as the Sharpe ratio improvement from using 2 or 3 deep factors over the tangency portfolio with one deep factor
is negligible and statistically insignificant. Therefore, in what follows, we focus on
the first deep factor and the corresponding deep tangency portfolio from the 1-layer
neural network.
30
Table 4: Multiple Deep Factors
This table presents Sharpe ratios of the deep tangency portfolios constructed using multiple
softmax ranking-based deep factors. We take the sample from July 2004 to June 2004 for model
training and validating and the sample from July 2014 to December 2020 for out-of-sample
testing. We sequentially add one additional deep factor for each choice of the number of neural
network layers ranging from 1 to 3. The test of the Sharpe ratio improvement by including
additional deep factors over the deep tangency portfolio with one deep factor is based on
Barillas and Shanken (2017). ∗ ∗ ∗, ∗∗, and ∗ denote the level of significance of 1%, 5%, and
10%, respectively.
MKTC
D1
D2
D3
Panel A. In Sample Period (2004.7–2014.6)
L1
L2
L3
0.85
0.85
0.85
11.27***
12.23***
11.88***
11.31
12.24
14.12***
11.33
12.37
14.12
Panel B. Out of Sample Period (2014.7–2020.12)
L1
L2
L3
4.3
0.86
0.86
0.86
2.29***
1.48***
1.13***
2.31
1.48
0.79
2.30
1.46
0.79
Factor-Spanning Regressions
The key findings are that our deep factor constructed from a nonlinear combination
of firm characteristics captures missing risks other than the market factor and plays a
role as a hedge portfolio to market downturns. Commonly used observable factors do
not contain extra pricing information regarding Sharpe ratio improvement combined
with the deep factor. In this part, we further examine these issues by implementing
the simple factor-spanning regressions of the form,
Rd,t = α + β ′ ft + ϵt ,
(23)
where ft is a set of observable factors (e.g., BBW factors), and Rd,t is our deep factor.
Given its superior performance, we focus on the 1-layer softmax ranking-based deep
factor. We also run a regression of
Rtopt = α + β ′ ft + ϵt ,
31
(24)
Table 5: Factor-Spanning Regressions
This table reports factor-spanning regression results for the out-of-sample period. Specifically,
we regress the deep factor and the deep tangency portfolios on the two competing factor
models, the BBW four-factor model and the Fama-French five-factor model. Newey-West tstatistics are presented in brackets (Newey and West, 1987).
Panel A. BBW Four Factors
Rd1
R1opt
Rd1
R1opt
α
βM KTC
βDRF
βCRF
βLRF
R2
0.21
(5.88)
0.19
(5.88)
−0.16
(−2.17)
−0.06
(−0.81)
0.06
(0.94)
0.05
(0.94)
0.02
(0.12)
0.01
(0.12)
0.04
(1.91)
0.04
(1.91)
19.89
α
βM KTE
Panel B. FF Five Factors
βSM B
βHM L
0.21
(5.00)
0.19
(4.68)
0.01
(0.83)
0.02
(1.58)
−0.01
(−1.04)
−0.01
(−0.95)
0.02
(1.02)
0.02
(1.28)
15.90
βT RM
βDEF
R2
−0.05
(−2.51)
−0.00
(−0.20)
−0.08
(−2.33)
−0.02
(−0.54)
21.18
9.16
where Rtopt represents the deep tangency portfolio constructed from the bond market
factor and the 1-layer softmax ranking-based deep factor. Such a regression provides
further evidence of whether the small number of observable factors can span the deep
tangency portfolio.
Table 5 presents the results of the factor-spanning regression for the out-of-sample
period. Panel A reports the alphas and betas from the spanning regressions of the deep
factor and the deep tangency portfolio on the BBW four factors, respectively. It can be
observed that the BBW four factors cannot explain excess returns on both the deep
factor and the deep tangency portfolio. The alpha estimate is approximately 0.21% in
the regression of the deep factor, and it is 0.19% in the regression of the deep tangency
portfolio. Both alpha estimates are highly statistically significant. The loading of the
deep factor on the bond market factor is negative and statistically significant, -0.16 (t =
−2.17), and the loading of the deep tangency portfolio on the bond market factor is
almost zero, again suggesting that the deep factor serves as a market-hedge portfolio.
We find similar results in the spanning regressions on the Fama-French five factors
(Panel B). The alpha estimate is about 0.21% (t = 5.00) in the regression of the deep
32
factor and is about 0.19% (t = 4.68) in the regression of the deep tangency portfolio.
We find that both the deep factor and the deep tangency portfolio insignificantly load
on three equity factors. We also find that the deep factor significantly and negatively
loads on both the term and default factors, while the deep tangency portfolio does not
expose to those two factors. The negative loadings of the deep factor and negligible
loadings of the deep tangency portfolio on the term and default factors further suggest
that our deep factor is a bond market hedge portfolio.
4.4
Interpreting Deep Characteristics
By combining an economically motivated loss function with deep learning and
constructing the deep factor as long-short portfolio returns, we aim to improve the
transparency and interpretability of our methodology. Therefore, a natural next step
is understanding how different characteristics contribute to the deep factor.
Our methodology’s nonlinear activation of neural networks transforms characteristics into a deep one, a highly nonlinear combination of raw characteristics whose
exact functional form is unknown to us in principle. We first evaluate the linear contribution of each characteristic to the deep characteristic by running the Fama-MacBeth
(L)
cross-sectional regressions (Fama and MacBeth, 1973) of the deep characteristic Zi,t
on raw characteristics zk,i,t ,
(L)
Zi,t = at + b1,t z1,i,t + · · · + bk,t zk,i,t + · · · + bK,t zK,i,t + ϵi,t ,
(25)
for i = 1, . . . , N . Given that all characteristics are cross-sectionally normalized, we can
then evaluate each characteristic’s contribution by the explained variation using the
time-series average of b̂k,t , for k = 1, . . . , K.
Two observations are in order: first, most characteristics significantly contribute to
the deep characteristic in the above regression, suggesting no clear evidence of sparsity of characteristics, and second, the deep characteristic is not dominated by a small
33
particular set of characteristics, as the (absolute) values of all coefficients are relatively
small. Figure 4 presents the top 30 most important characteristics, with bond, equity,
and option characteristics classified by the blue, yellow, and red bars, respectively.
We report both the coefficient signs and significance levels. We find that all of them
significantly contribute to deep characteristics. The top 10 most important variables
include five bond characteristics, namely, size (SIZE), monthly turnover (TURN), illiquidity (LIQ BPW) (Bao, Pan, and Wang, 2011), downside risk beta (DRF BETA), and
yield-to-maturity (YTM), two equity characteristics, namely, market equity (ME), and
quarterly asset liquidity (ALM), and three option characteristics, namely, stock-option
volume ratio (SO), trading volume (VOL), and implied and historical volatility spread
(IVRV).
To further examine the importance of different types of characteristics, we reconstruct the softmax ranking-based deep tangency portfolios using equity and bond
characteristics with option-related variables removed, or using bond characteristics
alone. Table 6 presents Sharpe ratios obtained from different types of characteristics.
Even though the in-sample training results are more or less similar, their out-of-sample
performance differs. When we use all characteristics, as before, the one-layer neural
network works quite well, and the deep tangency portfolio earns an out-of-sample annualized Sharpe ratio of 2.29. However, when we exclude the option-related variables,
even though the out-of-sample Sharpe ratio of the deep tangency portfolio still reaches
the highest value from the 1-layer neural network, it becomes smaller, only 1.83. This
suggests that option-related variables contain valuable information regarding future
corporate bond returns. Even worse is that when we use bond characteristics alone,
the performance of the deep tangency portfolio further deteriorates. Its best out-ofsample Sharpe ratio is only about 0.71 from the 1-layer neural network, which is even
smaller than the market portfolio.
To sum up, all three types of characteristics are important and weighted heavily
34
Figure 4: Variable Importance: Linear Contributions
newchar
This figure presents the variable importance via the Fama-MacBeth cross-sectional regressions
(L)
of the deep characteristic Zi,t on raw characteristics zk,i,t over the in-sample period. We report
the normalized average coefficient β̂k,t . The blue bars stand for bond characteristics, the yellow
for equity characteristics, and the red for option characteristics.
0.3
SIZE***
SO***
ME***
TURN***
LIQ_BPW***
ALM***
VOL***
DRF_BETA***
YTM***
IVRV***
STR***
ACC***
ABR***
std_barQ_1mom***
LIQ_P_FHT***
MOM1M***
RNK3M***
NOPT***
VAR5***
MOM12***
RNK1M***
VARIANCE***
MOM6M***
UNC_BETA***
BM_IA***
T2M***
LIQ_P_HL***
MOM12M***
TERM_DEF_RVAR***
OP***
RSUP***
barQ***
ROA***
LIQ_TRADE***
SEAS1A***
DSO***
RE***
NOA***
ISKEW***
ILL***
PCRATIO***
CHTX***
RVAR_FF3
RNK12M***
SUE***
LIQ_RANGE_M***
RVAR_MEAN**
CHPM***
CASH***
RDM***
0.2
0.1
0.0
0.1
0.2
0.3
Bond
Equity
Option
in the deep characteristic. Therefore, they are necessary for constructing the deep
tangency portfolio. This finding is, in fact, in stark contrast to previous studies that
argue that those characteristics that predict equity returns do not necessarily forecast
corporate bond returns (see, e.g., Chordia et al., 2017; Bali et al., 2021). But it provides
further empirical evidence in support of the integration between the bond and equity
markets (Schaefer and Strebulaev, 2008; Kelly, Palhares, and Pruitt, 2022).
35
Table 6: Importance of Characteristics
This table presents annualized Sharpe ratios obtained from different types of characteristics.
We consider three sets of characteristics: all 132 characteristics, 102 bond, and equity characteristics, and 41 bond characteristics. The test of the Sharpe ratio improvement of the deep
tangency portfolio over the market factor is based on Barillas and Shanken (2017). ∗ ∗ ∗, ∗∗,
and ∗ denote the level of significance of 1%, 5%, and 10%, respectively.
MKTC
Bond+Equity+Option
Bond+Equity
Bond
Panel A. In Sample Period (2004.7–2014.6)
0.85
0.85
0.85
L1
L2
L3
11.27***
12.23***
11.88***
11.73***
10.16***
12.05***
11.05***
11.08***
11.55***
Panel B. Out of Sample Period (2014.6–2020.12)
0.86
0.86
0.86
L1
L2
L3
4.5
2.29***
1.48***
1.13***
1.83***
0.49
1.70***
0.71
0.53
0.61
Additional Analyses
4.5.1
Latent Factors and Deep Factors
A recent paper by Kelly, Palhares, and Pruitt (2022) shows that a five-factor model
based on the instrumental principle component analysis (IPCA, Kelly, Pruitt, and
Su, 2019) outperforms commonly used observable factor models in pricing corporate
bonds. They find that a tangency portfolio constructed from their five IPCA factors using the ICE corporate bond return data can earn an annualized out-of-sample Sharpe
ratio of as large as 6.23. We note that in another paper, Kelly and Pruitt (2022) shows
that the core analysis of Kelly, Palhares, and Pruitt (2022) is robust to using the TRACE
data. We follow their IPCA approach and construct five corporate bond factors using our TRACE data and all three types of characteristics. To be consistent with our
primary empirical analysis, we use the same in-sample and out-of-sample split as before and extract the out-of-sample IPCA factors using in-sample model parameter estimates and out-of-sample characteristics.
Panel A of Table 7 summarizes the optimal portfolio’s in-sample and out-of-sample
Sharpe ratios constructed from the IPCA factors. While both Kelly, Palhares, and Pruitt
36
(2022) and Kelly and Pruitt (2022) find that an optimal portfolio constructed using the
five IPCA factors can earn an out-of-sample Sharpe ratio of larger than 6 in using both
ICE and TRACE corporate bond data, we find that such a portfolio can only earn an
in-sample Sharpe ratio of 2.95 and an out-of-sample Sharpe ratio of 1.67, both of which
are smaller than the corresponding values of our deep tangency portfolio (see Table 3).
There are two reasons why we find such a weaker out-of-sample Sharpe ratio. First,
the sample size in our paper is much larger than that in Kelly and Pruitt (2022): the
total number of bond-month observations in our paper is 633,600, whereas it is only
144,933 in Kelly and Pruitt (2022). Second, both Kelly, Palhares, and Pruitt (2022) and
Kelly and Pruitt (2022) adopt an expanding window procedure to construct the out-ofsample IPCA factors, whereas we extract out-of-sample IPCA factors by fixing model
parameters at the in-sample estimates to make it comparable with our methodology.8
We further find that combining the deep factor with the IPCA five factors improves the
out-of-sample Sharpe ratio of the optimal portfolio to 2.24, similar to that of our deep
tangency portfolio; such a Sharpe ratio improvement over the IPCA optimal portfolio
is highly statistically significant.
Given that the IPCA factors are also estimated by taking into account all firm characteristics (in a linear form), and that Kelly, Palhares, and Pruitt (2022) and Kelly and
Pruitt (2022) show that the IPCA factors extremely outperform popular observable
factors, we examine whether they can span our deep factor and deep tangency portfolio. Panel B presents the spanning regression results, which show that the five IPCA
factors cannot explain excess returns on both the deep factor and the deep tangency
portfolio, as the alpha estimates are about 0.17% and 0.16%, respectively, which are
highly statistically significant in both regressions.
In addition, a recent paper by Lettau and Pelger (2020) proposes a risk-premium
8
We also implement a recursive expanding-window approach similar to Kelly, Palhares, and Pruitt
(2022) and Kelly and Pruitt (2022) using our TRACE data and find an almost identical out-of-sample
Sharpe ratio of the IPCA optimal portfolio. The results show a mean of 0.39 and a standard deviation
of 0.81, resulting in an annualized Sharpe ratio of 1.67.
37
Table 7: Latent Factors and Deep Factors
Panel A of the table presents Sharpe ratios of the tangency portfolios constructed from the
IPCA five factors and RP-PCA five factors. The test of Sharpe ratio improvement of the tangency portfolio constructed from latent factors and the deep factor over that from latent factors
alone is based on Barillas and Shanken (2017). ∗ ∗ ∗, ∗∗, and ∗ denote the level of significance
of 1%, 5%, and 10%, respectively. Panel B presents the factor-spanning regressions of the deep
factor and the deep tangency portfolio on the IPCA five factors or the RP-PCA five factors over
the out-of-sample period. Newey-West t-statistics are presented in brackets (Newey and West,
1987).
Panel A. Sharpe Ratios
TP
IPCA5
RP-PCA5
2.95
1.04
In Sample (2004.7–2014.6)
L1
L2
11.24***
11.29***
12.34***
12.06***
Out of Sample (2014.7–2020.12)
TP
L1
L2
L3
L3
11.65***
11.74***
1.67
0.95
2.24***
2.19***
1.49
1.41***
1.14
1.07
Panel B. Spanning Regressions
β2
β3
β4
β5
R2
B.1. IPCA Five Factors
Rd1
0.17
0.03
(4.51)
(0.47)
R1opt
0.16
0.05
(4.44)
(0.85)
−0.04
(−1.90)
0.01
(0.34)
−0.08
(−1.88)
−0.05
(−1.38)
0.03
(0.80)
0.04
(1.08)
0.14
(4.36)
0.09
(3.16)
39.56
B.2. RP-PCA Five Factors
Rd1
0.20
0.01
(7.98)
(0.67)
R1opt
0.18
0.03
(8.03)
(2.25)
−0.09
(−5.57)
−0.08
(−5.44)
0.02
(1.34)
0.02
(1.88)
−0.04
(−1.60)
−0.04
(−1.61)
−0.05
(−4.31)
−0.04
(−4.06)
45.89
α
β1
37.71
42.85
principal component analysis (RP-PCA) model for estimating latent asset pricing factors. Lettau and Pelger (2020) show that the RP-PCA performs much better than the
PCA method, particularly in identifying the weak factors. Table 7 also examines how
the five RP-PCA factors perform compared to our deep factor. Again, we find that the
out-of-sample Sharpe ratio of the RP-PCA tangency portfolio is much smaller than that
of the deep tangency portfolio (0.95 vs. 2.29), and the five RP-PCA factors are unable
to explain excess returns on both the deep factor and the deep tangency portfolio.
4.5.2
Importance of Nonlinearity
The deep factor in our deep learning model is formed on a deep characteristic that
is a highly nonlinear combination of raw characteristics. A natural question is how im38
portant this nonlinear combination is, and how differently the deep tangency portfolio
performs compared to those constructed from the linear machine learning models.
For this purpose, we construct long-short portfolios based on Lasso, Ridge, PCA,
and PLS. To be specific, at each time t, we form the return forecasts using all characteristics relying on these four linear machine learning models as follows,
Wt ≡ Et [rt+1 ] = zt b,
(26)
where b is a K × 1 model parameters. We then construct an equal-weighted long-short
portfolio using the standard sorting approach that longs top 30% and shorts bottom
30% corporate bonds based on Wt . As before, we use the sample from July 2004 to June
2004 for model training and validating and the sample from July 2014 to December
2020 for out-of-sample testing. See the Internet Appendix for details.
Panel A of Table 8 presents summary statistics of the portfolios. We see that while
the portfolios constructed from Lasso, Ridge, and PLS deliver average returns that are
statistically significant both in the sample and out of the sample, none achieves an
annualized Sharpe ratio larger than 1. Our deep tangency portfolio performs much
better than those portfolios.
We further examine what would happen if we remove the nonlinear tanh activation
function and use a linear combination of raw characteristics in deep learning. We
see from Panel B of Table 8 that without the nonlinear activation function, we need
a deeper neural network , and the out-of-sample performance of the deep tangency
portfolio becomes much worse, compared to the case with the nonlinear activation.
To sum up, both nonlinear combinations of characteristics and nonlinear activation in deep learning play important roles in constructing the deep tangency portfolio.
Such findings are largely consistent with what the literature has found on nonlinear effects of characteristics on expected returns and covariances (see, e.g., Freyberger et al.,
39
Table 8: Importance of Nonlinearity
Panel A reports descriptive statistics of the long-short portfolios based on return forecasts from
Lasso, Ridge, PCA, and PLS. We normalize the in-sample annualized volatility to 10%. Panel
B presents annualized Sharpe ratios of the softmax ranking-based deep tangency portfolios
constructed by replacing the nonlinear activation function with a linear one in the deep learning model. The test of Sharpe ratio improvements by including an extra deep factor in the
tangency portfolio is based on Barillas and Shanken (2017). ∗ ∗ ∗, ∗∗, and ∗ denote the level
of significance of 1%, 5%, and 10%, respectively. We use the sample from July 2004 to June
2004 for model training and validating and the sample from July 2014 to December 2020 for
out-of-sample testing.
Panel A. Linear ML Portfolios
In Sample (2004.7–2014.6)
Mean
t-stat
SR
Lasso
Ridge
PCA
PLS
0.69
0.61
0.45
0.52
2.49
2.36
1.81
2.04
Out of Sample (2014.7–2020.12)
Mean
t-stat
SR
0.83
0.74
0.54
0.63
0.46
0.34
0.46
0.19
2.21
2.42
1.59
2.01
0.83
0.94
0.67
0.85
Panel B. Linear Activation
D1
L1
L2
L3
In Sample (2014.7–2020.12)
D2
D3
10.10***
9.21***
9.71***
10.11
9.21
9.81
D1
10.17
9.21
9.81
Out of Sample (2004.7–2014.6)
D2
D3
0.31
0.56
0.40
0.31
0.56
0.41
0.30
0.57
0.41
2020; Gu et al., 2020; Cong et al., 2022).
5
Conclusion
Constructing the tangency portfolio has economic importance: the mean-variance
efficient (MVE) portfolio is equivalent to the stochastic discount factor (SDF). The solution proposed by Markowitz (1952) to the MVE portfolio is challenging to implement
in practice for a large number of individual assets. Modern asset pricing relies on factor models to approximate the SDF using a small number of characteristics-managed
factors. However, these commonly used factors cannot fully span the mean-variance
efficient frontier of the entire asset universe, leading to the issue of a “factor zoo” and
the curse of high dimension. The literature has not found clear-cut evidence of sparsity of characteristics, and Kozak and Nagel (2023) demonstrate that a large number
of characteristics are necessary to span the efficient frontier.
40
This paper proposes a parametric approach to estimating optimal portfolio weights
directly, bypassing the need to estimate mean returns and covariance using deep learning techniques with an economically motivated target. A divide-and-conquer strategy
estimates the tangency portfolio by combining a deep factor with the market factor.
Using high-dimensional firm characteristics, the endogenous deep factor construction
mimics the commonly used characteristic-sorted factor approach in empirical asset
pricing. The economically-guided deep factor plays two important roles: (i) it has a
low or even negative correlation with benchmark factors, providing a potential hedge
portfolio, and (ii) it may span any missing risk factors other than benchmark factors.
We apply our method to the corporate bond market. Our deep tangency portfolio outperforms those constructed using commonly used observable or latent factors,
with an annualized Sharpe ratio of 2.29. We further demonstrate that recently developed latent-factor models, such as RP-PCA and IPCA, cannot explain our deep factor
and deep tangency portfolio. Additionally, we emphasize the importance of considering various types of characteristics in constructing the deep tangency portfolio. Excluding any type of characteristics (equity, corporate bond, or options) would result
in a worsened performance of the deep tangency portfolio. This evidence contrasts
starkly with previous studies that suggest characteristics predicting equity returns
may not necessarily forecast corporate bond returns (see, e.g., Chordia et al., 2017;
Bali et al., 2021), but offers supporting evidence for the integration between the bond
and equity markets (see, e.g., Schaefer and Strebulaev, 2008; Kelly et al., 2022).
References
Ait-Sahalia, Y. and M. W. Brandt (2001). Variable selection for portfolio choice. Journal of
Finance 56, 1297–1351.
Avramov, D., S. Cheng, and L. Metzker (2023). Machine learning vs. economic restrictions:
Evidence from stock return predictability. Management Science 69(5), 2587–2619.
41
Bai, J., T. G. Bali, and Q. Wen (2019). Common risk factors in the cross-section of corporate
bond returns. Journal of Financial Economics 131(3), 619–642.
Bali, T. G., A. Goyal, D. Huang, F. Jiang, and Q. Wen (2021). The cross-sectional pricing of
corporate bonds using big data and machine learning. Technical report, Georgetown University.
Bali, T. G., A. Subrahmanyam, and Q. Wen (2021). Long-term reversals in the corporate bond
market. Journal of Financial Economics 139(2), 656–677.
Bao, J., J. Pan, and J. Wang (2011). The illiquidity of corporate bonds. Journal of Finance 66,
911–946.
Barillas, F., R. Kan, C. Robotti, and J. Shanken (2020). Model comparison with sharpe ratios.
Journal of Financial and Quantitative Analysis 55(6), 1840–1874.
Barillas, F. and J. Shanken (2017). Which alpha? Review of Financial Studies 30(4), 1316–1338.
Barillas, F. and J. Shanken (2018). Comparing asset pricing models. Journal of Finance 73, 715–
754.
Bessembinder, H., K. Kahle, W. Maxwell, and D. Xu (2009). Measuring abnormal bond performance. Review of Financial Studies 22, 4219–4258.
Brandt, M. W. (1999). Estimating portfolio and consumption choice: A conditional euler equations approach. Journal of Finance 54, 1609–1646.
Brandt, M. W. and P. Santa-Clara (2006). Dynamic portfolio selection by augmenting the asset
space. Journal of Finance 61(5), 2187–2217.
Brandt, M. W., P. Santa-Clara, and R. Valkanov (2009). Parametric portfolio policies: Exploiting
characteristics in the cross-section of equity returns. Review of Financial Studies 22, 3411–3447.
Cao, J., A. Goyal, X. Xiao, and X. Zhan (2022). Implied volatility changes and corporate bond
returns. Management Science, Forthcoming.
Chen, L., M. Pelger, and J. Zhu (2022). Deep learning in asset pricing. Management Science,
Forthcoming.
Choi, J. and Y. Kim (2018). Anomalies and market (dis)integration. Journal of Monetary Economics 100, 16–34.
Chordia, T., A. Goyal, Y. Nozawa, A. Subrahmanyam, and Q. Tong (2017). Are capital market anomalies common to equity and corporate bond markets? an empirical investigation.
Journal of Financial and Quantitative Analysis 52(4), 1301–1342.
Chung, K. H., J. Wang, and C. Wu (2019). Volatility and the cross-section of corporate bond
returns. Journal of Financial Economics 133(2), 397–417.
42
Cochrane, J. H. (2011). Presidential address: Discount rates. Journal of Finance 66(4), 1047–1108.
Cochrane, J. H. (2014). A mean-variance benchmark for intertemporal portfolio theory. Journal
of Finance 69, 1–49.
Cong, L. W., G. Feng, J. He, and X. He (2022). Asset pricing with panel tree under global split
criteria. Technical report, City University of Hong Kong.
Cong, L. W., G. Feng, J. He, and J. Li (2023). Sparse modeling under grouped heterogeneity
with an application to asset pricing. Technical report, City University of Hong Kong.
Daniel, K., L. Mota, S. Rottke, and T. Santos (2020). The cross-section of risks and returns.
Review of Financial Studies 33, 1927–1979.
DeMiguel, V., L. Garlappi, and R. Uppal (2009). Optimal versus naive diversification: How
inefficient is the 1/n portfolio strategy? Review of Financial Studies 22(5), 1915–1953.
DeMiguel, V., A. Martin-Utrera, F. J. Nogales, and R. Uppal (2020). A transaction-cost perspective on the multitude of firm characteristics. Review of Financial Studies 33(5), 2180–2222.
Dick-Nielsen, J. (2009). Liquidity biases in trace. Journal of Fixed Income 19(2), 43–55.
Dick-Nielsen, J. (2014). How to clean enhanced trace data. Technical report, Copenhagen
Business School.
Dickerson, A., P. Mueller, and C. Robotti (2023). Priced risk in corporate bonds. Journal of
Financial Economics 150, 103707.
Fama, E. F. and K. R. French (1992). The cross-section of expected stock returns. Journal of
Finance 47(2), 427–465.
Fama, E. F. and K. R. French (1993). Common risk factors in the returns on stocks and bonds.
Journal of Financial Economics 33(1), 3–56.
Fama, E. F. and K. R. French (1996). Multifactor explanations of asset pricing anomalies. Journal
of Finance 51(1), 55–84.
Fama, E. F. and K. R. French (2015). A five-factor asset pricing model. Journal of Financial
Economics 116(1), 1–22.
Fama, E. F. and K. R. French (2018). Choosing factors. Journal of Financial Economics 128(2),
234–252.
Fama, E. F. and J. D. MacBeth (1973). Risk, return, and equilibrium: Empirical tests. Journal of
Political Economy 81(3), 607–636.
Feng, G., J. He, N. Polson, and J. Xu (2023). Deep Learning of Characteristics-Sorted Factor
Models. Journal of Financial and Quantitative Analysis, Forthcoming.
43
Freyberger, J., A. Neuhierl, and M. Weber (2020). Dissecting characteristics nonparametrically.
Review of Financial Studies 33, 2326–2377.
Giannone, D., M. Lenza, and G. E. Primiceri (2021). Economic predictions with big data: The
illusion of sparsity. Econometrica 89, 2409–2437.
Giglio, S., B. Kelly, and D. Xiu (2022). Factor models, machine learning, and asset pricing.
Annual Review of Financial Economics 14.
Gu, S., B. Kelly, and D. Xiu (2020). Empirical asset pricing via machine learning. Review of
Financial Studies 33, 2223–2273.
Gu, S., B. Kelly, and D. Xiu (2021). Autoencoder asset pricing models. Journal of Econometrics 222(1), 429–450.
Hansen, L. and R. Jagannathan (1991). Implications of security market data for models of
dynamic economies. Journal of Political Economy 99, 225–262.
Harvey, C. R., Y. Liu, and H. Zhu (2016). ... and the cross-section of expected returns. Review of
Financial Studies 29(1), 5–68.
He, X., G. Feng, J. Wang, and C. Wu (2021a). Benchmarking individual corporate bonds. Technical report, City University of Hong Kong.
He, X., G. Feng, J. Wang, and C. Wu (2021b). Predicting individual corporate bond returns.
Technical report, City University of Hong Kong.
Hou, K., C. Xue, and L. Zhang (2020). Replicating anomalies. Review of Financial Studies 33(5),
2019–2133.
Huang, T., L. Jiang, and J. Li (2023). Downside variance premium, firm fundamentals, and
expected corporate bond returns. Journal of Banking and Finance, Forthcoming.
Jostova, G., S. Nikolova, A. Philipov, and C. W. Stahel (2013). Momentum in corporate bond
returns. Review of Financial Studies 26(7), 1649–1693.
Kaniel, R., Z. Lin, M. Pelger, and S. Van Nieuwerburgh (2022). Machine-learning the skill of
mutual fund managers. Technical report, National Bureau of Economic Research.
Kelly, B. T., S. Malamud, and K. Zhou (2022). The virtue of complexity in return prediction.
Journal of Finance, Forthcoming.
Kelly, B. T., D. Palhares, and S. Pruitt (2022). Modeling corporate bond returns. Journal of
Finance, Forthcoming.
Kelly, B. T. and S. Pruitt (2022). Reconciling trace bond returns. Technical report, Yale University.
44
Kelly, B. T., S. Pruitt, and Y. Su (2019). Characteristics are covariances: A unified model of risk
and return. Journal of Financial Economics 134(3), 501–524.
Kozak, S. and S. Nagel (2023). When do cross-sectional asset pricing factors span the stochastic
discount factor? Technical report, University of Michigan.
Kozak, S., S. Nagel, and S. Santosh (2018). Interpreting factor models. Journal of Finance 73(3),
1183–1223.
Kozak, S., S. Nagel, and S. Santosh (2020). Shrinking the cross-section. Journal of Financial
Economics 135(2), 271–292.
Lettau, M. and M. Pelger (2020). Factors that fit the time series and cross-section of stock
returns. Review of Financial Studies 33, 2274–2325.
Lin, H., J. Wang, and C. Wu (2011). Liquidity risk and expected corporate bond returns. Journal
of Financial Economics 99, 628–650.
Lopez-Lira, A. and N. L. Roussanov (2020). Do common factors really explain the cross-section
of stock returns? Technical report, University of Pennsylvania.
Markowitz, H. (1952). Portfolio selection. Journal of Finance 7, 77–99.
Merton, R. C. (1973). An intertemporal capital asset pricing model. Econometrica 41, 867–887.
Merton, R. C. (1980). On estimating the expected return on the market: An exploratory investigation. Journal of Financial Economics 8, 323–361.
Negal, S. (2021). Machine Learning in Asset Pricing. Princeton University Press.
Neuhierl, A., X. Tang, R. Varneskov, and G. Zhou (2021). Option characteristics as crosssectional predictors. Technical report, Washington University in St. Louis.
Newey, W. K. and K. D. West (1987). A simple, positive semi-definite, heteroskedasticity and
autocorrelation consistent covariance matrix. Econometrica 55, 703–708.
Newey, W. K. and K. D. West (1994). Automatic lag selection in covariance matrix estimation.
Review of Economic Studies 61(4), 631–653.
Raponi, V., R. Uppal, and P. Zaffaroni (2021). Robust portfolio choice. Technical report, mperial
College Business School.
Schaefer, S. M. and I. Strebulaev (2008). Structural models of credit risk are useful: Evidence
from hedge ratios on corporate bonds. Journal of Financial Economics 90, 1–19.
45
Internet Appendix for
“Deep Tangency Portfolios”
(not for publication)
Summary of Contents
• Section I presents detailed definitions of bond, equity, and option characteristics
used in our empirical studies.
• Section II presents the implementation details of our deep learning algorithm
and some robustness checks.
• Section III presents the implementation details of four linear machine learning
algorithms.
1
I
Bond, Equity, and Option Characteristics
Table A1: Description of 41 Bond Characteristics
Characteristics
Description
AGE
RATING
T2M
SIZE
DUR
VAR5
VAR10
LIQ BPW
LIQ ROLL
LIQ P HL
LIQ P FHT
LIQ AMIHUD
LIQ STD AMIHUD
LIQ TC IQR
MKT BETA
DEF BETA
TERM BETA
LIQ BETA
DRF BETA
CRF BETA
LRF BETA
VIX BETA
UNC BETA
STR
VARIANCE
SKEW
KURT
COSKEW
ISKEW
LIQ RANGE
LIQ TRADE
MKT RVAR
TERM DEF RVAR
TURN
YTM
MOM6
MOM12
LTR
barQ
std barQ 1mom
LIQ RANGE M
Time since issuance in years
Bond credit rating
The number of years to maturity
Amount outstanding
Bond duration
Value-at-risk 5% over past 3 years
Value-at-risk 10% over past 3 years
Liquidity measure of transitory price movements.
Roll’s liquidity
High-low spread estimator
Modified illiquidity measure based on zero returns
Amihud liquidity
Standard deviation of Amihud daily liquidity
Interquartile range
Market beta
DEF factor beta
TERM factor beta
Liquidity beta of bond illiquidity factor
Downside risk beta controlling bond market factor
Credit risk beta controlling bond market factor
Liquidity risk beta controlling bond market factor
VIX index beta
Macroeconomic Uncertainty Beta
Short-term reversal t-1
Variance of raw returns
Skewness of raw returns
Kurtosis of raw returns
Systematic skewness with bond market
Idiosyncratic skewness
Simple high-low spread
Number of trades
Market residual variance
TERM DEF residual variance
Bond Turnover
Yield-to-maturity
Momentum from t-2 to t-6
Momentum from t-7 to t-12
Long-term reversal from t-13 to t-48
average daily dollar volume in the 1-month period
standard deviation of dollar volume in the 1-month period
Simple high-low spread
2
Table A2: Description of 61 Equity Characteristics
Characteristics
Description
ABR
ACC
ADM
AGR
ALM
ATO
BASPREAD
BETA
BM
BM IA
CASH
CASHDEBT
CFP
CHCSHO
CHPM
CHTX
CINVEST
DEPR
DOLVOL
DY
EP
GMA
GRLTNOA
HERF
HIRE
ILL
LEV
LGR
MAXRET
ME
ME IA
MOM12M
MOM1M
MOM36M
MOM60M
MOM6M
NI
NINCR
NOA
OP
PCTACC
PM
PS
RD SALE
RDM
RE
RNA
ROA
ROE
RSUP
RVAR CAPM
RVAR FF3
RVAR MEAN
Abnormal returns around earnings announcement
Operating Accruals
Advertising Expense-to-market
Asset growth
Quarterly Asset Liquidity
Asset Turnover
Bid-ask spread (3 months)
Beta (3 months)
Book-to-market equity
Industry-adjusted book to market
Cash holdings
Cash to debt
Cashflow-to-price
Change in shares outstanding
Industry-adjusted change in profit margin
Change in tax expense
Corporate investment
Depreciation / PP&E
Dollar trading volume
Dividend yield
Earnings-to-price
Gross profitability
Growth in long-term net operating assets
Industry sales concentration
Employee growth rate
Illiquidity rolling (3 months)
Leverage
Growth in long-term debt
Maximum daily returns (3 months)
Market equity
Industry-adjusted size
Cumulative Returns in the past (2-12) months
Previous month return
Cumulative Returns in the past (13-35) months
Cumulative Returns in the past (13-60) months
Cumulative Returns in the past (2-6) months
Net Equity Issue
Number of earnings increases
Net Operating Assets
Operating profitability
Percent operating accruals
profit margin
Performance Score
R&D to sales
R&D Expense-to-market
Revisions in analysts’ earnings forecasts
Return on Net Operating Assets
Return on Assets
Return on Equity
Revenue surprise
Residual variance - CAPM (3 months)
Res. var. - Fama-French 3 factors (3 months)
Return variance (3 months)
3
Description of 61 Equity Characteristics (continued)
Characteristics
Description
SEAS1A
SGR
SP
STD DOLVOL
STD TURN
SUE
TURN
ZEROTRADE
1-Year Seasonality
Sales growth
Sales-to-price
Std of dollar trading volume (3 months)
Std. of Share turnover (3 months)
Unexpected quarterly earnings
Shares turnover
Number of zero-trading days (3 months)
Table A3: Description of 30 Equity Option Characteristics
Characteristics
Description
IVSLOPE
IVVOL
IVRV
IVRV RATIO
ATM CIVPIV
SKEWIV
IVD
DCIV
DPIV
ATM-DCIVPIV
NOPT
SO
DSO
VOL
PCRATIO
PBA
TOI
MFVU
MFVD
RNS1M
RNK1M
IVARUD30
RNS3M
RNK3M
RNS6M
RNK6M
RNS9M
RNK9M
RNS12M
RNK12M
Implied Volatility Slope
Volatility of atm implied volatility
Implied and historical volatility spread
Ratio of implied to historical volatility
Implied volatility spread
Implied volatility skew
Implied volatility duration
Change of implied volatility of atm call
Change of implied volatility of atm put
Change of implied volatility spread
Number of traded options
Stock-option volume ratio
Stock-option dollar volume ratio
Option Trading Volume
Put-call ratio
Proportional bid-ask spread
Total open interest
Option-implied upside semivariance
Option-implied downside semivariance
1-month risk-neutral skewness
1-month risk-neutral kurtosis
Option-implied variance asymmetry
3-month risk-neutral skewness
3-month risk-neutral kurtosis
6-month risk-neutral skewness
6-month risk-neutral kurtosis
9-month risk-neutral skewness
9-month risk-neutral kurtosis
12-month risk-neutral skewness
12-month risk-neutral kurtosis
4
II
Implementation Details
II.1
Data Clearning
As shown in Table 1, we provide the summary statistics for all our Bond-Month
observations (3200 monthly observations). Since the raw bond return data started in
July 2002 and we have a three-year window requiring a minimum of one year of data
to initialize risk characteristics, we start with the data from June 2004. After standardizing the raw characteristics, we filter the bond data whose maturity is less than two
years and impute the missing values with 0. The equity and option characteristics
are merged into bond data by the ’PERMNO’. If the firm has issued multiple stocks
simultaneously, we only keep and merge the earliest issued stock.
II.2
Model Training
We divide the TRACE dataset into two parts: we perform all the training on the
sample from July 2004 to June 2014 and test the sample from July 2014 to December
2020. We train the neural network structure and parameters during the in-sample
period and determine the mean-variance portfolio’s weights based on the in-sample
factors’ statistics. To avoid the effect of outliers during training, we also monthly winsorize the in-sample bond returns within the retlow,t and retup,t bounds, where retlow,t
and retup,t are the cross-sectional 2.5% and 97.5% quantiles of the month t, respectively.
Each month, bond returns that are lower/higher than the monthly low/up quantile of
cross-sectional returns will be revised as the retlow,t and retup,t . It is important to emphasize that winsorization is only applied to in-sample data. All out-of-sample results
are tested on non-winsorized data with potential extreme values.
Considering the factor value, the bond market employs an equally weighted approach for each month’s selected 3200 bonds’ excess return. We replicate the BBW
factors using the methodology outlined in Bai et al. (2019) with our corporate bond
5
dataset. For the IPCA/RP-PCA model, we train the PCA structure using the balanced
individual bond return from July 2004 to June 2014 and output the factor value for
both in-sample and out-of-sample periods based on this trained structure. Similar to
the IPCA model, our neural network takes the individual bond data from July 2004 to
June 2014 as a test training for our neural network structure and determines the choice
of tuning parameters through a two-fold cross-validation process as shown in Table
A2.
Figure A1: Softmax Ranking: a1 = 50, and a2 = 8
In implementation, we choose a1 = 50 and a2 = 8 in Equation (17) such that at
each time, about 50% to 70% of assets are in the middle rank and have zero weights,
similar to the traditional sorting procedure (see Figure A1 below). We fix the network
structure as trained, then feed in the pairwise characteristics data and return data to
generate the in-sample and out-of-sample estimates. Throughout section II, all presented results are computed with PyTorch 1.10.2 and are parallelized across a server
with 96 Intel(R) Xeon (R) Gold 6230 @ 2.10GHz CPUs and 314 GB of RAM.
6
II.3
Robustness to Tuning Parameters Selection
Figure A2: Two-Fold Cross Validation
This figure demonstrates the deterministic two-fold cross-validation scheme. We determine the tuning
parameters using the sample from July 2004 to June 2005. Specifically, the deterministic design divides
the sample into two consecutive fold samples. We train our neural network separately on one and then
calculate the fitting result with different tuning parameters on the other. We average the out-of-sample
loss and choose the parameter pair with the best performance on this criterion.
July 2004 to Jun 2009
Jul 2009 to June 2004 July 2014 to December 2020
Fisrt Fold
Train
Validation
Holdout
Second Fold
Validation
Train
Holdout
In this subsection, we outline our procedure for selecting tuning parameters. To determine the optimal tuning parameters for our network, we employ a two-fold crossvalidation approach (in Figure A2) as follows: We divide the sample period from July
2004 to June 2014 into two consecutive fold samples of equal length. We train our
neural network on one fold and evaluate the fitted results using different tuning parameters on the other fold. We compute the average loss from the validation samples
and select the parameter pair resulting in the smallest loss. Finally, we retrain the
model using the chosen tuning parameters. This approach ensures that the data used
is completely in-sample, eliminating any look-ahead bias that may affect our out-ofsample trading results discussed in the main text. Initially, we start with a reasonable
set of tuning parameters and test additional points adjacent to these sets. We evaluate
a total of 16 combinations of tuning parameters, which are detailed in Table A4.
II.4
Alternative Sample Split
We design a complete out-of-sample test to verify the performance of the tangency
portfolio in the chronological sample set.9 Inspired by Kaniel et al. (2022), we sequen9
In this section, the data we use differs slightly from the main text: we have certain corporate bond
characteristics that are obtained based on the BBW factors in Bai et al. (2019). In this section, we directly
7
Table A4: Tuning parameter selection in the empirical analysis
This table presents the network tuning parameters about the Sharpe ratio observed on our
validation data.
Notation
a1
a2
HDN
BTCH
LR
L1 Penalty
L2 Penalty
EPCH
OPT
Tuning Parameters
Value in equation (17)
Value in equation (17)
Number of nodes in the hidden layer
Batch size, in months
Learning rate
L1 penalty in objective function
L2 penalty in objective function
Number of optimization epochs
Optimization method
Candidates
50
5,8
66
120
1e-3,5*1e-3,1e-2,5*1e-2,1e-1,5*1e-1
1e-9,1e-8,1e-7,1e-6
1e-9,1e-8.1e-7,1e-6
400
Adam
Chosen
50
8
66
120
5*1e-2
1e-8
1e-8
400
Adam
Figure A3: Market time series for the different cross-out-of-sample folds
This figure plots the bond market return from July 2014 to December 2020. Different colors
denote the three cross-out-of-sample folds we use throughout the robustness check.
dataset_1
dataset_2
dataset_3
0.20
0.15
0.10
0.05
0.00
0.05
0.10
2004
2006
2008
2010
2012
2014
2016
2018
2020
tially randomly split the dataset into three parts following Fama and French (2018).
Thus, at every quarter (3 months), these three-month data are randomly assigned to a
specific group of datasets. We keep this random split until the data for every month
incorporate their factor values into our data, rather than calculating them on our dataset as described
in the main text.
8
Table A5: Performance on Random Splited Datasets
We report the out-of-sample performance of our model on the whole time period (2014.07
to 2020.12) based on the sequentially three-fold random splitting dataset. This table’s Panel
A reports the descriptive statistics in percentage containing the mean of return, Newey West
standard error (Newey and West, 1994), adjusted t-statistics, standard deviation (Std), Sharpe
Ratio (SR), and past twelve months’ Maximal Drawdown (Max DD) of whole period out-ofsample deep portfolios and tangency portfolios. Panel B compares the Sharpe Ratio of the MVE
portfolio between the market factor and adding one deep factor (from 1- to 3-layer models).
Panel A. Descriptive Statistics
Rd1
Rd2
Rd3
R1opt
R2opt
R3opt
Mean
tstat
Std
SR
Max DD
0.39
0.21
0.36
0.41
0.26
0.36
(3.34)
(4.56)
(4.32)
(4.66)
(5.17)
(5.18)
1.06
1.08
0.82
1.03
0.65
0.71
1.27
0.66
1.51
1.38
1.41
1.75
1.61
11.76
2.18
0.50
6.29
1.40
Panel B. Out of Sample Sharpe Ratios
M KTC
L1
L2
L3
0.80
1.38***
1.41***
1.75***
has been assigned to a group setting. Figure A3 shows how we split the full dataset
into three groups, where different colors denote the three folds. We then implement
the same process in subsection 2.2 to construct the neural network and take the threefold validation for each sub-group data: for a specific test fold, the other two folds are
used to estimate and validate the parameters. Crossly estimating the parameters for
three folds, we obtain an out-of-sample result on the whole sample period.
Panel A of Table A5 lists the descriptive statistics of the out-of-sample deep portfolio on the entire time period. The 3-layer model generates the best deep portfolio with
the highest Sharpe Ratio (1.51), followed by the 1-layer model (1.27). The Sharpe Ratio
of the tangency portfolios shows significant improvement compared to the bond market. The results remain robust when subjected to cross-out-of-sample analysis with
sequentially random sampling, indicating that our sample selection does not influ-
9
ence the prominent results.
III
Machine Learning Implementations
In section 4.5.2, we present a long-short portfolio based on predicted bond returns
from four machine learning methods: Lasso, Ridge, principal component analysis regression (PCA), and partial least squares (PLS). Our implementation follows He, Feng,
Wang, and Wu (2021b), and we find consistently positive performance.
PCA and PLS are classic dimension reduction techniques commonly used in empirical asset pricing. They solve the bias-variance trade-off problem using a lowdimension version of linearly transformed predictors to construct a predictive model.
PCA and PLS consist of a two-step procedure where the first step combines predictors
with a small set of linear combinations that best preserves the covariance structure.
The first K components are used in multiple regressions in the second step. For our
applications, K is set to 5 for both PCA and PLS.
Lasso and ridge are linear predictive regressions used in machine learning finance
to preserve the interpretability of linear models. They add a penalty over ordinary linear regression to preserve the predictors without transforming them. Lasso and ridge
share similar loss functions but have different regularization effects. Lasso performs
variable selection, while ridge shrinks the regression coefficients of useless predictors to very small numbers. A tuning parameter controls the penalty weight, with a
larger penalty weight imposing more shrinkage on the coefficients. A three-fold crossvalidation determines all tuning parameters.
10
Download