Uploaded by 12kevinzhu

AxiomaRiskModelHandbookV4

advertisement
Axioma RobustTM Risk Model Version 4 Handbook
Axioma Robust Risk Model Version 4 Handbook
TM
DRAFT
October 2017
Axioma
1
Axioma RobustTM Risk Model Version 4 Handbook
Contents
1 What is Risk?
4
2 Why a Multi-Factor Model?
2.1 The Three Types of Factor Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
7
3 Model Factors
3.1 Market Factor or Intercept . . . . . . . . . . . . . . . . . .
3.2 Industry Factors . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Country Factors . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Style Factors . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Standardisation of Style Factors . . . . . . . . . . .
3.4.2 Stability of Style Factor Exposures . . . . . . . . . .
3.4.3 Exposures for Assets with Insufficient Data (IPOs) .
3.4.4 Exposures for assets with missing Fundamental Data
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
9
10
10
11
13
14
15
16
5 The Cross-Sectional Returns Model
5.1 The Least-Squares Regression Solution . . . . . . .
5.1.1 Assumptions of the Least-Squares Solution
5.2 Solving the Problem of Heteroskedasticity . . . . .
5.3 Outliers in the Regression . . . . . . . . . . . . . .
5.4 Robust Regression . . . . . . . . . . . . . . . . . .
5.5 Regression statistics . . . . . . . . . . . . . . . . .
5.6 Thin Factors . . . . . . . . . . . . . . . . . . . . .
5.6.1 Treatment of Thin Factors . . . . . . . . . .
5.7 The Problem of Multicollinearity: Binary Factors .
5.8 The Problem of Multicollinearity: Style Factors . .
5.9 Currency Factors . . . . . . . . . . . . . . . . . . .
5.10 Factor Mimicking Portfolios . . . . . . . . . . . . .
5.10.1 Constraints in the regression . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
19
19
20
21
22
24
24
25
26
29
29
32
33
6 Common Model Issues
6.1 Model Error: specification versus noise
6.2 Extreme Data . . . . . . . . . . . . . .
6.3 Missing Data . . . . . . . . . . . . . .
6.3.1 A non-trading day problem . .
6.3.2 An illiquid asset problem . . .
6.4 Solutions . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
35
35
38
39
41
42
42
7 Statistical Approaches
7.1 Principal Components . . . . . . .
7.2 Asymptotic Principal Components
7.3 Expanded History . . . . . . . . .
7.4 Model Statistics . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
44
44
45
46
46
DRAFT
4 Estimation Universe
Axioma
.
.
.
.
.
.
.
.
2
Axioma RobustTM Risk Model Version 4 Handbook
8 The
8.1
8.2
8.3
8.4
8.5
Risk Model
Factor Covariance Matrix . . . . . . . . . . .
Exponential Weighting . . . . . . . . . . . . .
Autocorrelation in the Factor Returns . . . .
Combining Covariance Matrices . . . . . . . .
8.4.1 The Procrustes Transform . . . . . . .
Market Asynchronicity . . . . . . . . . . . . .
8.5.1 Synchronizing the Markets . . . . . .
8.5.2 Non-trading Days and Returns Timing
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
47
47
47
50
51
52
53
54
54
9 Modelling Currency Risk
60
9.1 Currency Covariances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
9.2 Numeraire Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
10 Specific Risk Calculation
66
11 Predicted versus Realised Betas
67
DRAFT
12 Modeling of DRs and Cross-Listings
12.1 Modeling DR Returns . . . . . . . . . . . . . . .
12.2 Currency Exposure . . . . . . . . . . . . . . . . .
12.3 Market Exposure . . . . . . . . . . . . . . . . . .
12.4 Issuer Specific Covariance . . . . . . . . . . . . .
12.5 Cointegration . . . . . . . . . . . . . . . . . . . .
12.6 Some assets are more cointegrated than others .
12.7 Treatment of Style Exposures . . . . . . . . . . .
12.8 The Cointegration Algorithm for Specific Returns
12.9 The Final Bit . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
68
68
68
69
70
72
76
79
81
81
A Appendix: Constrained Regression
86
A.1 Exact Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
A.2 Dummy Asset Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
A.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
B DRs and Returns-Timing
Axioma
88
3
Axioma RobustTM Risk Model Version 4 Handbook
1
What is Risk?
There is no single answer to the question, “what is the risk of a given asset or portfolio? ”. Economists,
for example, typically associate risk with abstract notions of individual preference, whereas financial
regulators may prefer a measure such as Value-at-Risk (VaR). And then there is the whole vexed
debate of risk versus uncertainty. We won’t wade into those waters here. Axioma defines risk as
the standard deviation of an asset’s return over time. This statistical definition is straightforward,
broadly-applicable, and intuitive. An asset whose return varies wildly over time is volatile, therefore risky; another whose return remains fairly low-magnitude is relatively predictable, and thus
less risky.
Throughout the following discussion, ri,t will represent the return to an asset i, at time t:
ri,t =
pi,t + di,t − pi,t−1
− rrf,t
pi,t−1
DRAFT
where pi,t is the asset’s price at time t, di,t is any dividend payout at time t, and pi,t−1 is the price at
the previous time period, adjusted for any corporate actions (e.g. stock splits). The time increment
t used could be days, weeks, months, or any other period. The quantity rrf,t is the risk-free rate
— the return to some minimal risk entity (e.g. LIBOR rate). Returns net of the risk-free rate are
termed excess returns and are usually used for the purposes of risk modeling. Henceforth, unless
stated otherwise, the term return will be used to denote excess return.
The risk of an asset over T time periods is thus given by the square-root of the sample variance
(or standard deviation) of the asset returns:
v
u
T
u1 X
σi = t
(ri,t − r¯i )2
T
t=1
where r¯i is the asset’s mean return over time.
Investors tend to think in terms of portfolios of assets rather than individual assets in isolation.
A portfolio allows for diversification and risk reduction, as illustrated by a simple example: consider
a portfolio of two risky assets A and B, with weights wA and wB . The risk of this portfolio is
q
2 σ 2 + w 2 σ 2 + 2ρ
σ(wA rA + wB rB ) = wA
AB wA wB σA σB
A
B B
where −1 ≤ ρAB ≤ 1 is the correlation coefficient between the two assets. It can be shown that
σ(wA rA + wB rB ) ≤ wA σA + wB σB
with equality if ρAB = 1. In other words, the risk of the portfolio of two assets will tend to be lower
than the sum of the risks of the individual assets. This is the key to portfolio diversification. When
analyzing portfolio risk, it is therefore necessary to know not only the risk of each asset involved,
but also the interplay or co-movement of each asset with every other. This information is contained
in a covariance matrix of asset returns:


σ12
ρ12 σ1 σ2 ρ13 σ1 σ3 · · · ρ1n σ1 σn


..
2
 ρ21 σ2 σ1

σ
ρ
σ
σ
.
23
2
3
2




.
..
2
Q =  ρ31 σ3 σ1 ρ32 σ3 σ2

σ
3




..
..
..


.
.
.
2
ρn1 σn σ1
···
···
···
σn
Axioma
4
Axioma RobustTM Risk Model Version 4 Handbook
Ultimately, this is all a risk model is: a big covariance matrix1 , yielding the relationships of the
assets to each another.
The risk of a portfolio h, where
 
w1
 w2 
 
h= . 
 .. 
wn
and wi , i = 1, . . . , n are the weights, or holdings, of each asset, is then computed as
p
σh = hT Qh
Note that the matrix is symmetric and positive-semidefinite
DRAFT
1
Axioma
5
Axioma RobustTM Risk Model Version 4 Handbook
2
Why a Multi-Factor Model?
DRAFT
Possessing an accurate estimate of the asset returns covariance matrix is the sine qua non of portfolio
risk management. The obvious question is: how does one calculate such a matrix in practice? The
obvious solution would seem to be to build a history of asset returns and then calculate the sample
variances and covariances directly from this.
Computing sample statistics directly from historical data, however, is fraught with danger. Historical returns are typically noisy; even in the absence of actual data errors, false signals and spurious relationships abound. Two assets may appear closely related when their seemingly-correlated
behavior is in fact an artifact of data-mining.
Weak signals and noise aside, when a new asset enters the existing asset universe there is no
reliable way of calculating its relationships with the other assets, or even its own volatility, as it does
not yet possess a returns history. One could attempt to construct proxies, to mimic its assumed
behaviour, but such an approach has no guarantee of success.
Finally, and possibly most importantly, observations totalling no less than the number of assets
are required to accurately estimate all the variances and covariances directly. For any realistic
number of assets, it is extremely unlikely that sufficient observations exist. Even with a universe of
100 assets, over 12 · 100 · (100 + 1) > 5, 000 relationships need to be estimated. For markets like the
U.S. (over 12,000 assets), this becomes completely infeasible.
Any one of the above problems is sufficient reason against constructing an asset returns covariance matrix directly. A better approach is to reduce the amount of information to be estimated.
We achieve this by imposing structure on the asset returns: we hypothesise that at least some part
of any asset’s return is driven by a relatively small set of common factors. These factors influence
the returns of some or all of the assets in the market, and each asset will be exposed to each factor
to some extent. Asset returns can then be modeled as a function of a relatively small number
of parameters, and the estimation of hundreds of thousands of asset variances and covariances is
reduced to calculating, at most, a few thousand.
Factors used in multi-factor models fall into several broad categories:
• Fundamental factors
– Industry and country factors reflect a company’s line of business and country of domicile.
– Style factors encapsulate the financial characteristics of an asset — a company’s size,
debt levels, liquidity, etc. They are usually calculated from a mixture of market and
fundamental (i.e. balance sheet) data.
– Currency factors represent the interplay between local currencies of the various assets
within the model.
– Macroeconomic factors capture an asset’s sensitivity to variables such as GNP growth,
bond yields, inflation, etc.
• Statistical factors are mathematical constructs responsible for the observed correlations in
asset returns. They are not directly connected to any observable real-world phenomena, are
not necessarily continuous from one period to the next.
An asset’s return is decomposed into a portion driven by these factors (common factor return)
Axioma
6
Axioma RobustTM Risk Model Version 4 Handbook
and a residual component (specific return), producing the following model at time t:
 

  
u1
b11 · · · b1m  
r1
f1


 r2   b21
b2m   .   u2 

  
 ..  =  ..
..   ..  +  .. 
..


.  .
.
. 
.
fm
un
bn1 · · · bnm
rn
or more succinctly, in matrix form:
r = Bf + u
DRAFT
where r is the vector of asset returns at time t, f the vector of factor returns, and u, the set of
asset specific returns. B is the n × m exposure matrix: its elements denote each asset’s exposure/sensitivity/beta/loading to a particular factor.
Theoretical foundations for this model specification make up the Arbitrage Pricing Theory
(APT), proposed by Ross (1976) as a generalization of the traditional Capital Asset Pricing Model
to allow for multiple risk factors. The APT is therefore a logical starting point for building a factor
risk model.
As an example, consider a simple model with four assets, a, b, c and d, and three factors — two
industry factors, IT and Banking, and one style factor — size, which takes values of {−1, 0, 1} to
represent small, mid-cap and large companies respectively2 . Assets a and b are IT companies, while
c and d are banks. a and b are small-cap, c is mid-cap and d is large-cap. The exposure matrix may
look like this:
IT bank size

a
1 0 −1
1 0 −1
B= b


0 1 0 
c
d
0 1 1
The objects r and B are known, so the system of equations can be solved for f and u.
This process is repeated over time to build two time series — one of factor returns and another
of asset specific returns. Given that there is a comparatively small number of factors, it is feasible to
estimate a factor covariance matrix directly from the factor returns time series. Moreover, assuming
that asset specific returns are uncorrelated both amongst themselves and with factor returns 3 , the
n × n asset returns covariance matrix may be computed as:
var(r) = var(Bf + u)
Q = BV B T + ∆2
where V is the m × m factor covariance matrix and ∆2 is the diagonal matrix of specific variances.
In essence, the multi-factor model is a dimension reduction tool, simplifying the problem of
calculating an n × n asset returns covariance matrix into calculating the variances and covariances
of a much smaller number of factors, and n specific variances.
2.1
The Three Types of Factor Model
We define our general model of returns thus:
r = Bf + u.
2
3
In reality, style exposures usually take a continuum of values rather than simple scores as above.
Note that this is arguably the single most important assumption in the entire theory of risk models
Axioma
7
Axioma RobustTM Risk Model Version 4 Handbook
This begs the obvious question: how do we compute the various components of the model? The
asset returns r, are the only part of the model known in advance with 100% certainty (and even this
is questionable when we consider illiquid assets). There are, broadly speaking, three approaches:
1. Fundamental models. These are models where we pre-compute the exposures to the factors,
B at each point in time, and then estimate the factor returns f cross-sectionally.
2. Statistical models. For these, we simultaneously estimate exposures and a history of factor
returns using a corresponding history of asset returns.
3. Macroeconomic models. Here, we begin with time-series of asset returns, derived from macroeconomic indicators such as oil prices, employment data etc. We then estimate our exposures
as sensitivities to these time-series.
DRAFT
All model approaches have their particular strengths and weaknesses. In this handbook we tend to
restrict ourselves to the first two such approaches, as the third is dealt with elsewhere in Axioma
(2013).
In the following sections we discuss in greater detail the various parts of a risk model and
the stages in its construction. We begin with detailing how we compute the exposures for our
fundamental models, as these are the foundations upon which the rest of the model stands.
Interested readers may wish to consider Grinold and Kahn (1995) or Zangari (2003) for a more
complete exposition on factor risk models and their applications.
Axioma
8
Axioma RobustTM Risk Model Version 4 Handbook
3
Model Factors
The factors we identify and combine into our model are arguably the most critical part of the risk
model. Every asset must have exposures to its set of model factors, estimated via a number of
approaches. We here detail the types of factors and the processes whereby they are computed.
3.1
Market Factor or Intercept
Many models include a single factor that encompasses the overall market behaviour, and which is
typically by far the most statistically significant of all model factors. This could take the form of
the following:
1. Simple Intercept: Every asset has exposure of one to the market and the factor return can be
thought of as the regression intercept term.
2. Market Beta: The assumption behind the simple intercept is that each asset has a beta of one
to the market. This is obviously untrue, and so we instead compute an asset’s market exposure
as its historical beta against a suitable market return (from a benchmark for instance), viz.
ri,t = βi rM,t + αi,t
for each asset i = 1, . . . , n
where rM,t are the returns of the market benchmark index.
Industry Factors
DRAFT
3.2
Returns of assets belonging to companies with similar lines of business often move together. Industry
factors are formed from a set of mappings from each asset to one or more industries within the
market. The mapping used can be a proprietary third-party scheme such as GICS or ICB, or a
modification thereof, consolidating, splitting, or discarding industries where appropriate.
Alternatively, one could create a new scheme altogether by investigating company fundamentals,
or by statistical analysis of asset returns. Custom-tailored industry schemes may provide marginally
more explanatory power but require intensive work to design. Wherever possible, Axioma prefers
industry factor structures that correspond directly to classifications widely accepted among the
investment community, particularly if an industry scheme is already a de-facto market standard.
There are two main ways in which one may define industry exposures:
1. As binary variables: an asset has unit exposure to the industry corresponding to its company’s
main line of business, and zero to other industries.
2. As fractional values. This enables one to assign multiple industry exposures per asset. One
could attempt to compute the exposures via:
• A sensitivity or beta, using a time-series regression.
• Fundamentals: e.g. each exposure being a fraction representing the proportion of activity
within that industry.
The problem with the first approach is that we then run into the vast amount of false positives
and negatives that would inevitably arise, together with the issue of insufficient or illiquid asset
histories. The problem with the second is that for many markets, this level of detail is not
readily available, and researching this information, especially in less-developed markets, is a
challenging problem, to say the least
Axioma
9
Axioma RobustTM Risk Model Version 4 Handbook
Currently, we follow approach one, for its simplicity and robustness. While we do not rule some
form of the second approach for the future, there is little or no hard evidence that more complex
industry schemes lead to measurably better models than the binary approach.
An important consideration in designing an industry factor scheme is avoiding thin industries
within the model. A thin industry contains few or no assets, or has most of its market capitalisation
concentrated in a small handful of assets. We will have more to say on this later.
3.3
Country Factors
A risk model covering more than one market needs to distinguish between assets’ countries of
incorporation 4 . These are similar in flavour to the industry factors, as the types of country exposure
include:
1. As binary variables: an asset has unit exposure to its country of incorporation, zero otherwise.
2. As fractional values. Again, one could attempt to compute the exposures via:
• A sensitivity or beta, using a time-series regression versus one or more country market
returns, for instance, as βi where
ri,t = βi rM,t + αi,t
for each asset i = 1, . . . , n
where rM , t are the returns of a local benchmark index.
DRAFT
• Fundamentals: e.g. each exposure being a fraction representing a company’s proportion
of sales within each country, or its assets therein, for instance.
For much the same reasons as for industries, our current approach is the first instance above.
Binary variables have the advantage of simplicity: a stock is either exposed to the market or it is
not. Country betas are more computationally intensive and therefore may yield greater sensitivity
and better fit in theory. In practice, however, any advantages are often overwhelmed by the extra
noise introduced. We will talk about the two major types of model error later.
Exactly as for industries, it is perfectly possible for country factors to exhibit thinness. We treat
these in exactly the same way that we treat thin industry factors (more later).
3.4
Style Factors
Whereas industry and market factors capture trends on the broad market level, style factors capture
behavior on an asset level, net of market trends.
Style factor exposures are derived from market data such as return, trading volume, capitalization, and/or balance sheet (fundamental) data. For instance, to construct a style factor encapsulating the size of an asset’s parent company relative to others in the market, one might consider a
function of the company’s market capitalization, total assets, etc. Typical varieties of style factor
include the following:
4
For some assets, e.g. ADRs, one may choose to explicitly expose the asset to the market within which it trades
rather than the underlying asset’s domicile.
Axioma
10
Axioma RobustTM Risk Model Version 4 Handbook
Style
Value
Leverage
Growth
Momentum
Description
Cheap or expensively priced stock
relative to fundamentals
Measure of indebtedness of company
Earnings expected to grow over time
Consistently high or low returns
over a period
Size
Large or small company
Return sensitivity
Sensitivity of asset returns to external factors
Whether a stock is illiquid or not
Liquidity
Volatility
Stocks returns large in magnitude
relative to the market
Typical Components
Book-to-price,
Earnings-to-price,
Sales-to-price
Debt-to-assets, Debt-to-equity
Earnings-growth, sales-growth
One-year cumulative return, onemonth cumulative return, stock alpha
Log of market capitalisation, total
assets
Market beta, oil-price sensitivity,
exchange-rate sensitivity
ADV to market capitalisation, Amihud ratio, Liquidity beta
Market beta, residual historical
volatility,
DRAFT
The above is a very small sample. There are many hundreds, if not thousands, of candidate measures
for use as styles. A great many, typically, tend to be derived either from a history of market data
such as returns, or financial ratios such as book-to-price, debt-to-assets etc.
In practice, any measure that can be calculated and for which sufficient data exists can be
considered as a potential style factor. Typically, when a model is first estimated, many candidate
factors are created. These are then filtered down to an “optimal” set, based on some considerations:
• Data availability — there must be sufficient depth and breadth of data to be able to calculate
a variable reliably.
• Intuitiveness — a style factor must be intuitive to users, describing a sensible characteristic of
an asset or company. The average shoe size of a company’s employees may carry explanatory
power, but may prove somewhat difficult to explain as part of a strategy.
• Empirical performance — a factor must be statistically significant, capable of explaining a
non-negligible portion of returns. We detail some of the tests and metrics used in Axioma
(2017b).
3.4.1
Standardisation of Style Factors
Assuming that we have decided on our set of style factors, at each period we compute their raw
exposures using the latest available data. However, because style factor definitions are expressed in
a mixture of units (betas, ratios, returns etc.), we needs must standardise the raw values - convert
them to a common level, much like z-scores. We do this for two reasons: firstly, a great difference of
magnitudes is likely to lead to numerical problems when we compute factor returns, and secondly,
making them all roughly unitary, leads to factor returns that are of similar magnitude to the asset
returns.
To standardise a set of factor exposures b’:
1. Using only the assets in the estimation universe 5 portfolio hU , calculate the capitalizationweighted mean exposure:
b̄ = bT hU
5
See the following section on the estimation universe
Axioma
11
Axioma RobustTM Risk Model Version 4 Handbook
2. Calculate the equal-weighted standard deviation of the exposure values in hU about the
capitalization-weighted mean (zero, per above step):
s
1 X
σb =
(bi − b̄)2
N −1
i∈U
3. Subtract the weighted mean exposure from each asset’s raw exposure and divide by the equalweighted standard deviation:
bi − b̄
b̂i =
σb
DRAFT
This ensures that the estimation universe has a capitalization-weighted mean of zero and standard
deviation of one; in other words, a typical market capitalisation-weighted “broad-market” portfolio
will be approximately factor neutral. Note that we make no other assumption about the crosssectional distribution of the exposures. It is a common fallacy that they should be, for example,
normally distributed. This is not the case.
Outliers in the raw exposure data can skew the standardizing statistics, so raw exposures are
typically winsorized prior to standardisation. Given an upper bound bU and lower bound bL , for a
given raw style factor exposure, braw , set:


braw < bL
bL
b˜i = braw bL < braw < bU


bU
braw > bU
There are a number of different methods by which one could determine bU and bL :
• Absolute values. One may, for instance, have a prior belief that the data should lie within the
interval [-5, 5], and set the bounds accordingly.
• Standard deviation. Compute the standard deviation of the entire set of exposures, then pick
a multiple k of the standard deviations and truncate all values above or below k standard
deviations from the (possibly weighted) mean.
• Robust statistics. Rather than using the mean and standard deviation, employ the median
and median absolute deviation (MAD):
M AD(b) = median (|b − median(b)|)
Then, pick a number of MADs above and below the median, and use the corresponding values
to winsorize. We will discuss this further in a later section when we talk about outliers in
general.
The first method relies upon having some idea of what “reasonable” values are, and may be suitable
for data one is familiar with, such as market betas. For more complicated factor definitions, a
dynamic approach may be more appropriate. The Axioma approach is to use the median absolute
deviation (MAD). This forms the workhorse for outlier detection within our models, being used for
many other applications than exposure standardisation. We shall have more to say on this when
we treat of outliers in general.
Axioma
12
Axioma RobustTM Risk Model Version 4 Handbook
3.4.2
Stability of Style Factor Exposures
DRAFT
Many of our market-based style factors, such as momentum, market sensitivity and volatility, rely
on a rolling history of asset returns for their computation. If we take the case of medium-term
momentum, a 230-day history of asset returns is extracted for each asset, and its cumulative return
computed. This gives the raw exposure for the current day. The next trading day, this history is
advanced by one day, and the oldest datum dropped from the series. And the computation repeats.
In earlier models, the returns histories used for these factors were equal-weighted — that is,
every return in the series contributes equally to the descriptor value. This, however, introduces
potential instability, as exposures may exhibit spurious jumps when a large return enters or exits
the series.
To counter this, from the fourth generation of models onwards we introduced a new weighting
scheme for such descriptors. We downweight returns in the beginning and end of the series, while
leaving the central portion of the series with equal weights of one, forming a trapezoid. Downweighting is done is a linear fashion, with that first and last return in the series having the smallest weight,
the second and penultimate return having the next smallest weight, and so on, until the central
portion of the series is reached Figure 1 demonstrates how this works in practice for a 250-day
history of data.
Figure 1: Trapezoidal Weighting Scheme
The exact number of returns downweighted in this fashion varies from descriptor to descriptor,
depending upon the total length of returns used.
We can see how the trapezoid weighting scheme improves the stability of asset-level exposures
from Figures 2 and 3, which depict exposures to the Market Sensitivity factor for Facebook and
Google using two Axioma models: US3 (with no trapezoidal weighting), and US4 (with trapezoidal
weighting). While the overall level of exposures is unaffected over time, the weighted exposures of
the US4 model are much less noisy.
At a model level, we measure the stability of exposures by computing the correlation coefficient
between exposures on dates t and t − 21. These correlation coefficients are sometimes referred to as
Axioma
13
Axioma RobustTM Risk Model Version 4 Handbook
Figure 3: Google
Figure 2: Facebook
DRAFT
exposure “churn” or exposure turnover statistics. A correlation coefficient that is close to 1 suggests
that exposures are very similar on dates t and t − 21 (and churn is, therefore, relatively low), and
a correlation coefficient that is close to zero suggests that exposures are dissimilar on dates t and
t − 21 (and churn is, therefore, relatively high). A style exposure that has too high a churn will
yield unstable exposures over time (obviously), but also unstable risk estimates, and related metrics
such as model betas. A high churn is therefore undesirable in most situations.
Plots 4 and 5 show how exposure churn is improved with the use of trapezoid weights for the
Market Sensitivity and Momentum factors, while not adversely affecting the performance of these
factors in the model. 6
Figure 4: Market Sensitivity Churn
3.4.3
Figure 5: Medium-Term Momentum Churn
Exposures for Assets with Insufficient Data (IPOs)
Many factor exposures, such as momentum, market sensitivity and volatility rely on a fixed history
of asset returns for their computation. In the case of an IPO, no previous history will exist. While
we can and do generate a proxy history of returns for such assets as a starting point (see later),
returns-based exposures for IPOs are still likely to suffer from instability and unintuitive values. We
therefore shrink exposures of IPOs toward the cap-weighted average sector exposure. The shrinkage
parameter is a linear function of the number of days on which actual returns are available.
To illustrate, let X denote the raw style exposure for a recently listed asset, and let M denote
the cap-weighted average exposure, computed across the sector to which the asset belongs. Let T
be the number of returns required for the style factor’s computation, and let t denote the actual
number of genuine returns available. Then the shrunk exposure, X̃, is given by
6
Factor performance is measured by inspecting cumulative factor returns, t-statistics for factor returns, factor
volatility, etc. The differences in factor performance for models estimated with trapezoid and equal weighting schemes
are negligible.
Axioma
14
Axioma RobustTM Risk Model Version 4 Handbook
((T − t) ∗ M + t ∗ X)
T
Thus, at t = 0 (the IPO date), the asset’s exposure is the cap-weighted average sector exposure,
while at t = T , the exposure is determined entirely by the asset’s returns.
By way of example, Figures 6 and 7 present US3 and US4 market sensitivity and volatility
exposures for Google in the 18 months following its IPO in 2004. Note that the US4 model avoids
the initial, and presumably erroneous, large market sensitivity and volatility exposures of the predecessor model, whilst US4 “plays it safer” by adopting much more neutral initial values.
X̃ =
Figure 7: Google Volatility
Figure 6: Google Market Sensitivity
Exposures for assets with missing Fundamental Data
DRAFT
3.4.4
When a style exposure derived from fundamental data (Value, Growth, Leverage etc.) has missing
values, we employ a simple cross-sectional model to fill the missing data.
We construct an “exposure” matrix for each asset consisting of exposure to region, sector and
size, call it Z. Then, for each style exposure, we fit a cross-sectional model to the vector of nonmissing values, x0 , viz.
x0 = Z0 v
where Z0 is the subset of exposures corresponding to these assets. Then, given the exposure matrix
for the assets with missing style exposure, say Z1 , estimate values for the missing style exposure,
x1 via
x1 = Z1 v.
These estimates are then used to fill the missing raw exposure values prior to standardisation.
That concludes our overview of factors, for now. Two varieties of model factor conspicuous by
their absence are currency factors and statistical factors. It is easier to treat of these when we come
to look at the cross-sectional regression and the statistical models respectively. First, we we make a
slight digression and speak of the model estimation universe, a critical, but often overlooked, part
of the risk model.
Axioma
15
Axioma RobustTM Risk Model Version 4 Handbook
4
Estimation Universe
There are two important sets of assets:
• The model universe, i.e. all of the stocks contained in a particular model, and associated with
the market(s) therein, and
• The estimation universe.
DRAFT
When estimating factor returns (or other key model statistics) one seldom uses the entire model
universe of asset returns. The overall market may contain many illiquid assets as well as other
potentially “problematic” assets such as investment trusts, depository receipts, foreign listings, to
name a few.
Illiquid assets lack a stable, regular price, making their volatility difficult to estimate reliably.
Composite assets such as investment trusts and ETFs are difficult to quantify in terms of their
exposure to the market — if such an instrument invests in several different sectors or markets, how
should one define its industry/country exposures? Similarly, is an ADR’s return primarily driven
by the U.S. market factors, or by the behavior of the underlying stock and its associated market?
These are difficult questions to answer and, depending on the market and the model, different
assets need to be excluded from the estimation process. Essentially, the estimation universe should
contain that which is “representative” of the market, and which is sufficiently large and liquid to
be modelled well. In this respect, the rules we use to determine the estimation universe are similar
to those used by index providers when constructing market benchmarks. Index providers typically
employ sophisticated selection criteria involving price activity, liquidity, capitalisation free-float,
and other business logic, much of which we also apply. A critical way in which we differ is that our
model estimation universe should include sufficient assets to keep the number of “thin” factors to a
minimum. These are factors to which only a small handful of assets have nonzero exposure, and are
among the worst evils one can encounter while modeling return, for reasons to be explained later.
For this reason, even if legal issues were not a significant deterrent (they are), we prefer to
create and apply our own selection criteria. While every model differs in exact detail, most follow
something akin to this broad set of rules:
• Remove foreign listings, DRs etc.
• Exclude particular asset classes, e.g. Preferred stock, ETFs, Investment Trusts.
• Exclude assets traded on particular exchanges, whether because of illiquidity or some other
undesirable characteristic.
• Exclude illiquid assets or those without sufficient returns history.
• Exclude very small assets (many of which would also be illiquid).
• Attempt to “inflate” any thin factors where possible.
• Grandfathering - add back excluded assets which had previously qualified within a given time
frame.
Of course, there are exceptions to these rules, and some models, for instance, rather than excluding
particular asset types, only include certain asset types (e.g. common stock, REITs). But as a
rough guide, the above suffices. Figure 8 shows the effect of our logic on the US market for the
Axioma US4 model. While there are anything up to 12, 000 assets associated with the US market,
Axioma
16
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
Figure 8: US4 Model estimation universe: Numbers of Assets
Figure 9: US4 Model estimation universe: Proportion of ADV
Axioma
17
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
after screening, the amount of assets that find their way into the estimation universe is in the
region of 3000. This may seem a small proportion of the total in terms of asset numbers. However,
when we look at the proportion of estimation universe ADV to the total, the wisdom becomes
apparent. Our US4 estimation universe accounts for the vast majority of the US market’s liquidity.
By lowering our standards, we would probably be admitting huge numbers of very illiquid (and
therefore problematic) assets.
Axioma
18
Axioma RobustTM Risk Model Version 4 Handbook
5
The Cross-Sectional Returns Model
Recall the linear factor model of asset returns:
r = Bf + u
There are many possible solutions to this system of equations. If factor exposures B are known,
f can be estimated using cross-sectional regression analysis. In the case of what is loosely termed
macroeconomic models, however, f is observed, and it is B rather, that needs to be estimated,
typically via time-series regression for each asset. In the case of statistical factors, neither B or f is
specified, so a rotational indeterminacy exists and both parameters are determined simultaneously,
albeit only up to a nonsingular transformation.
For a more thorough discussion of multi-factor models and the APT, the curious reader is
encouraged to consult Campbell et al. (1997). We now fix our attention on the typical approach
taken by fundamental models, whereby the exposures, B are specified at a point in time, and the
factor returns, f , estimated via cross-sectional regression.
5.1
The Least-Squares Regression Solution
The ordinary least-squares (OLS) regression solution to the factor model of returns seeks to minimize
the sum of squared residuals:
n
X
fols = arg min
u2i
i=1
DRAFT
f
whose solution is straightforward:
fˆols = B T B
5.1.1
−1
BT r
Assumptions of the Least-Squares Solution
The field of regression theory is vast, so only the issues most relevant to risk modeling are dealt
with here. Further details can be found in any elementary econometrics textbook, such as Greene
(2003, chap. 4,5).
1. B is a n × m matrix with full column rank ρ(B) = m ≤ n. The OLS solution requires
that B T B be invertible, which is satisfied only if the columns of B are linearly independent.
Intuitively, this means the factors should all be distinct from one another.
2. Residuals are zero-mean and independent of the factor exposures. In
order
for the regression
estimates to be unbiased — correct “on average”, E [u] = 0 and E B T u = 0 are required.
3. Residuals are homoskedastic and have no autocorrelation. These constitute the Gauss-Markov
conditions:
var(ui ) = σ 2 and cov(ui , uj ) = 0 for all ui , i = 1, . . . , n, i 6= j (or more compactly,
T
E uu = σ 2 In ) and establishes the superiority of the least-squares solution over all other
linear estimators. Unfortunately, large assets tend to exhibit lower volatility than smaller
ones, and homoskedastic residual returns are rarely observed. Figure 10 shows the typical
relationship between asset size and returns behavior.
4. Residuals are normally-distributed : strengthening the previous assumption to u ∼ N (0, Ω)
where Ω = σ 2 In is not strictly required. Nevertheless, it is a convenient assumption for testing
the estimators, to simplify constructing confidence intervals, evaluating hypothesis tests, and
so forth.
Axioma
19
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
Figure 10: Log of Market Capitalisation versus Residual Volatility of US Assets, Dec 2016
5.2
Solving the Problem of Heteroskedasticity
Traditionally, one corrects for this phenomenon by scaling each asset’s residual by the inverse of its
residual variance, transforming the above into a weighted least-squares (WLS) problem:
W 1/2 r = W 1/2 Bf + W 1/2 u
n
X
u2i
fwls = arg min
f
σ2
i=1 i
where
W 1/2
 −1
σ1

=
0
0
..
.
σn−1



The solution is easily shown to be
fˆwls = B T W B
−1
BT W r
The challenge lies in estimating the residual variances, σi2 . One could calculate these directly from
historical data, but such estimations are noisy and require sufficient history for each asset. As a
proxy for the inverse residual variance, Axioma fundamental models use the square-root of each
asset’s market capitalization.
This proxy is intuitive - larger stocks tend to exhibit less volatile returns - robust, and simple to
compute. Tried and tested over many years, it is accepted by the industry as a standard. Further,
Axioma
20
Axioma RobustTM Risk Model Version 4 Handbook
extensive testing using the inverse residual variance as a weight has singularly failed to show any
significant improvement in the model thus obtained.
There is an additional, practical, reason for adopting the square-root of capitalization as a
weighting scheme: it “tunes” the regression estimates in favor of larger assets. Large, liquid assets
constitute the bulk of most institutional investors’ universes, so there is a compelling case for
modeling these assets most accurately, sometimes even at the expense of smaller, less visible, assets.
5.3
Outliers in the Regression
DRAFT
Data frequently contain extreme values, or outliers, arising from outright errors in the data collection process, poor or unrepresentative sampling, or genuinely aberrant behavior. In particular,
distributions of asset returns are known to exhibit “fat-tails” — large numbers of observations in
the outer edges of the distribution, most likely attributable to economic shocks, poor liquidity, etc.
Because least-squares attempts to minimize the sum of squared residuals, outlier returns produce
large residuals that have a disproportionate effect on the solution, pulling it away from the “true”
solution.
As an example, consider the regression of China market returns against Hong Kong market
returns over a period of a year. Figure 11 shows the scatter plot of these two sets of returns over
the course of 2014. We see a very significant relationship - a beta of 0.62 and an r-square of 57%.
Figure 11: China versus Hong Kong Market Returns
Then, we introduce a single, albeit very large, outlier to the same data, keeping all other returns
identical (Figure 12). Our beta of 0.63 has become a beta of 8.1, with a negligible r-square of 1.4%.
Clearly, outliers can have a severely adverse affect, both to the solution, and the numerical certainty
around it.
And financial data are replete with outliers, legitimate and illegitimate, which obscure the
genuine signals one is trying to trace. A simplistic solution may be to exclude or truncate all
observations outside certain pre-determined bounds. These bounds can either be static values (e.g.
exclude returns above 200% ) or statistical-based (e.g. exclude returns beyond 3 standard deviations
from the mean).
Axioma
21
Axioma RobustTM Risk Model Version 4 Handbook
Figure 12: China versus Hong Kong Market Returns plus Outlier
DRAFT
Hard-coded figures are inadvisable as they take no account of changing market conditions over
time and the resulting changes in the distribution of the returns: an extreme return during an
uneventful period may be unremarkable during a market boom or bust. Actual use of the standard
deviation as a measure is not robust either, as this itself is very sensitive to outliers.
Further, it is not the total returns in which we wish to look for outliers, but the residual returns.
Imagine that all assets in a particular industry exhibit an average 20% return one day. From the
perspective of total returns over the entire market, these returns would all appear to be outliers
and would likely be truncated. However, it is obvious that these 20% returns should actually find
their way into the relevant industry return, and the residual returns for these assets would be
much smaller. We therefore need to identify outliers in the returns, after they have already been
de-meaned by the regression. Something of a chicken and egg situation.
Axioma models uses robust statistical methods to reduce the effect of outliers, described in
the section below. These iterate round a series of weighted regressions, identifying outliers in the
residuals at each stage, and downweighting these observations until some convergence criterion is
reached.
5.4
Robust Regression
A comprehensive treatment of robust regression is far beyond the scope of this discussion 7 ; rather,
focus will be limited to the techniques Axioma uses in its model construction. The weapon of choice
is a Huber M-Estimator (where M stands for Maximum-Likelihood ).
Whereas the weighted least-squares regression seeks to minimize the sum of squared residuals
fwls = arg min
f
7
n
X
wi u2i
i=1
An thorough introduction to robust regression can be found in Fox (2002).
Axioma
22
Axioma RobustTM Risk Model Version 4 Handbook
a robust regression estimator attempts to minimize the objective function
frobust = arg min
f
n
X
ρ(ui ) = arg min
f
i=1
n
X
ρ(ri − bTi f )
i=1
The function ρ fulfills the following criteria for all a, b:
• ρ(a) ≥ 0
• ρ(0) = 0
• ρ(a) = ρ(−a)
• ρ(a) ≥ ρ(b)f or|a| > |b|
Denoting the derivative of ρ with respect to f as ρ0 = ϕ(a), the minimization program has the
first-order conditions
n
X
ϕ(ri − bTi f )bTi = 0
i=1
Defining the weight function w(a) as w(a) = ϕ(a)/a, and writing w(ui ) = wi , the above becomes
n
X
wi (ri − bTi f )bTi = 0
DRAFT
i=1
which can be solved via iteratively re-weighted least-squares:
1. Using ordinary least-squares, solve the transformed system
r̂ = B̂f + û
where r̂ = W 1/2 r and B̂ = W 1/2 B, obtaining an initial estimate f 0 .
2. For each iteration j, calculate residuals uj−1
and weights wij−1 = w(uj−1
).
i
i
3. Update the weighted least squares estimator
f j = B T W j−1 B
−1
B T W j−1 r
4. Repeat steps 2 and 3 until successive estimates converge.
There are many different possibilities for the function ρ. As stated earlier, Axioma models use
the Huber function:
(
1 2
u
if |ui | ≤ k
ρ(ui ) = 2 i
1 2
k|ui | − 2 k if |ui | > k
with k = 1.345σ where σ is the standard deviation of the residual.
Returning to the example in the previous section, robust regression gives the results shown in
Figure 13. Robust regression has salvaged our solution by downweighting our artificial outlier: most
would agree that a beta of 0.62 is somewhat closer to the “true” value of 0.63 than is 8.1
Axioma
23
Axioma RobustTM Risk Model Version 4 Handbook
Figure 13: China versus Hong Kong Market Returns plus Outlier with Robust Regression
5.5
Regression statistics
DRAFT
There are many important statistics and metrics associated with the regression and other parts of
the factor model, such as r-squares, VIFs, t-statistics etc. Rather than vastly inflate the current
document by detailing them here, we discuss them in Axioma (2017c). This document details many
of the many ways in which we evaluate, not just the factor returns model, but the entire risk model
itself.
5.6
Thin Factors
Thinness presents a very real problem for factor returns estimation. To take the example of industry
factors, an industry factor return is predominantly a weighted average of asset returns within the
industry. With sufficient assets, the specific return is averaged out, leaving something (hopefully)
close to the true factor return. If there are very few stocks, however, a large element of specific
return will likely remain, over-estimating the industry factor return and inducing correlations in the
specific returns. To see why, consider a simple model with two assets in industry k, equally-weighted
for the sake of clarity:
r1 = fk + u1
r2 = fk + u2
This has solution fk = 12 (r1 + r2 ) and residuals u1 = −u2 , with perfect negative correlation between
the two assets’ specific returns. In general, if there are n mega-cap assets within an industry, all
with similar weights, the average correlation across their specific returns will be of the order of
ρ = −1/(n − 1).
As stated previously, uncorrelated specific returns constitute a major assumption of the factor
risk model. If this condition is violated, portfolio specific risk may be over-estimated, because the
specific return correlations are not taken into account.
Axioma
24
Axioma RobustTM Risk Model Version 4 Handbook
Because high concentrations of industry capitalization among a few assets is a common phenomenon in developed markets, the number of stocks in an industry is an inadequate measure of
thinness. Axioma models rely on the “Herfindahl Index”, an economic measure of the size of firms
relative to their industry, frequently invoked in antitrust applications:
X
HI =
wj2
j
1
where wj is stock j’s weight within the industry. HI
then represents the “effective” number of
stocks within the industry. An industry with 100 assets but 99% of its capitalization in one asset
has a Herfindahl Index of 0.9801, and effective number of stocks equal to 1.0203. It is trivial to see
that for an industry where every asset has equal weight, the effective and actual numbers of stocks
are exactly equal.
Note that we talk here of thin industries, but the same problem can manifest in country factors
and even, in rare circumstances, in style factors. In general, any factor where most of the regression
weight is concentrated in a few assets, may qualify as thin. As a general rule of thumb, we find
empirically that an effective number of ten assets represents the boundary between what may be
considered thin and what is “sufficiently fat”.
5.6.1
Treatment of Thin Factors
DRAFT
Thin industries can often be avoided by merging similar industries within the same sector with
closely correlated returns, or where one has negligible weight relative to the other. This approach,
however, is not always possible for industries; as market structure evolves, a thin industry may
become more populated as time goes by, or vice versa.
For thin countries, merging factors is even more problematic; if Belgium is thin within a European model, for instance, would we really want to merge it with France for instance? What would
Belgian or French users of the model have to say about that? One suspects they would have rather
strongly-held opinions on the matter. So thin factors are sometimes unavoidable, and we must find
some means of dealing with them.
Axioma models introduce a dummy asset within each thin factor, with a suitable prior return 8
unit exposure to the relevant factor, and the market intercept if relevant, and neutral to all other
factors. Its inclusion will, therefore, directly affect only the thin factor’s return estimate. In order
to reduce the impact of the other assets within the thin factor, this asset’s regression weight is
defined as
(
s4j −φ4 wj
(φ − 1) 1−φ
if HIj−1 < φ
4 · s
j
wdummy,j =
0
if HIj−1 > φ
where
• sj = HIj−1 is the reciprocal of the Herfindahl Index,
• wj is the total regression weight of all assets in the industry, and
• φ is the effective number of assets below which an industry is deemed “thin”.
This functional form ensures that the dummy asset’s weight decreases slowly as the effective number
of assets increases from 1, but decays rapidly as the effective number approaches the limit φ. As
mentioned before, empirical results indicate that φ = 10 is a safe threshold. Beyond this number
8
Depending on the factor, this prior return may be the market return, or a sector or regional return
Axioma
25
Axioma RobustTM Risk Model Version 4 Handbook
returns diversify sufficiently, but this will also depend on the particular market. If ten proves to be
impossibly high, then it is possible to reduce φ as low as six without serious adverse effects.
Clearly, the thinner the factor, the greater the weight of the dummy asset. This pulls the factor
return estimate closer to the prior return as the industry becomes thinner, reflecting a decreasing
certainty over the true factor return in the face of insufficient observations. The purpose, therefore,
is not to try and guess what the “true” factor return should have been, but to steer the estimated
factor return towards a “safe” estimate and prevent its pollution by large amounts of specific return.
Applying this correction to the earlier example, the new factor returns are thus
w
w
1
dummy
dummy
f˜k =
rprior + 1 −
(r1 + r2 )
wk
wk
2
with specific returns
wdummy 1
˜
(r1 + r2 ) − fk
ũ1 = u1 +
wk
2
wdummy 1
˜
ũ2 = −u1 +
(r1 + r2 ) − fk
wk
2
and we see that the perfect negative correlation in the specific returns has been disrupted.
5.7
The Problem of Multicollinearity: Binary Factors
DRAFT
If the regression contains two or more of the following:
• A market intercept
• Country binary exposures
• Industry binary exposures
then as it stands it will fail to find a unique solution, as the exposure matrix B is singular. To see
why this is so, consider each asset’s return. Ignoring style factors and specific return, it may be
written as
ri = 1 · fM + 1 · fIj + 1 · fCk
where fM is the market intercept return, fIj the asset’s industry return, and fCk its country return.
We may add an arbitrary amount to any one of these returns provided an equal, but negative,
amount is added to one of the other returns, and the overall model fit will be unchanged. Thus
there are an infinite number of solutions, all of which appear equally “good”. This is the issue of
multicollinearity or linear dependence between different sets of binary factors.
A unique solution can be obtained if, for additional set of binary factors, a constraint is imposed
on the regression. In theory, anything that will resolve the non-uniqueness will do. In practice, a
common choice for researchers is to force one or more sets of factor returns to sum to zero. Assume
henceforth that we have a market intercept, binary industries and binary countries, one could add
the constraints
λ
P
X
wIj fIj = 0
j=1
ρ
Q
X
wCj fCj = 0
j=1
Axioma
26
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
where wIj is the market capitalization of industry j, and wCj the market capitalization of country
j. The effect of this is to “force” the broad market return into the market intercept factor, while
the country and industry factors represent the residual behaviour of each, net of the overall market.
Since the style factor exposures are standardised about a market-capitalisation weighted mean of
zero, if one were to compute the return of a market-capitalisation weighted broad market benchmark,
one would find that its return was almost entirely explained by the market factor: the industry
and country factor return contributions are made zero by the constraints, and the style factor
contribution is made zero by the exposure standardisation. The only other contribution would
come from the specific return, and for a sufficiently-diversified benchmark, this should be small in
magnitude. Figure 14 illustrates the above point using the global model. The plot shows both the
Figure 14: Global Benchmark return versus WW model market factor return
total return to a typical global benchmark, and the market factor return from the Axioma WW4
model. The returns are all rolling one-year cumulative returns. We see that the two are very close
throughout the history with a correlation of 97%.
By contrast, Figure 15 shows a typical US benchmark return versus the US factor return from
the same WW4 model. The returns are now quite dissimilar, with a lower correlation of 68%.
These constraints may be added to the regression as dummy assets and the regression performed
in the normal fashion. In the Appendix we demonstrate that under easily-met conditions the solution
asymptotically approaches the “true” solution.
An alternative approach is to use a multi-stage regression. For example, one could first regress
the asset returns against the market factor,
r = XM fM + u
then regress the residual u against the industry exposures XI :
u = XI fI + v
Axioma
27
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
Figure 15: US Benchmark return versus WW model US factor return
then regress the residuals from this against the country exposures XC :
v = XC fC + w
The factor returns can then be collected together, with the specific return being the final residual,
w. This approach is useful when one wants a particular set of factors to extract as much power as
possible from the returns, independent of the remaining factors, which will often tend to be much
weaker than if they were all included in one regression. In addition to a more cumbersome algorithm,
however, analysis of the results becomes more complicated: different regressions may have different
weights, and their results cannot easily be meaningfully combined. This makes comparisons difficult.
Finally, if one is not careful with the weights, it is possible to induce spurious structure into the
regressions and hence the residuals.
For all those reasons, we use the constrained, single-regression approach in our models, reserving
the residual regression technique for certain specific cases. One such case is the ’Domestic China’
factor in many regional models. Because the “International” China market (H-share, Redchips
etc.) behaves quite distinctly from the “Local” China market (A-shares and B-shares), any regional
model covering the China market requires two “China factors”:
• The main China Market factor is exposed to all China-denominated stocks,
• The Domestic China factor is exposed only to the A and B shares.
• The China Market factor is estimated in the main regression, using the international China
assets.
• The Domestic China factor is estimated in a secondary regression, using the China A and B
shares, and the residuals from the main regression.
Axioma
28
Axioma RobustTM Risk Model Version 4 Handbook
The Domestic China factor therefore captures the difference in behaviour of the local China market.
Should the two markets converge over time, we would expect to see the significance of this factor
steadily diminish.
5.8
The Problem of Multicollinearity: Style Factors
Multicolinearity, or linear dependence, can also be an issue amongst style factors. While it is
unlikely that any two styles will yield identical exposures, it may certainly be the case that two or
more styles9 will look sufficiently alike that this may cause issues of numerical stability, manifesting
in poor t-statistics for the afflicted factors and high VIFs (variance inflation numbers).
While the best approach is simply to avoid factor definitions that lead to high levels of multicolinearity, sometimes this cannot be avoided. A good example is the definition of Volatility, which
is often very highly correlated with beta, or Market Sensitivity. We could simply drop one factor,
such as Volatility, but there is still useful information in this factor which is not reflected in Market
Sensitivity. Combining them is another possibility, but again, we are potentially damping asset
characteristics that we wish to measure. Our customary approach is to orthogonalise the exposures
of one relative to the other. For instance, in the case of market sensitivity and volatility, we perform
the cross-sectional regression:
xvol = α + βxms + ,
DRAFT
where xvol is the original (standardized) volatility exposure and xms is the standardised market
sensitivity factor. We may also weight the regression using, for instance, the square root of market
capitalisation, in order to maintain consistency with the factor regression.
Our new cross section of volatility exposures is thus given by the values of . These now
represent the volatility of each asset net of volatility induced by exposure to the market. Numerically,
we see from Figures 16 and 17 that the t-statistics for market sensitivity are greatly increased
by the reduction of uncertainty surrounding the estimate, while its factor return becomes more
negative and, in fact, more like that of volatility. In the extreme case where we had only these two
factors in the model, it can be shown that their factor returns would, in fact, become equal after
orthogonalisation.
5.9
Currency Factors
Currency factors are rather different to other factors in that they do not feature in the returns
regression. Rather, each asset has its return calculated using its local currency, ensuring that
currency effects are (as far as is possible) eliminated from the factor regression. All non-currency
factor returns are then computed via the regression, whilst currency returns are calculated directly
from exchange and risk-free rates.
Because the final risk model must be built from the viewpoint of a single numeraire currency, it
is necessary to convert our returns model to this numeraire currency. This is done per asset as
rN = rX + cXN ,
where
• N is the model numeraire currency,
• X is the asset’s local currency, i.e. that in which it is quoted,
9
or a linear combination thereof
Axioma
29
DRAFT
Axioma RobustTM Risk Model Version 4 Handbook
Figure 16: T-Statistics for Market Sensitivity and Volatility
Figure 17: Factor Returns for Market Sensitivity and Volatility
Axioma
30
Axioma RobustTM Risk Model Version 4 Handbook
• rX is the asset’s local return,
• cXN is the return in currency N for an investor buying currency X.
This currency return, and associated risk, represents the risk to the investor of buying and selling
something in a currency foreign to him/her. Suppose you area US-based investor who wishes to
purchase a Euro-denominated stock. The following artificial example illustrates what the currency
return is:
• The stock is valued at 1 euro.
• The Euro/USD exchange rate is 1 (1 USD will buy 1 Euro).
• You purchase 100 Euros using 100 USD.
• You buy 100 shares.
• Next day, the price of the asset is still 100 Euros, but the Euro/USD exchange rate is 0.5 (1
USD will buy 0.5 Euros).
• You sell your shares for 100 Euros.
• You convert your 100 Euros back to 200 USD.
DRAFT
Thus, independently of the asset and its local price, the fluctuation in exchange rates leads to a
return and risk due to the currencies. In the above simple example, you have made a positive return
from your investment entirely due to these currency movements.
Suppose we have a set of m currencies, 1, 2, ..., m. Then we define the following:
• pi,t is the asset price at time t in currency i,
• ri : the asset return in local currency i from period t − 1 to t,
• rif : the risk free rate for currency i,
• xij,t is the amount of currency j which one unit of currency i fetches at time t,
x is the return for an investor whose numeraire is currency j in buying currency i.
• rij
We therefore have:
pi,t
−1
pi,t−1
xij,t
x
rij
=
−1
xij,t−1
ri =
pj,t = pi,t xij,t
We now define what we mean by a currency return. The asset’s total return from a currency j
perspective is expressed exactly as
rj =
pi,t xij,t
−1
pi,t−1 xij,t−1
This may be expressed more compactly as
x
rj = (1 + ri )(1 + rij
)−1
Axioma
31
Axioma RobustTM Risk Model Version 4 Handbook
If we consider log returns, the above may be written as
x
rj ≈ ri + rij
.
This is exact for log returns, and should provide a good approximation for returns over a short
duration (e.g. daily returns). The above is written in terms of total returns. If we consider excess
returns we may write
x
rj − rjf ≈ ri − rif + rij
+ rif − rjf .
We have therefore expressed the excess return from the perspective of currency j as the sum of the
local excess return (currency i), and that which we define to be the currency factor return, viz.
x
cij = rij
+ rif − rjf ,
where cij represents the excess return to currency i expressed in terms of currency j. Note that
when i = j, cij ≡ 0.
In our returns model, therefore we simply append the currency factors to the other model factors.
Every asset is exposed to a single currency (in most cases, that of the market on which it trades),
with an exposure of one, with zero exposure to all other currencies.
If we write our original (non-currency) returns model as
r = Bf + u
DRAFT
Then our final numeraire returns model may be written as
f
N
r = B BC
fC
where
• BC is our currency exposure matrix, and
• fC is the vector of returns for each currency in the model relative to the numeraire, N .
For a more complete treatment of the modelling of currencies see Axioma (2017a).
5.10
Factor Mimicking Portfolios
The solution to the weighted least-squares regression has a very useful property that we will here
describe. Given our basic factor model
r = Xf + u,
and a diagonal matrix of weights W , we know that the least-squares solution, i.e. the set of factor
returns, is given by
−1 T
f = XT W X
X W r.
Looking at the above, we see that each factor return is expressed
−1 T as a weighted sum of the returns,
where the weights are the rows of the matrix X T W X
X W . Thus, each row of this matrix
corresponds to a portfolio of asset weights such that, if we take the cross-product of the ith row
with the exact set of returns used in the factor regression, we retrieve the ith factor return.
Further, if we observe that
−1 T
XT W X
X W X = I,
Axioma
32
Axioma RobustTM Risk Model Version 4 Handbook
then we deduce that, additionally, the ith row has unit exposure to the ith factor, and zero exposure
to all other factors. Such a row is known as a factor mimicking portfolio (FMP), and we can neatly
obtain the FMPs for each factor in the model from the matrix above.
The FMPs, therefore, provide a simple way of reconstructing the operation of the factor model,
without having to perform any regressions, or any of the other complex processing that goes on in
the model. They also provide a simple way of constructing ones own factor returns, for a bespoke
set of asset returns. They thus provide all the functionality of the factor model in the simple form
of a set of portfolio weights.
5.10.1
Constraints in the regression
If the model contains regression constraints and/or thin factor corrections, then the form of the
FMP becomes a little more complex, as these have the effect of implicitly altering the exposure
matrix.
With a slight change of notation, write the model regression thus:
r1 = X1 f + u,
with constraints on the factor returns, which may be written as:
0 = X2 f,
DRAFT
and dummy assets due to thin factors, which may be expressed thus:
r3 = X3 f.
Assume that the thin factor dummy returns, r3 may be written as the weighted average of the asset
returns r1 , i.e.
r3 = Hr1 .
Then, given appropriate weights for each of the above, the entire regression can be written as the
solution to
 1
 1


 
 
W12
W12
I
X1




1
1








2
2
W2
W2

 0 r1 = 
 X2 f
1
1
H
X3
W32
W32
The least-squares solution to this is given by

f =  X1T
X2T

W1
X3T 
W2
  −1
X1
T
 X2 
X1
W3
X3
X2T

W1
X3T 
W2
 
I
  0  r1 .
W3
H
This can be simplified to
f = X1T W1 X1 + X2T W2 X2 + X3T W3 X3
−1
X1T W1 + X3T W3 H r.
The beauty is that, given the assumption that the dummy asset constraints are either zero or a sum
of the asset returns, all of the thin factor and constraint information may be included in the FMP,
but only the original asset returns are required to retrieve the original factor returns.
Note, however, that the FMPs no longer have their unit/zero exposure structure any more,
since the actual regression being solved is not the original (unconstrained) factor model. If we
Axioma
33
Axioma RobustTM Risk Model Version 4 Handbook
construct a new exposure matrix that contains the constraints and thin-factor corrections, the
exposure properties of the FMPs hold for this matrix.
We construct a simple example to illustrate. Suppose we have a model with just two assets, and
three factors - a market and two industries. Its returns model will look thus:
 
f1
r1
1 1 0  
f2 + u,
=
r2
1 0 1
f3
with constraint f2 + f3 = 0. For simplicity, we have assumed equal weights everywhere. The
regression equation we wish to solve is:
  
 
r1
1 1 0 f1
r2  = 1 0 1 f2  .
0
0 1 1 f3
Some matrix algebra yields the solution as




 
1
1 r1 + r2
f1
f2  = 1  1 −1 r1 = 1 r1 − r2  .
r2
2
2
−1 1
r2 − r1
f3
DRAFT
Thus, the market factor return is given as the average of the two asset returns, while the industry
factor returns are computed from the differences between the two asset returns and are perfectly
negatively correlated. Note that the FMP for the market factor has non-zero exposure to all factors,
owing to the effect of the constraint.
Axioma
34
Axioma RobustTM Risk Model Version 4 Handbook
6
Common Model Issues
Before moving on to statistical models and beyond, we here take some time to go over some practical
issues that arise in modelling, some as a consequence of the model structure itself, some as a
consequence of the data used.
6.1
Model Error: specification versus noise
By taking a small amount of mathematical liberty, it is reasonable to argue that there are, essentially,
two types of error in risk models:
1. Estimation Error, or, more simply noise. This is the error incurred by trying to estimate
model components, such as betas in the face of noisy data, sampling error, illiquidity, data
errors and so on.
2. Specification Error. This is the error incurred by imposing a particular structure onto a set
of asset returns, that may be more or less appropriate.
DRAFT
The former error tends to manifest in noisy estimates and general uncertainty. The latter tends to
manifest as persistent bias.
We consider the first of these and use the example of attempting to estimate an asset’s beta
(sensitivity to a market return). While perfectly straightforward on paper (it is merely the timeseries regression coefficient of the history of asset returns versus that of the market), there are so
many parameters that affect our estimate, for instance:
• What length history do we use?
• What frequency of data do we use?
• How do we treat outliers?
• Do we wish to weight the data in any way?
• Do we need to correct for autocorrelation?
All of these choices affect our estimate, giving the lie to the notion that anybody can claim, definitively, that their beta is the one true beta.
To take one of these points, we consider the history length used. Figure 18 shows estimates for
the beta using a large, liquid, otherwise well-behaved US asset versus the European market, using
estimation horizons of one and two years (rolling daily data) respectively. The instability of the
estimated beta is evident, particularly the sharp downwards drop in 2007. Time series regressions
are very sensitive to the data used.
Similarly, if we keep the estimation window constant, but vary the frequency of the data, we
get Figure 19. Thus, the answer to the question, what is this asset’s beta, relative to the relevant
market would appear to be: “What would you like it to be?”.
And then, there’s specification error. Every model imposes structure onto asset returns. In the
case of a fundamental model, this structure is determined by us in advance, largely independently
of the underlying returns-generating mechanism.
To see what may go wrong, consider an extreme case: an asset whose returns are always,
genuinely, zero. It is obvious that the best sort of model to fit this is a model with no factors.
Axioma
35
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
Figure 18: Rolling Historical Beta for Ticker BERK: Varying horizons
Figure 19: Rolling Historical Beta for Ticker BERK: Varying frequencies
Axioma
36
Axioma RobustTM Risk Model Version 4 Handbook
However, since we impose a structure on the asset returns, we get the following decomposition of
return for the asset:
X
0=
bi fi (t) + u(t).
i
DRAFT
It is highly unlikely that the common factor portion of return will be zero, or close to zero, all of the
time, unless we have constructed the exposures also to be zero. Instead, we have a wholly spurious
set of factor returns imposed onto the asset.
Worse still, it is obvious that the specific returns will always be exactly minus the common
factor returns, i.e. the two sets of returns are perfectly negatively correlated. This will cause
massive error in any risk measure for this asset owing to the assumption that specific returns are
generally uncorrelated with anything else over time. In cases such as this, they obviously are not,
but we throw the information away by the assumption of a perfectly diagonal specific risk matrix
and no correlation between factor and specific returns.
Where we use binary factors for the market or country exposures, assets with historically low
betas will be at particular risk here. If we impose a value of one on an asset’s factor exposure, but
its estimated beta to that factor (insofar as can be determined - see above) is obviously significantly
different from one, then it stands a distinct chance of being “misspecified” in the model.
To illustrate this, we run a couple of studies on a set of “good” assets versus a set of “bad” assets.
We define our good assets as those whose predicted (model) beta and historical (time series) beta
generally agree over time. Figure 20 shows the distribution of correlations between the common
Figure 20: Factor/specific return correlation distribution for selected assets
factor and specific return histories (using the Axioma global fundamental model) of two equal-sized
samples of assets. Those denoted “good” are those whose predicted beta are close to the realised
beta on average. Those labelled “bad” exhibit significant and persistent differences in predicted
versus realised beta.
We see that the correlations for the “bad” assets tend to be skewed towards the negative, whilst
the “good” assets are rather more evenly distributed. This negative correlation is a likely indication
Axioma
37
Axioma RobustTM Risk Model Version 4 Handbook
Figure 21: US Daily Market Returns for 2008
DRAFT
of the assets being less well specified by the model structure. Overall, we see the following for the
correlations distributions:
• Good assets: mean: 0.005, skew: -0.018,
• Bad assets: mean: -0.111, skew: -0.850.
This type of error, due to inappropriate factor structure, is known as specification error. It is
present to a greater or lesser extent in all such structured risk models, and as we said earlier, will
tend to manifest as bias in risk estimates and other metrics.
6.2
Extreme Data
As mentioned earlier, when it comes to identification of extreme data points, or outliers, the median
absolute deviation, or MAD, is our standard tool. This is defined as
M AD = medkxi − med(x)k,
for a set of data xi , i = 1, . . . , n.
This is preferable to the two obvious alternatives: fixed bounds, or the standard deviation.
The former relies on the distribution of the data and its associated properties remaining fixed over
time. Consider Figure 21. Here we see the distribution of daily US market returns over the year
of 2008, divided into two halves: January to June, and July to December. It is clear that any
fixed bounds based on the behaviour of the returns in the first half of the year are going to differ
greatly from those that may be derived from the second half. Thus, had we tuned our model based
on observations from earlier than mid-2008, we would very likely identify far more outliers than is
warranted after that period. Obviously a more dynamic measure of asset dispersion is warranted.
Axioma
38
Axioma RobustTM Risk Model Version 4 Handbook
Figure 22: AUD Currency Returns and the 3 Standard Deviation Bounds
DRAFT
The other obvious candidate, the standard deviation, suffers from its non-robustness. It is far
too easily affected by outliers, having, in the technical parlance, a breakdown point of 0%.
Figure 22 shows a six month distribution of recent AUD currency returns, along with the ±3
standard deviation bounds. There are two obvious data points that fall outside the bounds and
could be considered as outliers. It is likely that we would wish to clip or exclude these.
All well and good. But if we artificially stir things up, by the addition of a single, very large
outlier of 10%, we get the situation shown in Figure 23. This one change has inflated the standard
deviation bounds so that, while our artificial outlier does actually lie outside the bounds, our two
previous outliers are now “inliers”.
If we perform the same analysis again, but using 4.5 MADs in place of three standard deviations
(which, under assumptions of normality is roughly equivalent), then we get Figure 24. We see here
that, using MAD, our original outliers remain outliers, unlike the standard deviation method.
6.3
Missing Data
Missing data, especially returns, are a constant problem in daily risk models. Returns may be
missing for one of three major reasons:
1. Illiquidity, i.e. an asset does not trade sufficiently regularly to yield a daily return history,
2. Non-trading day, which is really just a day on which all assets in a market become 100%
illiquid, and
3. IPOs - an asset is newly listed, so no historical data exists.
While in some cases, such as a company with multiple-listings, it may be possible to replace an
illiquid asset with a more liquid sibling, in general we do not have that luxury and must find ways
to deal with illiquidity. We give two examples below where missing data of one form or another
may be a problem.
Axioma
39
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
Figure 23: AUD Currency Returns plus Outlier and the 3 Standard Deviation Bounds
Figure 24: AUD Currency Returns plus Outlier and 4.5 MAD Bounds
Axioma
40
Axioma RobustTM Risk Model Version 4 Handbook
6.3.1
A non-trading day problem
Figure 25 shows market returns for the Canadian and US markets during October, 2008. Note that
Figure 25: Market returns: CA and USA
DRAFT
the highlighted US return on October 13th, as well as being very large, occurs on a non-trading
day for Canada. Then note the following day, where both markets trade: the US market return is
insignificant, while the Canada market return is of a similar magnitude to the previous day’s US
return.
The rolling correlation between the two markets over the relevant period is shown in Figure 26.
Note how the correlation suddenly drops when this non-trading day occurs. It scarcely needs to be
Figure 26: Market return Correlations: CA and USA
said that the non-trading day in Canada is causing a temporary lag effect, and that Canada on the
14th is responding to the US market return on the 13th. Thus, thanks to the holiday in Canada,
Axioma
41
Axioma RobustTM Risk Model Version 4 Handbook
the two markets appear to be less integrated than is truly the case.
So, at both an asset level, and a market level, illiquidity can cause us to underestimate correlations, and hence misspecify our models.
6.3.2
An illiquid asset problem
DRAFT
In addition to underestimation of the strength of relationships between liquid and less liquid assets,
there is the difficulty inherent in computing exposures or other measures based upon illiquid data.
This problem was perfectly illustrated in 2015 when large numbers of Chinese stocks were suspended
for weeks or even months at a time.
The fact that large numbers of assets suddenly began showing missing returns is problematic for
any measure based upon returns. One such is the style factor Volatility, which depends on a history
(usually three to six months) of asset returns. If an asset’s missing returns are merely treated as
zero, then its volatility score will start to become very low.
Figure 27: Chinese Asset Volatility Exposures
Figure 27 shows the Volatility exposures in an older model for two randomly-selected suspended
Chinese stocks. It is clear to see at which point they are suspended, as the Volatility level drops
enormously, owing to the missing asset returns, which are treated here as zero returns.
6.4
Solutions
The problems of missing data (and extreme values) are legion, and Axioma models adopt a host of
solutions to these depending on the context. These include:
• Computing of imputed or proxy returns to fill in missing values in a time-series of asset returns
for use in exposures, statistical models etc.
• Shrinkage of exposures for IPOs towards a mean value.
Axioma
42
Axioma RobustTM Risk Model Version 4 Handbook
• Imputing non-trading day market returns via a VAR model for use in returns-timing, crosssectional regression etc.
• Cross-sectional replacements for missing fundamental data-based exposures.
• Shrinkage of specific risk estimates for assets with insufficient data.
DRAFT
Some of these have already been touched upon, some will be discussed later in this document. All
of these issues in this and the previous section on extreme values are discussed far more fully in the
accompanying document to this one: Axioma (2017b).
Axioma
43
Axioma RobustTM Risk Model Version 4 Handbook
7
Statistical Approaches
Statistical factor models typically deduce the appropriate factor structure by analyzing the sample
asset returns covariance matrix. There is no need to pre-define factors and compute exposures, as
required by fundamental factor models. The only inputs are a time-series of asset returns and the
number of desired factors. There are no concrete rules specifying the appropriate number of factors.
One can, for example, use a Scree plot to assess the variance explained by each additional factor.
Factor analysis is used to estimate the factor exposures B and factor returns f . Principal
components and maximum likelihood estimation are two popular methods of parameter estimation.
These, as well as others, are explained in detail in Johnson and Wichern (1998). Axioma risk
models involving statistical factors typically employ a variation of principal components, described
in detail below. A comparison of fundamental versus statistical models, and combinations thereof,
is presented in Connor (1995).
7.1
Principal Components
Principal components analysis (PCA) determines factors by eigendecomposition of the observed
asset returns covariance matrix. Given an n × t matrix R of historical asset returns,
Q̂ =
RRT
= U DU T
t
Only the largest m eigenvalues and corresponding eigenvectors are kept, hence
DRAFT
T
Q̂ = Um Dm Um
+ ∆2
where ∆2 is the n × n diagonal matrix of specific variances.
1
2
The exposure matrix B is taken to be Um Dm
. Factor and specific returns can then be computed
by regressing, for each asset, historical returns against its exposures using ordinary or weighted
least-squares:
F = BT W B
−1
BT W R
Γ = R − BF
Unlike cross-sectional regression, the matrices (not vectors) F and Γ (m × t and n × t, respectively),
contain the entire time-series of factor and specific returns, and will subsequently be used to compute
the factor covariance matrix and specific variances.
Additionally, one could try to perform the above estimation using only a subset of assets, both
for computational efficiency and to avoid noisy historical data. Exposures and specific returns for
assets outside this universe can be backed-out via ordinary least-squares. For each asset i,
ri,t =
m
X
bi,j fj,t + ui,t
j=1
bi = F F T
 T
bi
 .. 
B= . 
−1
F ri
bTn
Axioma
44
Axioma RobustTM Risk Model Version 4 Handbook
7.2
Asymptotic Principal Components
PCA requires that the number of assets n be smaller than the number of time periods t, in order for
Q̂ to be reliably estimated. In most capital markets, the number of assets far exceeds the number
of time periods for which data are available, especially if returns are measured in weekly or longer
intervals. If daily data were used to model the U.S. stock market, for example, one would require
well over 10,000 data points — over 40 years of data! Even if such data were available, the resulting
covariance matrix would be poorly estimated as noise and shocks from long in the past continue to
influence current estimates.
To address this inherent shortcoming of PCA, Axioma adopted the following 10 : instead of
working with the n × n covariance matrix Q̂ = 1t RRT , one can shift the analysis from n- to t-space,
and use the t × t covariance matrix Q̃ = n1 RT R. The raw input data, R, remain unchanged. The
process begins with eigendecomposition of the covariance matrix Q̃:
T
Q̃ = Ũ D̃Ũ T = Ũm D̃m Ũm
+Φ
T . Choosing the
Because there is an infinity of solutions to R = BF + Γ, one can choose F = Ũm
eigenvectors (right singular vectors, if using singular value decomposition) is convenient because
they provide orthogonal factors. Regressing asset returns against factor returns produces B:
BT = F F T
−1
F RT = F RT
B = RŨm
DRAFT
At this stage, the factor returns and exposure estimates F and B are discarded, keeping only the
specific returns Γ, which are used to estimate a diagonal matrix of asset specific variances:
1
∆2 = diag ΓΓT
t
The resulting specific risks are then used to scale each asset’s returns. This is conceptually analogous
to weighting assets by inverse residual variance in the earlier regression example, whereby the
influence of more volatile assets is reduced:
R∗ = ∆−1 R
The scaled returns are then used to compute a new t × t covariance matrix, which will undergo the
eigendecomposition routine again:
1 ∗T ∗
R R
n
∗ ∗
∗T
Q̃∗ = Um
Dm Um
+ Φ̃
Q̃∗ =
At this point, we construct new specific risks, rescale our returns and iterate, doing so until successive
∗T are chosen as the final
estimates for specific risk are sufficiently close. The top m eigenvectors Um
estimates for the factor returns history. The final exposures B are calculated by regressing these
against the unscaled asset returns R.
The above estimation can be carried out on a subset of assets to prevent noisy returns from
contaminating the eigenstructure analysis. The procedure for doing so and recovering the exposures
and specific returns for the remaining assets is similar to the procedure outlined for PCA.
10
This novel approach was introduced by Connor and Korajczyk (1988), specifically to address the inherent problem
of applying PCA to financial markets, where assets typically outnumber time periods.
Axioma
45
Axioma RobustTM Risk Model Version 4 Handbook
7.3
Expanded History
In constructing the statistical models, we typically use a history length of one year of returns for
the APCA step. Empirically, we find that this yields a happy medium between responsiveness and
stability. One year of asset returns obviously leads to a one-year history of factor returns, which can
then be used to construct the factor covariance matrix. However, the fundamental models generally
use longer histories (up to four years of returns) in the construction of their covariance matrices.
This leads to a problem when comparing the forecast from a fundamental model with a statistical
model: are differences in risk due to structure in the returns that one model is picking up and the
other not, or simply due to the different “memory”, i.e. length of history used, by each model?
We can address this as follows: suppose we have built our statistical model using APCA and a
t-day history of returns. We write the factor decomposition as
Rt = BFt + ∆t ,
where Rt is a matrix of t-days of asset returns, B is the latest matrix of statistical exposures, Ft
is the corresponding matrix of t-days of factor returns, and ∆t the matrix of residuals. Now, we
keep the exposures B and retrieve a longer history of asset returns, RT , say. We then perform the
regression
RT = BFT + ∆T ,
and solve for FT via
FT = B T B
−1
B T RT .
DRAFT
We have thus extracted a T -day history of factor returns using our statistical model exposures. This
longer history can then be used to construct the factor covariance matrix, and, if the fundamental
model also uses the same length history for its covariance matrix then we have ruled out one source
of difference in the models’ risk estimates. We can thus have more confidence that differences in
model forecasts are due to structural differences rather than the use of different returns histories.
7.4
Model Statistics
It cam be seen that this methodology (and PCA) is equivalent to performing least-squares regression
of asset returns against factor returns (or exposures) estimates. Therefore, regression statistics such
as R̄2 , standard error, or t-statistics are applicable, though such measures are likely to be rather
high, because computing them is analogous to having look-ahead bias in a least-squares estimator.
Further, since they are estimates over an entire history, rather than a single cross-section. This
makes it difficult to compare and contrast these with our fundamental models.
To generate more of a like-for-like comparison we adopt the following approach:
• Construct our statistical model structure R = BF + Γ using PCA or APCA at time t.
• Perform a weighted least squares regression using our statistical model exposures from time t
against a cross-section of asset returns from time t + 1.
• Compute regression statistics (R̄2 , t-statistics) from this regression.
It seems reasonable to assume that, if our statistical exposures contain any genuine information,
and are not merely the constructs of data-mining, then they should yield significant regression
statistics when regressed out-of-sample as above. We thus have regression statistics that can be
more meaningfully compared with those of the fundamental models.
Axioma
46
Axioma RobustTM Risk Model Version 4 Handbook
8
The Risk Model
Thus far the discussion has focused entirely on modeling returns, having said nothing whatsoever
about the generation of risk forecasts. This is justified — if returns are modeled correctly and
robustly then deriving risk estimates is relatively straightforward. For the mathematically-inclined,
a more rigorous treatment of the subtleties can be found in Zangari (2003); De Santis et al. (2003).
If the model has been sensibly constructed with no important factors missed, then the specific
returns are uncorrelated with themselves and with the factors, and the factor risk model can then
be derived thus:
var(r) = var(Bf + u)
Q̂ = BV B T + ∆2
The asset returns covariance matrix Q̂ is a combination of a common factor returns covariance
matrix Σ and a diagonal specific variance matrix ∆2 .
8.1
Factor Covariance Matrix
DRAFT
The factor covariance matrix is calculated directly from the time-series of factor returns. Regression
models estimate a set of factor and specific returns at each time period, eventually building up a
returns history. Statistical models, on the other hand, generate the entire time-series anew with
each iteration.


f1,1 f1,2 · · · f1,T
 f2,1 f2,2 · · · f2,T 


F = .
..
.. 
.
.
.
 .
.
.
. 
fm,1 fm,2 · · · fm,T
However, we arrive at our factor returns history, we could simply write the covariance matrix as
V =
1
FFT
T −1
where, for the sake of convenience, we assume a mean of zero. We simply pick the length of factor
returns history required, and compute the covariance matrix over time using a rolling history of
observations. To achieve a numerically stable covariance matrix, a long history of observations is
necessary. Given a constant volatility regime, this approach is reasonable. However, history would
indicate that such an assumption is not entirely warranted: a long history of returns is likely to
lead to older, less relevant, observations having an undue influence on the present.
8.2
Exponential Weighting
Recent events should exert more influence on the model than those long in the past, but one cannot
simply curtail the history of returns and use only the most recent observations. A sufficiently long
history is required to estimate all the covariances reliably.
Axioma models address this dilemma by weighting the returns matrix using an exponential
weighting scheme:
−(T −t)
2 λ
wt =
t = 0, . . . , T
ŵ
where T is the most recent time period. λ is the half-life P
parameter, the value of t at which the
weight is half that of the most recent observation and ŵ = Tt=0 wt .
Axioma
47
Axioma RobustTM Risk Model Version 4 Handbook
Figure 28: Typical exponential weighting scheme
DRAFT
Figure 28 shows how this looks for values of t from t = 0 (earliest) to t = 500 (most recent) for
a half-life of 100 days. Here weights have been rescaled so that the maximum value (at t = 500) is
one. As can be seen, at t = 400 (one half-life), the weight is 0.5, and at t = 300 (two half-lives) the
weight is 0.25.
Thus, given a half-life, weights W = diag(w1 , w2 , . . . wT ) are computed, and the factor returns
history is weighted to yield a new matrix of values:

 1
1
1
w12 f1,1 w22 f1,2 · · · wT2 f1,T

 1
1
1
w2f
1
w22 f2,2 · · · wT2 f2,T 


2,1
1
F̃ = F W 2 =  .

..
..
..

 ..
.
.
.


1
1
1
2
2
2
w1 fm,1 w2 fm,2 · · · wT fm,T
From this, the factor covariance matrix is estimated as
FWFT
V = var F̃ =
T −1
Selecting an appropriate half-life is a major design question to which there is no definitive answer.
This half-life indirectly affects the forecast horizon of the risk model: too short a half-life may
allow for a very responsive model but creates excessive turnover for asset managers; 11 too long a
half-life and the model will fail to respond sufficiently to changing market conditions. The half-life
parameter therefore treads a fine line between responsiveness and stability.
Figure 29 shows how the choice of half-life influences the computed volatility. The graph shows
risk predictions for a typical US benchmark using the US4 medium horizon and short horizon
models, which use half-lives of 125 and 60 days respectively for their volatility estimation. We show
also a 21-day trailing realised volatility for comparison. It is obvious that the short-horizon model
(US4-SH) is far more responsive to changes in the short-term volatility. But at what cost?
11
Shortening the half-life also reduces the “effective” sample size available for estimating the factor correlations,
2
which far outnumber the number of variances. (approximately n2 vs n)
Axioma
48
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
Figure 29: US benchmark risk forecasts using 60- and 125-day half lives
Figure 30: Average forecast change numbers for US4 models
Axioma
49
Axioma RobustTM Risk Model Version 4 Handbook
Figure 30 shows the average forecast change figures for approximately 70 US benchmarks over a
20-year period. Forecast change is a simple proxy for turnover, and it is clear that the short-horizon
model has almost twice the level that the medium horizon model has. For model users where greater
stability and low turnover is crucial, the short-horizon model is likely to prove too responsive for
their needs.
Axioma risk models use multiple half-lives for greater flexibility — a longer half-life for estimating
the factor correlations and a shorter one for the variances. Given the much larger number of
relationships to be estimated, a relatively long half-life is necessary to yield stable correlations. The
resulting correlations are then scaled by the corresponding variances to obtain the full covariance
matrix.
8.3
Autocorrelation in the Factor Returns
If factor returns were uncorrelated across time, the above procedure is sufficient for calculating a
factor covariance matrix. Unfortunately, the assumption of serial independence is much less true
for daily asset returns than for returns over longer horizons. Over short time frames, market microstructure tends to induce lead-lag relationships that induce autocorrelation in the factor returns
over time. This would not matter if we were simply computing daily covariance matrices and going
no further, but these effects must be taken into consideration when aggregating risk numbers to
longer horizons. For instance, one cannot simply derive a monthly risk forecast from daily numbers
via the simple relationship 12
√
σM = 21 · σd
DRAFT
Without taking autocorrelation into account, such forecasts are likely to be under- or overestimated.
Put simply, if a market has a 10% return one day, followed by a -10% return the next day, its daily
volatility will be high, obviously. However, at any horizon longer than a day, these two numbers
cancel - they contribute nothing to weekly or monthly volatility, for instance. Axioma adjusts its
daily factor covariance matrix estimates using a technique developed by Newey and West (1987):
A factor’s return over N days may be approximated as the sum of its daily returns:
f (t, t + N ) =
t+N
X
f (i)
i=t
This approximation is exact using logarithmic returns. The predicted covariance across two factors,
f and g, from t to t + N , is then
"
#
N
−1 X
k
σf,g (t : t + N ) = N σf (t)σg (t) ρf,g (t) +
1−
(ϕf,g (t, t + k) + ϕg,f (t, t + k))
N
k=1
where ϕf,g (t, t + k) and ϕg,f (t, t + k) represent the lagged correlations:
ϕf,g (t, t + k) = corr (f (t), g(t + k))
ϕg,f (t, t + k) = corr (g(t), f (t + k))
In practice, one need not calculate lagged correlations for each lag up to N . By first analyzing serial
correlation trends in the historical returns, it is reasonable to cut off the series at a point h < N ,
beyond which any correlations are insignificant.
12
Assuming that there are on average 21 trading days in a month
Axioma
50
Axioma RobustTM Risk Model Version 4 Handbook
8.4
Combining Covariance Matrices
Different risk models will utilize different sets of factor returns, perhaps with different techniques
and parameters for computing the covariances. However, it would be desirable, for models with
identical horizons, for the relevant currency covariances to match across models. We guarantee that
this is the case by computing a single global currency covariance matrix as a distinct entity and
then merging the relevant portions of this into each model covariance matrix.
We partition our q non-currency and m currency returns and exposures as
fq
+u
r = Xf + u = Xq Xm
fm
First we compute a simple, all-in-one covariance matrix from this complete set of factor returns.
We factorize this matrix into volatilities and correlations thus
Cq Cqm Dq
Dq
VS =
T
Cm
Dm Cqm
Dm
DRAFT
where the D blocks are diagonal matrices, each element on the diagonal being the relevant factor
volatility, and the C blocks correspond to correlations.
Next, assume that we have also calculated separate covariance matrices for the non-currency
and currency factors. Each of these sub-matrices may have been calculated by differing techniques
(e.g. different numbers of lags in autocorrelation correction, treatment of extreme values etc.). We
stack these in a block diagonal form and factorize as follows:
D̂q
Ĉq
D̂q
VC =
D̂m
Ĉm
D̂m
Because each covariance matrix was calculated for a particular factor group, we have no off-diagonal
blocks corresponding to correlations across factor groups. Our aim, therefore, is to combine the two
covariance matrices VC and VS such that the composite matrix is symmetric positive-semidefinite,
with its diagonal blocks being identical to the diagonal blocks of VC .
To effect this, define the transformation matrix
"
#
1/2
−1/2
D̂q Ĉq Qq Cq
R=
1/2
−1/2
D̂m Ĉm Qm Cm
where Qq and Qm are any orthogonal matrices of dimension q and m respectively. Then the
transformed matrix
Cq Cqm
V =R T
RT
Cqm Cm
yields the desired result. Expansion will demonstrate that its diagonal blocks are equal to those of
VC , that it is symmetric is obvious, and its positive semi-definiteness follows from the properties of
the central term.
However, we needs must define the orthogonal matrices Qq and Qm . Since they may take any
(orthogonal) values, we wish to choose them such that they are optimal in some sense. The following
section details how that is done.
Axioma
51
Axioma RobustTM Risk Model Version 4 Handbook
8.4.1
The Procrustes Transform
We define the ”‘simple”’ covariance matrix, i.e. that computed all-in-one with every factor, thus:
C11 C12
T
C12
C22
Assume without loss of generality that the blocks are ordered according to some criteria (e.g. first
regression, second regression, statistical, currencies etc.). Also assume that there are only two such
sets of factors - it is straightforward to extend all of this analysis to an arbitrary number of sets.
Finally, assume that for each of these factor groups, we have a separately estimated covariance
matrix. We write these as follows:
Ĉ11
Ĉ22
In the case of a regional model, we have two blocks of covariances - the first computed over the
model’s factors, the second is the currency covariance block from the standalone currency model.
Our aim, therefore, is to combine the two sets of covariance matrices such that the composite
matrix is symmetric positive semi-definite (SPSD), with its diagonal blocks being identical to the
components of our second, block-diagonal, covariance matrix. To effect this, define the transformation matrix
"
#
1/2
−1/2
Ĉ11 Q11 C11
X=
1/2
−1/2
Ĉ22 Q22 C22
DRAFT
Here each matrix Qii is any orthogonal matrix. Then the transformed matrix
C11 C12
V =X
XT
T
C12
C22
yields
"
1/2
−1/2
Ĉ11
Ĉ11 Q11 C11
V =
1/2
−1/2 T 1/2 T
−1/2
Ĉ22 Q22 C22 C12 C11 Q11 Ĉ11
1/2
−1/2
C12 C22 QT22 Ĉ22
Ĉ22
#
That this matrix is SPSD is easily shown.
So far, we have not said what the Qii should be. A simple choice might be to set them equal
to the identity matrix. However, the resulting composite covariance matrix then becomes very
unstable over time, owing to the matrix inversion necessary for blocks C11 and C22 . This leads, in
turn, to noisy covariances and risk estimates.
1/2
−1/2
A more rigorous approach is to find Qii such that the product Ĉii Qii Cii
is as close as
possible to the identity matrix. This amounts to solving the following:
Find Qii such that QTii Qii = I and it minimises
1/2
1/2
kĈii Qii − Cii k
This is a known problem with a well-defined solution, the so-called orthogonal Procrustes problem.
The solution is as follows: form the product
T /2
1/2
M = Ĉii Cii
Then compute its singular value decomposition
M = U ΣV T
Axioma
52
Axioma RobustTM Risk Model Version 4 Handbook
The optimal Qii is then given by
Qii = U V T .
This, finally, gives us a stable composite covariance matrix, whose diagonal blocks are equal to our
pre-defined Ĉ11 and Ĉ22 .
8.5
Market Asynchronicity
DRAFT
Because equity markets do not open and close simultaneously, the use of daily returns to estimate
relationships across assets trading in different time-zones offers a number of challenges to a daily
risk model. Consider Tokyo: this market closes before the US market has even opened; thus, stocks
trading in Japan will not reflect events on the US market that day until the Tokyo market reopens
the next day. There is, therefore, the distinct possibility of a lag across asset returns’ behaviour. In
the worst case scenario, if one market always goes up when the other goes down, this could give the
appearance of negative correlation across markets, and hence one market is a hedge for the other.
In reality, this correlation is an artifact of the time-delay, and the markets are positively correlated.
To illustrate this, we compute daily returns for a selection of test markets: each is simply the
weighted sum of the asset returns within the market. We compute the correlations between these
markets and the USA using the raw daily data, along with the correlation using a Newey West
correction of 1 day’s lag. Finally, we compound daily returns to weekly (Wednesday to Wednesday)
and compute the correlations of these with the USA. This will lessen the daily timing effect and
give us a picture of market relationships closer to the truth. Figure 31 shows all three flavors of
Figure 31: Correlations with the US market 1997-2016
correlation between markets. The results are unambiguous: based on weekly returns, almost all
markets show a significant positive correlation with the US market, with the developed markets
showing the strongest relationship. This is not evident when using raw daily returns, where there
is a clear East-West effect. The one-day Newey West correction improves matters somewhat, but
Axioma
53
Axioma RobustTM Risk Model Version 4 Handbook
still vastly under-estimates the true relationship for many of the more Easterly markets. Thus, we
conclude that daily data do give a misleading picture of correlations across markets.
8.5.1
Synchronizing the Markets
To alleviate the problem of market asynchronicity we adjust our returns. Recall that our basic
returns model at a point in time, t, is as follows:
rt = Bft + ut .
(1)
We then make the assumption that each asset return, rti may be written as
rti = rtim + γti ,
(2)
where rim is the return to i’s local market (the particular market on which the asset trades), and
γ i is the remainder, non-market, term.
Using the set of J local market returns, we construct a VAR model of local market returns:
m
rtm = N rt−1
+ t .
(3)
We then use the coefficient matrix N above to estimate a set of forecast market returns
r̂tm = N rtm + t ,
(4)
DRAFT
and these projected market returns are then used to adjust each asset return, viz.
r̂ti = r̂tim + γti .
(5)
r̂ti = rti + ∆itm ,
(6)
Using (2) we rewrite (5) as
where the returns-timing adjustment factor, ∆itm is defined as
∆itm = r̂tim − rtim .
(7)
We may then use the set of adjusted asset returns from (6) in the model factor regression
r̂t = B fˆt + ut .
(8)
For a more complete explanation of returns timing see Axioma (2017d).
8.5.2
Non-trading Days and Returns Timing
A complication with the returns-timing algorithm above occurs when a market has a non-trading
day. If the market(s) being used to adjust the other markets (such as the USA) have a holiday, then
no returns-timing correction can be computed for any market. If the market that one is attempting
to adjust has a holiday, then a correction cannot be computed for that particular market on that
day. With this basic algorithm therefore, when a non-trading day occurs for one of the markets
involved, the returns timing adjustment is missing for that market.
While, for many markets, this is a minor inconvenience, for Middle-Eastern markets, many
of which trade from Sunday to Thursday, or for those with a large number of holidays, it is a
serious shortcoming. Not only is there no returns-timing adjustment on a Friday in the case of the
Axioma
54
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
Figure 32: Correlation of European Markets with the US Market - Part 1
former, but the actual adjustments computed on other days are less well-modelled owing to the nonalignment of trading days, and correlations between the markets are likely to be underestimated.
Figure 32 illustrates the effect of non-trading days. We compare the correlations computed via
weekly returns, with those using daily adjusted returns (via the returns-timing algorithm) for a
selection of European markets. We choose these as there is the most overlap between them and the
Americas, and so the weekly returns correlation is likely to give as accurate a picture as possible.
We see that for most markets, using daily adjusted returns via the basic returns timing algorithm
(Adjusted RT1) tends to underestimate the correlations. If we look at a geographical spread of
markets (Figure 33) we see that the tendency of the weekly correlations to be higher than those
from the adjusted returns decreases as we head East. This seems intuitively reasonable, as there is
at most four days overlap in weekly returns between the Far East and the USA, thus the weekly
correlations are themselves highly likely to be underestimates, the further East we are.
To get round these issues and enable us to compute a returns timing coefficient for any market
on almost any day, we attempt to impute market returns on holidays prior to the returns timing
computation. Given a set of market returns mit at time t for markets i = 1, ..., n, we fit a VAR
model thus
X
mit =
βi jmjt + it
j∈C
where C denotes the set of values of j, such that market mj lies in a time-zone within two hours
of mi (but does not include mi ). Thus, in simple terms, we estimate missing returns for a market
by comparing it with its nearby peers, at least one of which is assumed to be trading. The values
of βi j are estimated from a 500-day history of returns.
The complete returns timing algorithm is described thus:
1. Take initial 500-day set of daily market returns, with non-trading days filled with zero, say
Axioma
55
Axioma RobustTM Risk Model Version 4 Handbook
M 0.
DRAFT
Figure 33: Correlation of Selection of Markets with the US Market - Part 1
2. Estimate a matrix of betas, B from the equation above, using this returns history.
3. Compute a set of imputed returns M 1 = BM 0 .
4. Replace non-trading day values in M 0 with the imputed values from M 1 , filling and scaling
in the usual way.
5. Using M 1 as our initial set of returns, repeat from 2 until some given convergence is achieved.
6. Compute returns timing coefficients using the filled set of market returns computed above.
In practice, the returns-imputation algorithm converges rapidly, after which we have a set of market
returns for 2 years over all markets with no missing data from non-trading days. Days immediately
following non-trading days are scaled appropriately to convert their returns to daily as mentioned
in Axioma (2017b) .
Revisiting our previous results, when we compare them with our improved, robust returnstiming, we get Figure 34. Here we compare the correlations produced via the new returns timing
algorithm (Adjusted RMT2) and we see that they tend to be higher than the old version, and
similar to or higher than the weekly correlations. If we look again at the geographical spread of
markets (Figure 35) we see that the above improvements are generally true. The tendency of the
new returns timing to give higher correlations than the weekly returns increases as we head East.
This seems intuitively reasonable, as there is at most four days overlap in weekly returns between
the Far East and the USA.
Focussing on one market - Israel - that suffers from the problem of market closure on Fridays as
well as other non-trading days, we can see first the improvement in the performance of the returns
timing algorithm. Figure 36 shows the returns timing coefficients for Israel over time (i.e. the
Axioma
56
DRAFT
Axioma RobustTM Risk Model Version 4 Handbook
Figure 34: Correlation of European Markets with the US Market - Part 2
Figure 35: Correlation of Selection of Markets with the US Market - Part 2
Axioma
57
Axioma RobustTM Risk Model Version 4 Handbook
Figure 36: Returns-Timing Coefficients for Israel Market Return
DRAFT
relevant component of the N matrix in equation 3). It is clear that the components for the new
returns timing (RT2) are much more stable than those of the old (RT1), many of which are missing
entirely owing to the many non-trading days. Figure 36 shows the t-statistics of these same returns
timing coefficients, those for the new returns timing being unambiguously greater. Finally, Figure
38 shows the distribution of active risk bias statistics for an Israeli market benchmark versus a global
benchmark. We see that the tendency for the risk to be overestimated in an older global model
(WW2) which uses the first generation returns timing, is greatly improved in the newer model with
the latest returns timing algorithm.
Axioma
58
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
Figure 37: T-Statistics for Israeli Returns-Timing Coefficients
Figure 38: Distribution of Active Risk Bias Statistics for Israeli Benchmark
Axioma
59
Axioma RobustTM Risk Model Version 4 Handbook
9
Modelling Currency Risk
Recall that our currency factor returns (if applicable) are computed separately from the other factor
returns. We append the currency factor returns to the model factor returns:
fq
j
r = Bf + u = Bq Bm
j +u
fm
Here, rj is the vector of asset returns in numeraire currency j. Bq ∈ <N ×q and fq ∈ <q are the
j
non-currency exposures and factor returns respectively. The vector fm
∈ <m is the vector of m
currency returns (relative to numeraire currency j):


c1j


j
fm
=  ... 
cmj
9.1
Currency Covariances
DRAFT
and Bm ∈ <N ×m is the matrix of currency exposures. We then compute a combined factor covariance matrix. Henceforth we drop reference to the numeraire currency except where necessary, and
assume that by “currency return” we always mean the return of a trade in a currency relative to
a particular numeraire. There are a number of complications relating to currency risk. We detail
those here.
Given the returns model above, to include currency risk in the risk model we could simply calculate
a combined factor covariance matrix as we would any other model factor covariance matrix, viz.
T
fq Bq
N
var(r ) = Bq Bm var j
T + var (u)
fm Bm
(9)
Vq Vqm BqT
= Bq Bm
T
T + ∆.
V m Bm
Vqm
Here:
• Vq is the covariance matrix for the non-currency factors,
• Vqm is the covariance matrix block of non-currency/currency factor returns,
• Vm is the currency return covariance matrix.
The main problem with this simple approach is that currencies can be noisy.
Figure 39 shows a scatter plot of Euro returns versus GBP returns. Both of these are liquid,
heavily-traded currencies, and it is manifest that there is a significant statistical relationship between
them. However, if we look at Figure 40, the situation is rather different. This shows the scatter
plot of Indian versus Pakistani Rupee currency returns. Despite the close proximity of the two
geographically, the relationship between the two is non-existent. The problem with currencies is
much the same as with individual assets themselves. Some are heavily traded and liquid, leading to
a high degree of signal to noise and easily-measurable statistical relationships. Others are illiquid,
some are also pegged, and any statistics we attempt to derive from these are likely to be noisy and
unreliable, due to the lack of meaningful information in their returns.
Axioma
60
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
Figure 39: Scatter plot of EUR and GBP returns
Figure 40: Scatter plot of PKR and INR returns
Axioma
61
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
Figure 41: A Selection of Realized Currency Correlations
Figure 41 demonstrates this. We compute realized 250-day correlations for the Euro and GBP,
and contrast these with those for PKR and INR returns. There is clearly a gulf between the two
pairs in terms of meaningful information. The correlation between the EUR and GBP is consistent
and significant over time. The correlation between PKR and INR is little more than noise, and
one would be quite justified in drawing a straight line through the origin to represent what the
relationship really is.
Noisy currencies equal noisy correlations: we see similar patterns amongst other such noisy
currencies and also between currencies and non-currency factors, the danger, of course, being that
we will get apparently significant but entirely spurious correlations from time to time, which may
prove detrimental to a user’s investment strategies.
How can we improve the correlations and reduce the noise? We model the currency returns via
a Principal Components Analysis (PCA) statistical model. We thus decompose
C = ZG + θ,
(10)
where:
• C ∈ <m×T is a time-series matrix of currency returns, cm (t), t = 1, . . . , T ,
• Z ∈ <m×p is a matrix of exposures to p statistical factors,
• G ∈ <p×T is a time-series matrix of p statistical factor returns, and
• θ ∈ <m×T is the residual from the PCA regression.
Given a time-series of currency returns we compute (10) by standard PCA decomposition.
We derive the PCA structure via an estimation universe of twelve currencies. These are:
Axioma
62
Axioma RobustTM Risk Model Version 4 Handbook
• The big seven: USD, JPY, GBP, EUR, CHF, AUD, CAD. These account for the vast majority
of currency trades.
• A further six from Asia and the developing world: BRL, MXN, PLN, TWD, SGD, ZAR.
These are some of the next most heavily traded.13
The above currencies are chosen to minimise residual returns structure: some other currencies are
actually more heavily traded than the latter six, (e.g. SEK, NOK, NZD), but are so highly correlated
with currencies already in the estimation universe that for modeling purposes they scarcely count
as separate entities. Then, we allow our PCA model to have twelve statistical factors; thus we have
the same number of factors as currencies in our estimation universe, ensuring that the estimation
universe currencies are modeled exactly by the PCA, and the non-estimation universe exposures
can be thought of as sensitivities to the estimation universe currency returns.
To summarise the above: we model
Cestu = Zestu G,
(11)
where Cestu ∈ <12×T is the subset of estimation universe currency returns. Here (11) is an exact
relationship (no residual). We then “back out” non-estimation universe exposures, Znonest via
Znonest = Cnonest GT GT G
−1
,
(12)
DRAFT
where Cnonest ∈ <(m−12)×T is the subset of non-estimation universe currency returns. We combine
our “estu” and “non-est” currency components to create the combined currency model (10). We
assume that the residual currency returns θ are uncorrelated, both across currencies and across
currency/non-currency pairs, hence cov(θ, fq ) = 0, cov(θ, ZG) = 0 and var(θ) is diagonal. We may
thus rewrite covariance matrix (9) as
T
Xq
Vq
cov(fq , ZG)
N
var(r ) = Xq Xm
(13)
T + ∆.
cov(ZG, fq ) var(ZG) + var(θ) Xm
By decomposing currency returns and imposing structure onto them, we force at least some of the
noise into the residual, uncorrelated component, θ, and the resulting covariance matrix will exhibit
smoother and more meaningful correlations.
Figure 42 shows the effect of our currency model. For the correlation of EUR versus GBP, there is
little overall difference: this is one of the cases where signal to noise in the returns is high, and so the
only differences between our realised and our model numbers are down to the differing parameters
and settings used to compute covariances. However, in the case of the correlation between PKR
and INR, we see we have significantly reduced the noisy and almost certainly spurious jumps. Our
overall correlation is much closer to zero, reflecting the lack of meaningful information in the two
sets of returns.
9.2
Numeraire Transformation
As it stands, our combined currency matrix gives a measure of risk from the point of view of a
particular base numeraire such as USD or Euro. However, any currency could be considered as a
numeraire, and risk model users who trade in other currencies may wish to look at things from the
perspective of their own currency. The simple but costly solution to this is to compute a number
13
The more observant will have spotted that there are actually thirteen currencies in total; however, one of these is
always the numeraire, hence its returns are zero, and it is automatically excluded from the PCA decomposition
Axioma
63
DRAFT
Axioma RobustTM Risk Model Version 4 Handbook
Figure 42: A Selection of Realized vs. Model Currency Correlations
of different covariance matrices, one for each desired currency. This, however, is inflexible and
wasteful. By a simple transform we may transform a covariance matrix based on one currency to
the perspective of any other that we choose.
Our aim is to transform the currency returns from currency-j perspective to that of, say, currency
k. Using the log-return approximation for currency returns and considering our asset to be the
currency return for currency j, we form the expression
x
x
x
rik
≈ rij
+ rjk
We rebase the return to currency i, cij , from j to currency k as
cik = cij − ckj
Thus everything can be written in terms of already-known quantities (j-numeraire currency returns).
The asset returns in terms of our new base-currency, k, are
fq
rk = Xq Xm
k +u
fm
k , are given by
The re-based currency factor returns, fm


c1j − ckj


..
k
fm
=

.
cmj − ckj
Axioma
64
Axioma RobustTM Risk Model Version 4 Handbook
The returns equation can be rearranged as:
rk = Xq
fq
+u
X̂m
fm
where the transformed currency exposure matrix is given thus


0 ··· 0 1 0 ···

 ..
.. .. ..
X̂m = Xm Im −  .
. . .
0 ··· 0 1 0 ···
The second term is simply a matrix of N by m
the position of currency k. If we write

0 ···
 ..
Tm =  .
0 ···

0
.. 
. 
0
zeros, save with a column of ones corresponding to
0 1 0 ···
.. .. ..
. . .
0 1 0 ···

0
.. 
.
0
Then our transform may be written as
Iq
0
fq
rk = Xq Xm
+u
0 Im − Tm fm
DRAFT
And so, from the above equation for returns, it is simple to see that we may construct our covariance
matrix from the existing factor covariance matrix. We could apply the above transformation to the
asset exposure matrix, considering the factor covariance matrix as fixed, or we could view the
exposures as fixed and compute a new factor covariance matrix as:
Iq
0
Iq
0
F̂ =
F
T
0 Im − Tm
0 Im − Tm
This concludes our overview of the treatment of currencies. For more detail, refer to Axioma (2017a)
Axioma
65
Axioma RobustTM Risk Model Version 4 Handbook
10
Specific Risk Calculation
The final piece of the risk model is the asset specific risk. This is the volatility of each asset’s residual
returns from the factor regression. Given the assumption of zero-correlation between specific returns
we assume a diagonal specific returns covariance matrix. Thus, only n specific variances require
calculation. In their simplest form, these are estimated directly from the specific returns, applying
exponential weighting, with a half-life typically equal to that used for the factor variances for
consistency and a rolling history of one to two years depending on the model:
T
σi2 =
1 X
wt (ui,t − ūi )2
T −1
t=1
The story doesn’t quite end there. There are two problems to be overcome:
• Outliers: whereas factor returns tend to be fairly well-behaved, specific returns are by definition noisy, with many outliers, so a modicum of care must be taken when estimating historical variances. To prevent sporadic extreme returns from skewing the specific risk, Axioma
models employ our MAD-based outlier detection to identify extrema. Rather than using a
multi-dimensional or cross-sectional approach, we examine each asset individually, computing
the MAD along the time-dimension, and winsorising all returns within a certain number of
MADs.
DRAFT
• Deficient histories. IPOs and illiquid assets will not have a fully-populated returns history,
rendering their historical specific variances noisy and unreliable (and also outlier-prone). To
remediate this, we first compute each asset’s specific risk in the usual way (as above). We then
take the subset of “good” assets (call these σ02 ), i.e. those with sufficiently complete returns
histories, and use these to compute a cross-sectional model of specific risk, viz.
σ02 = Z0 v
where Z0 is the subset of exposures corresponding to these assets. These exposures are to
quantities that are always known for every asset, such as size, sector and region. Then, given
the exposure matrix for all assets Z, compute an imputed specific risk s2 for each asset via
s2 = Zv.
Then, if T is the maximum number of observations required for specific risk, and t is the
asset’s actual number of non-missing observations, the asset’s final specific risk is computed
as the weighted sum
1
σ̂ 2 =
tσ 2 + (T − t) s2 .
T
Thus, the greater the number of observations, the greater the influence of the “true” historical
specific risk.
Axioma
66
Axioma RobustTM Risk Model Version 4 Handbook
11
Predicted versus Realised Betas
We define a predicted beta thus:
βP =
hT V h B
,
hTB V hB
where h is any portfolio, from a single asset to the entire universe, hB is the benchmark against
which we are computing the beta, and V is the asset covariance matrix given by a risk model.
We define a historical beta as the solution to a time-series regression:
rh (t) = βH rB (t) + (t),
where rh (t) is the time series of returns to portfolio h and rB (t) the time-series of returns to
benchmark B. And so, the historical beta is given by
βH =
Cov(rh , rB )
.
V ar(rB )
DRAFT
While, on paper, the two forms of beta are equivalent, the use of a structured risk model versus
time-series regression can yield very different results at an asset level. We discussed some of the
issues involved previously, in section 6.1, so we will not repeat them here. At a portfolio level,
however, the differences tend to average out, and the various flavours of beta generally agree over
time. In Figure 43 we see the realised beta computed using two-years of backwards-looking weekly
Figure 43: Predicted (model) and realised betas for Secondary Emerging Markets Benchmark
returns for the FTSE Secondary Emerging portfolio benchmarked against the FTSE Emerging. In
addition to the realised beta, we show the predicted (model) beta computed via the four flavours
of the Global WW4 model. Of course, the numbers are not identical, but track roughly the same
level over time, with some betas more responsive than others.
Axioma
67
Axioma RobustTM Risk Model Version 4 Handbook
12
Modeling of DRs and Cross-Listings
We here consider all cases of ADRs, GDRs, foreign and cross-listed assets, assets whose domicile can
be said to be outside the country of quotation (e.g. Israeli assets quoted in the US, South African
stocks quoted in London) and any similar cases not included in the above. Henceforth we refer to
these collectively as DRs. Typically, alongside the market on which a DR is quoted, there can be
said to be an underlying market, and in many, though not all, cases, a corresponding underlying
asset.
12.1
Modeling DR Returns
We define the numeraire-return of any asset as
rN = rX + cN
X,
(14)
where
• N is the model numeraire currency,
• X is the asset’s local currency, i.e. that in which it is quoted,
• rX is the asset’s local return,
• cN
X is the return in N for an investor trading in currency X.
DRAFT
This relationship is exact for log-returns, but is still a very close approximation for high-frequency
(such as daily) returns. We make use of log-return relationships extensively hereon.
If we define X to be the currency of an underlying asset, and Y to be the currency of quotation
of its DR, then we model the DR’s return in terms of the underlying currency, viz.
rY = rX + cYX .
(15)
Thus, we consider the DR return to be composed of a return expressed in the underlying asset’s
currency, plus the return in currency Y of purchasing currency X. If we ignore effects due to
differing market-hours and liquidity, this return, rX , should be equal to that of the underlying
asset. If it were not, there would be arbitrage opportunities available to the purchaser of either the
DR or the underlying.
12.2
Currency Exposure
From the above we infer that a DR has exposure to two currencies: its local currency Y , and that
of the underlying asset, X. If we write the DR return in terms of numeraire N , combining (14) and
(15) we get
rN = rY + cN
Y,
= rX + cYX + cN
Y.
(16)
The DR would appear, therefore, to have exposure to two currency components. However, in order
to fit this into the framework of a regional model, all currency returns must be expressed in terms
of the numeraire. Using properties of logarithmic returns, we know
Y
cYX = cN
X + cN
N
= cN
X − cY .
Axioma
(17)
68
Axioma RobustTM Risk Model Version 4 Handbook
This says that for a holder of Y to buy currency X is equivalent to a holder of numeraire N doing
the following:
1. Buying currency X,
2. Going short on Y .
Therefore, combining (16) and (17) the DR numeraire-return becomes
N
N
rN = rX + cN
X − cY + cY
= rX + cN
X.
(18)
We see that from a numeraire-perspective, the DR return is the same as the local return of the
underlying asset. We therefore give DRs exposure to the underlying currency rather than the
currency in which the DR is quoted.
12.3
Market Exposure
DRAFT
Since we have translated our DR return’s currency to that of the underlying asset, it would seem
logical that we give the DR exposure to the underlying asset’s market rather than the market on
which the DR is quoted. Thus, an ADR on a British stock will now have exposure to GBP and the
British Market, rather than USD and the US market.
To pick an example at random, the ADR of East Japan Railway, Figure 44 shows the betas
relative to the Japanese (home) and US (trading) markets. While the beta to the US market is
Figure 44: Market Betas of East Japan Railway ADR
distinctly non-zero, it is much lower in magnitude than that to the Japan market. Thus, were we to
Axioma
69
Axioma RobustTM Risk Model Version 4 Handbook
assign the USA as this asset’s market - i.e. its exposure in the risk model, we would likely introduce
significant specification error.
We can observe the effect of introducing such specification error and the harm it causes by
studying the behaviour of Royal Dutch Shell. This company has listings (A and B shares) on the
UK and Dutch markets, leading to two possible ways of treating them:
1. Assign the UK listings to the UK market, and the Dutch listings to the Netherlands.
2. Assign all listings to one of the two markets.
DRAFT
When we look at the betas of the various listings, any of them could be exposed to either the UK
or Netherlands, betas are significant for both markets. However, if we look at the tracking errors
between the Dutch and UK listings using each of these approaches in a risk model, we get Figure 45.
It is clear, when we compare with the realised, model-independent, tracking error, that assigning
Figure 45: Tracking Errors for UK vs. NL RDS listings
different markets to the various listings is exactly the wrong thing to do. The specification error
induced by their differing factor structure causes severe over-estimation of the tracking error (the
Multiple Markets case). Effectively, the model structure is saying that the listings are more different
than is, in fact, the case. Assigning all listings to just one market (whether the UK or Netherlands)
yields a tracking error of the correct order of magnitude (Single Market).
12.4
Issuer Specific Covariance
It is not hard to see that assets linked by a common issuer have rather more than model factors
in common. In addition to the characteristics modelled by their common factor structure, their
Axioma
70
Axioma RobustTM Risk Model Version 4 Handbook
specific returns exhibit higher degrees of correlation than do those of assets that are merely similar.
No model could or should attempt to capture all of this behaviour via the factors alone, and so,
for those assets that we deem to be linked, we break the condition that an asset’s specific returns
should not be correlated with any other, and allow specific correlations between groups of linked
assets.
The specific covariance matrix is no longer diagonal; rather we allow a scattering of non-zero
off-diagonal elements corresponding to pairs of linked assets. We call these linkages Issuer Specific
Covariance, or ISC.
Therefore, in addition to an asset’s specific risk, we need to model the specific correlations
between pairs of linked assets.. It might be thought that all that is required to compute the
correlation between specific returns of assets A and B is to take each asset’s history of specific
returns and, assuming sufficient length and liquidity, compute the correlation between the two as
covAB
correlAB =
σA σB
DRAFT
where covAB is the sample covariance between the two. But there’s more to it than meets the eye.
We shall explain...
Figure 46: Scatter plot of HSBC vs. Barclays returns
Step back to total asset returns for now. And consider the relationship between the returns
of two merely similar assets. Figure 46 shows the relationship between the returns of HSBC and
Barclays - two large UK banks. We see that there is a great deal of commonality in the returns,
and the overall correlation between them is 69%.
Now we look at a corresponding plot for BP and its ADR (Figure 47). A similarly significant
relationship exists between the two, which is unsurprising, given that they share the same issuer
and are, essentially, the same entity. The correlation between the two is 73%, so very similar to
that between HSBC and Barclays.
Axioma
71
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
Figure 47: Scatter plot of BP vs. ADR returns
So, at the level of raw daily asset returns, both pairs seem to exhibit similar depths of relationships. However, when we look at long-run behaviour, as evinced by their cumulative returns, things
become somewhat different. Figure 48 shows the cumulative returns for HSBC and Barclays over
a period of several years. It is clear that they are quite different, despite the high correlation between the two. Again, this is not surprising: it only takes a small difference daily for the long-term
cumulative returns to diverge. Figure 49, on the other hand, shows the cumulative returns to BP
and its ADR. They are, to all intents and purposes, identical. And yet the correlation between the
two was little different to that between HSBC and Barclays. How can this be?
12.5
Cointegration
The reason for this more profound relationship between the assets linked by issuer is found in the
property of cointegration. While the price differences between the two may be large on a given day,
due to noise, market conditions etc., over the long term, the price of one will remain, on average, a
fixed distance (perhaps zero, perhaps not) from the price of the other. Even when a price premium
is present, there is still a tendency for the return to revert, as premia tend to remain relatively
constant over time, and so their effect on returns appears to be negligible. When two assets behave
in this way, they are said to be cointegrated. Technically, two time-series, xt and yt are said to be
cointegrated if there exists a parameter β such that
ut = yt − βxt
is a stationary process. The time-series ut is stationary if it has a constant mean, variance, and
autocorrelation through time.
In plain terms, it is as if there is a gravitational attraction, or elastic band, between the two
assets so that, despite the small-scale noise affecting their daily returns, this attraction prevents their
Axioma
72
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
Figure 48: Cumulative Returns of HSBC and Barclays
Figure 49: Cumulative Returns of BP and its ADR
Axioma
73
Axioma RobustTM Risk Model Version 4 Handbook
prices, and hence, their long-term returns, from drifting apart over time. This is what differentiates
the behaviour of linked assets from merely similar assets: small differences in returns do not blow
up over time into large differences in prices - the differences remain roughly constant.
To test for cointegration, we consider the Engle-Granger (Engle and Granger, 1987) test on the
log of prices. Consider the following time-series regression:
log pt = β log pdt + et ,
(19)
where pt and pdt are the prices of the underlying and DR, respectively, and et is an error term. The
DR and underlying are cointegrated if et is stationary. The augmented Dickey-Fuller (ADF) test is
frequently used to test the stationarity of et . For now, we assume that et is stationary and consider
the implications on the relative returns of the DR and underlying asset.
Using (19), we get the following:
log pt − log pt−1 = β log pdt + et − log pt−1
pt
⇒ log
= β log pdt + et − β log pdt−1 + et−1
pt−1
pd
= β log d t + et − et−1 .
pt−1
(20)
(21)
(22)
If we apply (22) recursively, we get
pd
pT
= β log Td + eT − e0 .
p0
p0
DRAFT
log
(23)
Since et is stationary, eT − e0 has a mean of zero and a variance that is roughly twice the variance
of et and independent of T . Equation (23) says that the cumulative return of the underlying over
T periods equals β times the cumulative return of the DR plus the sum of residuals at time T and
time zero.
If we make the additional simplification that β = 1 (in practice this is a reasonable assumption),
then we see that the difference in cumulative return between the underlying and DR may be written
as
cT − cdT = eT − e0 .
(24)
Defining the standard deviation of et as σ and assuming that the autocovariance of et is zero, then
this difference is distributed as N (0, 2σ 2 ). The critical point is that the cumulative return over a
period has a constant variance regardless of the length of the period.
Now let’s consider the realized tracking error, σa , between the two assets. Let
rt = log pt /pt−1
and
rtd = log pdt /pdt−1
be the logarithmic return of the underlying asset and DR, respectively. Then (22) says that
rt = rtd + εt
Axioma
74
Axioma RobustTM Risk Model Version 4 Handbook
where εt = et − et−1 . The daily active variance is given as
σa2 = E ε2t
=
T
X
(et − et−1 )2
(25)
(26)
t=1
−1
2 TX
= E e20 + e2T +
e2t
T
(27)
= 2σ 2 .
(28)
t=1
This is a daily variance figure; if we make the standard scaling assumption to convert this figure
to a period-variance, we get the result
σa2 = 2T σ 2 .
(29)
DRAFT
According to this expected tracking error, the difference between the cumulative log return at time
T is N (0, 2T σ 2 ). However, according to equation (24), the difference between the cumulative log
return at time T is N (0, 2σ 2 ). And here’s the rub: the assumption by which we scale daily active
variances to weekly, annual or any other period T (by multiplying by T ) does not hold in the case
of cointegrated assets.
In plain terms, this is saying that the overall magnitude of active returns, be they daily, weekly,
monthly or any other frequency, will be the same: one should not scale them or their associated
variances. For an arbitrary pair of assets, the magnitude of the differences between their weekly
returns will indeed be greater than that of the daily returns, and the difference in monthly returns
greater still. For cointegrated assets, (into which category our DRs and underlyings appear to fall)
the theory states that this is not the case.
Figure 50: Tracking Errors for Barclays vs. HSBC
Axioma
75
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
Figure 50 illustrates the case where scaling of variance is appropriate. We compute tracking
errors for HSBC vs. Barclays using daily, weekly and monthly active returns. The raw tracking
errors are shown as dotted lines, and we see that their levels are quite different, obviously owing to
their being computed using data of differing frequencies. We then scale them all to an annual level
(by multiplying the variances by 250, 52 or 12). It may then be seen that the overall level of risk is
roughly the same over time.
Figure 51: Tracking Errors for BP vs. ADR
We then perform the same operation on our cointegrated pair, BP and its ADR. Now we see
that the raw, unscaled tracking errors are all of roughly the same magnitude: cointegration has
ensured their independence of frequency. When we then scale in the usual way, the tracking errors
all diverge. This is because we should not have scaled the numbers at all: the true tracking error is
given by the raw tracking errors at any frequency.
12.6
Some assets are more cointegrated than others
While our ADR versus underlying example exhibited fairly textbook cointegration behaviour, we
cannot assume that every variety of linked asset will do the same. Linked assets fall mostly into
the following categories:
• ADRs, GDRs and similar versus their underlying stocks.
• Varieties of common stock: A, B, C shares, Bearer and Registered etc.
• Common versus Preferred stock.
• Chinese A versus H shares.
There are other varieties, and combinations of the above are also common, but these are the most
numerous. There is more than one test of cointegration, but we use the Augmented Dickey Fuller
test for our purposes. We will skip details of the test, but for a given pair of assets, it returns two
Axioma
76
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
important values: a measure of significance (usually negative) and a critical value. At significance
values below this critical value, then we reject the null hypothesis (that cointegration does not
occur).
At an asset level, returns being what they are, we expect to be cursed with the usual vast
numbers of false positives and negatives. Therefore, we perform our tests over all combinations of
linked assets, and take an average of the significance statistic for each type of combination (e.g.
DRs/Underlyings, Common/Preferred). We then use this to derive some rules of thumb as to when
cointegration does and does not take place in order to guide our choice of treatment for linked assets.
We perform this test over repeated intervals to build up a picture of the evolution of relationships.
We look at our results for some of the largest and most important groupings.
Figure 52: Cointegration between DRs and their underlyings
Figure 52 shows the level of cointegration over time between DRs of various forms and their
underlying stocks. We have divided the underlying assets into common and preferred. Also note
that we have rescaled the significance statistic so that a value of one or greater represents likely
cointegration. We see that both common and preferred stock tend to show significant cointegration
over time with their associated DRs.
Figure 53 shows the results for the relationship between common/common and common/preferred
types. We see that types of common stock exhibit cointegration amongst themselves, but not with
associated preferred stock. This seems intuitively reasonable as a preferred stock is a quite different
beast to a share of common stock.
Figure 54 shows the degree of relationship between Chinese A shares and their related H shares.
While we cannot say cointegration exists here, the relationship does seem to be deepening over
time. This would correspond with the opening up of the Chinese domestic market to international
investors. It may be that, in a few years from now, the situation will have changed, and these stocks
will show clear evidence of cointegration.
Finally, Figure 55 shows the overall results of our studies for all asset types in our equity models.
The solid lines represent evidence of cointegration between asset types, whilst the dotted lines show
borderline cases. We note that some of these are based upon relatively small samples. However, we
feel confident in stating some broad rules of thumb for the most important combinations.
Axioma
77
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
Figure 53: Cointegration between Common and Preferred Stock
Figure 54: Cointegration between Chinese A and H Shares
Axioma
78
Axioma RobustTM Risk Model Version 4 Handbook
Figure 55: Summary of Cointegration Relationships across all asset types
Cointegration does seem to be the norm in:
• DRs and their underlying assets, of whatever type.
DRAFT
• Various types of common stock.
It does not seem to be the norm for:
• Common versus preferred stock.
• Chinese A versus H shares (yet).
12.7
Treatment of Style Exposures
We now have most of the pieces we require for our modelling. We have dealt with country and
currency exposure, together with showing that in many cases, the prices and hence returns of linked
assets are bound by a cointegration relationship. It only remains to detail how we treat the other
important set of factor exposures: the style factors, and then how we apply our rules to the various
families of linked assets.
We may divide style exposures into two kinds: issuer -based and issue-based. Issuer-based styles
are those constructed from company characteristics and include size, value, profitability growth,
leverage etc. Issue-based styles use market data, such as returns, and typically include measures
such as momentum, various betas, volatility and liquidity.
Since they represent company-wide characteristics, it makes sense that issuer-based exposures
should be identical across all assets within a common issuer. Ideally, this should arise as a natural
consequence of the factor definitions and the data. Where it does not, we can replace any outliers
with a common value.
Treatment of issue-based exposures is a little more complex:
• If two assets’ returns series are not cointegrated, then letting the data decide would seem the
most reasonable option - i.e. we compute exposures individually for each asset using its own
reported data.
Axioma
79
Axioma RobustTM Risk Model Version 4 Handbook
• If two assets are cointegrated, then we are going to want to make sure their factor exposures
match exactly. Since, as we shall see, we attempt to model the cointegration behaviour in the
specific return correlations, then any differences in the common factor contribution are likely
to destroy the cointegration in the specific returns.
Thus, when cointegration takes place amongst a set of assets, we match their style exposures by
taking a weighted mean of the available exposures for each asset and assigning this mean to all. The
weights we use are based upon size, liquidity, age of asset and quality of returns history. Such an
approach is preferable to assigning a “master” asset from which other assets take their exposures,
as:
1. It is by no means clear how this master asset would be chosen. It is not always one particular
asset type that is the largest and most liquid.
2. We introduce a discontinuity: were the master asset to change (as will inevitably happen),
we could see sudden jumps in exposures, owing to the new master having markedly different
values.
DRAFT
For these reasons, a weighted average is likely to be smoother and less arbitrary.
Figure 56: Style Exposures for a Set of Linked Assets
Figure 56 shows how the exposures look for a randomly-selected company with four listings.
These consist of two Chinese A-shares, an H-share and associated ADR. This group is split further
into two subgroups - the two A-shares form one group, and the H-share and its ADR form the
other. We see that all of the issuer-based exposures (Value, Profitability, Growth etc.) match for
all listings. However, the issue-level exposures such as Momentum, Volatility etc. match within,
but not between, the subgroups.
Axioma
80
Axioma RobustTM Risk Model Version 4 Handbook
And thus, we are almost at the finish. We have determined our factor structure depending on
whether or not cointegration is taking place between asset types. Where it does, we match the
factor contribution exactly, as so the cointegration is passed through to the specific returns. We
then use this relationship to derive our Issuer Specific Covariance (ISC).
12.8
The Cointegration Algorithm for Specific Returns
We now show how we use the theory of cointegration to model the relationships between linked
assets. Because we have matched all common factor components, we guarantee, so long as two
assets’ total returns are cointegrated, that their specific returns will also be so. Applying equation
19 recursively leads us to
rt = βrtd + εt ,
(30)
where
rt = log pt /pt−1 ,
rtd = log pdt /pdt−1 , and
εt = et − et−1 ,
and pt and pdt are the prices of the underlying and DR (for example), respectively, and et is stationary.
If we assume that εt is uncorrelated with rtd , then we may write
(31)
DRAFT
var(r) = β 2 · var(rd ) + var(ε),
and
cov(r, rd ) = β · var(rd ),
(32)
and this leads us to our algorithm, viz: given two sets of returns, rt and rtd , where we assume for
the sake of argument that rtd is the larger/more liquid of the two and var(rd ) has been computed
directly:
1. ... Using the daily specific returns, compute corresponding series of pseudo log prices, log pt
and log pdt ,
2. Compute β and et from the time-series regression (19),
3. Compute var(r) from (31),
4. Compute the specific covariance between the two assets from (32).
12.9
The Final Bit
Finally, we summarise our approach to the various types of linked assets. When cointegration is
show to exist, we:
• Clone factor exposures across all relevant assets.
• Compute their specific covariance using a cointegration approach.
Where it is unproven, we:
• Let the factor exposures fall as they may.
Axioma
81
Axioma RobustTM Risk Model Version 4 Handbook
• Compute their specific correlations using simple correlation of specific returns.
DRAFT
The exceptions we make to the above are any instances where linked assets are forced to have
different country exposures, If this is so, then there is no purpose to applying the first approach;
we apply the second. We also maintain the ability to override any one approach for specific sets of
assets.
Lastly, we illustrate our approach with a couple of examples. First we look at a case where
cointegration does occur - in the tracking error between DRs and their underlying assets. Figure
Figure 57: Unscaled Tracking Error: DRs versus Underlyings
57 shows the realised tracking error between a portfolio of DRs benchmarked against a portfolio
containing the corresponding underlying assets, one-for-one. All are equally-weighted. Tracking
errors are computed using daily, weekly and monthly returns, but are raw - no scaling (to an
annualised number, for instance) has taken place.
Note that, despite the fact that the numbers are not scaled, the overall magnitude is roughly
the same between the three different varieties of tracking error. When we do scale the numbers,
we get the results shown in Figure 58. It is clear that the scaled numbers no longer match by any
stretch of the imagination. Thus, scaling, as we noted before, is inappropriate when cointegration
occurs. The correct tracking error, therefore, is any one of the unscaled numbers from Figure 57.
Comparing the unscaled active specific risk with the active specific risk computed by the model,
we get Figure 59. We see that our cointegration-based specific covariance method gives an accurate
measure of the risk. Were we to use mere correlation to compute the Issuer Specific Covariance we
would see much higher numbers.
For our second example we compare a portfolio of common stocks benchmarked against a portfolio containing the corresponding preferred stock, again one-for-one. This time, when we look at the
unscaled realised tracking errors (Figure 60), we see that each is of completely different magnitude
- indication that these assets are not cointegrated, however highly correlated they may be.
When we scale them (to annualised tracking errors), as shown in Figure 61, the numbers do
match. Thus we conclude that, cointegration does not seem to occur in this case, and the scaled
Axioma
82
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
Figure 58: Scaled Tracking Error: DRs versus Underlyings
Figure 59: Realised versus model active specific risk: DRs versus Underlyings
Axioma
83
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
Figure 60: Unscaled Tracking Error: Common versus Preferred
Figure 61: Scaled Tracking Error: Common versus Preferred
Axioma
84
Axioma RobustTM Risk Model Version 4 Handbook
DRAFT
numbers from Figure 61 are a more correct rendition of the truth.
Figure 62: Realised versus model active specific risk: Common versus Preferred
Finally, in Figure 62 we compare the realised active specific risk with that given by the model.
We see that in this case, where we did not apply the cointegration method to compute the Issuer
Specific Covariance, the model gives an accurate picture. Had we applied cointegration in this
case, our model numbers would have looked more like those of Figure 59, i.e.typically less than one
percent.
Axioma
85
Axioma RobustTM Risk Model Version 4 Handbook
A
Appendix: Constrained Regression
Our problem is as follows: calculate
min kAx − bk2
subject to the linear constraint
Bx = d,
where A ∈ <n×k , B ∈ <p×k . We assume the following:
• n≥k≥p
• B has rank p
• A has rank q where p + q ≥ k.
We simultaneously decompose A and B via the generalized singular value decomposition, thus:
A = U ΣX −1
B = V ∆X −1
and partition the above as follows:
DRAFT
A = Up Uk−p


−1 0
Σp
Xp


0 Σk−p
Un−k
−1 ,
Xk−p
0
0
B = V ∆p
Xp−1
0
−1 .
Xk−p
We assume without loss of generality that the singular values of A have been sorted so that all zero
singular values lie within the block Σp . If we make the simple transformation of variables
yp
−1
X x=y=
yk−p
then we may rephrase the problem as
min kUp Σp yp + Uk−p Σk−p yk−p − bk2
such that
V ∆p yp = d.
There are two ways in which we may attempt to solve this.
A.1
Exact Solution
Note that the constraint is now uniquely solvable, viz.
T
yp = ∆−1
p V d,
This may be substituted into the minimization equation, resulting in the unconstrained problem
min Uk−p Σk−p yk−p − b̂
Axioma
2
86
Axioma RobustTM Risk Model Version 4 Handbook
where
T
b̂ = b − Up Σp ∆−1
p V d
This is easily shown to have the solution
T
yk−p = Σ−1
k−p Uk−p b̂
And so, we arrive at the (almost) final solution
T
∆−1
p V d
y=
T
−1 T
Σ−1
k−p Uk−p (b − Up Σp ∆p V d)
We convert back to the original variable, x by hitting the above on the left-hand side with the
orthogonal transformation X, but there’s no need to do that right now.
Note that if p + q = k, i.e. we have exactly the number of constraints needed to compensate for
A’s rank-deficiency, then the solution becomes
−1 T ∆p V d
y=
T
Σ−1
k−p Uk−p b
A.2
Dummy Asset Approach
DRAFT
Adding the constraints as dummy assets is tantamount to solving the following unconstrained
problem
b
A
min
x−
.
λd 2
λB
And the question we ask ourselves is, how closely does the solution to this problem approach the
exact solution? Making the same transformation of variables as before, we are effectively solving
yp
b
Up Σp Uk−p Σk−p
−
.
min
λd 2
λV ∆p
0
yk−p
We solve this in the usual manner, viz.
Σp UpT
λ∆p V T
T
Σk − pUk−p
0
y=
Up Σp Uk−p Σk−p
λV ∆p
0
−1 Σp UpT
λ∆p V T
T
Σk−p Uk−p
0
b
.
λd
After working through the linear algebra, this yields
"
#
−1
−1
Σ2p + λ2 ∆2p
Σp UpT b + λ2 Σ2p + λ2 ∆2p
∆p V T d
y=
.
T
Σ−1
k−p Uk−p b
Note that
2 −1
Σ2p + λ2 ∆p
1
= 2
λ
Σ2p
+ ∆2p
λ2
!−1
,
and so, asymptotically, we have that
T
∆−1
p V d
y→
as λ → ∞.
T
Σ−1
k−p Uk−p b
This is equal to the exact solution in the event that either p + q = k or d = 0.
Axioma
87
Axioma RobustTM Risk Model Version 4 Handbook
A.3
Conclusion
If we have exactly the same number of constraints as the rank-deficiency of our exposure matrix,
or all constraints are equal to zero, then for a suitably large dummy asset weight (the Lagrange
multiplier), our solution will be consistent (in the case of ordinary least squares).
B
DRs and Returns-Timing
One unwanted side effect of the fact that we give DRs exposure to the underlying market is to
induce structure in the DR’s specific return. To take an example: BP’s ADR trades on the US
market, which closes hours after London has closed. Therefore, the ADR contains a component of
return due entirely to the US market. But, since we give the ADR exposure to the British market,
this timing-difference component will tend not to be picked up anywhere in the common factor
returns.
To see this, assume that we have converted the DR’s return to the underlying currency. We
define the underlying (i.e. home) market of DR i as im , and its trading market as iq . We model the
return of the DR as
ri = g + fim + Ψi + ∆im ,iq + ui ,
(33)
where
• fim is factor return to market m,
DRAFT
• g is the global market return,
• Ψi is the remaining factor structure (styles, industries), of no relevance here,
• ui is asset i’s specific return,
• ∆im ,iq is the difference in market return between markets m and q.
Unfortunately, this DR returns-timing component ∆im ,iq is not taken account of in the model factor
regression, since the DR has exposure only to the underlying home market. It pollutes the residual
return, rendering it not truly asset-specific:
ûi = ui + ∆im ,iq .
(34)
ui is the true specific return, but ûi is what we’re actually left with from the regression.
To deal with this structured residual component, we write the returns-timing adjustment factor
for market m relative to a reference market, R, (such as the USA) as ∆im ,iR . We then note that,
in the case of log returns, the following holds:
∆im ,iq = ∆im ,R − ∆iq ,R .
(35)
We recompute the DR’s return as
r̂i = ri − ∆im ,R + ∆iq ,R ,
and we have, in effect, made the DR return truly local; the timing-component is removed from
the total return; the factor return is genuinely that pertaining to the home market, m, whilst the
specific return should by construction contain no structure due to timing differences.
Axioma
88
References
Axioma. Axioma macroeconomic risk model handbook. Technical report, 2013.
Axioma. Currency modeling in axioma robust risk models. Technical report, 2017a.
Axioma. Missing data, extreme data. Technical report, 2017b.
Axioma. Testing risk models. Technical report, 2017c.
Axioma. Return-timing: A solution to market asynchronicity. Technical report, 2017d.
John Y. Campbell, Andrew W. Lo, and A. Craig MacKinlay. The Econometrics of Financial
Markets. Princeton University Press, 1997.
Gregory Connor. The three types of factor models: A comparison of their explanatory power.
Financial Analysts Journal, 51:42–46, 1995.
Gregory Connor and Robert A. Korajczyk. Risk and return and an equilibrium APT. Journal of
Financial Economics, 21:255–289, 1988.
Giorgio De Santis, Bob Litterman, Adrien Vesval, and Kurt Winkelmann. Covariance matrix
estimation. In Modern Investment Management: An Equilibrium Approach, chapter 16, pages
224–248. John Wiley & Sons, Inc., 2003.
DRAFT
Robert F. Engle and Clive W.J. Granger. Co-integration and error correction: Representation,
estimation, and testing. econometrica. Econometrica, 55(2):251–276, 1987.
John Fox. Web Appendix, An R and S-PLUS Companion to Applied Regression. Sage Publications,
2002. URL http://socserv.mcmaster.ca/jfox/Books/Companion/appendix.html.
William H. Greene. Econometric Analysis. Prentice Hall, fifth edition, 2003.
Richard C. Grinold and Ronald N. Kahn. Active Portfolio Management, pages 37–66. McGraw-Hill,
1995.
Richard A. Johnson and Dean W. Wichern. Applied Multivariate Statistical Analysis. Prentice Hall,
fourth edition, 1998.
Whitney K. Newey and Kenneth D. West. A simple, positive semi-definite, heteroskedasticity and
autocorrelation consistent covariance matrix. Econometrica, 55(3):703–708, 1987.
Stephen A. Ross. The arbitrage theory of capital asset pricing. Journal of Economic Theory, 13:
341–360, 1976.
Peter Zangari. Equity factor risk models. In Modern Investment Management: An Equilibrium
Approach, chapter 20, pages 334–395. John Wiley & Sons, Inc., 2003.
Axioma Robust is a trademark of Axioma, Inc.
Americas: 212-991-4500
EMEA: +44 (0)20 3621 8241
Asia Pacific: +852-8203-2790
Atlanta
Axioma, Inc.
400 Northridge Road
Suite 550
Atlanta, GA 30350
United States
Phone: 678-672-5400
Chicago
Axioma, Inc.
2 N LaSalle Street
Suite 1400
Chicago, IL 60602
United States
Phone: 312-448-3219
San Francisco
Axioma, Inc.
201 Mission Street
Suite #2150
San Francisco, CA 94105
United States
Phone: 415-614-4170
London
Axioma, (UK) Ltd.
St. Clements House
27-28 Clements Lane
London, EC4N 7AE
United Kingdom
Phone: +44 (0)20 3621 8241
Frankfurt
Axioma Deutschland GmbH
Mainzer Landstrasse 41
D-60329 Frankfurt am Main
Germany
Phone: +49-(0)69-5660-8997
Paris
Axioma (FR)
19 Boulevard Malesherbes
75008, Paris
France
Phone: +33 (0) 1-55-27-38-38
Geneva
Axioma (CH)
Rue du Rhone 69 2nd Floor
1207 Geneva, Switzerland
Phone: +41 22 700 83 00
Hong Kong
Axioma, (HK) Ltd.
Unit B, 17/F 30 Queen’s Road
Central Hong Kong
China
Phone: +852-8203-2790
Tokyo
Axioma, Inc.
Japan Tekko Building 4F
1-8-2 Marunouchi, Chiyoda-ku
Tokyo 100-0005
Japan
Phone: +81-3-6870-7766
Singapore
Axioma, (Asia) Pte Ltd.
30 Raffles Place
#23-00 Chevron House
Singapore 048622
Phone: +65 6233 6835
Melbourne
Axioma (AU) Ltd.
31st Floor
120 Collins Street
Melbourne, VIC 3000
Australia
Phone: +61 (0) 3-9225-5296
DRAFT
New York
Axioma, Inc.
17 State Street
Suite 2700
New York, NY 10004
United States
Phone: 212-991-4500
Sales: sales@axioma.com
Client Support: support@axioma.com
Careers: careers@axioma.com
Download