Axioma RobustTM Risk Model Version 4 Handbook Axioma Robust Risk Model Version 4 Handbook TM DRAFT October 2017 Axioma 1 Axioma RobustTM Risk Model Version 4 Handbook Contents 1 What is Risk? 4 2 Why a Multi-Factor Model? 2.1 The Three Types of Factor Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 7 3 Model Factors 3.1 Market Factor or Intercept . . . . . . . . . . . . . . . . . . 3.2 Industry Factors . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Country Factors . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Style Factors . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Standardisation of Style Factors . . . . . . . . . . . 3.4.2 Stability of Style Factor Exposures . . . . . . . . . . 3.4.3 Exposures for Assets with Insufficient Data (IPOs) . 3.4.4 Exposures for assets with missing Fundamental Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 9 10 10 11 13 14 15 16 5 The Cross-Sectional Returns Model 5.1 The Least-Squares Regression Solution . . . . . . . 5.1.1 Assumptions of the Least-Squares Solution 5.2 Solving the Problem of Heteroskedasticity . . . . . 5.3 Outliers in the Regression . . . . . . . . . . . . . . 5.4 Robust Regression . . . . . . . . . . . . . . . . . . 5.5 Regression statistics . . . . . . . . . . . . . . . . . 5.6 Thin Factors . . . . . . . . . . . . . . . . . . . . . 5.6.1 Treatment of Thin Factors . . . . . . . . . . 5.7 The Problem of Multicollinearity: Binary Factors . 5.8 The Problem of Multicollinearity: Style Factors . . 5.9 Currency Factors . . . . . . . . . . . . . . . . . . . 5.10 Factor Mimicking Portfolios . . . . . . . . . . . . . 5.10.1 Constraints in the regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 19 20 21 22 24 24 25 26 29 29 32 33 6 Common Model Issues 6.1 Model Error: specification versus noise 6.2 Extreme Data . . . . . . . . . . . . . . 6.3 Missing Data . . . . . . . . . . . . . . 6.3.1 A non-trading day problem . . 6.3.2 An illiquid asset problem . . . 6.4 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 35 38 39 41 42 42 7 Statistical Approaches 7.1 Principal Components . . . . . . . 7.2 Asymptotic Principal Components 7.3 Expanded History . . . . . . . . . 7.4 Model Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 44 45 46 46 DRAFT 4 Estimation Universe Axioma . . . . . . . . 2 Axioma RobustTM Risk Model Version 4 Handbook 8 The 8.1 8.2 8.3 8.4 8.5 Risk Model Factor Covariance Matrix . . . . . . . . . . . Exponential Weighting . . . . . . . . . . . . . Autocorrelation in the Factor Returns . . . . Combining Covariance Matrices . . . . . . . . 8.4.1 The Procrustes Transform . . . . . . . Market Asynchronicity . . . . . . . . . . . . . 8.5.1 Synchronizing the Markets . . . . . . 8.5.2 Non-trading Days and Returns Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 47 47 50 51 52 53 54 54 9 Modelling Currency Risk 60 9.1 Currency Covariances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 9.2 Numeraire Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 10 Specific Risk Calculation 66 11 Predicted versus Realised Betas 67 DRAFT 12 Modeling of DRs and Cross-Listings 12.1 Modeling DR Returns . . . . . . . . . . . . . . . 12.2 Currency Exposure . . . . . . . . . . . . . . . . . 12.3 Market Exposure . . . . . . . . . . . . . . . . . . 12.4 Issuer Specific Covariance . . . . . . . . . . . . . 12.5 Cointegration . . . . . . . . . . . . . . . . . . . . 12.6 Some assets are more cointegrated than others . 12.7 Treatment of Style Exposures . . . . . . . . . . . 12.8 The Cointegration Algorithm for Specific Returns 12.9 The Final Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 68 68 69 70 72 76 79 81 81 A Appendix: Constrained Regression 86 A.1 Exact Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 A.2 Dummy Asset Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 A.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 B DRs and Returns-Timing Axioma 88 3 Axioma RobustTM Risk Model Version 4 Handbook 1 What is Risk? There is no single answer to the question, “what is the risk of a given asset or portfolio? ”. Economists, for example, typically associate risk with abstract notions of individual preference, whereas financial regulators may prefer a measure such as Value-at-Risk (VaR). And then there is the whole vexed debate of risk versus uncertainty. We won’t wade into those waters here. Axioma defines risk as the standard deviation of an asset’s return over time. This statistical definition is straightforward, broadly-applicable, and intuitive. An asset whose return varies wildly over time is volatile, therefore risky; another whose return remains fairly low-magnitude is relatively predictable, and thus less risky. Throughout the following discussion, ri,t will represent the return to an asset i, at time t: ri,t = pi,t + di,t − pi,t−1 − rrf,t pi,t−1 DRAFT where pi,t is the asset’s price at time t, di,t is any dividend payout at time t, and pi,t−1 is the price at the previous time period, adjusted for any corporate actions (e.g. stock splits). The time increment t used could be days, weeks, months, or any other period. The quantity rrf,t is the risk-free rate — the return to some minimal risk entity (e.g. LIBOR rate). Returns net of the risk-free rate are termed excess returns and are usually used for the purposes of risk modeling. Henceforth, unless stated otherwise, the term return will be used to denote excess return. The risk of an asset over T time periods is thus given by the square-root of the sample variance (or standard deviation) of the asset returns: v u T u1 X σi = t (ri,t − r¯i )2 T t=1 where r¯i is the asset’s mean return over time. Investors tend to think in terms of portfolios of assets rather than individual assets in isolation. A portfolio allows for diversification and risk reduction, as illustrated by a simple example: consider a portfolio of two risky assets A and B, with weights wA and wB . The risk of this portfolio is q 2 σ 2 + w 2 σ 2 + 2ρ σ(wA rA + wB rB ) = wA AB wA wB σA σB A B B where −1 ≤ ρAB ≤ 1 is the correlation coefficient between the two assets. It can be shown that σ(wA rA + wB rB ) ≤ wA σA + wB σB with equality if ρAB = 1. In other words, the risk of the portfolio of two assets will tend to be lower than the sum of the risks of the individual assets. This is the key to portfolio diversification. When analyzing portfolio risk, it is therefore necessary to know not only the risk of each asset involved, but also the interplay or co-movement of each asset with every other. This information is contained in a covariance matrix of asset returns: σ12 ρ12 σ1 σ2 ρ13 σ1 σ3 · · · ρ1n σ1 σn .. 2 ρ21 σ2 σ1 σ ρ σ σ . 23 2 3 2 . .. 2 Q = ρ31 σ3 σ1 ρ32 σ3 σ2 σ 3 .. .. .. . . . 2 ρn1 σn σ1 ··· ··· ··· σn Axioma 4 Axioma RobustTM Risk Model Version 4 Handbook Ultimately, this is all a risk model is: a big covariance matrix1 , yielding the relationships of the assets to each another. The risk of a portfolio h, where w1 w2 h= . .. wn and wi , i = 1, . . . , n are the weights, or holdings, of each asset, is then computed as p σh = hT Qh Note that the matrix is symmetric and positive-semidefinite DRAFT 1 Axioma 5 Axioma RobustTM Risk Model Version 4 Handbook 2 Why a Multi-Factor Model? DRAFT Possessing an accurate estimate of the asset returns covariance matrix is the sine qua non of portfolio risk management. The obvious question is: how does one calculate such a matrix in practice? The obvious solution would seem to be to build a history of asset returns and then calculate the sample variances and covariances directly from this. Computing sample statistics directly from historical data, however, is fraught with danger. Historical returns are typically noisy; even in the absence of actual data errors, false signals and spurious relationships abound. Two assets may appear closely related when their seemingly-correlated behavior is in fact an artifact of data-mining. Weak signals and noise aside, when a new asset enters the existing asset universe there is no reliable way of calculating its relationships with the other assets, or even its own volatility, as it does not yet possess a returns history. One could attempt to construct proxies, to mimic its assumed behaviour, but such an approach has no guarantee of success. Finally, and possibly most importantly, observations totalling no less than the number of assets are required to accurately estimate all the variances and covariances directly. For any realistic number of assets, it is extremely unlikely that sufficient observations exist. Even with a universe of 100 assets, over 12 · 100 · (100 + 1) > 5, 000 relationships need to be estimated. For markets like the U.S. (over 12,000 assets), this becomes completely infeasible. Any one of the above problems is sufficient reason against constructing an asset returns covariance matrix directly. A better approach is to reduce the amount of information to be estimated. We achieve this by imposing structure on the asset returns: we hypothesise that at least some part of any asset’s return is driven by a relatively small set of common factors. These factors influence the returns of some or all of the assets in the market, and each asset will be exposed to each factor to some extent. Asset returns can then be modeled as a function of a relatively small number of parameters, and the estimation of hundreds of thousands of asset variances and covariances is reduced to calculating, at most, a few thousand. Factors used in multi-factor models fall into several broad categories: • Fundamental factors – Industry and country factors reflect a company’s line of business and country of domicile. – Style factors encapsulate the financial characteristics of an asset — a company’s size, debt levels, liquidity, etc. They are usually calculated from a mixture of market and fundamental (i.e. balance sheet) data. – Currency factors represent the interplay between local currencies of the various assets within the model. – Macroeconomic factors capture an asset’s sensitivity to variables such as GNP growth, bond yields, inflation, etc. • Statistical factors are mathematical constructs responsible for the observed correlations in asset returns. They are not directly connected to any observable real-world phenomena, are not necessarily continuous from one period to the next. An asset’s return is decomposed into a portion driven by these factors (common factor return) Axioma 6 Axioma RobustTM Risk Model Version 4 Handbook and a residual component (specific return), producing the following model at time t: u1 b11 · · · b1m r1 f1 r2 b21 b2m . u2 .. = .. .. .. + .. .. . . . . . fm un bn1 · · · bnm rn or more succinctly, in matrix form: r = Bf + u DRAFT where r is the vector of asset returns at time t, f the vector of factor returns, and u, the set of asset specific returns. B is the n × m exposure matrix: its elements denote each asset’s exposure/sensitivity/beta/loading to a particular factor. Theoretical foundations for this model specification make up the Arbitrage Pricing Theory (APT), proposed by Ross (1976) as a generalization of the traditional Capital Asset Pricing Model to allow for multiple risk factors. The APT is therefore a logical starting point for building a factor risk model. As an example, consider a simple model with four assets, a, b, c and d, and three factors — two industry factors, IT and Banking, and one style factor — size, which takes values of {−1, 0, 1} to represent small, mid-cap and large companies respectively2 . Assets a and b are IT companies, while c and d are banks. a and b are small-cap, c is mid-cap and d is large-cap. The exposure matrix may look like this: IT bank size a 1 0 −1 1 0 −1 B= b 0 1 0 c d 0 1 1 The objects r and B are known, so the system of equations can be solved for f and u. This process is repeated over time to build two time series — one of factor returns and another of asset specific returns. Given that there is a comparatively small number of factors, it is feasible to estimate a factor covariance matrix directly from the factor returns time series. Moreover, assuming that asset specific returns are uncorrelated both amongst themselves and with factor returns 3 , the n × n asset returns covariance matrix may be computed as: var(r) = var(Bf + u) Q = BV B T + ∆2 where V is the m × m factor covariance matrix and ∆2 is the diagonal matrix of specific variances. In essence, the multi-factor model is a dimension reduction tool, simplifying the problem of calculating an n × n asset returns covariance matrix into calculating the variances and covariances of a much smaller number of factors, and n specific variances. 2.1 The Three Types of Factor Model We define our general model of returns thus: r = Bf + u. 2 3 In reality, style exposures usually take a continuum of values rather than simple scores as above. Note that this is arguably the single most important assumption in the entire theory of risk models Axioma 7 Axioma RobustTM Risk Model Version 4 Handbook This begs the obvious question: how do we compute the various components of the model? The asset returns r, are the only part of the model known in advance with 100% certainty (and even this is questionable when we consider illiquid assets). There are, broadly speaking, three approaches: 1. Fundamental models. These are models where we pre-compute the exposures to the factors, B at each point in time, and then estimate the factor returns f cross-sectionally. 2. Statistical models. For these, we simultaneously estimate exposures and a history of factor returns using a corresponding history of asset returns. 3. Macroeconomic models. Here, we begin with time-series of asset returns, derived from macroeconomic indicators such as oil prices, employment data etc. We then estimate our exposures as sensitivities to these time-series. DRAFT All model approaches have their particular strengths and weaknesses. In this handbook we tend to restrict ourselves to the first two such approaches, as the third is dealt with elsewhere in Axioma (2013). In the following sections we discuss in greater detail the various parts of a risk model and the stages in its construction. We begin with detailing how we compute the exposures for our fundamental models, as these are the foundations upon which the rest of the model stands. Interested readers may wish to consider Grinold and Kahn (1995) or Zangari (2003) for a more complete exposition on factor risk models and their applications. Axioma 8 Axioma RobustTM Risk Model Version 4 Handbook 3 Model Factors The factors we identify and combine into our model are arguably the most critical part of the risk model. Every asset must have exposures to its set of model factors, estimated via a number of approaches. We here detail the types of factors and the processes whereby they are computed. 3.1 Market Factor or Intercept Many models include a single factor that encompasses the overall market behaviour, and which is typically by far the most statistically significant of all model factors. This could take the form of the following: 1. Simple Intercept: Every asset has exposure of one to the market and the factor return can be thought of as the regression intercept term. 2. Market Beta: The assumption behind the simple intercept is that each asset has a beta of one to the market. This is obviously untrue, and so we instead compute an asset’s market exposure as its historical beta against a suitable market return (from a benchmark for instance), viz. ri,t = βi rM,t + αi,t for each asset i = 1, . . . , n where rM,t are the returns of the market benchmark index. Industry Factors DRAFT 3.2 Returns of assets belonging to companies with similar lines of business often move together. Industry factors are formed from a set of mappings from each asset to one or more industries within the market. The mapping used can be a proprietary third-party scheme such as GICS or ICB, or a modification thereof, consolidating, splitting, or discarding industries where appropriate. Alternatively, one could create a new scheme altogether by investigating company fundamentals, or by statistical analysis of asset returns. Custom-tailored industry schemes may provide marginally more explanatory power but require intensive work to design. Wherever possible, Axioma prefers industry factor structures that correspond directly to classifications widely accepted among the investment community, particularly if an industry scheme is already a de-facto market standard. There are two main ways in which one may define industry exposures: 1. As binary variables: an asset has unit exposure to the industry corresponding to its company’s main line of business, and zero to other industries. 2. As fractional values. This enables one to assign multiple industry exposures per asset. One could attempt to compute the exposures via: • A sensitivity or beta, using a time-series regression. • Fundamentals: e.g. each exposure being a fraction representing the proportion of activity within that industry. The problem with the first approach is that we then run into the vast amount of false positives and negatives that would inevitably arise, together with the issue of insufficient or illiquid asset histories. The problem with the second is that for many markets, this level of detail is not readily available, and researching this information, especially in less-developed markets, is a challenging problem, to say the least Axioma 9 Axioma RobustTM Risk Model Version 4 Handbook Currently, we follow approach one, for its simplicity and robustness. While we do not rule some form of the second approach for the future, there is little or no hard evidence that more complex industry schemes lead to measurably better models than the binary approach. An important consideration in designing an industry factor scheme is avoiding thin industries within the model. A thin industry contains few or no assets, or has most of its market capitalisation concentrated in a small handful of assets. We will have more to say on this later. 3.3 Country Factors A risk model covering more than one market needs to distinguish between assets’ countries of incorporation 4 . These are similar in flavour to the industry factors, as the types of country exposure include: 1. As binary variables: an asset has unit exposure to its country of incorporation, zero otherwise. 2. As fractional values. Again, one could attempt to compute the exposures via: • A sensitivity or beta, using a time-series regression versus one or more country market returns, for instance, as βi where ri,t = βi rM,t + αi,t for each asset i = 1, . . . , n where rM , t are the returns of a local benchmark index. DRAFT • Fundamentals: e.g. each exposure being a fraction representing a company’s proportion of sales within each country, or its assets therein, for instance. For much the same reasons as for industries, our current approach is the first instance above. Binary variables have the advantage of simplicity: a stock is either exposed to the market or it is not. Country betas are more computationally intensive and therefore may yield greater sensitivity and better fit in theory. In practice, however, any advantages are often overwhelmed by the extra noise introduced. We will talk about the two major types of model error later. Exactly as for industries, it is perfectly possible for country factors to exhibit thinness. We treat these in exactly the same way that we treat thin industry factors (more later). 3.4 Style Factors Whereas industry and market factors capture trends on the broad market level, style factors capture behavior on an asset level, net of market trends. Style factor exposures are derived from market data such as return, trading volume, capitalization, and/or balance sheet (fundamental) data. For instance, to construct a style factor encapsulating the size of an asset’s parent company relative to others in the market, one might consider a function of the company’s market capitalization, total assets, etc. Typical varieties of style factor include the following: 4 For some assets, e.g. ADRs, one may choose to explicitly expose the asset to the market within which it trades rather than the underlying asset’s domicile. Axioma 10 Axioma RobustTM Risk Model Version 4 Handbook Style Value Leverage Growth Momentum Description Cheap or expensively priced stock relative to fundamentals Measure of indebtedness of company Earnings expected to grow over time Consistently high or low returns over a period Size Large or small company Return sensitivity Sensitivity of asset returns to external factors Whether a stock is illiquid or not Liquidity Volatility Stocks returns large in magnitude relative to the market Typical Components Book-to-price, Earnings-to-price, Sales-to-price Debt-to-assets, Debt-to-equity Earnings-growth, sales-growth One-year cumulative return, onemonth cumulative return, stock alpha Log of market capitalisation, total assets Market beta, oil-price sensitivity, exchange-rate sensitivity ADV to market capitalisation, Amihud ratio, Liquidity beta Market beta, residual historical volatility, DRAFT The above is a very small sample. There are many hundreds, if not thousands, of candidate measures for use as styles. A great many, typically, tend to be derived either from a history of market data such as returns, or financial ratios such as book-to-price, debt-to-assets etc. In practice, any measure that can be calculated and for which sufficient data exists can be considered as a potential style factor. Typically, when a model is first estimated, many candidate factors are created. These are then filtered down to an “optimal” set, based on some considerations: • Data availability — there must be sufficient depth and breadth of data to be able to calculate a variable reliably. • Intuitiveness — a style factor must be intuitive to users, describing a sensible characteristic of an asset or company. The average shoe size of a company’s employees may carry explanatory power, but may prove somewhat difficult to explain as part of a strategy. • Empirical performance — a factor must be statistically significant, capable of explaining a non-negligible portion of returns. We detail some of the tests and metrics used in Axioma (2017b). 3.4.1 Standardisation of Style Factors Assuming that we have decided on our set of style factors, at each period we compute their raw exposures using the latest available data. However, because style factor definitions are expressed in a mixture of units (betas, ratios, returns etc.), we needs must standardise the raw values - convert them to a common level, much like z-scores. We do this for two reasons: firstly, a great difference of magnitudes is likely to lead to numerical problems when we compute factor returns, and secondly, making them all roughly unitary, leads to factor returns that are of similar magnitude to the asset returns. To standardise a set of factor exposures b’: 1. Using only the assets in the estimation universe 5 portfolio hU , calculate the capitalizationweighted mean exposure: b̄ = bT hU 5 See the following section on the estimation universe Axioma 11 Axioma RobustTM Risk Model Version 4 Handbook 2. Calculate the equal-weighted standard deviation of the exposure values in hU about the capitalization-weighted mean (zero, per above step): s 1 X σb = (bi − b̄)2 N −1 i∈U 3. Subtract the weighted mean exposure from each asset’s raw exposure and divide by the equalweighted standard deviation: bi − b̄ b̂i = σb DRAFT This ensures that the estimation universe has a capitalization-weighted mean of zero and standard deviation of one; in other words, a typical market capitalisation-weighted “broad-market” portfolio will be approximately factor neutral. Note that we make no other assumption about the crosssectional distribution of the exposures. It is a common fallacy that they should be, for example, normally distributed. This is not the case. Outliers in the raw exposure data can skew the standardizing statistics, so raw exposures are typically winsorized prior to standardisation. Given an upper bound bU and lower bound bL , for a given raw style factor exposure, braw , set: braw < bL bL b˜i = braw bL < braw < bU bU braw > bU There are a number of different methods by which one could determine bU and bL : • Absolute values. One may, for instance, have a prior belief that the data should lie within the interval [-5, 5], and set the bounds accordingly. • Standard deviation. Compute the standard deviation of the entire set of exposures, then pick a multiple k of the standard deviations and truncate all values above or below k standard deviations from the (possibly weighted) mean. • Robust statistics. Rather than using the mean and standard deviation, employ the median and median absolute deviation (MAD): M AD(b) = median (|b − median(b)|) Then, pick a number of MADs above and below the median, and use the corresponding values to winsorize. We will discuss this further in a later section when we talk about outliers in general. The first method relies upon having some idea of what “reasonable” values are, and may be suitable for data one is familiar with, such as market betas. For more complicated factor definitions, a dynamic approach may be more appropriate. The Axioma approach is to use the median absolute deviation (MAD). This forms the workhorse for outlier detection within our models, being used for many other applications than exposure standardisation. We shall have more to say on this when we treat of outliers in general. Axioma 12 Axioma RobustTM Risk Model Version 4 Handbook 3.4.2 Stability of Style Factor Exposures DRAFT Many of our market-based style factors, such as momentum, market sensitivity and volatility, rely on a rolling history of asset returns for their computation. If we take the case of medium-term momentum, a 230-day history of asset returns is extracted for each asset, and its cumulative return computed. This gives the raw exposure for the current day. The next trading day, this history is advanced by one day, and the oldest datum dropped from the series. And the computation repeats. In earlier models, the returns histories used for these factors were equal-weighted — that is, every return in the series contributes equally to the descriptor value. This, however, introduces potential instability, as exposures may exhibit spurious jumps when a large return enters or exits the series. To counter this, from the fourth generation of models onwards we introduced a new weighting scheme for such descriptors. We downweight returns in the beginning and end of the series, while leaving the central portion of the series with equal weights of one, forming a trapezoid. Downweighting is done is a linear fashion, with that first and last return in the series having the smallest weight, the second and penultimate return having the next smallest weight, and so on, until the central portion of the series is reached Figure 1 demonstrates how this works in practice for a 250-day history of data. Figure 1: Trapezoidal Weighting Scheme The exact number of returns downweighted in this fashion varies from descriptor to descriptor, depending upon the total length of returns used. We can see how the trapezoid weighting scheme improves the stability of asset-level exposures from Figures 2 and 3, which depict exposures to the Market Sensitivity factor for Facebook and Google using two Axioma models: US3 (with no trapezoidal weighting), and US4 (with trapezoidal weighting). While the overall level of exposures is unaffected over time, the weighted exposures of the US4 model are much less noisy. At a model level, we measure the stability of exposures by computing the correlation coefficient between exposures on dates t and t − 21. These correlation coefficients are sometimes referred to as Axioma 13 Axioma RobustTM Risk Model Version 4 Handbook Figure 3: Google Figure 2: Facebook DRAFT exposure “churn” or exposure turnover statistics. A correlation coefficient that is close to 1 suggests that exposures are very similar on dates t and t − 21 (and churn is, therefore, relatively low), and a correlation coefficient that is close to zero suggests that exposures are dissimilar on dates t and t − 21 (and churn is, therefore, relatively high). A style exposure that has too high a churn will yield unstable exposures over time (obviously), but also unstable risk estimates, and related metrics such as model betas. A high churn is therefore undesirable in most situations. Plots 4 and 5 show how exposure churn is improved with the use of trapezoid weights for the Market Sensitivity and Momentum factors, while not adversely affecting the performance of these factors in the model. 6 Figure 4: Market Sensitivity Churn 3.4.3 Figure 5: Medium-Term Momentum Churn Exposures for Assets with Insufficient Data (IPOs) Many factor exposures, such as momentum, market sensitivity and volatility rely on a fixed history of asset returns for their computation. In the case of an IPO, no previous history will exist. While we can and do generate a proxy history of returns for such assets as a starting point (see later), returns-based exposures for IPOs are still likely to suffer from instability and unintuitive values. We therefore shrink exposures of IPOs toward the cap-weighted average sector exposure. The shrinkage parameter is a linear function of the number of days on which actual returns are available. To illustrate, let X denote the raw style exposure for a recently listed asset, and let M denote the cap-weighted average exposure, computed across the sector to which the asset belongs. Let T be the number of returns required for the style factor’s computation, and let t denote the actual number of genuine returns available. Then the shrunk exposure, X̃, is given by 6 Factor performance is measured by inspecting cumulative factor returns, t-statistics for factor returns, factor volatility, etc. The differences in factor performance for models estimated with trapezoid and equal weighting schemes are negligible. Axioma 14 Axioma RobustTM Risk Model Version 4 Handbook ((T − t) ∗ M + t ∗ X) T Thus, at t = 0 (the IPO date), the asset’s exposure is the cap-weighted average sector exposure, while at t = T , the exposure is determined entirely by the asset’s returns. By way of example, Figures 6 and 7 present US3 and US4 market sensitivity and volatility exposures for Google in the 18 months following its IPO in 2004. Note that the US4 model avoids the initial, and presumably erroneous, large market sensitivity and volatility exposures of the predecessor model, whilst US4 “plays it safer” by adopting much more neutral initial values. X̃ = Figure 7: Google Volatility Figure 6: Google Market Sensitivity Exposures for assets with missing Fundamental Data DRAFT 3.4.4 When a style exposure derived from fundamental data (Value, Growth, Leverage etc.) has missing values, we employ a simple cross-sectional model to fill the missing data. We construct an “exposure” matrix for each asset consisting of exposure to region, sector and size, call it Z. Then, for each style exposure, we fit a cross-sectional model to the vector of nonmissing values, x0 , viz. x0 = Z0 v where Z0 is the subset of exposures corresponding to these assets. Then, given the exposure matrix for the assets with missing style exposure, say Z1 , estimate values for the missing style exposure, x1 via x1 = Z1 v. These estimates are then used to fill the missing raw exposure values prior to standardisation. That concludes our overview of factors, for now. Two varieties of model factor conspicuous by their absence are currency factors and statistical factors. It is easier to treat of these when we come to look at the cross-sectional regression and the statistical models respectively. First, we we make a slight digression and speak of the model estimation universe, a critical, but often overlooked, part of the risk model. Axioma 15 Axioma RobustTM Risk Model Version 4 Handbook 4 Estimation Universe There are two important sets of assets: • The model universe, i.e. all of the stocks contained in a particular model, and associated with the market(s) therein, and • The estimation universe. DRAFT When estimating factor returns (or other key model statistics) one seldom uses the entire model universe of asset returns. The overall market may contain many illiquid assets as well as other potentially “problematic” assets such as investment trusts, depository receipts, foreign listings, to name a few. Illiquid assets lack a stable, regular price, making their volatility difficult to estimate reliably. Composite assets such as investment trusts and ETFs are difficult to quantify in terms of their exposure to the market — if such an instrument invests in several different sectors or markets, how should one define its industry/country exposures? Similarly, is an ADR’s return primarily driven by the U.S. market factors, or by the behavior of the underlying stock and its associated market? These are difficult questions to answer and, depending on the market and the model, different assets need to be excluded from the estimation process. Essentially, the estimation universe should contain that which is “representative” of the market, and which is sufficiently large and liquid to be modelled well. In this respect, the rules we use to determine the estimation universe are similar to those used by index providers when constructing market benchmarks. Index providers typically employ sophisticated selection criteria involving price activity, liquidity, capitalisation free-float, and other business logic, much of which we also apply. A critical way in which we differ is that our model estimation universe should include sufficient assets to keep the number of “thin” factors to a minimum. These are factors to which only a small handful of assets have nonzero exposure, and are among the worst evils one can encounter while modeling return, for reasons to be explained later. For this reason, even if legal issues were not a significant deterrent (they are), we prefer to create and apply our own selection criteria. While every model differs in exact detail, most follow something akin to this broad set of rules: • Remove foreign listings, DRs etc. • Exclude particular asset classes, e.g. Preferred stock, ETFs, Investment Trusts. • Exclude assets traded on particular exchanges, whether because of illiquidity or some other undesirable characteristic. • Exclude illiquid assets or those without sufficient returns history. • Exclude very small assets (many of which would also be illiquid). • Attempt to “inflate” any thin factors where possible. • Grandfathering - add back excluded assets which had previously qualified within a given time frame. Of course, there are exceptions to these rules, and some models, for instance, rather than excluding particular asset types, only include certain asset types (e.g. common stock, REITs). But as a rough guide, the above suffices. Figure 8 shows the effect of our logic on the US market for the Axioma US4 model. While there are anything up to 12, 000 assets associated with the US market, Axioma 16 Axioma RobustTM Risk Model Version 4 Handbook DRAFT Figure 8: US4 Model estimation universe: Numbers of Assets Figure 9: US4 Model estimation universe: Proportion of ADV Axioma 17 Axioma RobustTM Risk Model Version 4 Handbook DRAFT after screening, the amount of assets that find their way into the estimation universe is in the region of 3000. This may seem a small proportion of the total in terms of asset numbers. However, when we look at the proportion of estimation universe ADV to the total, the wisdom becomes apparent. Our US4 estimation universe accounts for the vast majority of the US market’s liquidity. By lowering our standards, we would probably be admitting huge numbers of very illiquid (and therefore problematic) assets. Axioma 18 Axioma RobustTM Risk Model Version 4 Handbook 5 The Cross-Sectional Returns Model Recall the linear factor model of asset returns: r = Bf + u There are many possible solutions to this system of equations. If factor exposures B are known, f can be estimated using cross-sectional regression analysis. In the case of what is loosely termed macroeconomic models, however, f is observed, and it is B rather, that needs to be estimated, typically via time-series regression for each asset. In the case of statistical factors, neither B or f is specified, so a rotational indeterminacy exists and both parameters are determined simultaneously, albeit only up to a nonsingular transformation. For a more thorough discussion of multi-factor models and the APT, the curious reader is encouraged to consult Campbell et al. (1997). We now fix our attention on the typical approach taken by fundamental models, whereby the exposures, B are specified at a point in time, and the factor returns, f , estimated via cross-sectional regression. 5.1 The Least-Squares Regression Solution The ordinary least-squares (OLS) regression solution to the factor model of returns seeks to minimize the sum of squared residuals: n X fols = arg min u2i i=1 DRAFT f whose solution is straightforward: fˆols = B T B 5.1.1 −1 BT r Assumptions of the Least-Squares Solution The field of regression theory is vast, so only the issues most relevant to risk modeling are dealt with here. Further details can be found in any elementary econometrics textbook, such as Greene (2003, chap. 4,5). 1. B is a n × m matrix with full column rank ρ(B) = m ≤ n. The OLS solution requires that B T B be invertible, which is satisfied only if the columns of B are linearly independent. Intuitively, this means the factors should all be distinct from one another. 2. Residuals are zero-mean and independent of the factor exposures. In order for the regression estimates to be unbiased — correct “on average”, E [u] = 0 and E B T u = 0 are required. 3. Residuals are homoskedastic and have no autocorrelation. These constitute the Gauss-Markov conditions: var(ui ) = σ 2 and cov(ui , uj ) = 0 for all ui , i = 1, . . . , n, i 6= j (or more compactly, T E uu = σ 2 In ) and establishes the superiority of the least-squares solution over all other linear estimators. Unfortunately, large assets tend to exhibit lower volatility than smaller ones, and homoskedastic residual returns are rarely observed. Figure 10 shows the typical relationship between asset size and returns behavior. 4. Residuals are normally-distributed : strengthening the previous assumption to u ∼ N (0, Ω) where Ω = σ 2 In is not strictly required. Nevertheless, it is a convenient assumption for testing the estimators, to simplify constructing confidence intervals, evaluating hypothesis tests, and so forth. Axioma 19 Axioma RobustTM Risk Model Version 4 Handbook DRAFT Figure 10: Log of Market Capitalisation versus Residual Volatility of US Assets, Dec 2016 5.2 Solving the Problem of Heteroskedasticity Traditionally, one corrects for this phenomenon by scaling each asset’s residual by the inverse of its residual variance, transforming the above into a weighted least-squares (WLS) problem: W 1/2 r = W 1/2 Bf + W 1/2 u n X u2i fwls = arg min f σ2 i=1 i where W 1/2 −1 σ1 = 0 0 .. . σn−1 The solution is easily shown to be fˆwls = B T W B −1 BT W r The challenge lies in estimating the residual variances, σi2 . One could calculate these directly from historical data, but such estimations are noisy and require sufficient history for each asset. As a proxy for the inverse residual variance, Axioma fundamental models use the square-root of each asset’s market capitalization. This proxy is intuitive - larger stocks tend to exhibit less volatile returns - robust, and simple to compute. Tried and tested over many years, it is accepted by the industry as a standard. Further, Axioma 20 Axioma RobustTM Risk Model Version 4 Handbook extensive testing using the inverse residual variance as a weight has singularly failed to show any significant improvement in the model thus obtained. There is an additional, practical, reason for adopting the square-root of capitalization as a weighting scheme: it “tunes” the regression estimates in favor of larger assets. Large, liquid assets constitute the bulk of most institutional investors’ universes, so there is a compelling case for modeling these assets most accurately, sometimes even at the expense of smaller, less visible, assets. 5.3 Outliers in the Regression DRAFT Data frequently contain extreme values, or outliers, arising from outright errors in the data collection process, poor or unrepresentative sampling, or genuinely aberrant behavior. In particular, distributions of asset returns are known to exhibit “fat-tails” — large numbers of observations in the outer edges of the distribution, most likely attributable to economic shocks, poor liquidity, etc. Because least-squares attempts to minimize the sum of squared residuals, outlier returns produce large residuals that have a disproportionate effect on the solution, pulling it away from the “true” solution. As an example, consider the regression of China market returns against Hong Kong market returns over a period of a year. Figure 11 shows the scatter plot of these two sets of returns over the course of 2014. We see a very significant relationship - a beta of 0.62 and an r-square of 57%. Figure 11: China versus Hong Kong Market Returns Then, we introduce a single, albeit very large, outlier to the same data, keeping all other returns identical (Figure 12). Our beta of 0.63 has become a beta of 8.1, with a negligible r-square of 1.4%. Clearly, outliers can have a severely adverse affect, both to the solution, and the numerical certainty around it. And financial data are replete with outliers, legitimate and illegitimate, which obscure the genuine signals one is trying to trace. A simplistic solution may be to exclude or truncate all observations outside certain pre-determined bounds. These bounds can either be static values (e.g. exclude returns above 200% ) or statistical-based (e.g. exclude returns beyond 3 standard deviations from the mean). Axioma 21 Axioma RobustTM Risk Model Version 4 Handbook Figure 12: China versus Hong Kong Market Returns plus Outlier DRAFT Hard-coded figures are inadvisable as they take no account of changing market conditions over time and the resulting changes in the distribution of the returns: an extreme return during an uneventful period may be unremarkable during a market boom or bust. Actual use of the standard deviation as a measure is not robust either, as this itself is very sensitive to outliers. Further, it is not the total returns in which we wish to look for outliers, but the residual returns. Imagine that all assets in a particular industry exhibit an average 20% return one day. From the perspective of total returns over the entire market, these returns would all appear to be outliers and would likely be truncated. However, it is obvious that these 20% returns should actually find their way into the relevant industry return, and the residual returns for these assets would be much smaller. We therefore need to identify outliers in the returns, after they have already been de-meaned by the regression. Something of a chicken and egg situation. Axioma models uses robust statistical methods to reduce the effect of outliers, described in the section below. These iterate round a series of weighted regressions, identifying outliers in the residuals at each stage, and downweighting these observations until some convergence criterion is reached. 5.4 Robust Regression A comprehensive treatment of robust regression is far beyond the scope of this discussion 7 ; rather, focus will be limited to the techniques Axioma uses in its model construction. The weapon of choice is a Huber M-Estimator (where M stands for Maximum-Likelihood ). Whereas the weighted least-squares regression seeks to minimize the sum of squared residuals fwls = arg min f 7 n X wi u2i i=1 An thorough introduction to robust regression can be found in Fox (2002). Axioma 22 Axioma RobustTM Risk Model Version 4 Handbook a robust regression estimator attempts to minimize the objective function frobust = arg min f n X ρ(ui ) = arg min f i=1 n X ρ(ri − bTi f ) i=1 The function ρ fulfills the following criteria for all a, b: • ρ(a) ≥ 0 • ρ(0) = 0 • ρ(a) = ρ(−a) • ρ(a) ≥ ρ(b)f or|a| > |b| Denoting the derivative of ρ with respect to f as ρ0 = ϕ(a), the minimization program has the first-order conditions n X ϕ(ri − bTi f )bTi = 0 i=1 Defining the weight function w(a) as w(a) = ϕ(a)/a, and writing w(ui ) = wi , the above becomes n X wi (ri − bTi f )bTi = 0 DRAFT i=1 which can be solved via iteratively re-weighted least-squares: 1. Using ordinary least-squares, solve the transformed system r̂ = B̂f + û where r̂ = W 1/2 r and B̂ = W 1/2 B, obtaining an initial estimate f 0 . 2. For each iteration j, calculate residuals uj−1 and weights wij−1 = w(uj−1 ). i i 3. Update the weighted least squares estimator f j = B T W j−1 B −1 B T W j−1 r 4. Repeat steps 2 and 3 until successive estimates converge. There are many different possibilities for the function ρ. As stated earlier, Axioma models use the Huber function: ( 1 2 u if |ui | ≤ k ρ(ui ) = 2 i 1 2 k|ui | − 2 k if |ui | > k with k = 1.345σ where σ is the standard deviation of the residual. Returning to the example in the previous section, robust regression gives the results shown in Figure 13. Robust regression has salvaged our solution by downweighting our artificial outlier: most would agree that a beta of 0.62 is somewhat closer to the “true” value of 0.63 than is 8.1 Axioma 23 Axioma RobustTM Risk Model Version 4 Handbook Figure 13: China versus Hong Kong Market Returns plus Outlier with Robust Regression 5.5 Regression statistics DRAFT There are many important statistics and metrics associated with the regression and other parts of the factor model, such as r-squares, VIFs, t-statistics etc. Rather than vastly inflate the current document by detailing them here, we discuss them in Axioma (2017c). This document details many of the many ways in which we evaluate, not just the factor returns model, but the entire risk model itself. 5.6 Thin Factors Thinness presents a very real problem for factor returns estimation. To take the example of industry factors, an industry factor return is predominantly a weighted average of asset returns within the industry. With sufficient assets, the specific return is averaged out, leaving something (hopefully) close to the true factor return. If there are very few stocks, however, a large element of specific return will likely remain, over-estimating the industry factor return and inducing correlations in the specific returns. To see why, consider a simple model with two assets in industry k, equally-weighted for the sake of clarity: r1 = fk + u1 r2 = fk + u2 This has solution fk = 12 (r1 + r2 ) and residuals u1 = −u2 , with perfect negative correlation between the two assets’ specific returns. In general, if there are n mega-cap assets within an industry, all with similar weights, the average correlation across their specific returns will be of the order of ρ = −1/(n − 1). As stated previously, uncorrelated specific returns constitute a major assumption of the factor risk model. If this condition is violated, portfolio specific risk may be over-estimated, because the specific return correlations are not taken into account. Axioma 24 Axioma RobustTM Risk Model Version 4 Handbook Because high concentrations of industry capitalization among a few assets is a common phenomenon in developed markets, the number of stocks in an industry is an inadequate measure of thinness. Axioma models rely on the “Herfindahl Index”, an economic measure of the size of firms relative to their industry, frequently invoked in antitrust applications: X HI = wj2 j 1 where wj is stock j’s weight within the industry. HI then represents the “effective” number of stocks within the industry. An industry with 100 assets but 99% of its capitalization in one asset has a Herfindahl Index of 0.9801, and effective number of stocks equal to 1.0203. It is trivial to see that for an industry where every asset has equal weight, the effective and actual numbers of stocks are exactly equal. Note that we talk here of thin industries, but the same problem can manifest in country factors and even, in rare circumstances, in style factors. In general, any factor where most of the regression weight is concentrated in a few assets, may qualify as thin. As a general rule of thumb, we find empirically that an effective number of ten assets represents the boundary between what may be considered thin and what is “sufficiently fat”. 5.6.1 Treatment of Thin Factors DRAFT Thin industries can often be avoided by merging similar industries within the same sector with closely correlated returns, or where one has negligible weight relative to the other. This approach, however, is not always possible for industries; as market structure evolves, a thin industry may become more populated as time goes by, or vice versa. For thin countries, merging factors is even more problematic; if Belgium is thin within a European model, for instance, would we really want to merge it with France for instance? What would Belgian or French users of the model have to say about that? One suspects they would have rather strongly-held opinions on the matter. So thin factors are sometimes unavoidable, and we must find some means of dealing with them. Axioma models introduce a dummy asset within each thin factor, with a suitable prior return 8 unit exposure to the relevant factor, and the market intercept if relevant, and neutral to all other factors. Its inclusion will, therefore, directly affect only the thin factor’s return estimate. In order to reduce the impact of the other assets within the thin factor, this asset’s regression weight is defined as ( s4j −φ4 wj (φ − 1) 1−φ if HIj−1 < φ 4 · s j wdummy,j = 0 if HIj−1 > φ where • sj = HIj−1 is the reciprocal of the Herfindahl Index, • wj is the total regression weight of all assets in the industry, and • φ is the effective number of assets below which an industry is deemed “thin”. This functional form ensures that the dummy asset’s weight decreases slowly as the effective number of assets increases from 1, but decays rapidly as the effective number approaches the limit φ. As mentioned before, empirical results indicate that φ = 10 is a safe threshold. Beyond this number 8 Depending on the factor, this prior return may be the market return, or a sector or regional return Axioma 25 Axioma RobustTM Risk Model Version 4 Handbook returns diversify sufficiently, but this will also depend on the particular market. If ten proves to be impossibly high, then it is possible to reduce φ as low as six without serious adverse effects. Clearly, the thinner the factor, the greater the weight of the dummy asset. This pulls the factor return estimate closer to the prior return as the industry becomes thinner, reflecting a decreasing certainty over the true factor return in the face of insufficient observations. The purpose, therefore, is not to try and guess what the “true” factor return should have been, but to steer the estimated factor return towards a “safe” estimate and prevent its pollution by large amounts of specific return. Applying this correction to the earlier example, the new factor returns are thus w w 1 dummy dummy f˜k = rprior + 1 − (r1 + r2 ) wk wk 2 with specific returns wdummy 1 ˜ (r1 + r2 ) − fk ũ1 = u1 + wk 2 wdummy 1 ˜ ũ2 = −u1 + (r1 + r2 ) − fk wk 2 and we see that the perfect negative correlation in the specific returns has been disrupted. 5.7 The Problem of Multicollinearity: Binary Factors DRAFT If the regression contains two or more of the following: • A market intercept • Country binary exposures • Industry binary exposures then as it stands it will fail to find a unique solution, as the exposure matrix B is singular. To see why this is so, consider each asset’s return. Ignoring style factors and specific return, it may be written as ri = 1 · fM + 1 · fIj + 1 · fCk where fM is the market intercept return, fIj the asset’s industry return, and fCk its country return. We may add an arbitrary amount to any one of these returns provided an equal, but negative, amount is added to one of the other returns, and the overall model fit will be unchanged. Thus there are an infinite number of solutions, all of which appear equally “good”. This is the issue of multicollinearity or linear dependence between different sets of binary factors. A unique solution can be obtained if, for additional set of binary factors, a constraint is imposed on the regression. In theory, anything that will resolve the non-uniqueness will do. In practice, a common choice for researchers is to force one or more sets of factor returns to sum to zero. Assume henceforth that we have a market intercept, binary industries and binary countries, one could add the constraints λ P X wIj fIj = 0 j=1 ρ Q X wCj fCj = 0 j=1 Axioma 26 Axioma RobustTM Risk Model Version 4 Handbook DRAFT where wIj is the market capitalization of industry j, and wCj the market capitalization of country j. The effect of this is to “force” the broad market return into the market intercept factor, while the country and industry factors represent the residual behaviour of each, net of the overall market. Since the style factor exposures are standardised about a market-capitalisation weighted mean of zero, if one were to compute the return of a market-capitalisation weighted broad market benchmark, one would find that its return was almost entirely explained by the market factor: the industry and country factor return contributions are made zero by the constraints, and the style factor contribution is made zero by the exposure standardisation. The only other contribution would come from the specific return, and for a sufficiently-diversified benchmark, this should be small in magnitude. Figure 14 illustrates the above point using the global model. The plot shows both the Figure 14: Global Benchmark return versus WW model market factor return total return to a typical global benchmark, and the market factor return from the Axioma WW4 model. The returns are all rolling one-year cumulative returns. We see that the two are very close throughout the history with a correlation of 97%. By contrast, Figure 15 shows a typical US benchmark return versus the US factor return from the same WW4 model. The returns are now quite dissimilar, with a lower correlation of 68%. These constraints may be added to the regression as dummy assets and the regression performed in the normal fashion. In the Appendix we demonstrate that under easily-met conditions the solution asymptotically approaches the “true” solution. An alternative approach is to use a multi-stage regression. For example, one could first regress the asset returns against the market factor, r = XM fM + u then regress the residual u against the industry exposures XI : u = XI fI + v Axioma 27 Axioma RobustTM Risk Model Version 4 Handbook DRAFT Figure 15: US Benchmark return versus WW model US factor return then regress the residuals from this against the country exposures XC : v = XC fC + w The factor returns can then be collected together, with the specific return being the final residual, w. This approach is useful when one wants a particular set of factors to extract as much power as possible from the returns, independent of the remaining factors, which will often tend to be much weaker than if they were all included in one regression. In addition to a more cumbersome algorithm, however, analysis of the results becomes more complicated: different regressions may have different weights, and their results cannot easily be meaningfully combined. This makes comparisons difficult. Finally, if one is not careful with the weights, it is possible to induce spurious structure into the regressions and hence the residuals. For all those reasons, we use the constrained, single-regression approach in our models, reserving the residual regression technique for certain specific cases. One such case is the ’Domestic China’ factor in many regional models. Because the “International” China market (H-share, Redchips etc.) behaves quite distinctly from the “Local” China market (A-shares and B-shares), any regional model covering the China market requires two “China factors”: • The main China Market factor is exposed to all China-denominated stocks, • The Domestic China factor is exposed only to the A and B shares. • The China Market factor is estimated in the main regression, using the international China assets. • The Domestic China factor is estimated in a secondary regression, using the China A and B shares, and the residuals from the main regression. Axioma 28 Axioma RobustTM Risk Model Version 4 Handbook The Domestic China factor therefore captures the difference in behaviour of the local China market. Should the two markets converge over time, we would expect to see the significance of this factor steadily diminish. 5.8 The Problem of Multicollinearity: Style Factors Multicolinearity, or linear dependence, can also be an issue amongst style factors. While it is unlikely that any two styles will yield identical exposures, it may certainly be the case that two or more styles9 will look sufficiently alike that this may cause issues of numerical stability, manifesting in poor t-statistics for the afflicted factors and high VIFs (variance inflation numbers). While the best approach is simply to avoid factor definitions that lead to high levels of multicolinearity, sometimes this cannot be avoided. A good example is the definition of Volatility, which is often very highly correlated with beta, or Market Sensitivity. We could simply drop one factor, such as Volatility, but there is still useful information in this factor which is not reflected in Market Sensitivity. Combining them is another possibility, but again, we are potentially damping asset characteristics that we wish to measure. Our customary approach is to orthogonalise the exposures of one relative to the other. For instance, in the case of market sensitivity and volatility, we perform the cross-sectional regression: xvol = α + βxms + , DRAFT where xvol is the original (standardized) volatility exposure and xms is the standardised market sensitivity factor. We may also weight the regression using, for instance, the square root of market capitalisation, in order to maintain consistency with the factor regression. Our new cross section of volatility exposures is thus given by the values of . These now represent the volatility of each asset net of volatility induced by exposure to the market. Numerically, we see from Figures 16 and 17 that the t-statistics for market sensitivity are greatly increased by the reduction of uncertainty surrounding the estimate, while its factor return becomes more negative and, in fact, more like that of volatility. In the extreme case where we had only these two factors in the model, it can be shown that their factor returns would, in fact, become equal after orthogonalisation. 5.9 Currency Factors Currency factors are rather different to other factors in that they do not feature in the returns regression. Rather, each asset has its return calculated using its local currency, ensuring that currency effects are (as far as is possible) eliminated from the factor regression. All non-currency factor returns are then computed via the regression, whilst currency returns are calculated directly from exchange and risk-free rates. Because the final risk model must be built from the viewpoint of a single numeraire currency, it is necessary to convert our returns model to this numeraire currency. This is done per asset as rN = rX + cXN , where • N is the model numeraire currency, • X is the asset’s local currency, i.e. that in which it is quoted, 9 or a linear combination thereof Axioma 29 DRAFT Axioma RobustTM Risk Model Version 4 Handbook Figure 16: T-Statistics for Market Sensitivity and Volatility Figure 17: Factor Returns for Market Sensitivity and Volatility Axioma 30 Axioma RobustTM Risk Model Version 4 Handbook • rX is the asset’s local return, • cXN is the return in currency N for an investor buying currency X. This currency return, and associated risk, represents the risk to the investor of buying and selling something in a currency foreign to him/her. Suppose you area US-based investor who wishes to purchase a Euro-denominated stock. The following artificial example illustrates what the currency return is: • The stock is valued at 1 euro. • The Euro/USD exchange rate is 1 (1 USD will buy 1 Euro). • You purchase 100 Euros using 100 USD. • You buy 100 shares. • Next day, the price of the asset is still 100 Euros, but the Euro/USD exchange rate is 0.5 (1 USD will buy 0.5 Euros). • You sell your shares for 100 Euros. • You convert your 100 Euros back to 200 USD. DRAFT Thus, independently of the asset and its local price, the fluctuation in exchange rates leads to a return and risk due to the currencies. In the above simple example, you have made a positive return from your investment entirely due to these currency movements. Suppose we have a set of m currencies, 1, 2, ..., m. Then we define the following: • pi,t is the asset price at time t in currency i, • ri : the asset return in local currency i from period t − 1 to t, • rif : the risk free rate for currency i, • xij,t is the amount of currency j which one unit of currency i fetches at time t, x is the return for an investor whose numeraire is currency j in buying currency i. • rij We therefore have: pi,t −1 pi,t−1 xij,t x rij = −1 xij,t−1 ri = pj,t = pi,t xij,t We now define what we mean by a currency return. The asset’s total return from a currency j perspective is expressed exactly as rj = pi,t xij,t −1 pi,t−1 xij,t−1 This may be expressed more compactly as x rj = (1 + ri )(1 + rij )−1 Axioma 31 Axioma RobustTM Risk Model Version 4 Handbook If we consider log returns, the above may be written as x rj ≈ ri + rij . This is exact for log returns, and should provide a good approximation for returns over a short duration (e.g. daily returns). The above is written in terms of total returns. If we consider excess returns we may write x rj − rjf ≈ ri − rif + rij + rif − rjf . We have therefore expressed the excess return from the perspective of currency j as the sum of the local excess return (currency i), and that which we define to be the currency factor return, viz. x cij = rij + rif − rjf , where cij represents the excess return to currency i expressed in terms of currency j. Note that when i = j, cij ≡ 0. In our returns model, therefore we simply append the currency factors to the other model factors. Every asset is exposed to a single currency (in most cases, that of the market on which it trades), with an exposure of one, with zero exposure to all other currencies. If we write our original (non-currency) returns model as r = Bf + u DRAFT Then our final numeraire returns model may be written as f N r = B BC fC where • BC is our currency exposure matrix, and • fC is the vector of returns for each currency in the model relative to the numeraire, N . For a more complete treatment of the modelling of currencies see Axioma (2017a). 5.10 Factor Mimicking Portfolios The solution to the weighted least-squares regression has a very useful property that we will here describe. Given our basic factor model r = Xf + u, and a diagonal matrix of weights W , we know that the least-squares solution, i.e. the set of factor returns, is given by −1 T f = XT W X X W r. Looking at the above, we see that each factor return is expressed −1 T as a weighted sum of the returns, where the weights are the rows of the matrix X T W X X W . Thus, each row of this matrix corresponds to a portfolio of asset weights such that, if we take the cross-product of the ith row with the exact set of returns used in the factor regression, we retrieve the ith factor return. Further, if we observe that −1 T XT W X X W X = I, Axioma 32 Axioma RobustTM Risk Model Version 4 Handbook then we deduce that, additionally, the ith row has unit exposure to the ith factor, and zero exposure to all other factors. Such a row is known as a factor mimicking portfolio (FMP), and we can neatly obtain the FMPs for each factor in the model from the matrix above. The FMPs, therefore, provide a simple way of reconstructing the operation of the factor model, without having to perform any regressions, or any of the other complex processing that goes on in the model. They also provide a simple way of constructing ones own factor returns, for a bespoke set of asset returns. They thus provide all the functionality of the factor model in the simple form of a set of portfolio weights. 5.10.1 Constraints in the regression If the model contains regression constraints and/or thin factor corrections, then the form of the FMP becomes a little more complex, as these have the effect of implicitly altering the exposure matrix. With a slight change of notation, write the model regression thus: r1 = X1 f + u, with constraints on the factor returns, which may be written as: 0 = X2 f, DRAFT and dummy assets due to thin factors, which may be expressed thus: r3 = X3 f. Assume that the thin factor dummy returns, r3 may be written as the weighted average of the asset returns r1 , i.e. r3 = Hr1 . Then, given appropriate weights for each of the above, the entire regression can be written as the solution to 1 1 W12 W12 I X1 1 1 2 2 W2 W2 0 r1 = X2 f 1 1 H X3 W32 W32 The least-squares solution to this is given by f = X1T X2T W1 X3T W2 −1 X1 T X2 X1 W3 X3 X2T W1 X3T W2 I 0 r1 . W3 H This can be simplified to f = X1T W1 X1 + X2T W2 X2 + X3T W3 X3 −1 X1T W1 + X3T W3 H r. The beauty is that, given the assumption that the dummy asset constraints are either zero or a sum of the asset returns, all of the thin factor and constraint information may be included in the FMP, but only the original asset returns are required to retrieve the original factor returns. Note, however, that the FMPs no longer have their unit/zero exposure structure any more, since the actual regression being solved is not the original (unconstrained) factor model. If we Axioma 33 Axioma RobustTM Risk Model Version 4 Handbook construct a new exposure matrix that contains the constraints and thin-factor corrections, the exposure properties of the FMPs hold for this matrix. We construct a simple example to illustrate. Suppose we have a model with just two assets, and three factors - a market and two industries. Its returns model will look thus: f1 r1 1 1 0 f2 + u, = r2 1 0 1 f3 with constraint f2 + f3 = 0. For simplicity, we have assumed equal weights everywhere. The regression equation we wish to solve is: r1 1 1 0 f1 r2 = 1 0 1 f2 . 0 0 1 1 f3 Some matrix algebra yields the solution as 1 1 r1 + r2 f1 f2 = 1 1 −1 r1 = 1 r1 − r2 . r2 2 2 −1 1 r2 − r1 f3 DRAFT Thus, the market factor return is given as the average of the two asset returns, while the industry factor returns are computed from the differences between the two asset returns and are perfectly negatively correlated. Note that the FMP for the market factor has non-zero exposure to all factors, owing to the effect of the constraint. Axioma 34 Axioma RobustTM Risk Model Version 4 Handbook 6 Common Model Issues Before moving on to statistical models and beyond, we here take some time to go over some practical issues that arise in modelling, some as a consequence of the model structure itself, some as a consequence of the data used. 6.1 Model Error: specification versus noise By taking a small amount of mathematical liberty, it is reasonable to argue that there are, essentially, two types of error in risk models: 1. Estimation Error, or, more simply noise. This is the error incurred by trying to estimate model components, such as betas in the face of noisy data, sampling error, illiquidity, data errors and so on. 2. Specification Error. This is the error incurred by imposing a particular structure onto a set of asset returns, that may be more or less appropriate. DRAFT The former error tends to manifest in noisy estimates and general uncertainty. The latter tends to manifest as persistent bias. We consider the first of these and use the example of attempting to estimate an asset’s beta (sensitivity to a market return). While perfectly straightforward on paper (it is merely the timeseries regression coefficient of the history of asset returns versus that of the market), there are so many parameters that affect our estimate, for instance: • What length history do we use? • What frequency of data do we use? • How do we treat outliers? • Do we wish to weight the data in any way? • Do we need to correct for autocorrelation? All of these choices affect our estimate, giving the lie to the notion that anybody can claim, definitively, that their beta is the one true beta. To take one of these points, we consider the history length used. Figure 18 shows estimates for the beta using a large, liquid, otherwise well-behaved US asset versus the European market, using estimation horizons of one and two years (rolling daily data) respectively. The instability of the estimated beta is evident, particularly the sharp downwards drop in 2007. Time series regressions are very sensitive to the data used. Similarly, if we keep the estimation window constant, but vary the frequency of the data, we get Figure 19. Thus, the answer to the question, what is this asset’s beta, relative to the relevant market would appear to be: “What would you like it to be?”. And then, there’s specification error. Every model imposes structure onto asset returns. In the case of a fundamental model, this structure is determined by us in advance, largely independently of the underlying returns-generating mechanism. To see what may go wrong, consider an extreme case: an asset whose returns are always, genuinely, zero. It is obvious that the best sort of model to fit this is a model with no factors. Axioma 35 Axioma RobustTM Risk Model Version 4 Handbook DRAFT Figure 18: Rolling Historical Beta for Ticker BERK: Varying horizons Figure 19: Rolling Historical Beta for Ticker BERK: Varying frequencies Axioma 36 Axioma RobustTM Risk Model Version 4 Handbook However, since we impose a structure on the asset returns, we get the following decomposition of return for the asset: X 0= bi fi (t) + u(t). i DRAFT It is highly unlikely that the common factor portion of return will be zero, or close to zero, all of the time, unless we have constructed the exposures also to be zero. Instead, we have a wholly spurious set of factor returns imposed onto the asset. Worse still, it is obvious that the specific returns will always be exactly minus the common factor returns, i.e. the two sets of returns are perfectly negatively correlated. This will cause massive error in any risk measure for this asset owing to the assumption that specific returns are generally uncorrelated with anything else over time. In cases such as this, they obviously are not, but we throw the information away by the assumption of a perfectly diagonal specific risk matrix and no correlation between factor and specific returns. Where we use binary factors for the market or country exposures, assets with historically low betas will be at particular risk here. If we impose a value of one on an asset’s factor exposure, but its estimated beta to that factor (insofar as can be determined - see above) is obviously significantly different from one, then it stands a distinct chance of being “misspecified” in the model. To illustrate this, we run a couple of studies on a set of “good” assets versus a set of “bad” assets. We define our good assets as those whose predicted (model) beta and historical (time series) beta generally agree over time. Figure 20 shows the distribution of correlations between the common Figure 20: Factor/specific return correlation distribution for selected assets factor and specific return histories (using the Axioma global fundamental model) of two equal-sized samples of assets. Those denoted “good” are those whose predicted beta are close to the realised beta on average. Those labelled “bad” exhibit significant and persistent differences in predicted versus realised beta. We see that the correlations for the “bad” assets tend to be skewed towards the negative, whilst the “good” assets are rather more evenly distributed. This negative correlation is a likely indication Axioma 37 Axioma RobustTM Risk Model Version 4 Handbook Figure 21: US Daily Market Returns for 2008 DRAFT of the assets being less well specified by the model structure. Overall, we see the following for the correlations distributions: • Good assets: mean: 0.005, skew: -0.018, • Bad assets: mean: -0.111, skew: -0.850. This type of error, due to inappropriate factor structure, is known as specification error. It is present to a greater or lesser extent in all such structured risk models, and as we said earlier, will tend to manifest as bias in risk estimates and other metrics. 6.2 Extreme Data As mentioned earlier, when it comes to identification of extreme data points, or outliers, the median absolute deviation, or MAD, is our standard tool. This is defined as M AD = medkxi − med(x)k, for a set of data xi , i = 1, . . . , n. This is preferable to the two obvious alternatives: fixed bounds, or the standard deviation. The former relies on the distribution of the data and its associated properties remaining fixed over time. Consider Figure 21. Here we see the distribution of daily US market returns over the year of 2008, divided into two halves: January to June, and July to December. It is clear that any fixed bounds based on the behaviour of the returns in the first half of the year are going to differ greatly from those that may be derived from the second half. Thus, had we tuned our model based on observations from earlier than mid-2008, we would very likely identify far more outliers than is warranted after that period. Obviously a more dynamic measure of asset dispersion is warranted. Axioma 38 Axioma RobustTM Risk Model Version 4 Handbook Figure 22: AUD Currency Returns and the 3 Standard Deviation Bounds DRAFT The other obvious candidate, the standard deviation, suffers from its non-robustness. It is far too easily affected by outliers, having, in the technical parlance, a breakdown point of 0%. Figure 22 shows a six month distribution of recent AUD currency returns, along with the ±3 standard deviation bounds. There are two obvious data points that fall outside the bounds and could be considered as outliers. It is likely that we would wish to clip or exclude these. All well and good. But if we artificially stir things up, by the addition of a single, very large outlier of 10%, we get the situation shown in Figure 23. This one change has inflated the standard deviation bounds so that, while our artificial outlier does actually lie outside the bounds, our two previous outliers are now “inliers”. If we perform the same analysis again, but using 4.5 MADs in place of three standard deviations (which, under assumptions of normality is roughly equivalent), then we get Figure 24. We see here that, using MAD, our original outliers remain outliers, unlike the standard deviation method. 6.3 Missing Data Missing data, especially returns, are a constant problem in daily risk models. Returns may be missing for one of three major reasons: 1. Illiquidity, i.e. an asset does not trade sufficiently regularly to yield a daily return history, 2. Non-trading day, which is really just a day on which all assets in a market become 100% illiquid, and 3. IPOs - an asset is newly listed, so no historical data exists. While in some cases, such as a company with multiple-listings, it may be possible to replace an illiquid asset with a more liquid sibling, in general we do not have that luxury and must find ways to deal with illiquidity. We give two examples below where missing data of one form or another may be a problem. Axioma 39 Axioma RobustTM Risk Model Version 4 Handbook DRAFT Figure 23: AUD Currency Returns plus Outlier and the 3 Standard Deviation Bounds Figure 24: AUD Currency Returns plus Outlier and 4.5 MAD Bounds Axioma 40 Axioma RobustTM Risk Model Version 4 Handbook 6.3.1 A non-trading day problem Figure 25 shows market returns for the Canadian and US markets during October, 2008. Note that Figure 25: Market returns: CA and USA DRAFT the highlighted US return on October 13th, as well as being very large, occurs on a non-trading day for Canada. Then note the following day, where both markets trade: the US market return is insignificant, while the Canada market return is of a similar magnitude to the previous day’s US return. The rolling correlation between the two markets over the relevant period is shown in Figure 26. Note how the correlation suddenly drops when this non-trading day occurs. It scarcely needs to be Figure 26: Market return Correlations: CA and USA said that the non-trading day in Canada is causing a temporary lag effect, and that Canada on the 14th is responding to the US market return on the 13th. Thus, thanks to the holiday in Canada, Axioma 41 Axioma RobustTM Risk Model Version 4 Handbook the two markets appear to be less integrated than is truly the case. So, at both an asset level, and a market level, illiquidity can cause us to underestimate correlations, and hence misspecify our models. 6.3.2 An illiquid asset problem DRAFT In addition to underestimation of the strength of relationships between liquid and less liquid assets, there is the difficulty inherent in computing exposures or other measures based upon illiquid data. This problem was perfectly illustrated in 2015 when large numbers of Chinese stocks were suspended for weeks or even months at a time. The fact that large numbers of assets suddenly began showing missing returns is problematic for any measure based upon returns. One such is the style factor Volatility, which depends on a history (usually three to six months) of asset returns. If an asset’s missing returns are merely treated as zero, then its volatility score will start to become very low. Figure 27: Chinese Asset Volatility Exposures Figure 27 shows the Volatility exposures in an older model for two randomly-selected suspended Chinese stocks. It is clear to see at which point they are suspended, as the Volatility level drops enormously, owing to the missing asset returns, which are treated here as zero returns. 6.4 Solutions The problems of missing data (and extreme values) are legion, and Axioma models adopt a host of solutions to these depending on the context. These include: • Computing of imputed or proxy returns to fill in missing values in a time-series of asset returns for use in exposures, statistical models etc. • Shrinkage of exposures for IPOs towards a mean value. Axioma 42 Axioma RobustTM Risk Model Version 4 Handbook • Imputing non-trading day market returns via a VAR model for use in returns-timing, crosssectional regression etc. • Cross-sectional replacements for missing fundamental data-based exposures. • Shrinkage of specific risk estimates for assets with insufficient data. DRAFT Some of these have already been touched upon, some will be discussed later in this document. All of these issues in this and the previous section on extreme values are discussed far more fully in the accompanying document to this one: Axioma (2017b). Axioma 43 Axioma RobustTM Risk Model Version 4 Handbook 7 Statistical Approaches Statistical factor models typically deduce the appropriate factor structure by analyzing the sample asset returns covariance matrix. There is no need to pre-define factors and compute exposures, as required by fundamental factor models. The only inputs are a time-series of asset returns and the number of desired factors. There are no concrete rules specifying the appropriate number of factors. One can, for example, use a Scree plot to assess the variance explained by each additional factor. Factor analysis is used to estimate the factor exposures B and factor returns f . Principal components and maximum likelihood estimation are two popular methods of parameter estimation. These, as well as others, are explained in detail in Johnson and Wichern (1998). Axioma risk models involving statistical factors typically employ a variation of principal components, described in detail below. A comparison of fundamental versus statistical models, and combinations thereof, is presented in Connor (1995). 7.1 Principal Components Principal components analysis (PCA) determines factors by eigendecomposition of the observed asset returns covariance matrix. Given an n × t matrix R of historical asset returns, Q̂ = RRT = U DU T t Only the largest m eigenvalues and corresponding eigenvectors are kept, hence DRAFT T Q̂ = Um Dm Um + ∆2 where ∆2 is the n × n diagonal matrix of specific variances. 1 2 The exposure matrix B is taken to be Um Dm . Factor and specific returns can then be computed by regressing, for each asset, historical returns against its exposures using ordinary or weighted least-squares: F = BT W B −1 BT W R Γ = R − BF Unlike cross-sectional regression, the matrices (not vectors) F and Γ (m × t and n × t, respectively), contain the entire time-series of factor and specific returns, and will subsequently be used to compute the factor covariance matrix and specific variances. Additionally, one could try to perform the above estimation using only a subset of assets, both for computational efficiency and to avoid noisy historical data. Exposures and specific returns for assets outside this universe can be backed-out via ordinary least-squares. For each asset i, ri,t = m X bi,j fj,t + ui,t j=1 bi = F F T T bi .. B= . −1 F ri bTn Axioma 44 Axioma RobustTM Risk Model Version 4 Handbook 7.2 Asymptotic Principal Components PCA requires that the number of assets n be smaller than the number of time periods t, in order for Q̂ to be reliably estimated. In most capital markets, the number of assets far exceeds the number of time periods for which data are available, especially if returns are measured in weekly or longer intervals. If daily data were used to model the U.S. stock market, for example, one would require well over 10,000 data points — over 40 years of data! Even if such data were available, the resulting covariance matrix would be poorly estimated as noise and shocks from long in the past continue to influence current estimates. To address this inherent shortcoming of PCA, Axioma adopted the following 10 : instead of working with the n × n covariance matrix Q̂ = 1t RRT , one can shift the analysis from n- to t-space, and use the t × t covariance matrix Q̃ = n1 RT R. The raw input data, R, remain unchanged. The process begins with eigendecomposition of the covariance matrix Q̃: T Q̃ = Ũ D̃Ũ T = Ũm D̃m Ũm +Φ T . Choosing the Because there is an infinity of solutions to R = BF + Γ, one can choose F = Ũm eigenvectors (right singular vectors, if using singular value decomposition) is convenient because they provide orthogonal factors. Regressing asset returns against factor returns produces B: BT = F F T −1 F RT = F RT B = RŨm DRAFT At this stage, the factor returns and exposure estimates F and B are discarded, keeping only the specific returns Γ, which are used to estimate a diagonal matrix of asset specific variances: 1 ∆2 = diag ΓΓT t The resulting specific risks are then used to scale each asset’s returns. This is conceptually analogous to weighting assets by inverse residual variance in the earlier regression example, whereby the influence of more volatile assets is reduced: R∗ = ∆−1 R The scaled returns are then used to compute a new t × t covariance matrix, which will undergo the eigendecomposition routine again: 1 ∗T ∗ R R n ∗ ∗ ∗T Q̃∗ = Um Dm Um + Φ̃ Q̃∗ = At this point, we construct new specific risks, rescale our returns and iterate, doing so until successive ∗T are chosen as the final estimates for specific risk are sufficiently close. The top m eigenvectors Um estimates for the factor returns history. The final exposures B are calculated by regressing these against the unscaled asset returns R. The above estimation can be carried out on a subset of assets to prevent noisy returns from contaminating the eigenstructure analysis. The procedure for doing so and recovering the exposures and specific returns for the remaining assets is similar to the procedure outlined for PCA. 10 This novel approach was introduced by Connor and Korajczyk (1988), specifically to address the inherent problem of applying PCA to financial markets, where assets typically outnumber time periods. Axioma 45 Axioma RobustTM Risk Model Version 4 Handbook 7.3 Expanded History In constructing the statistical models, we typically use a history length of one year of returns for the APCA step. Empirically, we find that this yields a happy medium between responsiveness and stability. One year of asset returns obviously leads to a one-year history of factor returns, which can then be used to construct the factor covariance matrix. However, the fundamental models generally use longer histories (up to four years of returns) in the construction of their covariance matrices. This leads to a problem when comparing the forecast from a fundamental model with a statistical model: are differences in risk due to structure in the returns that one model is picking up and the other not, or simply due to the different “memory”, i.e. length of history used, by each model? We can address this as follows: suppose we have built our statistical model using APCA and a t-day history of returns. We write the factor decomposition as Rt = BFt + ∆t , where Rt is a matrix of t-days of asset returns, B is the latest matrix of statistical exposures, Ft is the corresponding matrix of t-days of factor returns, and ∆t the matrix of residuals. Now, we keep the exposures B and retrieve a longer history of asset returns, RT , say. We then perform the regression RT = BFT + ∆T , and solve for FT via FT = B T B −1 B T RT . DRAFT We have thus extracted a T -day history of factor returns using our statistical model exposures. This longer history can then be used to construct the factor covariance matrix, and, if the fundamental model also uses the same length history for its covariance matrix then we have ruled out one source of difference in the models’ risk estimates. We can thus have more confidence that differences in model forecasts are due to structural differences rather than the use of different returns histories. 7.4 Model Statistics It cam be seen that this methodology (and PCA) is equivalent to performing least-squares regression of asset returns against factor returns (or exposures) estimates. Therefore, regression statistics such as R̄2 , standard error, or t-statistics are applicable, though such measures are likely to be rather high, because computing them is analogous to having look-ahead bias in a least-squares estimator. Further, since they are estimates over an entire history, rather than a single cross-section. This makes it difficult to compare and contrast these with our fundamental models. To generate more of a like-for-like comparison we adopt the following approach: • Construct our statistical model structure R = BF + Γ using PCA or APCA at time t. • Perform a weighted least squares regression using our statistical model exposures from time t against a cross-section of asset returns from time t + 1. • Compute regression statistics (R̄2 , t-statistics) from this regression. It seems reasonable to assume that, if our statistical exposures contain any genuine information, and are not merely the constructs of data-mining, then they should yield significant regression statistics when regressed out-of-sample as above. We thus have regression statistics that can be more meaningfully compared with those of the fundamental models. Axioma 46 Axioma RobustTM Risk Model Version 4 Handbook 8 The Risk Model Thus far the discussion has focused entirely on modeling returns, having said nothing whatsoever about the generation of risk forecasts. This is justified — if returns are modeled correctly and robustly then deriving risk estimates is relatively straightforward. For the mathematically-inclined, a more rigorous treatment of the subtleties can be found in Zangari (2003); De Santis et al. (2003). If the model has been sensibly constructed with no important factors missed, then the specific returns are uncorrelated with themselves and with the factors, and the factor risk model can then be derived thus: var(r) = var(Bf + u) Q̂ = BV B T + ∆2 The asset returns covariance matrix Q̂ is a combination of a common factor returns covariance matrix Σ and a diagonal specific variance matrix ∆2 . 8.1 Factor Covariance Matrix DRAFT The factor covariance matrix is calculated directly from the time-series of factor returns. Regression models estimate a set of factor and specific returns at each time period, eventually building up a returns history. Statistical models, on the other hand, generate the entire time-series anew with each iteration. f1,1 f1,2 · · · f1,T f2,1 f2,2 · · · f2,T F = . .. .. . . . . . . . fm,1 fm,2 · · · fm,T However, we arrive at our factor returns history, we could simply write the covariance matrix as V = 1 FFT T −1 where, for the sake of convenience, we assume a mean of zero. We simply pick the length of factor returns history required, and compute the covariance matrix over time using a rolling history of observations. To achieve a numerically stable covariance matrix, a long history of observations is necessary. Given a constant volatility regime, this approach is reasonable. However, history would indicate that such an assumption is not entirely warranted: a long history of returns is likely to lead to older, less relevant, observations having an undue influence on the present. 8.2 Exponential Weighting Recent events should exert more influence on the model than those long in the past, but one cannot simply curtail the history of returns and use only the most recent observations. A sufficiently long history is required to estimate all the covariances reliably. Axioma models address this dilemma by weighting the returns matrix using an exponential weighting scheme: −(T −t) 2 λ wt = t = 0, . . . , T ŵ where T is the most recent time period. λ is the half-life P parameter, the value of t at which the weight is half that of the most recent observation and ŵ = Tt=0 wt . Axioma 47 Axioma RobustTM Risk Model Version 4 Handbook Figure 28: Typical exponential weighting scheme DRAFT Figure 28 shows how this looks for values of t from t = 0 (earliest) to t = 500 (most recent) for a half-life of 100 days. Here weights have been rescaled so that the maximum value (at t = 500) is one. As can be seen, at t = 400 (one half-life), the weight is 0.5, and at t = 300 (two half-lives) the weight is 0.25. Thus, given a half-life, weights W = diag(w1 , w2 , . . . wT ) are computed, and the factor returns history is weighted to yield a new matrix of values: 1 1 1 w12 f1,1 w22 f1,2 · · · wT2 f1,T 1 1 1 w2f 1 w22 f2,2 · · · wT2 f2,T 2,1 1 F̃ = F W 2 = . .. .. .. .. . . . 1 1 1 2 2 2 w1 fm,1 w2 fm,2 · · · wT fm,T From this, the factor covariance matrix is estimated as FWFT V = var F̃ = T −1 Selecting an appropriate half-life is a major design question to which there is no definitive answer. This half-life indirectly affects the forecast horizon of the risk model: too short a half-life may allow for a very responsive model but creates excessive turnover for asset managers; 11 too long a half-life and the model will fail to respond sufficiently to changing market conditions. The half-life parameter therefore treads a fine line between responsiveness and stability. Figure 29 shows how the choice of half-life influences the computed volatility. The graph shows risk predictions for a typical US benchmark using the US4 medium horizon and short horizon models, which use half-lives of 125 and 60 days respectively for their volatility estimation. We show also a 21-day trailing realised volatility for comparison. It is obvious that the short-horizon model (US4-SH) is far more responsive to changes in the short-term volatility. But at what cost? 11 Shortening the half-life also reduces the “effective” sample size available for estimating the factor correlations, 2 which far outnumber the number of variances. (approximately n2 vs n) Axioma 48 Axioma RobustTM Risk Model Version 4 Handbook DRAFT Figure 29: US benchmark risk forecasts using 60- and 125-day half lives Figure 30: Average forecast change numbers for US4 models Axioma 49 Axioma RobustTM Risk Model Version 4 Handbook Figure 30 shows the average forecast change figures for approximately 70 US benchmarks over a 20-year period. Forecast change is a simple proxy for turnover, and it is clear that the short-horizon model has almost twice the level that the medium horizon model has. For model users where greater stability and low turnover is crucial, the short-horizon model is likely to prove too responsive for their needs. Axioma risk models use multiple half-lives for greater flexibility — a longer half-life for estimating the factor correlations and a shorter one for the variances. Given the much larger number of relationships to be estimated, a relatively long half-life is necessary to yield stable correlations. The resulting correlations are then scaled by the corresponding variances to obtain the full covariance matrix. 8.3 Autocorrelation in the Factor Returns If factor returns were uncorrelated across time, the above procedure is sufficient for calculating a factor covariance matrix. Unfortunately, the assumption of serial independence is much less true for daily asset returns than for returns over longer horizons. Over short time frames, market microstructure tends to induce lead-lag relationships that induce autocorrelation in the factor returns over time. This would not matter if we were simply computing daily covariance matrices and going no further, but these effects must be taken into consideration when aggregating risk numbers to longer horizons. For instance, one cannot simply derive a monthly risk forecast from daily numbers via the simple relationship 12 √ σM = 21 · σd DRAFT Without taking autocorrelation into account, such forecasts are likely to be under- or overestimated. Put simply, if a market has a 10% return one day, followed by a -10% return the next day, its daily volatility will be high, obviously. However, at any horizon longer than a day, these two numbers cancel - they contribute nothing to weekly or monthly volatility, for instance. Axioma adjusts its daily factor covariance matrix estimates using a technique developed by Newey and West (1987): A factor’s return over N days may be approximated as the sum of its daily returns: f (t, t + N ) = t+N X f (i) i=t This approximation is exact using logarithmic returns. The predicted covariance across two factors, f and g, from t to t + N , is then " # N −1 X k σf,g (t : t + N ) = N σf (t)σg (t) ρf,g (t) + 1− (ϕf,g (t, t + k) + ϕg,f (t, t + k)) N k=1 where ϕf,g (t, t + k) and ϕg,f (t, t + k) represent the lagged correlations: ϕf,g (t, t + k) = corr (f (t), g(t + k)) ϕg,f (t, t + k) = corr (g(t), f (t + k)) In practice, one need not calculate lagged correlations for each lag up to N . By first analyzing serial correlation trends in the historical returns, it is reasonable to cut off the series at a point h < N , beyond which any correlations are insignificant. 12 Assuming that there are on average 21 trading days in a month Axioma 50 Axioma RobustTM Risk Model Version 4 Handbook 8.4 Combining Covariance Matrices Different risk models will utilize different sets of factor returns, perhaps with different techniques and parameters for computing the covariances. However, it would be desirable, for models with identical horizons, for the relevant currency covariances to match across models. We guarantee that this is the case by computing a single global currency covariance matrix as a distinct entity and then merging the relevant portions of this into each model covariance matrix. We partition our q non-currency and m currency returns and exposures as fq +u r = Xf + u = Xq Xm fm First we compute a simple, all-in-one covariance matrix from this complete set of factor returns. We factorize this matrix into volatilities and correlations thus Cq Cqm Dq Dq VS = T Cm Dm Cqm Dm DRAFT where the D blocks are diagonal matrices, each element on the diagonal being the relevant factor volatility, and the C blocks correspond to correlations. Next, assume that we have also calculated separate covariance matrices for the non-currency and currency factors. Each of these sub-matrices may have been calculated by differing techniques (e.g. different numbers of lags in autocorrelation correction, treatment of extreme values etc.). We stack these in a block diagonal form and factorize as follows: D̂q Ĉq D̂q VC = D̂m Ĉm D̂m Because each covariance matrix was calculated for a particular factor group, we have no off-diagonal blocks corresponding to correlations across factor groups. Our aim, therefore, is to combine the two covariance matrices VC and VS such that the composite matrix is symmetric positive-semidefinite, with its diagonal blocks being identical to the diagonal blocks of VC . To effect this, define the transformation matrix " # 1/2 −1/2 D̂q Ĉq Qq Cq R= 1/2 −1/2 D̂m Ĉm Qm Cm where Qq and Qm are any orthogonal matrices of dimension q and m respectively. Then the transformed matrix Cq Cqm V =R T RT Cqm Cm yields the desired result. Expansion will demonstrate that its diagonal blocks are equal to those of VC , that it is symmetric is obvious, and its positive semi-definiteness follows from the properties of the central term. However, we needs must define the orthogonal matrices Qq and Qm . Since they may take any (orthogonal) values, we wish to choose them such that they are optimal in some sense. The following section details how that is done. Axioma 51 Axioma RobustTM Risk Model Version 4 Handbook 8.4.1 The Procrustes Transform We define the ”‘simple”’ covariance matrix, i.e. that computed all-in-one with every factor, thus: C11 C12 T C12 C22 Assume without loss of generality that the blocks are ordered according to some criteria (e.g. first regression, second regression, statistical, currencies etc.). Also assume that there are only two such sets of factors - it is straightforward to extend all of this analysis to an arbitrary number of sets. Finally, assume that for each of these factor groups, we have a separately estimated covariance matrix. We write these as follows: Ĉ11 Ĉ22 In the case of a regional model, we have two blocks of covariances - the first computed over the model’s factors, the second is the currency covariance block from the standalone currency model. Our aim, therefore, is to combine the two sets of covariance matrices such that the composite matrix is symmetric positive semi-definite (SPSD), with its diagonal blocks being identical to the components of our second, block-diagonal, covariance matrix. To effect this, define the transformation matrix " # 1/2 −1/2 Ĉ11 Q11 C11 X= 1/2 −1/2 Ĉ22 Q22 C22 DRAFT Here each matrix Qii is any orthogonal matrix. Then the transformed matrix C11 C12 V =X XT T C12 C22 yields " 1/2 −1/2 Ĉ11 Ĉ11 Q11 C11 V = 1/2 −1/2 T 1/2 T −1/2 Ĉ22 Q22 C22 C12 C11 Q11 Ĉ11 1/2 −1/2 C12 C22 QT22 Ĉ22 Ĉ22 # That this matrix is SPSD is easily shown. So far, we have not said what the Qii should be. A simple choice might be to set them equal to the identity matrix. However, the resulting composite covariance matrix then becomes very unstable over time, owing to the matrix inversion necessary for blocks C11 and C22 . This leads, in turn, to noisy covariances and risk estimates. 1/2 −1/2 A more rigorous approach is to find Qii such that the product Ĉii Qii Cii is as close as possible to the identity matrix. This amounts to solving the following: Find Qii such that QTii Qii = I and it minimises 1/2 1/2 kĈii Qii − Cii k This is a known problem with a well-defined solution, the so-called orthogonal Procrustes problem. The solution is as follows: form the product T /2 1/2 M = Ĉii Cii Then compute its singular value decomposition M = U ΣV T Axioma 52 Axioma RobustTM Risk Model Version 4 Handbook The optimal Qii is then given by Qii = U V T . This, finally, gives us a stable composite covariance matrix, whose diagonal blocks are equal to our pre-defined Ĉ11 and Ĉ22 . 8.5 Market Asynchronicity DRAFT Because equity markets do not open and close simultaneously, the use of daily returns to estimate relationships across assets trading in different time-zones offers a number of challenges to a daily risk model. Consider Tokyo: this market closes before the US market has even opened; thus, stocks trading in Japan will not reflect events on the US market that day until the Tokyo market reopens the next day. There is, therefore, the distinct possibility of a lag across asset returns’ behaviour. In the worst case scenario, if one market always goes up when the other goes down, this could give the appearance of negative correlation across markets, and hence one market is a hedge for the other. In reality, this correlation is an artifact of the time-delay, and the markets are positively correlated. To illustrate this, we compute daily returns for a selection of test markets: each is simply the weighted sum of the asset returns within the market. We compute the correlations between these markets and the USA using the raw daily data, along with the correlation using a Newey West correction of 1 day’s lag. Finally, we compound daily returns to weekly (Wednesday to Wednesday) and compute the correlations of these with the USA. This will lessen the daily timing effect and give us a picture of market relationships closer to the truth. Figure 31 shows all three flavors of Figure 31: Correlations with the US market 1997-2016 correlation between markets. The results are unambiguous: based on weekly returns, almost all markets show a significant positive correlation with the US market, with the developed markets showing the strongest relationship. This is not evident when using raw daily returns, where there is a clear East-West effect. The one-day Newey West correction improves matters somewhat, but Axioma 53 Axioma RobustTM Risk Model Version 4 Handbook still vastly under-estimates the true relationship for many of the more Easterly markets. Thus, we conclude that daily data do give a misleading picture of correlations across markets. 8.5.1 Synchronizing the Markets To alleviate the problem of market asynchronicity we adjust our returns. Recall that our basic returns model at a point in time, t, is as follows: rt = Bft + ut . (1) We then make the assumption that each asset return, rti may be written as rti = rtim + γti , (2) where rim is the return to i’s local market (the particular market on which the asset trades), and γ i is the remainder, non-market, term. Using the set of J local market returns, we construct a VAR model of local market returns: m rtm = N rt−1 + t . (3) We then use the coefficient matrix N above to estimate a set of forecast market returns r̂tm = N rtm + t , (4) DRAFT and these projected market returns are then used to adjust each asset return, viz. r̂ti = r̂tim + γti . (5) r̂ti = rti + ∆itm , (6) Using (2) we rewrite (5) as where the returns-timing adjustment factor, ∆itm is defined as ∆itm = r̂tim − rtim . (7) We may then use the set of adjusted asset returns from (6) in the model factor regression r̂t = B fˆt + ut . (8) For a more complete explanation of returns timing see Axioma (2017d). 8.5.2 Non-trading Days and Returns Timing A complication with the returns-timing algorithm above occurs when a market has a non-trading day. If the market(s) being used to adjust the other markets (such as the USA) have a holiday, then no returns-timing correction can be computed for any market. If the market that one is attempting to adjust has a holiday, then a correction cannot be computed for that particular market on that day. With this basic algorithm therefore, when a non-trading day occurs for one of the markets involved, the returns timing adjustment is missing for that market. While, for many markets, this is a minor inconvenience, for Middle-Eastern markets, many of which trade from Sunday to Thursday, or for those with a large number of holidays, it is a serious shortcoming. Not only is there no returns-timing adjustment on a Friday in the case of the Axioma 54 Axioma RobustTM Risk Model Version 4 Handbook DRAFT Figure 32: Correlation of European Markets with the US Market - Part 1 former, but the actual adjustments computed on other days are less well-modelled owing to the nonalignment of trading days, and correlations between the markets are likely to be underestimated. Figure 32 illustrates the effect of non-trading days. We compare the correlations computed via weekly returns, with those using daily adjusted returns (via the returns-timing algorithm) for a selection of European markets. We choose these as there is the most overlap between them and the Americas, and so the weekly returns correlation is likely to give as accurate a picture as possible. We see that for most markets, using daily adjusted returns via the basic returns timing algorithm (Adjusted RT1) tends to underestimate the correlations. If we look at a geographical spread of markets (Figure 33) we see that the tendency of the weekly correlations to be higher than those from the adjusted returns decreases as we head East. This seems intuitively reasonable, as there is at most four days overlap in weekly returns between the Far East and the USA, thus the weekly correlations are themselves highly likely to be underestimates, the further East we are. To get round these issues and enable us to compute a returns timing coefficient for any market on almost any day, we attempt to impute market returns on holidays prior to the returns timing computation. Given a set of market returns mit at time t for markets i = 1, ..., n, we fit a VAR model thus X mit = βi jmjt + it j∈C where C denotes the set of values of j, such that market mj lies in a time-zone within two hours of mi (but does not include mi ). Thus, in simple terms, we estimate missing returns for a market by comparing it with its nearby peers, at least one of which is assumed to be trading. The values of βi j are estimated from a 500-day history of returns. The complete returns timing algorithm is described thus: 1. Take initial 500-day set of daily market returns, with non-trading days filled with zero, say Axioma 55 Axioma RobustTM Risk Model Version 4 Handbook M 0. DRAFT Figure 33: Correlation of Selection of Markets with the US Market - Part 1 2. Estimate a matrix of betas, B from the equation above, using this returns history. 3. Compute a set of imputed returns M 1 = BM 0 . 4. Replace non-trading day values in M 0 with the imputed values from M 1 , filling and scaling in the usual way. 5. Using M 1 as our initial set of returns, repeat from 2 until some given convergence is achieved. 6. Compute returns timing coefficients using the filled set of market returns computed above. In practice, the returns-imputation algorithm converges rapidly, after which we have a set of market returns for 2 years over all markets with no missing data from non-trading days. Days immediately following non-trading days are scaled appropriately to convert their returns to daily as mentioned in Axioma (2017b) . Revisiting our previous results, when we compare them with our improved, robust returnstiming, we get Figure 34. Here we compare the correlations produced via the new returns timing algorithm (Adjusted RMT2) and we see that they tend to be higher than the old version, and similar to or higher than the weekly correlations. If we look again at the geographical spread of markets (Figure 35) we see that the above improvements are generally true. The tendency of the new returns timing to give higher correlations than the weekly returns increases as we head East. This seems intuitively reasonable, as there is at most four days overlap in weekly returns between the Far East and the USA. Focussing on one market - Israel - that suffers from the problem of market closure on Fridays as well as other non-trading days, we can see first the improvement in the performance of the returns timing algorithm. Figure 36 shows the returns timing coefficients for Israel over time (i.e. the Axioma 56 DRAFT Axioma RobustTM Risk Model Version 4 Handbook Figure 34: Correlation of European Markets with the US Market - Part 2 Figure 35: Correlation of Selection of Markets with the US Market - Part 2 Axioma 57 Axioma RobustTM Risk Model Version 4 Handbook Figure 36: Returns-Timing Coefficients for Israel Market Return DRAFT relevant component of the N matrix in equation 3). It is clear that the components for the new returns timing (RT2) are much more stable than those of the old (RT1), many of which are missing entirely owing to the many non-trading days. Figure 36 shows the t-statistics of these same returns timing coefficients, those for the new returns timing being unambiguously greater. Finally, Figure 38 shows the distribution of active risk bias statistics for an Israeli market benchmark versus a global benchmark. We see that the tendency for the risk to be overestimated in an older global model (WW2) which uses the first generation returns timing, is greatly improved in the newer model with the latest returns timing algorithm. Axioma 58 Axioma RobustTM Risk Model Version 4 Handbook DRAFT Figure 37: T-Statistics for Israeli Returns-Timing Coefficients Figure 38: Distribution of Active Risk Bias Statistics for Israeli Benchmark Axioma 59 Axioma RobustTM Risk Model Version 4 Handbook 9 Modelling Currency Risk Recall that our currency factor returns (if applicable) are computed separately from the other factor returns. We append the currency factor returns to the model factor returns: fq j r = Bf + u = Bq Bm j +u fm Here, rj is the vector of asset returns in numeraire currency j. Bq ∈ <N ×q and fq ∈ <q are the j non-currency exposures and factor returns respectively. The vector fm ∈ <m is the vector of m currency returns (relative to numeraire currency j): c1j j fm = ... cmj 9.1 Currency Covariances DRAFT and Bm ∈ <N ×m is the matrix of currency exposures. We then compute a combined factor covariance matrix. Henceforth we drop reference to the numeraire currency except where necessary, and assume that by “currency return” we always mean the return of a trade in a currency relative to a particular numeraire. There are a number of complications relating to currency risk. We detail those here. Given the returns model above, to include currency risk in the risk model we could simply calculate a combined factor covariance matrix as we would any other model factor covariance matrix, viz. T fq Bq N var(r ) = Bq Bm var j T + var (u) fm Bm (9) Vq Vqm BqT = Bq Bm T T + ∆. V m Bm Vqm Here: • Vq is the covariance matrix for the non-currency factors, • Vqm is the covariance matrix block of non-currency/currency factor returns, • Vm is the currency return covariance matrix. The main problem with this simple approach is that currencies can be noisy. Figure 39 shows a scatter plot of Euro returns versus GBP returns. Both of these are liquid, heavily-traded currencies, and it is manifest that there is a significant statistical relationship between them. However, if we look at Figure 40, the situation is rather different. This shows the scatter plot of Indian versus Pakistani Rupee currency returns. Despite the close proximity of the two geographically, the relationship between the two is non-existent. The problem with currencies is much the same as with individual assets themselves. Some are heavily traded and liquid, leading to a high degree of signal to noise and easily-measurable statistical relationships. Others are illiquid, some are also pegged, and any statistics we attempt to derive from these are likely to be noisy and unreliable, due to the lack of meaningful information in their returns. Axioma 60 Axioma RobustTM Risk Model Version 4 Handbook DRAFT Figure 39: Scatter plot of EUR and GBP returns Figure 40: Scatter plot of PKR and INR returns Axioma 61 Axioma RobustTM Risk Model Version 4 Handbook DRAFT Figure 41: A Selection of Realized Currency Correlations Figure 41 demonstrates this. We compute realized 250-day correlations for the Euro and GBP, and contrast these with those for PKR and INR returns. There is clearly a gulf between the two pairs in terms of meaningful information. The correlation between the EUR and GBP is consistent and significant over time. The correlation between PKR and INR is little more than noise, and one would be quite justified in drawing a straight line through the origin to represent what the relationship really is. Noisy currencies equal noisy correlations: we see similar patterns amongst other such noisy currencies and also between currencies and non-currency factors, the danger, of course, being that we will get apparently significant but entirely spurious correlations from time to time, which may prove detrimental to a user’s investment strategies. How can we improve the correlations and reduce the noise? We model the currency returns via a Principal Components Analysis (PCA) statistical model. We thus decompose C = ZG + θ, (10) where: • C ∈ <m×T is a time-series matrix of currency returns, cm (t), t = 1, . . . , T , • Z ∈ <m×p is a matrix of exposures to p statistical factors, • G ∈ <p×T is a time-series matrix of p statistical factor returns, and • θ ∈ <m×T is the residual from the PCA regression. Given a time-series of currency returns we compute (10) by standard PCA decomposition. We derive the PCA structure via an estimation universe of twelve currencies. These are: Axioma 62 Axioma RobustTM Risk Model Version 4 Handbook • The big seven: USD, JPY, GBP, EUR, CHF, AUD, CAD. These account for the vast majority of currency trades. • A further six from Asia and the developing world: BRL, MXN, PLN, TWD, SGD, ZAR. These are some of the next most heavily traded.13 The above currencies are chosen to minimise residual returns structure: some other currencies are actually more heavily traded than the latter six, (e.g. SEK, NOK, NZD), but are so highly correlated with currencies already in the estimation universe that for modeling purposes they scarcely count as separate entities. Then, we allow our PCA model to have twelve statistical factors; thus we have the same number of factors as currencies in our estimation universe, ensuring that the estimation universe currencies are modeled exactly by the PCA, and the non-estimation universe exposures can be thought of as sensitivities to the estimation universe currency returns. To summarise the above: we model Cestu = Zestu G, (11) where Cestu ∈ <12×T is the subset of estimation universe currency returns. Here (11) is an exact relationship (no residual). We then “back out” non-estimation universe exposures, Znonest via Znonest = Cnonest GT GT G −1 , (12) DRAFT where Cnonest ∈ <(m−12)×T is the subset of non-estimation universe currency returns. We combine our “estu” and “non-est” currency components to create the combined currency model (10). We assume that the residual currency returns θ are uncorrelated, both across currencies and across currency/non-currency pairs, hence cov(θ, fq ) = 0, cov(θ, ZG) = 0 and var(θ) is diagonal. We may thus rewrite covariance matrix (9) as T Xq Vq cov(fq , ZG) N var(r ) = Xq Xm (13) T + ∆. cov(ZG, fq ) var(ZG) + var(θ) Xm By decomposing currency returns and imposing structure onto them, we force at least some of the noise into the residual, uncorrelated component, θ, and the resulting covariance matrix will exhibit smoother and more meaningful correlations. Figure 42 shows the effect of our currency model. For the correlation of EUR versus GBP, there is little overall difference: this is one of the cases where signal to noise in the returns is high, and so the only differences between our realised and our model numbers are down to the differing parameters and settings used to compute covariances. However, in the case of the correlation between PKR and INR, we see we have significantly reduced the noisy and almost certainly spurious jumps. Our overall correlation is much closer to zero, reflecting the lack of meaningful information in the two sets of returns. 9.2 Numeraire Transformation As it stands, our combined currency matrix gives a measure of risk from the point of view of a particular base numeraire such as USD or Euro. However, any currency could be considered as a numeraire, and risk model users who trade in other currencies may wish to look at things from the perspective of their own currency. The simple but costly solution to this is to compute a number 13 The more observant will have spotted that there are actually thirteen currencies in total; however, one of these is always the numeraire, hence its returns are zero, and it is automatically excluded from the PCA decomposition Axioma 63 DRAFT Axioma RobustTM Risk Model Version 4 Handbook Figure 42: A Selection of Realized vs. Model Currency Correlations of different covariance matrices, one for each desired currency. This, however, is inflexible and wasteful. By a simple transform we may transform a covariance matrix based on one currency to the perspective of any other that we choose. Our aim is to transform the currency returns from currency-j perspective to that of, say, currency k. Using the log-return approximation for currency returns and considering our asset to be the currency return for currency j, we form the expression x x x rik ≈ rij + rjk We rebase the return to currency i, cij , from j to currency k as cik = cij − ckj Thus everything can be written in terms of already-known quantities (j-numeraire currency returns). The asset returns in terms of our new base-currency, k, are fq rk = Xq Xm k +u fm k , are given by The re-based currency factor returns, fm c1j − ckj .. k fm = . cmj − ckj Axioma 64 Axioma RobustTM Risk Model Version 4 Handbook The returns equation can be rearranged as: rk = Xq fq +u X̂m fm where the transformed currency exposure matrix is given thus 0 ··· 0 1 0 ··· .. .. .. .. X̂m = Xm Im − . . . . 0 ··· 0 1 0 ··· The second term is simply a matrix of N by m the position of currency k. If we write 0 ··· .. Tm = . 0 ··· 0 .. . 0 zeros, save with a column of ones corresponding to 0 1 0 ··· .. .. .. . . . 0 1 0 ··· 0 .. . 0 Then our transform may be written as Iq 0 fq rk = Xq Xm +u 0 Im − Tm fm DRAFT And so, from the above equation for returns, it is simple to see that we may construct our covariance matrix from the existing factor covariance matrix. We could apply the above transformation to the asset exposure matrix, considering the factor covariance matrix as fixed, or we could view the exposures as fixed and compute a new factor covariance matrix as: Iq 0 Iq 0 F̂ = F T 0 Im − Tm 0 Im − Tm This concludes our overview of the treatment of currencies. For more detail, refer to Axioma (2017a) Axioma 65 Axioma RobustTM Risk Model Version 4 Handbook 10 Specific Risk Calculation The final piece of the risk model is the asset specific risk. This is the volatility of each asset’s residual returns from the factor regression. Given the assumption of zero-correlation between specific returns we assume a diagonal specific returns covariance matrix. Thus, only n specific variances require calculation. In their simplest form, these are estimated directly from the specific returns, applying exponential weighting, with a half-life typically equal to that used for the factor variances for consistency and a rolling history of one to two years depending on the model: T σi2 = 1 X wt (ui,t − ūi )2 T −1 t=1 The story doesn’t quite end there. There are two problems to be overcome: • Outliers: whereas factor returns tend to be fairly well-behaved, specific returns are by definition noisy, with many outliers, so a modicum of care must be taken when estimating historical variances. To prevent sporadic extreme returns from skewing the specific risk, Axioma models employ our MAD-based outlier detection to identify extrema. Rather than using a multi-dimensional or cross-sectional approach, we examine each asset individually, computing the MAD along the time-dimension, and winsorising all returns within a certain number of MADs. DRAFT • Deficient histories. IPOs and illiquid assets will not have a fully-populated returns history, rendering their historical specific variances noisy and unreliable (and also outlier-prone). To remediate this, we first compute each asset’s specific risk in the usual way (as above). We then take the subset of “good” assets (call these σ02 ), i.e. those with sufficiently complete returns histories, and use these to compute a cross-sectional model of specific risk, viz. σ02 = Z0 v where Z0 is the subset of exposures corresponding to these assets. These exposures are to quantities that are always known for every asset, such as size, sector and region. Then, given the exposure matrix for all assets Z, compute an imputed specific risk s2 for each asset via s2 = Zv. Then, if T is the maximum number of observations required for specific risk, and t is the asset’s actual number of non-missing observations, the asset’s final specific risk is computed as the weighted sum 1 σ̂ 2 = tσ 2 + (T − t) s2 . T Thus, the greater the number of observations, the greater the influence of the “true” historical specific risk. Axioma 66 Axioma RobustTM Risk Model Version 4 Handbook 11 Predicted versus Realised Betas We define a predicted beta thus: βP = hT V h B , hTB V hB where h is any portfolio, from a single asset to the entire universe, hB is the benchmark against which we are computing the beta, and V is the asset covariance matrix given by a risk model. We define a historical beta as the solution to a time-series regression: rh (t) = βH rB (t) + (t), where rh (t) is the time series of returns to portfolio h and rB (t) the time-series of returns to benchmark B. And so, the historical beta is given by βH = Cov(rh , rB ) . V ar(rB ) DRAFT While, on paper, the two forms of beta are equivalent, the use of a structured risk model versus time-series regression can yield very different results at an asset level. We discussed some of the issues involved previously, in section 6.1, so we will not repeat them here. At a portfolio level, however, the differences tend to average out, and the various flavours of beta generally agree over time. In Figure 43 we see the realised beta computed using two-years of backwards-looking weekly Figure 43: Predicted (model) and realised betas for Secondary Emerging Markets Benchmark returns for the FTSE Secondary Emerging portfolio benchmarked against the FTSE Emerging. In addition to the realised beta, we show the predicted (model) beta computed via the four flavours of the Global WW4 model. Of course, the numbers are not identical, but track roughly the same level over time, with some betas more responsive than others. Axioma 67 Axioma RobustTM Risk Model Version 4 Handbook 12 Modeling of DRs and Cross-Listings We here consider all cases of ADRs, GDRs, foreign and cross-listed assets, assets whose domicile can be said to be outside the country of quotation (e.g. Israeli assets quoted in the US, South African stocks quoted in London) and any similar cases not included in the above. Henceforth we refer to these collectively as DRs. Typically, alongside the market on which a DR is quoted, there can be said to be an underlying market, and in many, though not all, cases, a corresponding underlying asset. 12.1 Modeling DR Returns We define the numeraire-return of any asset as rN = rX + cN X, (14) where • N is the model numeraire currency, • X is the asset’s local currency, i.e. that in which it is quoted, • rX is the asset’s local return, • cN X is the return in N for an investor trading in currency X. DRAFT This relationship is exact for log-returns, but is still a very close approximation for high-frequency (such as daily) returns. We make use of log-return relationships extensively hereon. If we define X to be the currency of an underlying asset, and Y to be the currency of quotation of its DR, then we model the DR’s return in terms of the underlying currency, viz. rY = rX + cYX . (15) Thus, we consider the DR return to be composed of a return expressed in the underlying asset’s currency, plus the return in currency Y of purchasing currency X. If we ignore effects due to differing market-hours and liquidity, this return, rX , should be equal to that of the underlying asset. If it were not, there would be arbitrage opportunities available to the purchaser of either the DR or the underlying. 12.2 Currency Exposure From the above we infer that a DR has exposure to two currencies: its local currency Y , and that of the underlying asset, X. If we write the DR return in terms of numeraire N , combining (14) and (15) we get rN = rY + cN Y, = rX + cYX + cN Y. (16) The DR would appear, therefore, to have exposure to two currency components. However, in order to fit this into the framework of a regional model, all currency returns must be expressed in terms of the numeraire. Using properties of logarithmic returns, we know Y cYX = cN X + cN N = cN X − cY . Axioma (17) 68 Axioma RobustTM Risk Model Version 4 Handbook This says that for a holder of Y to buy currency X is equivalent to a holder of numeraire N doing the following: 1. Buying currency X, 2. Going short on Y . Therefore, combining (16) and (17) the DR numeraire-return becomes N N rN = rX + cN X − cY + cY = rX + cN X. (18) We see that from a numeraire-perspective, the DR return is the same as the local return of the underlying asset. We therefore give DRs exposure to the underlying currency rather than the currency in which the DR is quoted. 12.3 Market Exposure DRAFT Since we have translated our DR return’s currency to that of the underlying asset, it would seem logical that we give the DR exposure to the underlying asset’s market rather than the market on which the DR is quoted. Thus, an ADR on a British stock will now have exposure to GBP and the British Market, rather than USD and the US market. To pick an example at random, the ADR of East Japan Railway, Figure 44 shows the betas relative to the Japanese (home) and US (trading) markets. While the beta to the US market is Figure 44: Market Betas of East Japan Railway ADR distinctly non-zero, it is much lower in magnitude than that to the Japan market. Thus, were we to Axioma 69 Axioma RobustTM Risk Model Version 4 Handbook assign the USA as this asset’s market - i.e. its exposure in the risk model, we would likely introduce significant specification error. We can observe the effect of introducing such specification error and the harm it causes by studying the behaviour of Royal Dutch Shell. This company has listings (A and B shares) on the UK and Dutch markets, leading to two possible ways of treating them: 1. Assign the UK listings to the UK market, and the Dutch listings to the Netherlands. 2. Assign all listings to one of the two markets. DRAFT When we look at the betas of the various listings, any of them could be exposed to either the UK or Netherlands, betas are significant for both markets. However, if we look at the tracking errors between the Dutch and UK listings using each of these approaches in a risk model, we get Figure 45. It is clear, when we compare with the realised, model-independent, tracking error, that assigning Figure 45: Tracking Errors for UK vs. NL RDS listings different markets to the various listings is exactly the wrong thing to do. The specification error induced by their differing factor structure causes severe over-estimation of the tracking error (the Multiple Markets case). Effectively, the model structure is saying that the listings are more different than is, in fact, the case. Assigning all listings to just one market (whether the UK or Netherlands) yields a tracking error of the correct order of magnitude (Single Market). 12.4 Issuer Specific Covariance It is not hard to see that assets linked by a common issuer have rather more than model factors in common. In addition to the characteristics modelled by their common factor structure, their Axioma 70 Axioma RobustTM Risk Model Version 4 Handbook specific returns exhibit higher degrees of correlation than do those of assets that are merely similar. No model could or should attempt to capture all of this behaviour via the factors alone, and so, for those assets that we deem to be linked, we break the condition that an asset’s specific returns should not be correlated with any other, and allow specific correlations between groups of linked assets. The specific covariance matrix is no longer diagonal; rather we allow a scattering of non-zero off-diagonal elements corresponding to pairs of linked assets. We call these linkages Issuer Specific Covariance, or ISC. Therefore, in addition to an asset’s specific risk, we need to model the specific correlations between pairs of linked assets.. It might be thought that all that is required to compute the correlation between specific returns of assets A and B is to take each asset’s history of specific returns and, assuming sufficient length and liquidity, compute the correlation between the two as covAB correlAB = σA σB DRAFT where covAB is the sample covariance between the two. But there’s more to it than meets the eye. We shall explain... Figure 46: Scatter plot of HSBC vs. Barclays returns Step back to total asset returns for now. And consider the relationship between the returns of two merely similar assets. Figure 46 shows the relationship between the returns of HSBC and Barclays - two large UK banks. We see that there is a great deal of commonality in the returns, and the overall correlation between them is 69%. Now we look at a corresponding plot for BP and its ADR (Figure 47). A similarly significant relationship exists between the two, which is unsurprising, given that they share the same issuer and are, essentially, the same entity. The correlation between the two is 73%, so very similar to that between HSBC and Barclays. Axioma 71 Axioma RobustTM Risk Model Version 4 Handbook DRAFT Figure 47: Scatter plot of BP vs. ADR returns So, at the level of raw daily asset returns, both pairs seem to exhibit similar depths of relationships. However, when we look at long-run behaviour, as evinced by their cumulative returns, things become somewhat different. Figure 48 shows the cumulative returns for HSBC and Barclays over a period of several years. It is clear that they are quite different, despite the high correlation between the two. Again, this is not surprising: it only takes a small difference daily for the long-term cumulative returns to diverge. Figure 49, on the other hand, shows the cumulative returns to BP and its ADR. They are, to all intents and purposes, identical. And yet the correlation between the two was little different to that between HSBC and Barclays. How can this be? 12.5 Cointegration The reason for this more profound relationship between the assets linked by issuer is found in the property of cointegration. While the price differences between the two may be large on a given day, due to noise, market conditions etc., over the long term, the price of one will remain, on average, a fixed distance (perhaps zero, perhaps not) from the price of the other. Even when a price premium is present, there is still a tendency for the return to revert, as premia tend to remain relatively constant over time, and so their effect on returns appears to be negligible. When two assets behave in this way, they are said to be cointegrated. Technically, two time-series, xt and yt are said to be cointegrated if there exists a parameter β such that ut = yt − βxt is a stationary process. The time-series ut is stationary if it has a constant mean, variance, and autocorrelation through time. In plain terms, it is as if there is a gravitational attraction, or elastic band, between the two assets so that, despite the small-scale noise affecting their daily returns, this attraction prevents their Axioma 72 Axioma RobustTM Risk Model Version 4 Handbook DRAFT Figure 48: Cumulative Returns of HSBC and Barclays Figure 49: Cumulative Returns of BP and its ADR Axioma 73 Axioma RobustTM Risk Model Version 4 Handbook prices, and hence, their long-term returns, from drifting apart over time. This is what differentiates the behaviour of linked assets from merely similar assets: small differences in returns do not blow up over time into large differences in prices - the differences remain roughly constant. To test for cointegration, we consider the Engle-Granger (Engle and Granger, 1987) test on the log of prices. Consider the following time-series regression: log pt = β log pdt + et , (19) where pt and pdt are the prices of the underlying and DR, respectively, and et is an error term. The DR and underlying are cointegrated if et is stationary. The augmented Dickey-Fuller (ADF) test is frequently used to test the stationarity of et . For now, we assume that et is stationary and consider the implications on the relative returns of the DR and underlying asset. Using (19), we get the following: log pt − log pt−1 = β log pdt + et − log pt−1 pt ⇒ log = β log pdt + et − β log pdt−1 + et−1 pt−1 pd = β log d t + et − et−1 . pt−1 (20) (21) (22) If we apply (22) recursively, we get pd pT = β log Td + eT − e0 . p0 p0 DRAFT log (23) Since et is stationary, eT − e0 has a mean of zero and a variance that is roughly twice the variance of et and independent of T . Equation (23) says that the cumulative return of the underlying over T periods equals β times the cumulative return of the DR plus the sum of residuals at time T and time zero. If we make the additional simplification that β = 1 (in practice this is a reasonable assumption), then we see that the difference in cumulative return between the underlying and DR may be written as cT − cdT = eT − e0 . (24) Defining the standard deviation of et as σ and assuming that the autocovariance of et is zero, then this difference is distributed as N (0, 2σ 2 ). The critical point is that the cumulative return over a period has a constant variance regardless of the length of the period. Now let’s consider the realized tracking error, σa , between the two assets. Let rt = log pt /pt−1 and rtd = log pdt /pdt−1 be the logarithmic return of the underlying asset and DR, respectively. Then (22) says that rt = rtd + εt Axioma 74 Axioma RobustTM Risk Model Version 4 Handbook where εt = et − et−1 . The daily active variance is given as σa2 = E ε2t = T X (et − et−1 )2 (25) (26) t=1 −1 2 TX = E e20 + e2T + e2t T (27) = 2σ 2 . (28) t=1 This is a daily variance figure; if we make the standard scaling assumption to convert this figure to a period-variance, we get the result σa2 = 2T σ 2 . (29) DRAFT According to this expected tracking error, the difference between the cumulative log return at time T is N (0, 2T σ 2 ). However, according to equation (24), the difference between the cumulative log return at time T is N (0, 2σ 2 ). And here’s the rub: the assumption by which we scale daily active variances to weekly, annual or any other period T (by multiplying by T ) does not hold in the case of cointegrated assets. In plain terms, this is saying that the overall magnitude of active returns, be they daily, weekly, monthly or any other frequency, will be the same: one should not scale them or their associated variances. For an arbitrary pair of assets, the magnitude of the differences between their weekly returns will indeed be greater than that of the daily returns, and the difference in monthly returns greater still. For cointegrated assets, (into which category our DRs and underlyings appear to fall) the theory states that this is not the case. Figure 50: Tracking Errors for Barclays vs. HSBC Axioma 75 Axioma RobustTM Risk Model Version 4 Handbook DRAFT Figure 50 illustrates the case where scaling of variance is appropriate. We compute tracking errors for HSBC vs. Barclays using daily, weekly and monthly active returns. The raw tracking errors are shown as dotted lines, and we see that their levels are quite different, obviously owing to their being computed using data of differing frequencies. We then scale them all to an annual level (by multiplying the variances by 250, 52 or 12). It may then be seen that the overall level of risk is roughly the same over time. Figure 51: Tracking Errors for BP vs. ADR We then perform the same operation on our cointegrated pair, BP and its ADR. Now we see that the raw, unscaled tracking errors are all of roughly the same magnitude: cointegration has ensured their independence of frequency. When we then scale in the usual way, the tracking errors all diverge. This is because we should not have scaled the numbers at all: the true tracking error is given by the raw tracking errors at any frequency. 12.6 Some assets are more cointegrated than others While our ADR versus underlying example exhibited fairly textbook cointegration behaviour, we cannot assume that every variety of linked asset will do the same. Linked assets fall mostly into the following categories: • ADRs, GDRs and similar versus their underlying stocks. • Varieties of common stock: A, B, C shares, Bearer and Registered etc. • Common versus Preferred stock. • Chinese A versus H shares. There are other varieties, and combinations of the above are also common, but these are the most numerous. There is more than one test of cointegration, but we use the Augmented Dickey Fuller test for our purposes. We will skip details of the test, but for a given pair of assets, it returns two Axioma 76 Axioma RobustTM Risk Model Version 4 Handbook DRAFT important values: a measure of significance (usually negative) and a critical value. At significance values below this critical value, then we reject the null hypothesis (that cointegration does not occur). At an asset level, returns being what they are, we expect to be cursed with the usual vast numbers of false positives and negatives. Therefore, we perform our tests over all combinations of linked assets, and take an average of the significance statistic for each type of combination (e.g. DRs/Underlyings, Common/Preferred). We then use this to derive some rules of thumb as to when cointegration does and does not take place in order to guide our choice of treatment for linked assets. We perform this test over repeated intervals to build up a picture of the evolution of relationships. We look at our results for some of the largest and most important groupings. Figure 52: Cointegration between DRs and their underlyings Figure 52 shows the level of cointegration over time between DRs of various forms and their underlying stocks. We have divided the underlying assets into common and preferred. Also note that we have rescaled the significance statistic so that a value of one or greater represents likely cointegration. We see that both common and preferred stock tend to show significant cointegration over time with their associated DRs. Figure 53 shows the results for the relationship between common/common and common/preferred types. We see that types of common stock exhibit cointegration amongst themselves, but not with associated preferred stock. This seems intuitively reasonable as a preferred stock is a quite different beast to a share of common stock. Figure 54 shows the degree of relationship between Chinese A shares and their related H shares. While we cannot say cointegration exists here, the relationship does seem to be deepening over time. This would correspond with the opening up of the Chinese domestic market to international investors. It may be that, in a few years from now, the situation will have changed, and these stocks will show clear evidence of cointegration. Finally, Figure 55 shows the overall results of our studies for all asset types in our equity models. The solid lines represent evidence of cointegration between asset types, whilst the dotted lines show borderline cases. We note that some of these are based upon relatively small samples. However, we feel confident in stating some broad rules of thumb for the most important combinations. Axioma 77 Axioma RobustTM Risk Model Version 4 Handbook DRAFT Figure 53: Cointegration between Common and Preferred Stock Figure 54: Cointegration between Chinese A and H Shares Axioma 78 Axioma RobustTM Risk Model Version 4 Handbook Figure 55: Summary of Cointegration Relationships across all asset types Cointegration does seem to be the norm in: • DRs and their underlying assets, of whatever type. DRAFT • Various types of common stock. It does not seem to be the norm for: • Common versus preferred stock. • Chinese A versus H shares (yet). 12.7 Treatment of Style Exposures We now have most of the pieces we require for our modelling. We have dealt with country and currency exposure, together with showing that in many cases, the prices and hence returns of linked assets are bound by a cointegration relationship. It only remains to detail how we treat the other important set of factor exposures: the style factors, and then how we apply our rules to the various families of linked assets. We may divide style exposures into two kinds: issuer -based and issue-based. Issuer-based styles are those constructed from company characteristics and include size, value, profitability growth, leverage etc. Issue-based styles use market data, such as returns, and typically include measures such as momentum, various betas, volatility and liquidity. Since they represent company-wide characteristics, it makes sense that issuer-based exposures should be identical across all assets within a common issuer. Ideally, this should arise as a natural consequence of the factor definitions and the data. Where it does not, we can replace any outliers with a common value. Treatment of issue-based exposures is a little more complex: • If two assets’ returns series are not cointegrated, then letting the data decide would seem the most reasonable option - i.e. we compute exposures individually for each asset using its own reported data. Axioma 79 Axioma RobustTM Risk Model Version 4 Handbook • If two assets are cointegrated, then we are going to want to make sure their factor exposures match exactly. Since, as we shall see, we attempt to model the cointegration behaviour in the specific return correlations, then any differences in the common factor contribution are likely to destroy the cointegration in the specific returns. Thus, when cointegration takes place amongst a set of assets, we match their style exposures by taking a weighted mean of the available exposures for each asset and assigning this mean to all. The weights we use are based upon size, liquidity, age of asset and quality of returns history. Such an approach is preferable to assigning a “master” asset from which other assets take their exposures, as: 1. It is by no means clear how this master asset would be chosen. It is not always one particular asset type that is the largest and most liquid. 2. We introduce a discontinuity: were the master asset to change (as will inevitably happen), we could see sudden jumps in exposures, owing to the new master having markedly different values. DRAFT For these reasons, a weighted average is likely to be smoother and less arbitrary. Figure 56: Style Exposures for a Set of Linked Assets Figure 56 shows how the exposures look for a randomly-selected company with four listings. These consist of two Chinese A-shares, an H-share and associated ADR. This group is split further into two subgroups - the two A-shares form one group, and the H-share and its ADR form the other. We see that all of the issuer-based exposures (Value, Profitability, Growth etc.) match for all listings. However, the issue-level exposures such as Momentum, Volatility etc. match within, but not between, the subgroups. Axioma 80 Axioma RobustTM Risk Model Version 4 Handbook And thus, we are almost at the finish. We have determined our factor structure depending on whether or not cointegration is taking place between asset types. Where it does, we match the factor contribution exactly, as so the cointegration is passed through to the specific returns. We then use this relationship to derive our Issuer Specific Covariance (ISC). 12.8 The Cointegration Algorithm for Specific Returns We now show how we use the theory of cointegration to model the relationships between linked assets. Because we have matched all common factor components, we guarantee, so long as two assets’ total returns are cointegrated, that their specific returns will also be so. Applying equation 19 recursively leads us to rt = βrtd + εt , (30) where rt = log pt /pt−1 , rtd = log pdt /pdt−1 , and εt = et − et−1 , and pt and pdt are the prices of the underlying and DR (for example), respectively, and et is stationary. If we assume that εt is uncorrelated with rtd , then we may write (31) DRAFT var(r) = β 2 · var(rd ) + var(ε), and cov(r, rd ) = β · var(rd ), (32) and this leads us to our algorithm, viz: given two sets of returns, rt and rtd , where we assume for the sake of argument that rtd is the larger/more liquid of the two and var(rd ) has been computed directly: 1. ... Using the daily specific returns, compute corresponding series of pseudo log prices, log pt and log pdt , 2. Compute β and et from the time-series regression (19), 3. Compute var(r) from (31), 4. Compute the specific covariance between the two assets from (32). 12.9 The Final Bit Finally, we summarise our approach to the various types of linked assets. When cointegration is show to exist, we: • Clone factor exposures across all relevant assets. • Compute their specific covariance using a cointegration approach. Where it is unproven, we: • Let the factor exposures fall as they may. Axioma 81 Axioma RobustTM Risk Model Version 4 Handbook • Compute their specific correlations using simple correlation of specific returns. DRAFT The exceptions we make to the above are any instances where linked assets are forced to have different country exposures, If this is so, then there is no purpose to applying the first approach; we apply the second. We also maintain the ability to override any one approach for specific sets of assets. Lastly, we illustrate our approach with a couple of examples. First we look at a case where cointegration does occur - in the tracking error between DRs and their underlying assets. Figure Figure 57: Unscaled Tracking Error: DRs versus Underlyings 57 shows the realised tracking error between a portfolio of DRs benchmarked against a portfolio containing the corresponding underlying assets, one-for-one. All are equally-weighted. Tracking errors are computed using daily, weekly and monthly returns, but are raw - no scaling (to an annualised number, for instance) has taken place. Note that, despite the fact that the numbers are not scaled, the overall magnitude is roughly the same between the three different varieties of tracking error. When we do scale the numbers, we get the results shown in Figure 58. It is clear that the scaled numbers no longer match by any stretch of the imagination. Thus, scaling, as we noted before, is inappropriate when cointegration occurs. The correct tracking error, therefore, is any one of the unscaled numbers from Figure 57. Comparing the unscaled active specific risk with the active specific risk computed by the model, we get Figure 59. We see that our cointegration-based specific covariance method gives an accurate measure of the risk. Were we to use mere correlation to compute the Issuer Specific Covariance we would see much higher numbers. For our second example we compare a portfolio of common stocks benchmarked against a portfolio containing the corresponding preferred stock, again one-for-one. This time, when we look at the unscaled realised tracking errors (Figure 60), we see that each is of completely different magnitude - indication that these assets are not cointegrated, however highly correlated they may be. When we scale them (to annualised tracking errors), as shown in Figure 61, the numbers do match. Thus we conclude that, cointegration does not seem to occur in this case, and the scaled Axioma 82 Axioma RobustTM Risk Model Version 4 Handbook DRAFT Figure 58: Scaled Tracking Error: DRs versus Underlyings Figure 59: Realised versus model active specific risk: DRs versus Underlyings Axioma 83 Axioma RobustTM Risk Model Version 4 Handbook DRAFT Figure 60: Unscaled Tracking Error: Common versus Preferred Figure 61: Scaled Tracking Error: Common versus Preferred Axioma 84 Axioma RobustTM Risk Model Version 4 Handbook DRAFT numbers from Figure 61 are a more correct rendition of the truth. Figure 62: Realised versus model active specific risk: Common versus Preferred Finally, in Figure 62 we compare the realised active specific risk with that given by the model. We see that in this case, where we did not apply the cointegration method to compute the Issuer Specific Covariance, the model gives an accurate picture. Had we applied cointegration in this case, our model numbers would have looked more like those of Figure 59, i.e.typically less than one percent. Axioma 85 Axioma RobustTM Risk Model Version 4 Handbook A Appendix: Constrained Regression Our problem is as follows: calculate min kAx − bk2 subject to the linear constraint Bx = d, where A ∈ <n×k , B ∈ <p×k . We assume the following: • n≥k≥p • B has rank p • A has rank q where p + q ≥ k. We simultaneously decompose A and B via the generalized singular value decomposition, thus: A = U ΣX −1 B = V ∆X −1 and partition the above as follows: DRAFT A = Up Uk−p −1 0 Σp Xp 0 Σk−p Un−k −1 , Xk−p 0 0 B = V ∆p Xp−1 0 −1 . Xk−p We assume without loss of generality that the singular values of A have been sorted so that all zero singular values lie within the block Σp . If we make the simple transformation of variables yp −1 X x=y= yk−p then we may rephrase the problem as min kUp Σp yp + Uk−p Σk−p yk−p − bk2 such that V ∆p yp = d. There are two ways in which we may attempt to solve this. A.1 Exact Solution Note that the constraint is now uniquely solvable, viz. T yp = ∆−1 p V d, This may be substituted into the minimization equation, resulting in the unconstrained problem min Uk−p Σk−p yk−p − b̂ Axioma 2 86 Axioma RobustTM Risk Model Version 4 Handbook where T b̂ = b − Up Σp ∆−1 p V d This is easily shown to have the solution T yk−p = Σ−1 k−p Uk−p b̂ And so, we arrive at the (almost) final solution T ∆−1 p V d y= T −1 T Σ−1 k−p Uk−p (b − Up Σp ∆p V d) We convert back to the original variable, x by hitting the above on the left-hand side with the orthogonal transformation X, but there’s no need to do that right now. Note that if p + q = k, i.e. we have exactly the number of constraints needed to compensate for A’s rank-deficiency, then the solution becomes −1 T ∆p V d y= T Σ−1 k−p Uk−p b A.2 Dummy Asset Approach DRAFT Adding the constraints as dummy assets is tantamount to solving the following unconstrained problem b A min x− . λd 2 λB And the question we ask ourselves is, how closely does the solution to this problem approach the exact solution? Making the same transformation of variables as before, we are effectively solving yp b Up Σp Uk−p Σk−p − . min λd 2 λV ∆p 0 yk−p We solve this in the usual manner, viz. Σp UpT λ∆p V T T Σk − pUk−p 0 y= Up Σp Uk−p Σk−p λV ∆p 0 −1 Σp UpT λ∆p V T T Σk−p Uk−p 0 b . λd After working through the linear algebra, this yields " # −1 −1 Σ2p + λ2 ∆2p Σp UpT b + λ2 Σ2p + λ2 ∆2p ∆p V T d y= . T Σ−1 k−p Uk−p b Note that 2 −1 Σ2p + λ2 ∆p 1 = 2 λ Σ2p + ∆2p λ2 !−1 , and so, asymptotically, we have that T ∆−1 p V d y→ as λ → ∞. T Σ−1 k−p Uk−p b This is equal to the exact solution in the event that either p + q = k or d = 0. Axioma 87 Axioma RobustTM Risk Model Version 4 Handbook A.3 Conclusion If we have exactly the same number of constraints as the rank-deficiency of our exposure matrix, or all constraints are equal to zero, then for a suitably large dummy asset weight (the Lagrange multiplier), our solution will be consistent (in the case of ordinary least squares). B DRs and Returns-Timing One unwanted side effect of the fact that we give DRs exposure to the underlying market is to induce structure in the DR’s specific return. To take an example: BP’s ADR trades on the US market, which closes hours after London has closed. Therefore, the ADR contains a component of return due entirely to the US market. But, since we give the ADR exposure to the British market, this timing-difference component will tend not to be picked up anywhere in the common factor returns. To see this, assume that we have converted the DR’s return to the underlying currency. We define the underlying (i.e. home) market of DR i as im , and its trading market as iq . We model the return of the DR as ri = g + fim + Ψi + ∆im ,iq + ui , (33) where • fim is factor return to market m, DRAFT • g is the global market return, • Ψi is the remaining factor structure (styles, industries), of no relevance here, • ui is asset i’s specific return, • ∆im ,iq is the difference in market return between markets m and q. Unfortunately, this DR returns-timing component ∆im ,iq is not taken account of in the model factor regression, since the DR has exposure only to the underlying home market. It pollutes the residual return, rendering it not truly asset-specific: ûi = ui + ∆im ,iq . (34) ui is the true specific return, but ûi is what we’re actually left with from the regression. To deal with this structured residual component, we write the returns-timing adjustment factor for market m relative to a reference market, R, (such as the USA) as ∆im ,iR . We then note that, in the case of log returns, the following holds: ∆im ,iq = ∆im ,R − ∆iq ,R . (35) We recompute the DR’s return as r̂i = ri − ∆im ,R + ∆iq ,R , and we have, in effect, made the DR return truly local; the timing-component is removed from the total return; the factor return is genuinely that pertaining to the home market, m, whilst the specific return should by construction contain no structure due to timing differences. Axioma 88 References Axioma. Axioma macroeconomic risk model handbook. Technical report, 2013. Axioma. Currency modeling in axioma robust risk models. Technical report, 2017a. Axioma. Missing data, extreme data. Technical report, 2017b. Axioma. Testing risk models. Technical report, 2017c. Axioma. Return-timing: A solution to market asynchronicity. Technical report, 2017d. John Y. Campbell, Andrew W. Lo, and A. Craig MacKinlay. The Econometrics of Financial Markets. Princeton University Press, 1997. Gregory Connor. The three types of factor models: A comparison of their explanatory power. Financial Analysts Journal, 51:42–46, 1995. Gregory Connor and Robert A. Korajczyk. Risk and return and an equilibrium APT. Journal of Financial Economics, 21:255–289, 1988. Giorgio De Santis, Bob Litterman, Adrien Vesval, and Kurt Winkelmann. Covariance matrix estimation. In Modern Investment Management: An Equilibrium Approach, chapter 16, pages 224–248. John Wiley & Sons, Inc., 2003. DRAFT Robert F. Engle and Clive W.J. Granger. Co-integration and error correction: Representation, estimation, and testing. econometrica. Econometrica, 55(2):251–276, 1987. John Fox. Web Appendix, An R and S-PLUS Companion to Applied Regression. Sage Publications, 2002. URL http://socserv.mcmaster.ca/jfox/Books/Companion/appendix.html. William H. Greene. Econometric Analysis. Prentice Hall, fifth edition, 2003. Richard C. Grinold and Ronald N. Kahn. Active Portfolio Management, pages 37–66. McGraw-Hill, 1995. Richard A. Johnson and Dean W. Wichern. Applied Multivariate Statistical Analysis. Prentice Hall, fourth edition, 1998. Whitney K. Newey and Kenneth D. West. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55(3):703–708, 1987. Stephen A. Ross. The arbitrage theory of capital asset pricing. Journal of Economic Theory, 13: 341–360, 1976. Peter Zangari. Equity factor risk models. In Modern Investment Management: An Equilibrium Approach, chapter 20, pages 334–395. John Wiley & Sons, Inc., 2003. Axioma Robust is a trademark of Axioma, Inc. Americas: 212-991-4500 EMEA: +44 (0)20 3621 8241 Asia Pacific: +852-8203-2790 Atlanta Axioma, Inc. 400 Northridge Road Suite 550 Atlanta, GA 30350 United States Phone: 678-672-5400 Chicago Axioma, Inc. 2 N LaSalle Street Suite 1400 Chicago, IL 60602 United States Phone: 312-448-3219 San Francisco Axioma, Inc. 201 Mission Street Suite #2150 San Francisco, CA 94105 United States Phone: 415-614-4170 London Axioma, (UK) Ltd. St. Clements House 27-28 Clements Lane London, EC4N 7AE United Kingdom Phone: +44 (0)20 3621 8241 Frankfurt Axioma Deutschland GmbH Mainzer Landstrasse 41 D-60329 Frankfurt am Main Germany Phone: +49-(0)69-5660-8997 Paris Axioma (FR) 19 Boulevard Malesherbes 75008, Paris France Phone: +33 (0) 1-55-27-38-38 Geneva Axioma (CH) Rue du Rhone 69 2nd Floor 1207 Geneva, Switzerland Phone: +41 22 700 83 00 Hong Kong Axioma, (HK) Ltd. Unit B, 17/F 30 Queen’s Road Central Hong Kong China Phone: +852-8203-2790 Tokyo Axioma, Inc. Japan Tekko Building 4F 1-8-2 Marunouchi, Chiyoda-ku Tokyo 100-0005 Japan Phone: +81-3-6870-7766 Singapore Axioma, (Asia) Pte Ltd. 30 Raffles Place #23-00 Chevron House Singapore 048622 Phone: +65 6233 6835 Melbourne Axioma (AU) Ltd. 31st Floor 120 Collins Street Melbourne, VIC 3000 Australia Phone: +61 (0) 3-9225-5296 DRAFT New York Axioma, Inc. 17 State Street Suite 2700 New York, NY 10004 United States Phone: 212-991-4500 Sales: sales@axioma.com Client Support: support@axioma.com Careers: careers@axioma.com