From: AAAI Technical Report WS-97-07. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved. Risk ManagementIn The Financial Services Industry: ThroughA Statistical Lens Til Schuermann Oliver, Wyman& Company 666 Fifth Ave. New York, NY 10103 (212) 541-8100 tschuermann @owc.com Abstract Theproblemof fraud detection and risk management is first and foremost a statistical one where, in the face of overwhelmingamountsof data, the investigator is well advised to imposestructure. After defining three types of risk commonly encounteredin the financial serives industry. a generic modelingframeworkis presented in terms of conditional expectations. Nowwe can imposestructure on: the functional form of the conditioningrelationship, the stochastic distribution as well as variable (or feature) selection. As an example we present what is rapidly becomingan industry standard for measuringand modeling marketrisk, characterizingthe possibility of losses resulting from unfavorablemarketmovements.Finally we tackle the difficult issue of going from risk measurementto risk management. Risk defined Introduction While this is a conference on AI approaches to fraud detection and risk management, I will argue that the problemis first and foremost a statistical one whereareas within AI such as machine learning and knowledge representation have useful tools to offer. Certainly within the context of finance, but also within telephony, the task of detecting fraud specifically and managing risk more broadly is highly data driven and suggests a stochastic or probabilistic approach. Often the investigator is overwhelmed by the sheer amount of data and the difficulty of placing meaningfulstructure on the data and the detection and managementproblem itself. I will argue that structure is not only necessary to have a glimmerof hope of solving the problem but also good in that it provides useful analytical and forecasting tools. Section II provides a general definition of risk and describes the three types of risk we typically encounter in the financial services industry. Section III addresses the issue of howmuchstructure to impose on the problem and what that structure might be. Section IV describes the challenges of translating risk measurement into risk management. Section V concludes. 78 Risk is a statistical concept Simply defined, risk is the potential for deviation from expected results. A risky future cash flow, earnings result or change in value is characterized by a probability distribution of potential results. The relative magnitudeof risk is defined by the degree of dispersion in this distribution, an inherently statistical question. The dispersion can be measured in different ways, the most obvious of which is squared deviation, i.e. the variance or the second moment. However, we may want to consider higher momentssuch as the third, skewness, measuring asymmetry,or the fourth, kurtosis, measuring the fatness of tails, or the relative importance of extreme events. Since risk is after all a description of the relative frequency of extreme events, modelingthe kurtosis of a distribution seems appealing, especially since we knowthat tbr many traded financial instruments, the return distribution is fat tailed relative to the Gaussian. However,it is well known that higher order momentsare very difficult to measure accurately. This is a function largely of sample size and their sensitivity, by construction, to outliers. Another summary description of the return distribution is a quantile. a useful conceptwe will revisit later. The focus is around a statistical description of the data generating process (DGP),be it for the return of a financial instrument, the default probability of a loan or a credit card, or the order flow from a trader whomayor may not be under suspicion of conducting illegal activities. It is useful therefore to consider a frameworkfor "modeling’" so that we may understand better how and where to impose somestructure on the problem. Three types of risk In the financial services industry we typically divide risk into three categories. MarketRisk: Marketrisk is defined as the possibility of losses resulting from unfavorable market movements. Since such losses occur when an adverse price movement causes a decrease in the mark-to-market value of a position, market risk can also be referred to as "position risk". Market risk can be calculated at the level of a transaction, business unit, or bank as a whole and can come from manydifferent sources, including: ¯ Interest rate fluctuations that affect bonds and other fixed-income assets ¯ Foreign exchange fluctuations that affect future unhedged cash flows ¯ Changesin volatility of interest rates that affect the values of options or other derivatives Credit Risk: Credit risk. sometimes also called counterparty risk. describes the potential for loss due to non-payment by the counterparty in the financial transaction. Typical examplesare defaults on: ¯ A business or other loan ¯ A credit card ¯ A foreign exchange swap Operating Risk: Operating risk cover event risks from operating one’s business and is often described as the residual sources of risk after market and credit risks have been taken into account. Examplesinclude: ¯ System failures ¯ Errors and omissions ¯ Fraud ¯ Litigation ¯ Settlement failure Structure: Too MuchOr Too Little? Within the financial services industry we have the apparent luxury of workingwith a lot of data; does this necessarily mean having a lot of information’? Broadly speaking, any analysis or modelingof a problem with data must consider separating signal from noise and this is most easily done by imposing some structure on the problem. A FrameworkFor Modeling Following (Mallows and Pregibon 1996), we can generally write downa model of the problem as model= data + error where "’+’" should be interpreted very broadly. More specifically, suppose we are interested in predicting the outcomeof a bad event such as the default on a credit card or a business loan. Another example might be the modeling of the return on a portfolio of financial instruments. Specifically we are interested in generating a model for a conditional expectation or a conditional probability. Let y = the random variable denoting the outcome. If the randomvariable is something like a loan default, then y is binary {default, no default}. If the outcomeis a return, then clearly y can assumeany value on the real line. Generically we write the problemas 79 E(y [data; parameters) = f(data; parameters; error) E(y Ix; 0) = f(x; 0; where x is the data, often in the form of vectors of characteristics (for instance, credit scores, income, geography),0 is a parametervector, and ~: is a generic error term or residual. While the goal is almost always to generate accurate and robust predictions of y, in order to get there we haveto estimate 0 and possibly also f(.) itself. A trivial but oft used and successful exampleis y = l~0 + 13,x, + ... + 13kXk +e where the parameters we need to estimate are 0 = (13,, .... ,~k, ~), and f(.) is assumedto be linear in the parameters and the error term, and ¢~ is the variance of the error term. For example, y could be the return on assets for a firm and the x’s are macro-economicfactors like the change in the gross domestic product (GDP) or the changes in the interest rate whichare posited to affect y. Already this simple but very real example has demonstrated that the investigator needs to makeseveral decisions. I. Whatis the functional form of f(.)? 2. Whatis the distribution of E? 3. Whatis the distribution of the data x and moreover, whatis the joint distribution of x and e? 4. What does y look like and howwill it effect f(.) and the estimation methodto find 0? This method will be quite different if y is binary, and we are dealing essentially with a classification problem, or if y is continuous. 5. Howdo I choose mycandidate x’s, and how do I then choose the right set of final x’s? This is ostensibly a question about knowledgerepresentation and variable selection. This leads us to of x are 6. What kind of data transformations appropriate? This is also very closely related to (1). 7. Whatis the conditional distribution of x given y? In other words, there are many junctures where an investigator needs to imposesomesort of structure on the problem, where most of the structure comes in the form of knowledgerepresentation by considering functional form, variable definition and selection, and finally various distributional assumptions. Large datasets: are they necessarily a good thing? Suppose we have n observations of our data and we have k x’s. Increasingly we are dealing with very large data sets which is to say large n and large k. For small k and well behaved x and e, one can make the form of f(.) very flexible. Moreover one can impose fewer assumptions about the distribution of E. An exampleof the former is a neural networkwhere f(.) is possibly a highly nonlinear both x and 0. An example of the latter is non-parametric discriminant analysis via kernel density or nearest neighbor estimators where we make no strict assumptions about either the distribution of 8 or f(x [y). An exampleof both highly flexible functional form and a distribution-free approachis a Bayesian network. But as k increases, and as we wish to relax the structure on f(.) and ~, the task of estimating the model becomes very complex indeed. A large part of this estimation process involves choosing the relevant explanatory factors or variables. This can be phrased as either "Whatshould I keep" or "Whatcan I afford to throw away." In fact, the difficulties in estimating the modelwith large data,sets are less about large n and more about large k, although even this is changing. Moreover, the conditioning dataset x often more complex with noncontinuous or worse yet, non-numeric data. Thus while it may appear often that one has an abundance of data and with it a very large information set, gleaning behavioral rules and models becomes harder. If there is one thing which distinguishes machine learning from an applied statistics field like econometricsor biometrics, it is that the former takes an atheoretic approach to estimating the model. There is no behavioral theory which we can appeal to in order to help reduce the search over functional forms for f(.), useful elements of the data structure x and reasonable distributions for e. Even with theory the "problem reduction" is often not significantly large enough. In these instances, machine learning can be a powerfuldevice. 1 Value At Risk To take a very specific and highly relevant example, a fundamental problem in finance is the characterization of market risk, or the modelingof the distribution of market instruments. This is a problem faced by every major commercial and investment bank in the world, each of whomoften have 1000 or more instruments in their portfolio at any one time, including equities like stocks, commercial paper, government bonds, both foreign and domestic, foreign exchangeand derivative products such as options based on any one or combination of the former. All of these are in x, but presumably there is somelower dimensional representation which can still capture all the relevant characteristics about x. For example,the risk-free interest rates for different maturities, as defined by US dollar governmentbonds, are highly correlated. In tact, if one considersten typical maturities, 2, 3, 4, 5, 7, 9, 10, 15, 20, and 30 year bonds, the two-dimensionalfactor analytic representation takes 98%of the covariance into account. Since the variance-covariancestructure is a core element in modeling risk, the information loss due to dimension reduction will be minimal. Next we can appeal to finance theory to find a mapping from the price distribution to shareholder utility (and disutility for risk). Weconventionally assume that asset prices are log-normal and therefore asset returns are 1 See Elder and Pregibon 1996. 80 Gaussian. 2 Wecan relax this assumption somewhat by assuming that asset returns have a joint conditional Gaussian distribution as opposedto the stricter assumption of unconditional normality. Whatdoes this buy us’? Since covariation or stochastic dependence amongthe x’s is critical in characterizing portfolio profit and loss, by assuming conditional Gaussianity we can model this dependence by simply measuring pairwise correlation between any two assets. Specifically, cov(x~,x) does not dependonxt ’v’ 1 ~: i,j. Weneed to measure only two moments: the mean vector of the relevant risk factors, whichon a daily basis turn out to be very close to zero, and the variance / covariancc matrix. Then the profit and loss distribution for the whole portfolio can be characterized by a quadratic form in the positions a risk factor has in a portfolio (e.g., are we long Yen, are we short Italian governmentbonds, and howlargc is our stake in IBMstocks) and the covariance matrix. This is the workhorse of market risk finance: value at risk (VaR). 3 VaR is defined as a limit on the loss of returns, the tail or quantile of the return distribution. Specifically, the probability that we see any return x, less than VaRis equal to (x%: Pr[x, _< VaR] = a% If we rewrite this in terms of the density of returns over time,fix,), it is: eVaR a%=j f(x,)dx, In the case of x, being normally distributed, for (x = 2.5. VaR= 2.33.(~,. This is a powerful simplification since the tail area is simply a scaling up of the secondmomentof the return distribution. To compute VaRwe need an inventory and a price list of the current portfolio holdings, or at least the underlying risk factors, as well as a description of their volatility and interdependence over time. For instance, it is equally important to knowif a risk factor such as the MexicanPest) / $U.S.exchangerate is highly volatile as it is to knowif it covaries with other risk factors or if it is independentof them. By combining the inventory and price list information with the covariance structure, we can estimate a daily value at risk. So long as loss due to severe market fluctuations can be measured by these two elements only can we fully characterize risk. There are two important assumptions on which the statistical structure of VaRrests: 1. Asset or security returns are linear functions of the underlying risk factors. 2 This assumption is part of the well-knownBlack-Scholes option pricing model. 3 For a thorough and elegant survey, see Jorion 1996. 2. These returns follow a conditional normal distribution and are linearly correlated. Weknow that both of these assumptions are often violated by having derivative products like options in one’s portfolio. However, it is remarkable how well an otherwise simple system like VaRcan still approximate one’s risk profile, particularly for large, well diversified portfolios. For example, Nick Leeson portfolio shortly before his downfall was dominated by positions in Japanese governmentbond futures and Nikkei futures. Not only is this portfolio highly concentrated (there are only two instruments), but it contains derivative products. Nevertheless, basic VaRwould have revealed that Leeson had almost $900MM at risk! His final loss amounted to approximately $1.3B, which given the unusually large drop in the Nikkei following the Kobe earthquake, the highly concentratednature of the portfolio, and its high proportion of derivatives, is still remarkably close and wouldhave been very useful to senior management. Nevertheless, Gaussian linear VaRwould have been able to capture 70% 4of the potential losses. The second assumption is key for allowing us to mapthe return structure of a portfolio to our subjective notions of risk. Specifically, all we need to measure and modelthis return structure is the variance and the correlation between assets or instruments; we need measure nothing else. Then weshould be able to predict well the likelihood, if not their exact timing, of severe market fluctuations and its impact on loss. It turns out that it is easier to accommodate violations of the first assumption than of the second. Fixes run the gamut from approximations via a Taylor series expansion to full valuation using extensive Monte Carlo. Stress testing, while clumsy, provides a relatively easy sanity check which allows one to relax both assumptions. The framework has allowed us to reduce the dimensionality of the data matrix, x, we have characterized the functional form of ft.), and we have restricted the distribution of c. While this reduces the conceptual and computational complexity of the problem greatly, the growingpresence of non-linear options (i.e. ft.) becoming nonlinear in 0 and x) makes the task still very computationally intensive. For example, Citibank conducts a full valuation of its portfolio once a week, a process which occupies a Cray supercomputer for a full weekend. From risk measurement to risk management The discussion thus far has focused exclusively on risk measurement,yet it is highly unlikely that we will ever discover the true data generating processes. At best all models will be approximations. Even if we could discover 4 To be clear, while we are using this exampleto illustrate VaRas a market risk measurementtool, the Leesoncase is really an exampleof operating risk. 81 the DGP,because the problem is inherently stochastic, we can never control for extreme events with certainty. Value at risk, for instance, is the threshold whichdefines the c~% tail area of the return distribution. Weare most interested in the extreme tails describing the extreme(ly damaging) events, and it is precisely those tail probabilities whichare the hardest to measure. Simple procedures like stop-losses are critical to actually managingrisk. Weshould stress that neither VaRnor any other more or less sophisticated risk measurement technique will safeguard against fraudulent or otherwise rogue traders. Manyof the most significant "trading" losses in recent times wouldbe more accurately considered to be fraud or other forms of operating risk. Strong audit and process control functions cannot be replaced by a risk measurementsystem. Properties of a useful market risk measure Generalizing across measurement-based trading management functions, we make the following observations: EconomicRationality -- Traders and other agents will tend to "game"the system -- if incentives or controls are based on a measureof return per unit of assessed risk, and their return is actually proportional to "true" risk, they will have the incentive to maximize "true" risk per unit of assessed risk. It is then important that risk measurementbe as internally consistent and realistic as possible. Otherwise, traders may arbitrage the system, taking positions which seem much less risky than they are, or taking positions which are optimal to the risk-adjusted performance measurement system, but not optimal to the shareholders. CommonCurrency -- Most risk measurement-based management systems rely on the measure to reflect "apples-to-apples" comparisons of risk. This consistency is required at several different levels, depending on the application: ¯ For monitoring and control, relative risk should be measured consistently over time and between decisions on a single trading desk or within a single type of instrument ¯ For limit setting and planning in the trading unit, relative risk should be measuredconsistently across different trading desks or different types instruments ¯ For corporate strategic planning, risk should be measured on a basis consistent with measures of risk in other activities elsewherein the financial institution Appropriate Horizon ~ the risk measure must be calculable or translatable across a variety of relevant time horizons, ranging from one day or less for effective trading managementto one year or more for strategic planning and structural hedging. Actionable -- the risk measure should be accompanied by information as to which positions or desks are responsible for the aggregate risk, and what hedges might be taken to reduce the risk. The risk measure should aggregate and disaggregate easily so that risk management can be applied at a range of levels within the organization, and so that business unit managers, desk heads, etc. and individualtradersare all alignedto the sameincentivesand controls. Statistically Verifiable ~ in order to generate the credibility required of an effective managementtool, the risk measure should be validated by comparingestimates of risk against actual realizations of P&L (i.e. "backtesting"). This efficiency is achieved at the cost of taking a numberof unfortunate simplifying assumptions, but these are well worth it in order to process such a large set of detailed information into a useful single number which measures risk. One must wonder, however, if it is really necessary to carry this detailed information all the way to the head of risk management,the head of capital markets, the CFO,and other very senior managers. In other words, the VaR system is currently answering questions like: "Whatis the impact on our aggregate risk of a short Thai Bhat position hedged by a long Indonesian Rupiah position?" To suppose that this question as posed is relevant implies that very senior management might choose to hedgesuch a position or have the position closed out. The more relevant questions for senior management are of the following nature: "Whatis the impact of the Asia FXdesk on the aggregate risk? .... In particular, what is their impact on the net short dollar position?" and "How muchof their risk is unrelated to global risk factors (such as the net short dollar position), and what does the distribution of that risk look like?" These types of macroquestions can be answered with information that is much moresummarizedthan the detail of each position or a large "position vector." To be an effective risk management tool and management information system, a risk management system only needs to pass detailed information up to the level at which it might be acted upon. Such detail should include the sensitivity to factors against which a hedge might be taken, and the magnitudeand distribution of risk outside of these factors. This allows tremendousflexibility in the techniques used to measure risk at the desk level. For derivatives desks, the focus of measurementmight be the price-to-factor sensitivity, which may be highly nonlinear. In emergingmarkets, the focus might be on the distribution of large movementsin prices and modeling regional correlations for large movementsdifferentially from small movements. Conceivably, the covariance matrix approach might still be appropriate where the assumptions do not fail badly or for smaller desks with insignificant risks. This approach aligns the type of risk measurement information with the type of risk management action to be taken. Conclusion The bulk of the discussion in this paper focused on structured approaches to measuring and modeling market risk in the financial servies industry. I have argued that in general it is useful to impose structure on these types of 82 data analytic problems, and that this structure comes in three forms: 1. The set of conditioning variables under consideration 2. The functional form of the conditioning relationship 3. The parametric form of the distribution of the error structure Within the financial services industry we further appeal to economicor financial theory to impose what seem like reasonable restrictions on the problem. For example, in market risk we presumethat prices of financial instruments follow a log-normal distribution. In credit risk we often presumethat the log-odds of a default probability is linear in the conditioning set of variables; this is knownas a logistic relationship. Risk measurement and management is a hard problem; with some structure we can hope to approximatea solution. References Elder, J. and D. Pregibon1996. A Statistical Perspective on KnowledgeDiscovery in Databases, ch. 4 in U.M. Fayyad. G. Piatestky-Shapiro, P. Smyth, R. Uthurusamy eds. Advances in Knowledge Discovery and Data Milmtg. Cambridge, Mass.: MITPress. Jorion, P. 1996. Valueat Risk. Chicago,Ill.: Irwin. Mallows, C.L. and D. Pregibon 1996. Modeling with Massive Databases, mimeo, AT&TResearch.