Risk Management In The Financial Services Industry: Through

From: AAAI Technical Report WS-97-07. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved.
Risk ManagementIn The Financial Services Industry:
ThroughA Statistical Lens
Til
Schuermann
Oliver, Wyman& Company
666 Fifth Ave.
New York, NY 10103
(212) 541-8100
tschuermann @owc.com
Abstract
Theproblemof fraud detection and risk management
is first
and foremost a statistical one where, in the face of
overwhelmingamountsof data, the investigator is well
advised to imposestructure. After defining three types of
risk commonly
encounteredin the financial serives industry.
a generic modelingframeworkis presented in terms of
conditional expectations. Nowwe can imposestructure on:
the functional form of the conditioningrelationship, the
stochastic distribution as well as variable (or feature)
selection. As an example we present what is rapidly
becomingan industry standard for measuringand modeling
marketrisk, characterizingthe possibility of losses resulting
from unfavorablemarketmovements.Finally we tackle the
difficult issue of going from risk measurementto risk
management.
Risk defined
Introduction
While this is a conference on AI approaches to fraud
detection and risk management, I will argue that the
problemis first and foremost a statistical one whereareas
within AI such as machine learning and knowledge
representation have useful tools to offer. Certainly within
the context of finance, but also within telephony, the task
of detecting fraud specifically and managing risk more
broadly is highly data driven and suggests a stochastic or
probabilistic
approach.
Often the investigator is
overwhelmed by the sheer amount of data and the
difficulty of placing meaningfulstructure on the data and
the detection and managementproblem itself. I will argue
that structure is not only necessary to have a glimmerof
hope of solving the problem but also good in that it
provides useful analytical and forecasting tools.
Section II provides a general definition of risk and
describes the three types of risk we typically encounter in
the financial services industry. Section III addresses the
issue of howmuchstructure to impose on the problem and
what that structure might be. Section IV describes the
challenges of translating risk measurement into risk
management. Section V concludes.
78
Risk is a statistical concept
Simply defined, risk is the potential for deviation from
expected results. A risky future cash flow, earnings result
or change in value is characterized by a probability
distribution of potential results. The relative magnitudeof
risk is defined by the degree of dispersion in this
distribution,
an inherently statistical
question. The
dispersion can be measured in different ways, the most
obvious of which is squared deviation, i.e. the variance or
the second moment. However, we may want to consider
higher momentssuch as the third, skewness, measuring
asymmetry,or the fourth, kurtosis, measuring the fatness
of tails, or the relative importance of extreme events.
Since risk is after all a description of the relative frequency
of extreme events, modelingthe kurtosis of a distribution
seems appealing, especially since we knowthat tbr many
traded financial instruments, the return distribution is fat
tailed relative to the Gaussian. However,it is well known
that higher order momentsare very difficult to measure
accurately. This is a function largely of sample size and
their sensitivity, by construction, to outliers. Another
summary
description of the return distribution is a quantile.
a useful conceptwe will revisit later.
The focus is around a statistical description of the data
generating process (DGP),be it for the return of a financial
instrument, the default probability of a loan or a credit
card, or the order flow from a trader whomayor may not
be under suspicion of conducting illegal activities. It is
useful therefore to consider a frameworkfor "modeling’" so
that we may understand better how and where to impose
somestructure on the problem.
Three types of risk
In the financial services industry we typically divide risk
into three categories.
MarketRisk: Marketrisk is defined as the possibility of
losses resulting from unfavorable market movements.
Since such losses occur when an adverse price movement
causes a decrease in the mark-to-market value of a
position, market risk can also be referred to as "position
risk". Market risk can be calculated at the level of a
transaction, business unit, or bank as a whole and can come
from manydifferent sources, including:
¯
Interest rate fluctuations that affect bonds and other
fixed-income assets
¯
Foreign exchange fluctuations that affect future
unhedged cash flows
¯
Changesin volatility of interest rates that affect the
values of options or other derivatives
Credit Risk: Credit risk. sometimes also called
counterparty risk. describes the potential for loss due to
non-payment by the counterparty
in the financial
transaction. Typical examplesare defaults on:
¯ A business or other loan
¯ A credit card
¯ A foreign exchange swap
Operating Risk: Operating risk cover event risks from
operating one’s business and is often described as the
residual sources of risk after market and credit risks have
been taken into account. Examplesinclude:
¯ System failures
¯ Errors and omissions
¯ Fraud
¯ Litigation
¯
Settlement failure
Structure: Too MuchOr Too Little?
Within the financial services industry we have the apparent
luxury of workingwith a lot of data; does this necessarily
mean having a lot of information’? Broadly speaking, any
analysis or modelingof a problem with data must consider
separating signal from noise and this is most easily done by
imposing some structure on the problem.
A FrameworkFor Modeling
Following (Mallows and Pregibon 1996), we can generally
write downa model of the problem as
model= data + error
where "’+’" should be interpreted very broadly. More
specifically, suppose we are interested in predicting the
outcomeof a bad event such as the default on a credit card
or a business loan. Another example might be the
modeling of the return on a portfolio of financial
instruments. Specifically we are interested in generating a
model for a conditional expectation or a conditional
probability. Let y = the random variable denoting the
outcome. If the randomvariable is something like a loan
default, then y is binary {default, no default}. If the
outcomeis a return, then clearly y can assumeany value on
the real line. Generically we write the problemas
79
E(y [data; parameters) = f(data; parameters; error)
E(y Ix; 0) = f(x; 0;
where x is the data, often in the form of vectors of
characteristics
(for instance, credit scores, income,
geography),0 is a parametervector, and ~: is a generic error
term or residual. While the goal is almost always to
generate accurate and robust predictions of y, in order to
get there we haveto estimate 0 and possibly also f(.) itself.
A trivial but oft used and successful exampleis
y = l~0 + 13,x, + ... + 13kXk
+e
where the parameters we need to estimate are 0 = (13,, ....
,~k, ~), and f(.) is assumedto be linear in the parameters
and the error term, and ¢~ is the variance of the error term.
For example, y could be the return on assets for a firm and
the x’s are macro-economicfactors like the change in the
gross domestic product (GDP) or the changes in the
interest rate whichare posited to affect y.
Already this simple but very real example has
demonstrated that the investigator needs to makeseveral
decisions.
I. Whatis the functional form of f(.)?
2. Whatis the distribution of E?
3. Whatis the distribution of the data x and moreover,
whatis the joint distribution of x and e?
4. What does y look like and howwill it effect f(.) and
the estimation methodto find 0? This method will be
quite different if y is binary, and we are dealing
essentially with a classification problem, or if y is
continuous.
5. Howdo I choose mycandidate x’s, and how do I then
choose the right set of final x’s? This is ostensibly a
question about knowledgerepresentation and variable
selection. This leads us to
of x are
6. What kind of data transformations
appropriate? This is also very closely related to (1).
7. Whatis the conditional distribution of x given y?
In other words, there are many junctures where an
investigator needs to imposesomesort of structure on the
problem, where most of the structure comes in the form of
knowledgerepresentation by considering functional form,
variable definition and selection, and finally various
distributional assumptions.
Large datasets: are they necessarily a good thing?
Suppose we have n observations of our data and we have k
x’s. Increasingly we are dealing with very large data sets
which is to say large n and large k. For small k and well
behaved x and e, one can make the form of f(.) very
flexible. Moreover one can impose fewer assumptions
about the distribution of E. An exampleof the former is a
neural networkwhere f(.) is possibly a highly nonlinear
both x and 0. An example of the latter is non-parametric
discriminant analysis via kernel density or nearest neighbor
estimators where we make no strict assumptions about
either the distribution of 8 or f(x [y). An exampleof both
highly flexible functional form and a distribution-free
approachis a Bayesian network.
But as k increases, and as we wish to relax the structure
on f(.) and ~, the task of estimating the model becomes
very complex indeed. A large part of this estimation
process involves choosing the relevant explanatory factors
or variables. This can be phrased as either "Whatshould I
keep" or "Whatcan I afford to throw away."
In fact, the difficulties in estimating the modelwith large
data,sets are less about large n and more about large k,
although even this is changing.
Moreover, the
conditioning dataset x often more complex with noncontinuous or worse yet, non-numeric data. Thus while it
may appear often that one has an abundance of data and
with it a very large information set, gleaning behavioral
rules and models becomes harder. If there is one thing
which distinguishes machine learning from an applied
statistics field like econometricsor biometrics, it is that the
former takes an atheoretic approach to estimating the
model. There is no behavioral theory which we can appeal
to in order to help reduce the search over functional forms
for f(.), useful elements of the data structure x and
reasonable distributions for e. Even with theory the
"problem reduction" is often not significantly
large
enough. In these instances, machine learning can be a
powerfuldevice. 1
Value At Risk
To take a very specific and highly relevant example, a
fundamental problem in finance is the characterization of
market risk, or the modelingof the distribution of market
instruments. This is a problem faced by every major
commercial and investment bank in the world, each of
whomoften have 1000 or more instruments in their
portfolio at any one time, including equities like stocks,
commercial paper, government bonds, both foreign and
domestic, foreign exchangeand derivative products such as
options based on any one or combination of the former.
All of these are in x, but presumably there is somelower
dimensional representation which can still capture all the
relevant characteristics about x. For example,the risk-free
interest rates for different maturities, as defined by US
dollar governmentbonds, are highly correlated. In tact, if
one considersten typical maturities, 2, 3, 4, 5, 7, 9, 10, 15,
20, and 30 year bonds, the two-dimensionalfactor analytic
representation takes 98%of the covariance into account.
Since the variance-covariancestructure is a core element in
modeling risk, the information loss due to dimension
reduction will be minimal.
Next we can appeal to finance theory to find a mapping
from the price distribution to shareholder utility (and disutility for risk). Weconventionally assume that asset
prices are log-normal and therefore asset returns are
1 See Elder and Pregibon 1996.
80
Gaussian. 2 Wecan relax this assumption somewhat by
assuming that asset returns have a joint conditional
Gaussian distribution as opposedto the stricter assumption
of unconditional normality. Whatdoes this buy us’?
Since covariation or stochastic dependence amongthe
x’s is critical in characterizing portfolio profit and loss, by
assuming conditional Gaussianity we can model this
dependence by simply measuring pairwise correlation
between any two assets. Specifically, cov(x~,x) does not
dependonxt ’v’ 1 ~: i,j.
Weneed to measure only two moments: the mean vector
of the relevant risk factors, whichon a daily basis turn out
to be very close to zero, and the variance / covariancc
matrix.
Then the profit and loss distribution for the whole
portfolio can be characterized by a quadratic form in the
positions a risk factor has in a portfolio (e.g., are we long
Yen, are we short Italian governmentbonds, and howlargc
is our stake in IBMstocks) and the covariance matrix.
This is the workhorse of market risk finance: value at
risk (VaR). 3 VaR is defined as a limit on the loss of
returns, the tail or quantile of the return distribution.
Specifically, the probability that we see any return x, less
than VaRis equal to (x%:
Pr[x, _< VaR] = a%
If we rewrite this in terms of the density of returns over
time,fix,), it is:
eVaR
a%=j f(x,)dx,
In the case of x, being normally distributed, for (x = 2.5.
VaR= 2.33.(~,. This is a powerful simplification since the
tail area is simply a scaling up of the secondmomentof the
return distribution.
To compute VaRwe need an inventory and a price list
of the current portfolio holdings, or at least the underlying
risk factors, as well as a description of their volatility and
interdependence over time. For instance, it is equally
important to knowif a risk factor such as the MexicanPest)
/ $U.S.exchangerate is highly volatile as it is to knowif it
covaries with other risk factors or if it is independentof
them. By combining the inventory and price list
information with the covariance structure, we can estimate
a daily value at risk. So long as loss due to severe market
fluctuations can be measured by these two elements only
can we fully characterize risk.
There are two important assumptions on which the
statistical structure of VaRrests:
1. Asset or security returns are linear functions of the
underlying risk factors.
2 This assumption is part of the well-knownBlack-Scholes
option pricing model.
3 For a thorough and elegant survey, see Jorion 1996.
2.
These returns follow a conditional normal distribution
and are linearly correlated.
Weknow that both of these assumptions are often
violated by having derivative products like options in one’s
portfolio.
However, it is remarkable how well an
otherwise simple system like VaRcan still approximate
one’s risk profile, particularly for large, well diversified
portfolios. For example, Nick Leeson portfolio shortly
before his downfall was dominated by positions in
Japanese governmentbond futures and Nikkei futures. Not
only is this portfolio highly concentrated (there are only
two instruments), but it contains derivative products.
Nevertheless, basic VaRwould have revealed that Leeson
had almost $900MM
at risk! His final loss amounted to
approximately $1.3B, which given the unusually large drop
in the Nikkei following the Kobe earthquake, the highly
concentratednature of the portfolio, and its high proportion
of derivatives, is still remarkably close and wouldhave
been very useful to senior management. Nevertheless,
Gaussian linear VaRwould have been able to capture 70%
4of the potential losses.
The second assumption is key for allowing us to mapthe
return structure of a portfolio to our subjective notions of
risk. Specifically, all we need to measure and modelthis
return structure is the variance and the correlation between
assets or instruments; we need measure nothing else. Then
weshould be able to predict well the likelihood, if not their
exact timing, of severe market fluctuations and its impact
on loss.
It turns out that it is easier to accommodate
violations of
the first assumption than of the second. Fixes run the
gamut from approximations via a Taylor series expansion
to full valuation using extensive Monte Carlo. Stress
testing, while clumsy, provides a relatively easy sanity
check which allows one to relax both assumptions.
The framework has allowed us to reduce the
dimensionality of the data matrix, x, we have characterized
the functional form of ft.), and we have restricted the
distribution of c. While this reduces the conceptual and
computational complexity of the problem greatly, the
growingpresence of non-linear options (i.e. ft.) becoming
nonlinear in 0 and x) makes the task still
very
computationally intensive.
For example, Citibank
conducts a full valuation of its portfolio once a week, a
process which occupies a Cray supercomputer for a full
weekend.
From risk
measurement
to risk
management
The discussion thus far has focused exclusively on risk
measurement,yet it is highly unlikely that we will ever
discover the true data generating processes. At best all
models will be approximations. Even if we could discover
4 To be clear, while we are using this exampleto illustrate
VaRas a market risk measurementtool, the Leesoncase is
really an exampleof operating risk.
81
the DGP,because the problem is inherently stochastic, we
can never control for extreme events with certainty. Value
at risk, for instance, is the threshold whichdefines the c~%
tail area of the return distribution. Weare most interested
in the extreme tails describing the extreme(ly damaging)
events, and it is precisely those tail probabilities whichare
the hardest to measure. Simple procedures like stop-losses
are critical to actually managingrisk. Weshould stress
that neither VaRnor any other more or less sophisticated
risk measurement technique will safeguard against
fraudulent or otherwise rogue traders. Manyof the most
significant "trading" losses in recent times wouldbe more
accurately considered to be fraud or other forms of
operating risk. Strong audit and process control functions
cannot be replaced by a risk measurementsystem.
Properties of a useful market risk measure
Generalizing
across
measurement-based
trading
management functions,
we make the following
observations:
EconomicRationality -- Traders and other agents will
tend to "game"the system -- if incentives or controls are
based on a measureof return per unit of assessed risk, and
their return is actually proportional to "true" risk, they will
have the incentive to maximize "true" risk per unit of
assessed risk. It is then important that risk measurementbe
as internally consistent and realistic
as possible.
Otherwise, traders may arbitrage the system, taking
positions which seem much less risky than they are, or
taking positions which are optimal to the risk-adjusted
performance measurement system, but not optimal to the
shareholders.
CommonCurrency -- Most risk measurement-based
management systems rely on the measure to reflect
"apples-to-apples" comparisons of risk. This consistency
is required at several different levels, depending on the
application:
¯
For monitoring and control, relative risk should be
measured consistently
over time and between
decisions on a single trading desk or within a single
type of instrument
¯
For limit setting and planning in the trading unit,
relative risk should be measuredconsistently across
different trading desks or different types instruments
¯
For corporate strategic planning, risk should be
measured on a basis consistent with measures of risk
in other activities elsewherein the financial institution
Appropriate Horizon ~ the risk measure must be
calculable or translatable across a variety of relevant time
horizons, ranging from one day or less for effective trading
managementto one year or more for strategic planning and
structural hedging.
Actionable -- the risk measure should be accompanied
by information as to which positions or desks are
responsible for the aggregate risk, and what hedges might
be taken to reduce the risk. The risk measure should
aggregate and disaggregate easily so that risk management
can be applied at a range of levels within the organization,
and so that business unit managers, desk heads, etc. and
individualtradersare all alignedto the sameincentivesand
controls.
Statistically
Verifiable ~ in order to generate the
credibility required of an effective managementtool, the
risk measure should be validated by comparingestimates
of risk against actual realizations
of P&L (i.e.
"backtesting").
This efficiency is achieved at the cost of taking a
numberof unfortunate simplifying assumptions, but these
are well worth it in order to process such a large set of
detailed information into a useful single number which
measures risk. One must wonder, however, if it is really
necessary to carry this detailed information all the way to
the head of risk management,the head of capital markets,
the CFO,and other very senior managers. In other words,
the VaR system is currently answering questions like:
"Whatis the impact on our aggregate risk of a short Thai
Bhat position hedged by a long Indonesian Rupiah
position?" To suppose that this question as posed is
relevant implies that very senior management might
choose to hedgesuch a position or have the position closed
out. The more relevant questions for senior management
are of the following nature: "Whatis the impact of the Asia
FXdesk on the aggregate risk? .... In particular, what is
their impact on the net short dollar position?" and "How
muchof their risk is unrelated to global risk factors (such
as the net short dollar position), and what does the
distribution of that risk look like?" These types of macroquestions can be answered with information that is much
moresummarizedthan the detail of each position or a large
"position vector."
To be an effective
risk management tool and
management information system, a risk management
system only needs to pass detailed information up to the
level at which it might be acted upon. Such detail should
include the sensitivity to factors against which a hedge
might be taken, and the magnitudeand distribution of risk
outside of these factors. This allows tremendousflexibility
in the techniques used to measure risk at the desk level.
For derivatives desks, the focus of measurementmight be
the price-to-factor
sensitivity,
which may be highly
nonlinear. In emergingmarkets, the focus might be on the
distribution of large movementsin prices and modeling
regional correlations for large movementsdifferentially
from small movements. Conceivably, the covariance
matrix approach might still be appropriate where the
assumptions do not fail badly or for smaller desks with
insignificant risks. This approach aligns the type of risk
measurement information
with the type of risk
management
action to be taken.
Conclusion
The bulk of the discussion in this paper focused on
structured approaches to measuring and modeling market
risk in the financial servies industry. I have argued that in
general it is useful to impose structure on these types of
82
data analytic problems, and that this structure comes in
three forms:
1. The set of conditioning variables under consideration
2. The functional form of the conditioning relationship
3. The parametric form of the distribution of the error
structure
Within the financial services industry we further appeal
to economicor financial theory to impose what seem like
reasonable restrictions on the problem. For example, in
market risk we presumethat prices of financial instruments
follow a log-normal distribution. In credit risk we often
presumethat the log-odds of a default probability is linear
in the conditioning set of variables; this is knownas a
logistic relationship. Risk measurement and management
is a hard problem; with some structure we can hope to
approximatea solution.
References
Elder, J. and D. Pregibon1996. A Statistical Perspective on
KnowledgeDiscovery in Databases, ch. 4 in U.M. Fayyad.
G. Piatestky-Shapiro,
P. Smyth, R. Uthurusamy eds.
Advances in Knowledge Discovery and Data Milmtg.
Cambridge, Mass.: MITPress.
Jorion, P. 1996. Valueat Risk. Chicago,Ill.: Irwin.
Mallows, C.L. and D. Pregibon 1996. Modeling with
Massive Databases, mimeo, AT&TResearch.