An Introduction to the Mathematics of Value-at-Risk James N. Grotke, Jr. May 2010 Abstract The financial concept of value-at-risk (VaR) plays an integral role in modern financial risk management. VaR is used by many large financial institutions to measure the riskiness of their holdings and determine safe levels of capital to hold. This paper will explore the mathematics behind this fundamental concept. Specifically, the relevant probability theory underlying VaR will be discussed as will some concepts from linear algebra and applied mathematics that are useful in the computation of VaR. The paper will conclude with a sample problem that illustrates how to go about finding the VaR for a hypothetical portfolio. I. Introduction Value-at-Risk (VaR) gives the financial risk manager the worst expected loss under average market conditions over a certain time interval at a given confidence level. In other words, VaR gives the risk manager a sense of what he or she can expect to potentially lose in a given time interval, assuming “normal” market conditions. Definition 1: Given a confidence level α (0,1), the Value-at-Risk of a portfolio at α over the time period t is given by the smallest number k such that the probability of a loss over a time interval t greater than k is α. This concept is best illustrated through an example. Suppose a hypothetical bond portfolio has a “one-day VaR at a confidence level of 5% of $10 million.” This means that over a typical one-day period, the bond portfolio will lose $10 million or more only 5% of the time. While the concept of VaR is not too difficult to grasp, obtaining an actual VaR is quite involved. A large amount of data on historical returns for the holdings in a portfolio is utilized to arrive at a VaR. Average returns, variances and covariances of the returns of each holding are calculated. Using these values, an average return and variance of the overall portfolio can be determined. At this point, the risk manager determines the probability distribution that the return data will be assumed to follow. Using the calculated portfolio return mean and variance and this assumed probability distribution, one can solve for the VaR at a given confidence level. 2 II. The Mathematics Underlying Value-at-Risk Two areas of mathematics that are relevant to understanding VaR will be discussed. First, the relevant probability theory will be covered, as this forms the underpinning of VaR. Second, some topics from linear algebra will be reviewed. While not integral to the underlying mathematics of VaR, techniques from linear algebra are very useful in the actual computation of VaR. Part 1: Probability Theory Perhaps the most important concept in probability theory is that of expected value. Definition 2: The Expected Value of the random variable X, denoted as E[X] or µ, is n given by E[X] = x p( x ) , where xi are the various values that X can take, and p(xi) is i 1 i i the probability of X taking the value xi. The expected value can be thought of as a weighted average of the possible values that X can take. When each value of X occurs with the same probability, the expected value n takes on the more familiar form E[X] = xi p( xi ) = i 1 n pxi = i 1 1 n n x . i 1 i Important Property of the Expected Value: If X and Y are two random variables, and a,b , then the following holds: E[aX+bY] = aE[X]+bE[X], provided E[X] and E[Y] are finite. Equivalently, 3 n E[ i X i ] = i 1 n E[ X ] , for random variables Xi and αi . i 1 i i This can be proved using mathematical induction and the linearity property of expected value. In addition to having a measure of the average value of a set of data, being able to measure the spread, or variability, of a random variable is also important. Definition 3: The variance of the random variable X, denoted Var(X) or X2 or XX , n is given by Var(X) = E[(X-µ)2] = (x i i 1 ) 2 pi , where µ is the expected value of X, xi are the various values that X can take, and p(xi) is the probability of X taking the value xi. When each value of X occurs with the same probability, the variance takes on the more n n 1 familiar form Var(X) = E[(X-µ) ] = ( xi ) pi = ( xi ) p = n i 1 i 1 2 2 2 n (x i 1 i )2 . Two Important Properties of Variance: 1. Var(aX) = a2Var(X) , a 2. Var(X+b) =Var(X) , b When looking at linear combinations of random variables, an idea that arises in conjunction with the variance is that of covariance. Covariance is a measure of how two random variables move together (how they “co-vary”). It is related to the more wellknown idea of correlation. 4 Definition 4: The covariance between random variables X and Y, denoted Cov(X,Y) or σXY is given by Cov(X,Y) = E[(X-µ)(Y-γ)], where µ = E[X] and γ = E[Y]. Using the linearity of expected value, this can be simplified as follows: Cov(X,Y) = E[(X-µ)(Y-γ)] =E[XY-X γ-Yµ+ µγ] = E[XY] - E[X γ] - E[Yµ] + E[µγ] =E[XY] - E[X γ] = E[XY] - E[X]E[Y]. If X and Y both take on n values, then the covariance can also be written as Cov(X,Y) = 1 n ( xi E[ X ])( yi E[Y ]) . n i 1 Three Important Properties of the Covariance: 1. Cov(X,Y) = Cov(Y,X) 2. Cov(X,X) = Var(X) (ie. The covariance of X with itself is the variance) 3. Cov(αX, Y) = α Cov(X,Y) , where α, Now that the covariance has been defined, the variance of a linear combination of random variables can be discussed. Definition 5: The variance of α1X1+α2X2+…+αnXn, when Xi are random variables, and αi , i {1,2,…,n}, is given as follows: n Var( i X i ) = i 1 n n Var (X ) + 2 Cov( X , X i 1 2 i i i j i j i j ). Admittedly, this looks a bit ugly. When the example quantitative problem is solved, this summation will be better elucidated. This is one calculation where techniques from linear algebra prove invaluable. 5 Now that the expectation, variance, and covariance have been thoroughly discussed, the topic of probability distributions can be given some attention. Probability distributions are essential to VaR calculations. Take the expected value and variance of the portfolio returns, and apply these to a probability distribution (a portfolio can be thought of as a linear combination of assets, or random variables). Inferences as to the probabilistic return characteristics of the portfolio, assuming it adheres to the distribution that was chosen, can be made. Thus far, random variables whose set of possible values was finite or countably infinite have been considered. This is a discrete setting. In a continuous setting, random variables whose set of possible values is uncountable are considered. Definition 6: X is a Continuous Random Variable if there exists a non-negative function f defined for all real numbers, having the following property: For every set B of real numbers, P[X B] = f ( x)dx . f is the Probability Density B Function of X. In other words, the probability that X is in the set B can be found by integrating f over the set B. Since X must always take a value in the reals (ie. X (-∞,∞)), the following logically follows: P[X (-∞,∞)] = f ( x)dx = 1. 6 Also, any probability statement about X can be written in terms of f. For example, if b X (-∞,∞) and a,b (-∞,∞), then P[a X b] = f ( x)dx . a Building on this idea, the cumulative distribution function of X is defined as follows. Definition 7: Suppose X is a continuous random variable and X (-∞,∞). The Cumulative Distribution Function of X at a, denoted F(a), gives the probability that X a takes a value less than or equal to a. Mathematically, F(a) = P[X a] = f ( x)dx . Recall, in the discrete setting the expected value of X was defined as follows: n E[X] = x p( x ) . i 1 i i The analogous definition in the continuous setting is E[X] = xf ( x)dx . Similarly, the variance in the continuous setting is defined as follows: Var(X) = (x ) 2 f ( x)dx , where µ = E[X]. The most well-known distribution is the Gaussian, or Normal Distribution. It is commonly referred to as the “Bell Curve.” Definition 8: X is a normal random variable with parameters µ and σ2 if it has the following probability density function: f(x) = 2 1 e ( x ) 2 2 2 . 7 One can also say X is normally distributed, or X~N(µ, σ2). Note: Since f(x) is a probability density function, the area under the entire curve is 1. If X is normally distributed with mean µ and variance σ2, we can create a new variable Z that is also normally distributed and is termed the standard normal random variable. Z= X with expected value 0 and variance 1. The probability density function of the standard normal is f(z)= 2 1 e ( x 0 ) (1) 2 2 (1) 2 = 1 z2 2 . e 2 A result of the above is that solving P[a X b] is the same as solving P[ a Z b “standardization.” ]. This process of converting a and b can be thought as a a is the Z-Score for a, and b is the Z-Score for b. Finally, the cumulative distribution function for the standard normal random variable must be described. The CDF, denoted (x) , is denoted as follows: a (a) = 1 y 2 e dy = P[Z a]. This can be thought of as the area of the shaded 2 2 region under the following graph: 8 To illustrate this idea, consider the following example: Question: Suppose X is a normal random variable with mean µ and variance σ2. What is the probability that X lies between a and b? Solution: First, obtain Z-Scores for a and b: Za= a and Zb = b . Now apply the cumulative distribution function for the standard normal random variable. P[a X b]] = P[Za Z Zb] = Zb Za 1 y 2 e dy = ( Zb ) - (Za ) 2 2 This covers the basic probability theory underlying VaR. Now, two methods from linear algebra useful in the computation of VaR will be examined. Part 2: Linear Algebra As was mentioned on page 3, the expected value of a linear combination of random variables is the linear combination of the expected values. Suppose we have random variables X1,X2,…,Xn, and coefficients α1, α2,…, αn . Then it follows that E[α1X1 + α2X2 +…+ αnXn] = α1E[X1] + α2E[X2] +…+ E[αnXn]. We can use matrices to carry out this computation: 1 2 Let K = 3 and let U = n E[ X 1 ] E[ X 2 ] E[ X ] . 3 E[ X ] n 9 E[ X 1 ] E[ X 2 ] E[ X ] 3 E[ X ] n Then E[α1X1 + α2X2 +…+ αnXn] = KTU = 1 2 3 n n E[ i X i ] = α1E[X1] + α2E[X2] + α3E[X3] +…+ αnE[αnXn] = i 1 n E[ X ] , agreeing with i i 1 i the definition on page 3. n E[ i X i ] can be thought of as the expected return of a portfolio of financial assets Xi, i 1 where αi is the proportion of the portfolio invested in asset Xi. Assuming the expected return of each asset and the proportion of the total portfolio invested in each asset is known, the expected return of the overall portfolio can be determined. Using matrices is useful computationally when dealing with a large portfolio of many assets. One simple matrix operation (KTU) is all that is needed to find the expected return of the total portfolio. As was described on page 5, the variance of a linear combination of random variables is n the following: Var( i X i ) = i 1 n n Var (X ) + 2 Cov( X , X i 1 2 i i i j i j i j ). As one can imagine for a large portfolio (many Xi) this is a non-trivial calculation. Again, matrices are usually the best way to proceed. In this case, the matrix used is the Variance-Covariance Matrix, denoted Σ. 10 Let the Var(Xi) be denoted σii and Cov(Xi, Xj) = σij . Then the Variance Covariance Matrix for n random variables is defined as follows: 11 21 Σ = 31 n1 12 13 1n 22 23 2 n 32 33 3n . n 2 n3 nn Since Cov(Xi, Xj) = σij = Cov(Xj, Xi) σji, Σ is symmetric (ie. Σ = ΣT). Suppose we want to find the variance of a linear combination of random variables, or 1 2 n Var( i X i ). Let K = 3 and Σ = i 1 n 11 21 31 n1 12 13 1n 22 23 2 n 32 33 3n . n 2 n3 nn n Then Var( i X i ) = KTΣK . Since this result is not very obvious, let’s look at the 3 i 1 random variable example to illustrate: 3 Var( i X i ) = K ΣK = 1 2 T i 1 11 12 13 1 3 21 22 23 2 . 31 32 33 3 = α12σ11+ α1α2σ21+ α1α3σ31+ α1α2σ12+ α22σ22+ α2α3σ32+ α1α3σ13+ α2α3σ23+ α32σ33 = (α12σ11 + α22σ22+ α32σ33) + (2 α1α2σ21 + 2 α1α3σ31+ 2 α2α3σ23) = 3 3 i 1 i j i2Var (X i ) + 2 i jCov( X i , X j ) 11 III. A General Solution to the Basic VaR Problem Now that the requisite mathematical topics have been covered, the solution to the generalized VaR problem can be described. Problem: Suppose there is a portfolio consisting of Assets 1,2,3,…,N. Di dollars are invested in Asset i, so the total value of the portfolio is D1+D2+…+DN = N D = D i 1 i dollars. Assume the one-day return of Asset i is normally distributed with expected value E[ri] and variance i2 . Also, the covariance between the 1-day returns of Assets i and j is given by ij . Given this information, find the 1-day VaR at a confidence level of 5%. Solution: First, determine the expected return and variance of the overall portfolio. The first step in doing this is to calculate the weighting for each asset. The proportion of the portfolio expected return attributable to Asset i is i = Di . These are the D1 D2 ... DN asset weighting factors. 1 2 Now, let K = 3 and let U = N E[ r1 ] E[ r2 ] E[ r ] . We are creating a linear combination of 3 E[ r ] N random variables, where the random variables are the expected 1-day returns for each asset, and the coefficients are the asset weighting factors. 12 n n Taking advantage of the property of expectations that E[ i X i ] = E[ X ] and the i 1 i 1 i i matrix method for finding this expectation, we obtain the following result: E[ r1 ] E[ r2 ] N E[ r3 ] = E[ r ] N N E[rPortfolio] = E[ i ri ] = KTU = 1 2 3 i 1 N E[r ] = i 1 i i p . Next, we must calculate the variance of the total portfolio. In other words, we need to calculate the variance of the linear combination of random variables. As was discussed on page 5, the variance of the linear combination of random variables is given by n Var( i X i ) = i 1 n n Var (X ) + 2 Cov( X , X i 1 2 i i i j i j i j ). We can modify this to the conditions of our stated problem: N p2 = Var( i ri ) = i 1 N N Var (r ) + 2 Cov(r , r ) i 1 N N i 1 i j 2 i i i j i j i j = i2 i2 + 2 i j ij 1 2 2 To actually compute p , let K = 3 and Σ = N 12 12 13 21 22 23 2 31 32 3 N1 N 2 N 3 1N 2N 3 N . N2 13 n As was shown on page 11, we know Var( i X i ) = KTΣK. Therefore, we obtain that i 1 N p2 = Var( i ri ) = KTΣK = 1 2 i 1 12 12 13 21 22 23 N 31 32 32 N1 N 2 N 3 1N 1 2N 2 3 N 3 . N2 N Now that we have an expected value and variance for the overall portfolio return, we can find the VaR. We assume that the portfolio return is normally distributed with mean p and variance p2 , both of which are numbers we have calculated. Since we want the VaR at a 5% confidence level, we are solving for the return such that a return worse than this return occurs only 5% of the time. Mathematically, we are solving for r* such that r* 1 p 2 e ( x p ) 2 2 2p dx = 0.05. 14 Many mathematical software programs have a NORMINV function to solve for r*. Solving for r* analytically is quite involved, and this would not be done in a real-world r* setting. Therefore, assume we have found r* such that 1 p 2 e ( x p ) 2 2 2p dx = 0.05 holds. Usually, r* is a small, negative decimal. 100 r* is a percentage and can be thought of as the one-day percent loss such that, in normal market conditions, the portfolio loses more than 100 r* % only 5% of the time. Therefore, the one-day Value-at-Risk at a 5% confidence level is D r * . In the unusual event that r* > 0, the VaR is not very useful. Recall, we found r* such that the portfolio performs worse than r* only 5% of the time. But r* > 0, so essentially we are stating that only 5% of the time will the portfolio earn us a positive return between 0 and r* or lose money. Therefore, if we obtain an r* > 0, it is a useless metric. We should run a new VaR analysis with a lower confidence level until we obtain an r* < 0. IV. Some Final Observations It is important to note the large amount of data required to undertake a VaR calculation. In the generalized problem, the expected returns and variances of each asset, as well as the covariances between the assets were all given. With modern computing power, it is relatively easy to obtain these values. Normally, a risk manager will have access to historical return data for each asset, so only a few lines of code are required to calculate 15 expected returns, variances, and covariances. Still, the point that a great deal of data is needed before one can even approach calculating a portfolio VaR cannot be stressed enough. It is also important to note that VaR is a very versatile model. While this paper used a normal distribution, practically any distribution can be implemented. This gives the risk manager the ability to tailor a VaR model to the specific characteristics of the portfolio he or she is dealing with. Lastly, an interesting trend in risk management has been the movement toward probability distributions that have “fatter tails” (ie. distributions that give greater weighting to outlying, multi-sigma events). A major realization of the recent financial crisis has been that financial returns do not always approximate a normal distribution or some other benign distribution. Extreme events, often termed “Black Swans,” tend to occur more frequently than such distributions would predict. Consequently, greater emphasis has been placed on using distributions with fatter tails that give a larger weighting to these extreme events. 16 References Benninga, Simon. 2000. Financial Modeling. 2nd Edition. Cambridge, MA: The MIT Press. Bodie, Zvi, A. Kane, and A.Marcus. 2008. Essentials of Investments. 7th Edition. New York: McGraw-Hill Irwin. “The Gods Strike Back: A Special Report on Financial Risk.” The Economist. 13 February 2010. Ross, Sheldon. 2006. A First Course in Probability. 7th Edition. Upper Saddle River, NY: Pearson Prentice Hall, Inc. Strang, Gilbert. 2007. Computational Science and Engineering. Wellesley, MA: Wellesley-Cambridge Press. Taleb, Nicholas Nassim. 2007. The Black Swan. New York: Random House, Inc. 17