The Calculus of Finance For our entire range of books please use search strings "Orient BlackSwan", "Universities Press India" and "Permanent Black" in store. The Calculus of Finance Amber Habib Mathematical Sciences Foundation New Delhi THE CALCULUS OF FINANCE Universities Press (India) Private Limited Registered Office 3-6-747/1/A & 3-6-754/1, Himayatnagar, Hyderabad 500 029 (Telangana), INDIA e-mail: info@universitiespress.com Distributed by Orient Blackswan Private Limited Registered Office 3-6-752 Himayatnagar, Hyderabad 500 029 (Telangana), INDIA e-mail: info@orientblackswan.com Other Offices Bengaluru, Bhopal, Chennai, Guwahati, Hyderabad, Jaipur, Kolkata, Lucknow, Mumbai, New Delhi, Noida, Patna, Visakhapatnam © Universities Press (India) Private Limited 2011 First Published in 2011 Reprinted 2018 eISBN 9789389211023 e-edition:First Published 2019 ePUB Conversion: TEXTSOFT Solutions Pvt. Ltd. All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted by copyright law. For permission requests write to the publisher. To my brother Faiz, for lighting my way Contents Preface List of Notation 1 Basic Concepts 1.1 Arbitrage 1.2 Return and Interest 1.3 The Time Value of Money 1.4 Bonds, Shares and Indices 1.5 Models and Assumptions 2 Deterministic Cash Flows 2.1 Net Present Value 2.2 Internal Rate of Return 2.3 A Comparison of IRR and NPV 2.4 Bonds: Price and Yield 2.5 Clean and Dirty Price 2.6 Price –Yield Curves 2.7 Duration 2.8 Term Structure of Interest Rates 2.9 Immunisation 2.10 Convexity 2.11 Callable Bonds 3 Random Cash Flows 3.1 Random Returns 3.2 Portfolio Diagrams and Efficiency 3.3 Feasible Set 3.4 Markowitz Model 3.5 Capital Asset Pricing Model 3.6 Diversification 3.7 CAPM as a Pricing Formula 3.8 Numerical Techniques 4 Forwards and Futures 4.1 Forwards and Futures 4.2 Forward and Futures Price 4.3 Value of a Futures Contract 4.4 Method of Replicating Portfolios 4.5 Hedging with Futures 4.6 Currency Futures 4.7 Stock Index Futures 5 Stock Price Models 5.1 Lognormal Model 5.2 Geometric Brownian Motion 5.3 Suitability of GBM for Stock Prices 5.4 Binomial Tree Model 6 Options 6.1 Call Options 6.2 Put Options 6.3 Put–Call Parity 6.4 Binomial Options Pricing Model 6.5 Pricing American Options 6.6 Factors Influencing Option Premiums 6.7 Options on Assets with Dividends 6.8 Dynamic Hedging 6.9 Risk-Neutral Valuation 7 The Black–Scholes Model 7.1 Risk-Neutral Valuation 7.2 The Black–Scholes Formula 7.3 Options on Futures 7.4 Options on Assets with Dividends 7.5 Black–Scholes and BOPM 7.6 Implied Volatility 7.7 Dynamic Hedging 7.8 The Greeks 7.9 The Black–Scholes PDE 7.10 Speculating with Options 8 Value at Risk 8.1 Definition of VaR 8.2 Linear Model 8.3 Quadratic Model 8.4 Monte Carlo Simulation 8.5 The Martingale Appendix A: Calculus A.1 One Variable Calculus A.2 Partial Derivatives A.3 Lagrange Multipliers Method A.4 Differentiating under the Integral Sign A.5 Double Integrals Appendix B: Probability and Statistics B.1 Basic Probability B.2 Random Variables B.3 Cumulative Distribution Function B.4 Binomial Random Variable B.5 Normal Random Variable B.6 Expectation and Variance B.7 Lognormal Random Variable B.8 Cauchy Random Variable B.9 Bivariate Distributions B.10 Conditional Probability B.11 Independence B.12 Multivariate Distributions B.13 Covariance Matrix B.14 Linear Regression and Least Squares B.15 Random Sampling B.16 Sample Mean, Variance and Covariance B.17 Central Limit Theorem B.18 Stable Distributions B.19 Data Fitting B.20 Monte Carlo Simulation Appendix C: Solutions to Selected Exercises Bibliography Preface Mathematics has always enjoyed a close relationship with financial matters. Early developments in arithmetic owed much to the needs of accounting, and even geometry was influenced by the need of the State to measure area to fix taxes. While economics deals with the general issues regarding money and its place in society, finance has a narrower aim: how should we invest our money to make it grow the most? This sharper focusmakes itmore tractable to mathematical treatment. It is exciting that relatively elementary mathematics can lead to quite deep results in finance, including work that haswon theNobel Prize. At the same time, the problems of finance have helped motivate new mathematics of the highest order. Mathematical finance offers a new solution to the perennial problem mathematicians face—convincing people that our work has some significance for society. This book will introduce you to the basic concepts and products of modern finance. The emphasis is not somuch on the details of the financialworld as the basic principles by which we seek to understand it. Thus the aim of the book is to teach you how to think about finance. This seems particularly pertinent in the context of market upheavals that appear to be caused, to a fair extent, by the careless application of mathematical tools to the creation and pricing of complicated contracts. The carelessness stems from a lack of intuition or regard for the importance of the assumptions underlying the models, leading to incorrect evaluation of risk. As this book will show you, the understanding and quantification of risk is the central problem of finance. The book is based on material developed for the courses in mathematical finance of the Mathematical Sciences Foundation, Delhi. (MSF’s website is www.mathscifound.org) These courses are mainly aimed at undergraduates, but also attract students of professional courses as well as those in employment. They are taught portfolio analysis and financial derivatives, with the highlights being the Markowitz Model, the Capital Asset Pricing Model and the Black– Scholes approach to options pricing. The students come from a wide variety of backgrounds and so the required mathematics is also taught in parallel to the material on finance. We emphasise hands-on work, with extensive lab as well as student projects where theory is applied to (and tested by) real-life data. I have tried to retain the flavour of theMSF programs in that the book should be accessible to undergraduates and others of varying backgrounds. (Exposure to basic calculus and probability is all that is required. No prior knowledge of economics or finance is needed.) For those who have not taken any probability or calculus after high school, the required mathematics is described in fair detail in the Appendix. The book is peppered with examples that use real-life data to ground the theory. Exercises are also scattered through the text—their purpose varies fromsimple practice in applying formulas to extending the ideas in the text to new situations. The numbering of the exercises and examples needs some explanation. Exercise 1.4.2 will be found in Chapter 1, Section 4. Examples share the same numbering scheme with Exercises, so that Example 1.4.3 is found in Chapter 1, Section 4, just after Exercise 1.4.2. As you read the book, you will notice that not many references have been provided to the original sources. One reason is that it is not always clear who first had a certain idea, or how much credit should be given to the person who put it in its final or most popular form. For instance, a technique may be used, perhaps implicitly, by traders and investors long before it gets academic treatment and acquires a provenance. A large and contentious book could be written on the many claims to originality (one already has been: Rubinstein [42]). The best thing would be for you to followup this bookwith one ormore of the detailed texts on finance, for example, Bodie, Kane and Marcus [7], Brealey andMyers [8], Cuthbertson and Nitzsche [16], Hull [26], Luenberger [28], and Sharpe, Alexander and Bailey [45]. Formore onMathematical Finance, the following books are at about the same level as this one, but with varying choices of coverage: Capinski and Zastawniak [9], Ross [40], and Stampfli and Goodman [48]. To read more advanced texts, you would need to become proficient in stochastic calculus. A gentle start in this direction is provided by Mikosch [33], while the two volumes of Shreve [46, 47] are more advanced and comprehensive. I am grateful to my colleagues at MSF for support and inspiration in countless ways. Perhaps the most striking is their commitment to creating innovative teaching and research programs centered around the interaction of mathematics with all aspects of the world we live in. I particularly thank Professor Dinesh Singh, Director, MSF, for inviting me to take part in MSF activities and for sharing his vision of mathematics. Professor Sanjeev Agrawal has been a constant source of ideas, advice, and energy. Other colleagues who have helped me refine my thoughts on finance are Charu Sharma, Divya Beri, Jatin Anand, Niteesh Sahni and Ziaur Rehman. At Universities Press, I must thank its Director, Madhu Reddy, and editors Shubashree Desikan and Sreelatha Menon for encouragement, advice, and gentle prodding. Thanks are due to the referee for pointing out various ways of improving the text. Two old friends have played a special role in this story. Surajit Basu added to the long list of kindnesses he has done me by commenting on an early draft. Adnan Aziz rode in like the proverbial white knight just before publication and saved me from an army of ambiguities and omissions. I appeal to the reader to help root out those that remain by writing to me at amber@mathscifound.org. And, finally, my most heartfelt thanks to my wife Abha and son Zafar for continually showing me new ways of looking at life. Amber Habib Mathematical Sciences Foundation New Delhi List of Notation β, 82 , 217 B(n,p), 217 Cov[X,Y ], 229 δC , 166 DFW, 47 DM, 39 e, 10 , 214 ≐, 197 E[X], 221 fS,T, 43 FX , 214 fX , 212, 213 fX,Y , 227 γC , 168 μ, 120 ∇f, 203 N(μ,σ), 219 Ω, 210 Φ, 157 ℙ, 210 p* , 139, 141, 148 R, 247 R2 , 91 r, 59 reff, 13 ρ, 230 ρC , 171 S, 209 σ, 59, 120 σX , 222 σI , 165 σX 2, 222 σXY , 229 S2 , 244 sT , 42 SXY , 246 ΘC , 170 Var[X], 222 V C , 170 X, 243 x+ , 157 xq , 216 1 Basic Concepts T he aim of finance is to explore how money should be invested. Imagine that you have inherited a large sum of money from a rich uncle. The sum is so large that even when you have satisfied your immediate needs, you still have a considerable amount left over. What can you do with it? Some typical responses are: 1. Put it in a savings account. 2. Put it in a fixed deposit. 3. Buy bonds. 4. Buy shares. 5. Invest in a mutual fund. 6. Buy gold. 7. Buy real estate. Of course, there are many other possibilities, but let us start with just these. The question which arises is––in which of these should we put our money? This naturally depends on how the nature of these investments matches with our requirements. For example, the advantage of a fixed deposit as opposed to a savings account is that the former pays a higher rate of interest. The savings account, on the other hand, allows you constant access to your money, while a fixed deposit requires the money to be with the bank for a set time such as three months or a year. A bond provides regular payments over a set time period in return for an initial payment and is thus rather like a fixed deposit. Many bonds can be traded during their lifetime, and this provides additional flexibility to the investor. Bonds are also issued by companies, not only banks, and typically offer higher gains than fixed deposits. But there is a downside––if the company hits sufficiently bad times it may not be able to meet its obligations and the investor may not receive the promised payments or even get the initial payment back. In other words, the investor faces default risk, though only in exceptional circumstances. Risk also enters the picture if the investor wishes to sell the bond before its expiry, since the price would be affected by prevailing market conditions and the perceived financial stability of the issuer. Risk comes into even more prominence when we consider the remaining possibilities for investing. Share prices, for instance, vary greatly from day to day. Even if we invest in a company with an excellent record, there is no guarantee that we will gain by owning its shares over the next few months. On the other hand, if we hold on to the shares for many years we have a good chance of making a handsome profit. It used to be thought that bonds provide the optimal way to do well in the long run––say over 20 or 30 years. The current opinion, however, is in the favour of shares, provided one invests in a diverse collection of companies and thus reduces the possible loss due to one or more of them doing badly. Mutual funds, which distribute the investor’s money over such a collection, cater to the investor who wants steady long-term growth. The investor who wishes to make money quickly would invest in just a few shares that he believes are going to do exceptionally well in the immediate future. Such an investor would naturally be exposed to high levels of risk. RISK AND PROFIT Our discussion has brought forth some aspects of risk and profit. In this book we shall investigate these in greater detail. The main task is to quantify the relationship between risk and profit, so we can make well-informed and precise decisions. The initial problem is to figure out the ‘correct’ price for a product, by which we mean a price that satisfies both buyer and seller. We will refer to it as the value of the product. The products, by the way, could be anything from commodities like cars or wheat, to bonds and shares, or even contracts about future transactions. We will use the generic term asset for the products being traded. A collection of assets will be called a portfolio. Beyond pricing, the main decision is what assets to invest in. Naturally, we would like to invest in ones whose value seems likely to increase at a faster rate. It is almost a law of Nature, however, that bigger promises are also less reliable. In fact, less reliable promises must be bigger if they are to have any takers. Thus, there is a tradeoff between expected profit and risk: to aim for higher profit, the investor must undertake greater risk. The fundamental problem in finance is to understand the relationship between risk and profit. The word risk is used in finance in a special way. It refers to uncertainty and does not necessarily have a purely negative connotation. Thus, consider the choice between putting money in a bank account or using it to buy shares in a company. The second investment is riskier because it has more uncertainty, but it is not obvious how its worth compares with that of the the first one. Lotteries provide an extreme instance of high-risk investments which are nevertheless popular. Figure 1.1: Long term behaviour of the return from some asset classes in the US over the years 1926–1999. (Data from Stocks, Bonds, Bills and Inflation 2002 Yearbook, Ibbotson Associates. Used with permission from Morningstar, Inc.) Note how the classes with greater mean return are also the ones with greater fluctuation. Treasury bills, which are considered essentially risk free, barely outperform inflation! PROBABILITY This discussion leads us to the role of probability in finance. The relative worth of an investment depends on the probabilities of the possible pay-offs. If higher pay-offs are perceived as more likely, its value should increase. For example, if we can model the fluctuations in prices of a stock, we can assign probabilities to the possible payoffs from buying that stock, and thus estimate its value to the investor.1 Specifically, we treat the future profit as a random variable. Its expectation then represents the expected profit, while its standard deviation represents fluctuations and hence risk. (See Figures 1.1 and 1.2.) 2 Figure 1.2: This diagram considers the 65 stocks making up the Dow Jones Composite Index and their weekly profits over the one year period ending November 6, 2006. The mean profit (per dollar invested) is plotted on the vertical axis, and the standard deviation of the profits (representing risks) is plotted on the horizontal axis. The curve has been drawn to emphasise the absence of stocks with high mean profit but low risk – this indicates that higher mean profit requires greater risk. RISK-FREE ASSETS Some assets can be viewed as free of risk. For instance, deposits in banks and bonds bought from governments are typically treated as risk-free. Of course, both banks and governments can collapse, but such instances are rare. We shall soon see that there is good reason to expect that all risk-free assets will gain in value at the same rate, and we may therefore talk of the risk-free rate of growth. This rate is not universally fixed, but varies with market and time. PORTFOLIOS So far, we have considered individual assets. To design a portfolio, we need to consider not only the individual characteristics (regarding profit and risk) of the assets, but also their relationships with each other. Two assets could be linked together in certain ways––for instance, they may show a tendency to rise or fall in value together. Alternately, one may tend to move in the opposite direction to the other. In the latter case, a rise in one would be offset by a fall in the other, and a portfolio consisting of both these assets would be less risky than a portfolio consisting of only one of them! By combining assets in various ways, we can tailor a portfolio to satisfy the risk preferences of any investor. HEDGING The process of reducing risk by combining assets appropriately is called hedging. By hedging we reduce risk and therefore, also lower our expected profit. One of the goals we will pursue in this text is to see how to hedge against specific risks to which a portfolio is exposed, for instance fluctuations in the prices of stocks or in interest rates. If the hedging is complete, no risk will remain, and the portfolio will grow slowly at the risk-free rate. Therefore, we will also consider how to hedge to the right extent, so that the remaining risk just falls within acceptable levels and the portfolio is able to grow at a faster rate. In the latter part of this book we will consider the financial instruments known as derivatives. Derivatives are contracts that fix the terms for future trades. A simple example is a contract that binds two parties to a sale of crude oil, six months from now, at a price of $50 per barrel.3 A prime use of derivatives is to reduce uncertainty about future expenses (or profits), and they have become a very popular means of hedging. The creation and pricing of suitable derivatives is a major focus of modern finance. 1.1 ARBITRAGE Arbitrage is the making of profit without undertaking risk. It can be earned, for instance, when a product is being sold at different prices in different markets. Then risk-free profit can be made by selling it where it is costlier and buying it where it is cheaper. A variation is when the different prices are at different times, so that it is possible to buy today at a low price and sell some days later at a higher price. For this profit to qualify as arbitrage, however, it must be absolutely certain beforehand that the price will go up. Exercise 1.1.1 Consider the following situations. Is it possible to exploit them so that profit is certain? 1. A kind relative offers to sell you a share whose value has gone up by at least 15% every year for 50 years. 2. A valid lottery ticket is lying on the road. 3. Horses A and B race against each other. If you bet a rupee on horse A and it wins, you get back Rs 2. If you bet a rupee on horse B and it wins, you get back Rs 4. If you bet on a horse that loses, you lose your money. Here is an early use of the term arbitrage in our sense, occurring in a description of stock and derivative trading in eighteenth-century Holland: ‘There are other arbitrages and other profitable combinations independent of gambles or events, which are executed by combining 2 or 3 simultaneous transactions.’ Traite de la Circulation et du Credit by Isaac de Pinto, 1771. (Quoted in Poitras [36]) Deciding whether a profit has been made can be tricky. If you invest Rs 10 and after a while it becomes Rs 20, you may feel you have made a profit of Rs 10. Suppose however, that you are based in another country and count your gains in dollars. In the given time period, if the value of the rupee in terms of dollars falls sufficiently far you will perceive a loss rather than a gain. Yet again, suppose that in the same period a rupee put in a savings account would have more than doubled. Then it would be difficult to see the first investment as truly representing a profit. A simple way to resolve these ambiguities is to demand that arbitrage must be carried out without investing your own money (essentially, this means you start by borrowing some money and then pay off the loan by the end). If you start with zero and end up with something, you have definitely made a profit. A basic principle is that arbitrage opportunities are short-lived: Prices evolve in such a way as to eliminate them. For, as soon as it is realised that a product is under-valued and is creating an arbitrage opportunity, investors will rush to buy it. This will drive up its price, reducing and ultimately eliminating the arbitrage opportunity. Similarly, if the opportunity arises from an over-priced product, there will be a rush to sell it and this will drive its price down. Reflecting on this process, we are led to the formal definition of arbitrage. We start by noting that the amount of profit does not have to be known beforehand: it is enough to know that it cannot be negative and has a chance of being positive. This will suffice to attract investors and initiate the stabilisation process described above. Definition 1.1.2 An investment strategy is said to lead to arbitrage if: 1. It does not involve an initial investment of the investor’s own money. 2. It is known that at some future time the investment will have a value which is definitely non-negative and additionally has a nonzero probability of being strictly positive. Exercise 1.1.3 Which of the following situations provides an arbitrage opportunity? 1. A guarantee that in return for Rs 10 paid now, Rs 20 will be returned after ten years. 2. A guarantee that in return for Rs 10 paid now, Rs 20 will be returned tomorrow. 3. A lottery ticket. 4. A free lottery ticket. 5. Bank A loans money at an annual interest rate of 10%, while bank B pays 15% interest annually on deposits. 6. Bank A loans money at an annual interest rate of 15%, while bank B pays 10% interest annually on deposits. How long an arbitrage opportunity lasts depends on the communication within the market. The better it is, the faster investors will react to the situation and eliminate the opportunity. Thus, in the idealised situation of an efficient market, in which communication is instantaneous and complete, arbitrage opportunities will die immediately. This is our main assumption and is used throughout our text. Its brief statement is: No Arbitrage Principle: In an efficient market, there are no arbitrage possibilities. The No Arbitrage Principle is a surprisingly powerful tool for establishing the ‘correct’ price of a product and underlies every important result in this book. It seems to have been first noticed by Louis Bachelier in his doctoral thesis in 1900 when he described ‘transactions in which one of the parties makes a profit at all prices’ and also noted that ‘these are never found in practice.’4 Its systematic use in modern finance, however, was initiated by Franco Modigliani and Merton Miller in the 1950s. Both received the Nobel Prize in economics––Modigliani in 1985 and Miller in 1990. 1.2 RETURN AND INTEREST Consider an asset whose value evolves from V0 at an initial time t = 0 to VT at a later time t = T. Then the return from this asset over the time interval [0, T] is defined to be Return = VT – V0. The rate of return is defined by Rate of return = . Commonly, one also writes ‘return’ for ‘rate of return’. Confusion is avoided by noting that return has a currency as unit, while the rate of return is unit-free. We will further express rate of return in percentages. Thus the phrase ‘The return was Rs 10’ refers to the first definition, while ‘The return was 10%’ refers to the second and conveys that the rate of return was 0.1. INTEREST We will call the income from an investment interest if it is earned regularly and in a predetermined manner, without risk. (This is a rather narrow use of the word and we employ it for clarity at this initial stage.) Interest can be calculated according to different conventions. Consider a starting amount P (called the principal) on which interest is earned over a time period T. The amount of interest earned is given by the rate of interest, denoted r, in accordance with the adopted convention. The rate r is given relative to some time interval, called its period. The most commonly used period is one year, in which case the rate is called annual. Rates are commonly given as percentages, which have to be converted to fractions for calculations. SIMPLE INTEREST In simple interest, the interest earned over one period is not added to the principal (e.g., it may be returned to the investor), and further interest is again earned on the principal alone. Thus, if P is invested at a rate of interest r, the amount after one period is A = P + Pr = P(1 + r). During the second period, interest is again earned on P alone, so that the amount after two periods is A = P(1 + r) + Pr = P(1 + 2r). This calculation easily extends to the general case: Theorem 1.2.1 Suppose simple interest is earned on an investment P at a rate r over n periods. Then the final amount A is A = P(1 + nr). □ Example 1.2.2 A common example of simple interest is the provision of certain types of fixed deposits by banks. The interest earned on the money in such a fixed deposit account is returned to the investor so that future interest is earned on the original principal alone. □ DISCRETE COMPOUND INTEREST In compound interest, interest earned over one period is added to the principal and earns interest in subsequent periods. If an amount P is invested at a rate r, then the amount after one period is A = P(1 + r), just as for simple interest. However, in the second period P(1 + r) serves as the principal, so that the amount after two periods is A = P(1 + r)(1 + r) = P(1 + r)2. The general case is again easy to obtain: Theorem 1.2.3 Suppose compound interest is earned on an investment P at a rate r over n periods. Then the final amount A is A = P(1 + r)n. □ Sometimes the period for which the rate is quoted is not the same as the interval at which interest is compounded. For instance, the rate may be given as an annual one, while the interest is calculated every 6 months. In this situation, the rate is adjusted linearly, as in the following example. Example 1.2.4 Suppose you invest Rs 10,000 for one and a half years at an annual rate of 10% with semiannual compounding (that is, the compounding is every six months). Then interest is calculated for each six-month period at half the annual rate, i.e., at 5%. Therefore, over one-and-a-half years, the invested amount becomes A = 10,000 × 1.053 = 11,576.25. □ If interest is compounded m times during the period of the rate, then the rate per compounding period is set to r/m and so we have the following result. Theorem 1.2.5 Suppose compound interest is calculated with a rate r and is compounded m times per period. Then, over n periods an investment P grows into an amount A given by: A=P n. □ Example 1.2.6 Savings accounts in banks provide an example of compound interest since the interest earned on the amount in the account is fed back into the account. □ Exercise 1.2.7 Suppose you take a loan of Rs 1000, and have to pay it back in two equal and equally spaced installments over a year. The annual rate of interest applied to this loan is 15% and the interest is compounded semi-annually. a. What will be the size of each installment? b. How much of each installment will go toward the principal and how much toward the interest? (Assume that each payment has to pay off the outstanding interest at the time.) CONTINUOUS COMPOUND INTEREST Consider a bank offering interest compounded annually at a rate r. Suppose it allows an investor who withdraws his money at a time t before one year to earn interest at a linearly adjusted rate of rt. For example, an investor can withdraw his investment of P after 6 months, together with the interest earned. It would total P(1 + r/2). He can then immediately reinvest it for another 6 months. This strategy nets him a final amount of A = P(1 + r/2)(1 + r/2) = P(1 + r + r2⁄4), which is slightly better than the P(1 + r) he would have had if he had just let the money sit in the bank for the whole year. An investor who can create this strategy will certainly think of pushing it further by using smaller and smaller investment periods. In general, if he withdraws and reinvests m times, he will end up with m. A=P For example, if P = 100 and r = 10%, we have m=2 ⟹ A = 110.250 m=4 ⟹ A = 110.381 m = 12 ⟹ A = 110.471 m = 52 ⟹ A = 110.506 m = 365 ⟹ A = 110.516 The larger the value of m, the greater is his profit. This naturally leads to considering the limit case m →∞. To evaluate this limit, we need to recall the number e, which is called Euler’s number and is defined by Euler’s number is approximately 2.71828. Now we can calculate: This suggests creating a new kind of interest calculated by A = Per for a single period. Interest calculated according to this formula is said to have been continuously compounded, and r is called the continuously compounded rate of interest. Theorem 1.2.8 Suppose continuously compounded interest is calculated with a rate r per period. Then, over n periods an investment P grows into an amount A given by A = Penr. □ Again, if we take P = 100 and r = 10%, then the continuously compounded amount after a year is A = 100 e0.1 = 110.517, while daily compounding over a year gave 110.516. Thus daily compounding is barely differentiable from continuous compounding! INTEREST AT ARBITRARY TIMES We have been considering the interest earned when money is invested for a full time period, or for n full time periods. Now we look at what happens when an investor withdraws her money at some intermediate time. In particular, let the investor withdraw her money after a time T which consists of n full time periods and a final fraction t of a time period (so 0 ≤ t < 1). The convention we will adopt is that during the fractional period the rate of interest is adjusted linearly to rt. (Note that this is consistent with the convention used when the period for the quoted rate differs from the compounding period.) Then, over the time T, the invested amount becomes Simple interest: A = P(1 + nr) + Prt = P(1 + rT) Discretely compounded interest: A = P(1 + r)n(1 + rt) Continuously compounded interest: A = Penrert = PerT The formulas for simple and continuously compounded interest are mathematically simple, while that for discrete compounding is slightly more complicated. If we plot A versus T, then for simple interest we get a straight line, for discretely compounded interest a sequence of straight line segments with increasing slope, and for continuously compounded interest a smooth exponential curve (Figure 1.3). Figure 1.3: This diagram shows the growth of Rs 100 according to the different interest rates, each with r = 10%, over a period of 10 years. Exercise 1.2.9 Consider a bank that offers to double your investment in 10 years. What is the corresponding annual rate of interest if we assume the interest is: a. simple b. compounded annually c. compounded continuously. Continuous compounding is mathematically pleasant in another way. Withdrawing and reinvesting becomes just the same as making a single long investment since erT1erT2 = er(T1+T2). Moreover, if continuous compounding is used, the same r can be used for borrowing and lending without creating arbitrage opportunities. This is not the case with discrete compounding. Exercise 1.2.10 Suppose a bank has fixed an annual 5% discretely compounded interest rate for both deposits and loans. Show that this creates an arbitrage opportunity. Exercise 1.2.11 Given that continuous compounding has nicer behaviour than discrete compounding, can you explain why financial institutions use the latter? Let us now consider the possibility of different interest rates being available in the market. The differences could be of various types: 1. Use of different types of interest (simple or compound with different periods). 2. Different rates offered by different financial institutions. 3. Different rates for deposits and loans. 4. Different rates for investments of different time durations. Typically, institutions use discretely compounded interest. The difference would be in the frequency of compounding. The same value of r but with more frequent compounding leads to more interest being earned. Thus institutions doing more frequent compounding would also use slightly lower values of r. Continuous compounding is used more in mathematical modelling, and these models would use a value of r that is essentially equivalent to that being used for discrete compounding in the real world. EFFECTIVE RATE OF INTEREST One way to reduce the confusion from different kinds of interest is to calculate for each the amount of interest it earns over one year. In other words, we calculate the annually compounded interest rate that would generate the same amount of interest. Thus, suppose that a principal P has grown to an amount A by earning interest over a year. Then the interest earned is A–P. The corresponding effective rate of interest is defined to be the interest earned in one year per unit invested: reff = . We can expect that the effective rates of various available interestearning schemes would be the same. The quoted rate of interest is called the nominal rate, to distinguish it from the corresponding effective rate. Example 1.2.12 Consider a nominal rate r of 10% annually. If this is used with annual compounding, then the corresponding effective rate is again 10%. If the compounding is semi-annual (every 6 months), the effective rate becomes (1 + 0.1⁄2)2 – 1 = 0.1025, i.e. 10.25%. Finally, if continuous compounding is used, the effective rate is e0.1 – 1 = 0.1052 or 10.52%. □ Exercise 1.2.13 A credit card offers a cash withdrawal facility at a “low” monthly rate of 2%. What is the corresponding effective annual rate? Exercise 1.2.14 Consider two investments A and B of the same amount, and at the same effective annual interest rate. Suppose A earns semi-annually compounded interest and B earns continuously compounded interest. a. Which one earns more interest if the period of the investment is: 6 months, 9 months, 1 year? b. Suppose the invested amount is Rs 1000, and the common effective rate is 10%. What is the maximum difference in the interests earned by A and B at any point during the first 6 months? (The answer is quite small, so use a good number of decimal places in your calculations.) Let us now look at the other kinds of variations in interest rates listed on page 12. The second kind of variation can be expected to be small due to competition. A bank offering lower interest on deposits than its competitors would soon start losing customers and would have to raise its rates.5 The third kind certainly exists. Thus, a bank will offer lower interest on deposits than it will exact on loans. However, it should be noted that only the first rate can be reasonably seen as risk-free. The second rate involves a risk taken by the bank, which explains why it is higher. Exercise 1.3.15 Show that the No Arbitrage Principle rules out a bank offering higher interest on deposits as compared to loans. The fourth kind of difference is quite important and we will consider it in detail later (§2.8). As a general (but not universal) rule, investments for longer time durations are granted higher interest rates. The idea is that such an investment is exposed to more risk over its life and, to compensate, it must promise a higher profit. Example 1.2.16 In April 2005, 6-month investments in Reserve Bank of India bonds were earning 5.4% interest, while 12-month investments were earning 5.6%. □ We can, therefore, expect the same effective interest rates for investments of the same duration. Thus, we may (and do) talk of a common risk-free rate that applies to all risk-free investments over the same time period. 1.3 THE TIME VALUE OF MONEY Consider two offers: the first promises you one rupee right away and the other, after a month. Assuming both the offers are from trustworthy sources, do you have any reason to prefer one to the other? The simple answer is that it is better to get the money early, as it can be put in a bank to start earning interest. This example illustrates the important idea that the value of a transaction involves not only an amount of money but also the time at which it is undertaken. PRESENT AND FUTURE VALUE We have observed that holding a rupee now is not the same as holding it a year from now. Well then, what is the precise difference between the two? It depends on how much the rupee could have earned in a year by means of interest. Example 1.3.1 Suppose a rupee can be invested at a simple annual rate of 5%. Then, after a year, it becomes 1.05 rupees. In this situation, earning a rupee now is equivalent to earning 1.05 rupees after a year. □ In the above example, Rs 1.05 is the future value of Rs 1. Conversely, Rs 1 is the present value of Rs 1.05. In general, consider two amounts P and F that exist at times t1 and t2 respectively, with t1 < t2. Let the risk-free rate of return over the interval [t1,t2] be r. Then we call P the present value of F (at t1) if P= . Conversely, F is the future value of P (at t2). The factor C = 1/(1 + r) is called the discount factor for this period. If the risk-free rate is given in terms of interest, then the corresponding discount factor can be calculated as follows: 1. Suppose the annual interest rate is r, with compounding being m times a year (after equal periods of time). Then the discount factor over n periods is C= . 2. If the interest is compounded continuously, then the discount factor over T years is C = e−rT. The following cannot be over-emphasised: Amounts of money existing at different times must not be compared or combined without taking into account the relevant discount factors. INFLATION Another way that the value of money changes through time is with respect to its purchasing power. Typically, the same amount of money can buy less and less as time progresses––this phenomenon is known as inflation. Inflation shows up as a general increase in prices. Occasionally, prices may fall, and then we have a deflation situation. But inflation is the general trend. By averaging the rise in prices over various commodities, one can arrive at a single number––the rate of inflation f––which represents the annual decrease in purchasing power of a unit of currency. Purchasing power after 1 year = . If an amount A is invested at the risk-free rate r for a year, then the effective amount one has after a year is A, and so, we may talk of the real risk-free rate r′ defined by 1 + r′ = , or r′ = . Exercise 1.3.2 If continuous rates are used for inflation as well as risk-free growth, show that the real risk-free rate is given by r′ = r − f. Example 1.3.3 The Reserve Bank of India calculates inflation rates from the Wholesale Price Index (WPI) which tracks the price of a certain varied mix (or ‘basket’) of commodities. The RBI also maintains subindices for food articles, manufactured articles, and so on. The table below shows the WPI during 1993−99: For example, the inflation rate over 1994-95 would be calculated as follows. Since the price of the basket rose from 100 to 112.6, we could say that the purchasing power of the rupee changed by a factor of 100/112.6 = 0.888. The rate of inflation can now be obtained: = = 0.888 f = 0.126 or 12.6%. We can repeat these calculations to obtain the inflation rate for each period: Exercise 1.3.4: If an investment (in rupees) earned an annually □ compounded 8% interest throughout the period 1993–99, what would be the return from it in real terms (i.e., in terms of purchasing power)? In this book, we will not worry about inflation. The above discussion shows that, if necessary, inflation can be taken into account by a suitable modification of the risk-free rate. Indeed, the fact that inflation applies to all amounts of money makes it less relevant to financial choices than we may expect. For example, consider a choice between the following offers: 1. Rs 100 now 2. Rs 110 a year from now Suppose the available interest rate is 8%. Then the first amount grows to Rs 108 over a year – and this is a bit less than Rs 110. Most people, however, still prefer the first choice. They argue that the Rs 110 will also be subject to inflation and so they would prefer to have the Rs 100 now. Yet the point to remember is that present value is not just a theoretical notion––it has practical implications. If we are sure of receiving Rs 110 in a year, we can borrow its present value against it today. In this case that amounts to Rs 101.85. In other words, the second offer is equivalent to an offer of Rs 101.85 today, and so it wins no matter what the rate of inflation is. 1.4 BONDS, SHARES AND INDICES Bonds and shares are the principal means by which institutions raise money for their operations, and hence they provide the chief avenues for investment. A bond is a contract written by a company or government (called its issuer). The purchaser of the bond makes an immediate payment to the issuer and, in return, is entitled to a predetermined number of regular payments in the future. Thus, a bond is essentially a loan. Institutions use bonds to raise money when the amount needed is too large to be obtained from a single source. Investors use bonds as relatively safe investments providing a higher rate of return than a simple deposit in a bank. Judiciously used, they can provide insulation from interest rate changes as well. (We will take up this application in §2.9.) A share represents a part of the capital of a company. Its owner is thus a part-owner of the company and takes part in its fortunes. The share may offer him certain voting rights in the affairs of the company. Many companies also release regular payments, called dividends, to their shareholders out of their profits. The term stock is also used for a share. Another usage of the word stock is the total capital represented by the shares, or the total market capitalisation (TMC) of the company. A stock index is a hypothetical portfolio used for keeping track of the general trends in the market. Here are some examples of stock indices from India6: 1. BSE Sensex is based on 30 stocks forming a sample of large, liquid and representative companies listed on the Bombay Stock Exchange (BSE). 2. S&P CNX Nifty consists of the 50 stocks with highest TMC on the National Stock Exchange of India (NSE). It represents about 60% of the TMC on the NSE. 3. S&P CNX 500 consists of 500 stocks and covers 91% of the total turnover on NSE and 92% of the market capitalisation. 4. CNX Realty Index represents 85% of market capitalisation and 88% of turnover in the real estate sector. Here are some international examples: 1. Standard and Poor’s 500 Index (S&P 500), is based on 500 US stocks. S&P 500 covers about 70% of the total TMC and 78% of the total traded value. 2. Dow Jones Industrial Average (DJIA) consists of 30 of the largest public companies in the US. It was created in 1896, when it consisted of 12 companies (See Figure 1.4). Figure 1.4: The Dow Jones Industrial Average Index (DJIA) between 1928 and 2008 3. NASDAQ 100 consists of 100 of the largest companies (including non-US ones) listed on the NASDAQ exchange. It is relatively heavy in IT companies. Infosys is one of the current components of this index. 4. NASDAQ Composite (or just ‘the NASDAQ’) includes every company listed on the NASDAQ exchange––currently more than 3000. 5. NYSE Composite includes each of the over 2000 stocks listed on the New York Stock Exchange. 6. FTSE 100 consists of the top 100 companies in terms of TMC on the London Stock Exchange. It represents about 80% of the TMC on the London Stock Exchange. 7. Nikkei 225 is based on the Tokyo Stock Exchange. 8. Hang Seng Index consists of 39 companies listed on the Hong Kong Stock Exchange and comprising 65% of its TMC. Figure 1.5: In this diagram we plot the logarithms of the DJIA values and see that they have a strong linear trend. They also enable a clearer look at the relative size of fluctuations over a long term. Of these various stock indices, the smaller ones (with 30–50 constituents) give a summary of some particular aspect of the economy––perhaps of its largest companies, or of those belonging to a particular sector. Larger ones attempt to portray the national economy as a whole. Recently, stock indices have been created that track entire continents or the whole world. In this book we will consider two important uses of stock indices. One is as a benchmark against which other portfolios are measured. This aspect will be prominent in the applications of the Capital Asset Pricing Model. Another, which we take up during our study of financial derivatives, is as a tool to hedge against the general movements of the economy. 1.5 MODELS AND ASSUMPTIONS Any mathematical treatment of ‘real life’ must start by simplifying and thereby distorting it. A good choice of simplification is one that distorts less and yet leads to more insight. We may even accept a choice that we know is an over-simplification if it brings clarity to some essentials. Thus, the nature of mathematical modelling is quite different from what we generally expect of mathematics. We are less concerned with the validity of our deductions than with the value of the final result. In this text, the main mathematical assumption is the No Arbitrage Principle, sitting atop the assumption of an efficient market. In the real world, arbitrage opportunities do exist and, in fact, some financial institutions invest considerable effort in quickly detecting and exploiting them. The assumption, however, is almost indispensable because of the tremendous direction it gives to our work. It is further justified by the observation that arbitrage opportunities, when they exist, do not exist for long. We have also ignored many other aspects of real markets. Foremost among these are the various costs associated with transactions––fees charged by exchanges and brokers as well as taxes to governments. These are collectively known as friction in the financial world since they slow down its machine by lowering profits. Friction also has the effect of obliterating small arbitrage opportunities. The result is that the No Arbitrage Principle does not yield a finely balanced single ‘correct’ price but a range of valid prices. Yet another issue is the linearity of prices. This is, after all, true only for small trades. Further, high volume trades affect prices. If you attempt to offload a large amount of a particular stock at today’s price, your act will lower its price, and you will likely end up with less than you had hoped for. We have not taken up such complications in our book. The material in this book, therefore, constitutes only the first few steps on the path to acquiring a detailed understanding of finance. Yet these few steps are the most crucial and bring to you tools of great power and reach. 1 We shall look at specific stock price models in Chapter 5 of this book. 2 You can ignore this discussion for now if you are unfamiliar with the terms random variable, expectation, and standard deviation. However, you should start your study of the basics of probability (Appendix B) and familiarise yourself with these concepts as they will soon become essential. 3 A barrel of oil equals 159 litres. Between January and December 2008, the price of a barrel of crude oil rose from $85 to $122 (in June) and then fell all the way to $33! 4 The English translation is by Davis and Etheridge on page 24 of [17]. The seminal contributions of Bachelier are described in more detail in the introductory remarks of our fifth chapter. 5 The rates would have to be compared after taking into account any fees charged by the banks. A bank offering better service might also get away with a lower rate. Thus, in the real world, we only expect the differences to be small – not absent. 6 For more information, consult the websites of the National Stock Exchange of India (www.nse-india.com) (www.bseindia.com). and the Bombay Stock Exchange 2 Deterministic Cash Flows A cash flow is a sequence x0,x1,…,xn of cash transactions occurring at corresponding times t0, t1, …, tn (We will always take t0 = 0, the present time). Earnings are represented by positive signs, and expenditures by negative signs. For example, if a cash flow has annual transactions –1000, 400, and 700, then the first entry is an expenditure while the next two are earnings. A cash flow may, for example, represent the earnings and expenses associated with a particular investment, or a portfolio of investments, or the entire cash transactions of a company. It may also represent a single transaction. An investor or a manager needs to know how to choose between competing investments or projects. To do this, she has to be able to summarise the corresponding cash flows by one or two numbers that characterise the profits associated to them. The final decision will be made on the basis of these numbers. We will see that there isn’t any one number that works universally. Instead, there are different methods of comparison catering to the different requirements that she may have. In general, a cash flow is random in that the entries are not known ahead of their time. We will start, however, by imagining that the entries are known in advance, i.e., the cash flow is deterministic. This is justified on the grounds that there are important cash flows which are indeed deterministic (for example, income from government bonds, which we shall study later on in this chapter). Furthermore, when choosing between two projects, we calculate their projected earnings and expenses and, for the purpose of comparison, act as if these cash flows will indeed happen. It is often convenient to depict a cash flow by a diagram like the one given in Figure 2.1. In this diagram, the times when the transactions occur are marked along the horizontal axis. The transactions are represented by vertical arrows—these point up for earnings and down for payments, while their lengths represent the amounts of cash involved. Figure 2.1 2.1 NET PRESENT VALUE Consider a cash flow x0, x1, …, xn occurring at times t0 = 0, t1, …, tn. Let Ci be the discount factor for the time interval [t0, ti]. Recall that the discount factor gives the present value of an investment. Thus the present value of xi is Ci xi . The net present value or NPV of the cash flow is just the sum of the present values of all the entries: NPV = ∑ i=0nC ixi. Owning this cash flow is equivalent to an immediate earning equal to its NPV. To see this, we split the NPV into its constituent parts, C0x0 = x0, C1x1, …, Cnxn. We invest each part Cixi in a risk-free asset for the corresponding time ti. Then, over this time it grows into the amount xi. Thus, the NPV is the exact amount needed now to generate the given cash flow. Hence one way to compare different cash flows is to compare their NPVs. Example 2.1.1 Consider two projects with projected annual earnings as given below: 0 1 2 A –500 –100 700 B –500 700 –100 Let the discount factors for the first and second years be 0.9 and 0.8 respectively. Then the net present values of A and B are given by NPV(A) = –500 – 0.9 × 100 + 0.8 × 700 = –30 NPV(B) – 500 –500 + 0.9 × 700 – 0.8 × 100 = 50. = 50 Thus, on the basis of NPV, project B is better than project A. If the time value of money had not been taken into account, the projects would have appeared identical, each showing a total expenditure of 600 and an earning of 700. □ An important special case is when the cash flow occurs at regular intervals (e.g., it may consist of annual payments), and the same riskfree rate can be applied to all its entries. Thus, suppose the risk-free rate is r over each interval and we use discrete compounding. Then the discount factors are and so, where x0 is earned at time zero, x1 after the first interval, and so on. Example 2.1.2 An annuity consists of annual payments of the same amount A. An annuity lasting for n years would be represented by Figure 2.2. Figure 2.2 The NPV of an n-year annuity, with the first payment after 1 year, can be calculated as follows (assuming a constant risk-free rate r): Here we have a geometric sum of the form Fortunately, there is a formula for such a sum: , with x = 1⁄(1 + r). We substitute this formula in the previous equation and obtain If the annuity is from a source that can be considered risk-free, then this NPV is the fair price at which the annuity should be traded at time 0. □ Example 2.1.3 A perpetuity is an annuity that extends forever. Its NPV can be obtained by letting n →∞ in the previous calculation: NPV = . The formula is more easily understood by treating the NPV as the amount which can generate annual payments of A when the rate of interest is r. This immediately gives NPV × r = A. □ Until the First World War, perpetuities were a popular way for European governments to raise money, particularly for war. Speculation centered around trade in these perpetuities was a major financial activity. In 1900, the center for such trade was the Paris Stock Exchange, and the total capital loaned through perpetuities was 70 billion gold francs (by the governments of France, Germany, Russia, etc.). In comparison, the annual budget of France was 4 billion francs! (Source: Taqqu [49]) Net present value is an easy-to-calculate technique for choosing between projects. Another nice property of NPV is additivity: the NPV of two cash flows taken together is the sum of their NPVs. In symbols, NPV(A + B) = NPV(A) + NPV(B). Thus one can build up a view of a project by separately evaluating its parts. Example 2.1.4 Figure 2.3(a). Consider two cash flows, A and B, as shown in Figure 2.3(a) The cash flow A+B collects these transactions into a single cash flow shown in Figure 2.3(b). Figure 2.3(b) □ It is important to keep in mind that net present value does not use only the intrinsic properties of the cash flows but is closely tied to market conditions in the form of interest rates. Since interest rates fluctuate with time, a project that starts off with higher NPV may not stay that way. For instance, in Example 2.1.1, if the sequence of discount factors is changed to 0.8, 0.9, then project A becomes better than project B. In the next section we will study a measure which uses only the intrinsic properties of a cash flow to evaluate it. 2.2 INTERNAL RATE OF RETURN Once again, consider a deterministic cash flow sold at time t = 0 for a price P. Its internal rate of return or IRR is that rate of interest r which would allow the flow to be generated from its price. Another way of putting this is to say that IRR is the rate of interest which would make the cash flow’s NPV equal to P. Let us look at some simple examples. Example 2.2.1 Consider a cash flow of 2,4 occurring after 1 and 2 years respectively. Suppose it is sold at a price P = 5 at time t = 0. For a (discretely compounded) interest rate r, the NPV of this flow is + . Therefore the IRR is obtained by solving the following equation for r: 5= + . This can be rearranged into a quadratic equation 5(1 + r)2 – 2(1 + r) – 4 = 0, leading to the two values r = 0.12, – 1.72. Now, since the price paid is less than the total earned, it is clear that r should be positive. So we reject the value –1.72 and obtain 0.12 as the IRR. □ IRR need not always be positive. For example, suppose the cash flow in the last example was bought at a price P = 7. Then the two solutions for r are r = – 0.09, – 1.63. Both solutions are negative, which was to be expected since the amount paid is more than that received. Can we still reject one? Since some of the amount paid does come back, it is clear that the loss does not reach 100%. Hence we should have r > –1. This leads us to reject the value –1.63 and take the IRR to be –0.09. Figure 2.4: Internal rate of return for the cash flow in Example 2.2.1 In Figure 2.4 we plot the NPV of the last example as a function of r: f(r) = + . The intersection of the graph with the horizontal line at height P gives the possible values of IRR if this cash flow is sold for P at time 0. Figure 2.4 shows that we always obtain one value below –1 and one value above –1. Due to the reasons given above, the value which is above –1 is taken to be the IRR. Consider a regular cash flow of amounts x1, …, xn, occurring at times 1, …, n and sold at time t = 0 for a price P. Its IRR is then a solution of the equation (2.1) If we include P as part of the cash flow, we obtain the cash flow x0 = – P,x1,…,xn, occurring at times 0,1,…,n. In this notation the IRR equation is (2.2) The internal rate of return has the virtue of using only knowledge of the cash flow, without any reference to external and transient factors like interest rates. The characterisation of a cash flow by a rate of growth is also intuitively appealing, and so, the most popular technique for attracting investors is to promise a high IRR. The IRR is a solution of a polynomial equation since equation (2.2) can be rearranged into Unfortunately, a polynomial equation may have no solution, and in that case IRR will not be defined. One of the earliest non-trivial cubics to be solved was + + 15x = 50 by Dardi of Pisa, circa 1350 AD. The equation arose out of a calculation of monthly interest on a loan (Source: Hughes[25]). Exercise 2.2.2 Consider the cash flow x0 = –1,x1 = 0,x2 = –1. Show its IRR is not defined. Exercise 2.2.3 Consider a cash flow in which x0 < 0 and the net inflow is greater than the net outflow. Show its IRR equation has a solution. A more significant source of trouble is that a polynomial equation may have many solutions, and then we may not be able to say which is the ‘right’ IRR. Exercise 2.2.4 Consider the cash flow – 1, 8, – 17, – 2, 24 at times 0,1,2,3,4. Its IRR equation is – (1 + r)4 + 8(1 + r)3 – 17(1 + r)2 – 2(1 + r) + 24 = 0. You can verify by substitution that the solutions are r = – 2,1,2,3. □ There is one situation in which the IRR is defined without ambiguity. Consider a cash flow occurring at times 0, 1, …, n such that there is exactly one sign change in the transactions. Such a flow has the form – x0,– x1,…,– xk,xk+1,…,xn, or x0,x1,…,xk,– xk+1,…,– xn, with each xi ≥ 0. In both cases the IRR equation is – x0 – – – + + = 0. We multiply each term by (1 + r)k and rearrange: x0(1 + r)k + x 1(1 + r)k–1 + + xk = + + . Figure 2.5 Consider the left side of this equation as a function of r. As r varies over (–1,∞), the values of this function increase from xk to ∞. On the other hand, the right-hand side decreases from ∞ to 0. Therefore the two sides meet at exactly one point, as depicted in Figure 2.5. Hence, in this situation, there is a unique value of r which satisfies the IRR equation as well as r > –1. TECHNIQUES FOR CALCULATING IRR In the situations when the internal rate of return is defined, we still have to solve equation (2.1) to find it. Usually, it is not possible to solve it exactly, but there are techniques for finding approximate solutions. The simplest one is the following. BISECTION METHOD Consider the function Calculate values of f(r) for different values of r until you find values r1 and r2 such that f(r1) > P > f(r2). Then the IRR is between r1 and r2. Let r3 be the midpoint of r1 and r2. If f(r3) > P, the IRR is between r3 and r2; otherwise it is between r3 and r2. Take the appropriate midpoint and continue the process. The midpoints give a sequence of gradually improving approximations to the IRR. Example 2.2.5 Consider a cash flow of 1,2,2,4 at years 1,2,3,4. Suppose it was sold for a price of P = 5 at time 0. Let f(r) = + + + . We calculate f(0.1) = 6.8 and f(0.5) = 2.9; hence the IRR is between 0.1 and 0.5. Their midpoint is 0.3 and f(0.3) = 4.3, so the IRR is between 0.1 and 0.3. The next midpoint is 0.2, with f(0.2) = 5.3. Proceeding in this way, we create the following sequence of approximations to IRR: 0.3, ,0.2, 0.25, 0.225, 0.2375, 0.23125, 0.228125, … In fact, f(0.228) = 4.979, so 0.228 is a reasonable approximation to the IRR. □ We now present a more sophisticated technique for solving f(r) = P. NEWTON–RAPHSON METHOD We first calculate the derivative of f(r): We then define The method starts with any choice of a value r1 of r. We then define r2 = g(r1), r3 = g(r2), and so on. If this sequence converges to some value s, then by taking limits on both sides of the definition of g, we see that s=s– , and hence f(s) = P, which means that s is the IRR. Example 2.2.6 Let us apply the Newton–Raphson method to the situation of Example 2.2.5. If we start with r1 = 0.5, the rule rk+1 = g(rk) creates the following sequence: 0.5, 0.08, 0.19, 0.224, 0.22618, … We find that f(0.22618) = 5.00007, so this method has given a much better approximation than the bisection method, and in fewer steps! □ These methods are built into spreadsheet programs such as Microsoft Excel and OpenOffice Calc. Both of these have a tool called Goal Seek in which we can set up the values and solve the IRR equation. IRR AND PROJECT CHOICE IRR attempts to capture the rate of growth of an investment. Hence, we would usually prefer a cash flow with a higher IRR. Example 2.2.7 Consider the two projects of Example 2.1.1. The IRR of A is 8.7%. The IRR equation for B has two solutions: 23.8% and –83.9%. Since the investment in B is less than the gain, we reject the negative value and take 23.8% as the IRR. So, on the basis of IRR, we would choose B. □ The next example illustrates that some caution is needed while applying this principle. Example 2.2.8 Consider the following cash flows. t=0 t=1 A 100 –150 B –100 150 Cash flow A represents borrowing while cash flow B represents lending; both have an IRR of 50%. While this high IRR is good for the lender, it is bad for the borrower! NPV, unlike IRR, would have distinguished between the situations of borrowing and lending. For instance, with the risk free rate set at 5%, NPV(A) = –42.86, and NPV(B) = 42.86. □ In a cash flow with a single sign change, we can say which side of the deal we are on whether we are giving or taking the loan. If the cash flow has initial positive signs we are taking the loan and if the initial signs are negative we are giving it. In more complicated flows we cannot say, and then a high IRR is just as likely to be bad as good! 2.3 A COMPARISON OF IRR AND NPV We have encountered two ways of evaluating projects, with rather different characteristics. They may make opposing recommendations, so it is important to have some understanding of their relative strengths and weaknesses. The main features of NPV are: 1. It is always well defined, and easy to calculate once the interest rates are known. 2. It is linear: the NPV of the whole is the sum of the NPVs of the parts. 3. It gives a sense of the total profit over the life of the project. 4. It does not give a sense of the rate of growth of the project— either in the sense of profit per unit investment or in the sense of growth per unit time. Let us emphasise the last point with a couple of examples. 1. Is an NPV of a million really desirable? What if it is earned from a project with a total investment of a billion? 2. Again, is an NPV of a million from a project that lasts 20 years obviously better than an NPV of 500,000 from a project that lasts 5 years? The first of these difficulties can be handled as follows. Given a cash flow, we let I = magnitude of the NPV of the negative entries, N = NPV of the full cash flow. Then the present value ratio (N⁄I) gives the value per unit invested.7 With IRR, the situation is just the reverse. 1. It is only well defined for very simple cash flows, and even then it is not easy to calculate. On the other hand, it requires no extra information (like interest rates). 2. The IRR of the whole cannot be obtained from the IRRs of the parts. 3. It does not give a sense of the total profit over the life of the project. 4. It gives a sense of the rate of growth of the project—both in the sense of profit per unit investment and in the sense of growth per unit time. 5. Finally, let us recall from the last session that it may not be clear whether a high IRR indicates a high rate of profit or a high rate of loss! Here, we would be troubled by questions such as: 1. Is a small project with an IRR of 30% to be preferred to a large project with an IRR of 20%? 2. Is a project with an IRR of 30% over 10 years to be preferred to a project with an IRR of 20% over 20 years? NPV and IRR are indicators of the virtues or faults of a project. Neither should be taken as giving the final word. The information they convey must be supplemented by a careful analysis of the investor’s needs and plans. For example, if a project is a stand-alone opportunity with a clear-cut life, then NPV is better. An example of this is the construction and sale of an office building. Starting and running a car factory is a project of a very different kind. Here one would be interested in the annual growth of profit rather than the total gain over the life of the factory—here IRR is the natural fit. MODIFIED IRR In recent years, another measure of the rate of growth has gained popularity. It is called the modified internal rate of return or modified IRR or MIRR. It is a sort of hybrid of NPV and IRR. In MIRR, we evaluate a project’s cash flow within the context of the firm undertaking it. All negative entries (outflows) are assumed to be generated by investing at a certain rate called the finance rate. All positive entries (inflows) are assumed to be reinvested at another rate called the reinvestment rate. Typically, the finance rate is taken to be the market risk-free rate. The reinvestment rate is usually the firm’s cost of capital—the rate at which its overall growth is taking place. We estimate the total investment by taking the NPV of all the negative entries using the finance rate. Let us denote the magnitude of this NPV by P. The gains are estimated by taking the net future value of all the positive entries using the reinvestment rate at the conclusion of the cash flow. Let us denote this net future value by A. The MIRR μ is then defined to be the interest rate under which P would grow into A over the life of the cash flow. If the life is n time periods, then the MIRR per period is defined by P(1 + μ)n = A. Example 2.3.1 Consider the cash flow with annual payments of – 1000, 2000, –1000, 2000. Suppose the relevant annually compounded rates are: Finance rate = 10% Reinvestment rate = 20%. Then, P = 1000 + = 1961.17, A = 2000 × 1.202 + 2000 = 4880. Therefore, the MIRR μ is defined by 1961.17(1 + μ)3 = 4880, and this gives MIRR= 0.36 or 36%. On the other hand, the IRR of this flow is 100%! □ 2.4 BONDS: PRICE AND YIELD A fixed income security is a tradeable contract detailing a deterministic cash flow. The buyer of the security receives predetermined amounts of money at predetermined times over the life of the contract. The basic problem is of pricing: how much should the buyer pay for this security? This can be tackled through NPV and IRR. Such securities can be seen as risk-free, except for the possibility of default, where the writer of the contract is unable to make the promised payments. However, there is an additional element of risk: the value of a cash flow depends on the interest rates in effect and will fluctuate with changes in these rates. Therefore, we need a measure not only of the price but of the sensitivity of the price to market conditions. These, and related matters, will be taken up in the rest of this chapter. The most common form of a fixed income security is a bond. While bonds come in various flavours, the simplest one consists of n equal payments (called coupon payments or coupons) of an amount C paid at regular intervals, together with an additional payment F (called the face value) made with the last coupon payment. This is represented in Figure 2.6. Figure 2.6 The face value is also called the maturity value or par value. The date on which it is paid is called the maturity date. Bonds with this simple structure are called straight or plain vanilla bonds. More complicated bonds may offer one party the right to terminate the contract early or allow for some fluctuation in the coupon payments. Suppose the cash flow offered by a bond is purchased for some price P. This price depends not only on the structure of the bond itself but also on certain external factors. Two important factors are: 1. The current interest rates: Higher interest rates reduce the present value of the future payments and hence the value P of the bond. 2. The risk of default by the writer of the bond: If the perceived risk is higher, the bond has to offer greater return to compensate. This is achieved by a decrease of P. In this book we shall ignore the second factor and treat bonds as if they are risk-free. We only note that the investor can use credit ratings from various agencies to gauge the risk of default. In the US, the popular rating agencies are Moody’s, Standard and Poor’s (S&P) and Fitch IBCA. In India we have CRISIL (Credit Rating Information Services of India, Ltd) and CARE (Credit Analysis and Research, Ltd). The compensation or premium for the default risk is calculated via a statistical study of the historical loss from default at each of the rating levels. The following terminology is used to give a quick idea of the current status of a bond: At par: The bond is selling at a price equal to its face value. Discount: The bond is selling below its face value. Premium: The bond is selling above its face value. These terms, as we shall soon see, relate the payment structure of the bond to current interest rates and are not a judgement on the worth of the bond. A bond which is sold at par can be visualised as a loan of the face value F for some time. The coupons are then viewed as interest payments, and at the end we have the last interest payment as well as the return of the borrowed F. Before starting our analysis of bonds, let us list some special cases. Annuity: (see Example 2.1.2) An annuity has n coupon payments and a face value of zero. Perpetuity: (see Example 2.1.3) A perpetuity has an infinite sequence of coupon payments. Since there is no final payment, the face value is zero. Zero-Coupon Bond: A zero-coupon bond has no coupon payments. There is only the final payment of the face value. These are also called pure discount bonds. Consider a bond with n annual coupon payments of C each and a face value F. If the annual discretely-compounded risk-free rate over the life of the bond is r, then the net present value of the bond is. (2.3) If continuous compounding is used, the formula becomes (2.4) This can be taken as the fair price for the bond. It should be noted, however, that the assumption of a uniform r over the life of the bond is a strong one (especially as the life of a bond can stretch up to 30 years!). This can be taken into account by using different values of r for different time spans, and we will do this later in the section on the term structure of interest rates (§2.8). Exercise 2.4.1 Check that annuities and perpetuities are always sold at a premium, while a zero-coupon bond is always sold at a discount. The coupon payment is usually given as a percentage of the face value (called the coupon rate). Thus, a ‘10% annual bond’ is one whose coupon payments are annual and each is 10% of the face value. If the coupon payments are more frequent, then the coupon rate refers to the total annual coupon payment. For instance, many bonds have semi-annual coupon payments (i.e., every six months) and a ‘10% semi-annual bond’ has annual coupon payments of 10% of the face value. An individual coupon payment for this bond is therefore equal to 5% of the face value. Figure 2.7: The price versus yield plot for an annual bond Exercise 2.4.2 Consider a 12% monthly bond with face value Rs 1000. What will be the value of each coupon payment? For a bond with face value F, coupon rate C⁄F, m regular payments per year and a total of N coupon payments, formula (2.3) is modified to (2.4) Internal rate of return (IRR) is a commonly used measure of the worth of a bond, and is called its yield to maturity (YTM) or just yield. If a bond with annual coupon payments is sold at a price P, then its yield λ is obtained by solving the equation (2.5) Note that the right-hand side decreases from ∞ to 0 as λ varies from – 1 to ∞, so for every (positive) value of P there is a unique value λ of the yield. Thus the yield of a bond is always well-defined. This observation also shows that a rise in P lowers the yield. Similarly, if continuous compounding is used, the yield equation is Figure 2.7 shows the characteristic plot of an annual bond’s price versus its yield. Two special points are worth noting. When the yield is 0, the price is nC + F, the sum of all the payments from the bond. And when the yield is equal to the coupon rate, the price is the face value F. Exercise 2.4.3 Consider a bond with n years to maturity, annual coupon payments of C and a face value F. 1. Show that if its yield is λ then its price is given by P= . 2. Show that if its yield equals the coupon rate C⁄F, then the bond is at par. If the yield is greater than the coupon rate, the bond is sold at a discount. If the yield is lower, it is sold at a premium. Exercise 2.4.4 Consider a bond with n years to maturity, m coupon payments per year totalling to C, and face value F. 1. If its yield is λ, show its price is given by P= . Verify that λ = C⁄F implies P = F. 2. Draw the price–yield curve for this bond. A bond has to compete with the other available financial instruments. In particular, its price has to be low enough so that the yield is not below the risk-free rate. If the bond can be seen as risk-free, then the yield will equal the risk-free rate. Thus we have the concept of the required yield for a bond: this is the level of yield required for it to be competitive in the market after taking into account the prevalent interest rates as well as the risk associated with the bond. 2.5 CLEAN AND DIRTY PRICE The value of a bond changes during its lifetime and depends on the remaining payments as well as on the required yield. If the bond is sold on the day of a coupon payment (with the payment going to the seller) then the remaining cash flow can be seen as a fresh bond with the same face value and coupon rate. Its value can therefore be calculated from the already established formulas. Of course, this is a rare situation, and usually the bond will be sold at a date between two coupon payments. Suppose the situation is as depicted in Figure 2.8, where the bond is being sold at a time t between the k and k + 1 coupon payments. Figure 2.8 We calculate the price by calculating the NPV of the remaining payments at the required yield. We use continuous compounding as that makes it easier to deal with arbitrary time intervals. Example 2.5.1 Consider a 3-year 5% annual bond with face value 100. Figure 2.9 shows how the price of this bond changes over its life, assuming a constant required yield. We take three values of the effective required yield λ: 5%, 10% or 15%. Then the equivalent continuous yield is given by ln(1 + λ). Figure 2.9: Change of bond price with time for different values of the effective required yield λ (Example 2.5.1) We note that the price curve has a typical sawtooth shape, with sharp drops when coupon payments are made. In fact, we can visualise the overall curve as a sawtooth shape combined with an almost linear trend. □ In this example, we see that the sawtooth effect tends to hide the long term behaviour of the bond price. It is useful to identify and subtract it to reveal the underlying trend. We view the rise and fall as due to the approach and delivery of the next coupon payment. So we start by defining the accrued interest, which is the fraction of the next coupon payment which is already due. If the bond is sold at a time t between the k and k + 1 coupon payments, this is the fraction I(t) = C of the (k + 1)th coupon payment. We then define the clean price of the bond by Q(t) = P(t) – I(t). The actual price P(t) is now called the dirty price. Figure 2.10 depicts the accrued interest, clean price and dirty price plots for Example 2.5.1. Figure 2.10: Clean price, dirty price, and accrued interest plots for Example 2.5.1 Exercise 2.5.2 A 10% 3-year bond with face value 100 is issued on 1 January, 2008. Suppose its clean price on 1 July, 2008 is 102. How much would you have to pay to buy this bond? The clean price is seen as giving a truer picture of the value of a bond. Therefore, when bond prices are stated by stock exchanges, these are usually the clean prices. To make an actual purchase, the accrued interest has to be added to the clean price (which is also called the quoted price). 2.6 PRICE–YIELD CURVES We have already plotted the price of a bond against its yield. In this section we will explore how this price–yield curve varies with certain parameters. First, we ask what happens if the coupon payments are increased. Then, for any given yield, the price will increase, and so the priceyield curve will shift upward. This is illustrated in Figure 2.11(a). Now, we consider how an increase in the number of coupon payments affects the price. Although this increases the total cash delivered, it doesn’t necessarily lead to a higher price. If the required yield is lower than the coupon rate C⁄F, then each coupon payment represents a higher than required interest payment and then it is beneficial to have more of these payments. So the price of longer- term bonds will be higher. On the other hand, if the required yield is above the coupon rate, each coupon payment represents a loss and the longer-term bonds will have less value. Thus, in Figure 2.11(b) the longer-term bonds start off with higher prices (when λ is low) but fall below the shorter-term bonds when λ increases. The crossing occurs at the coupon rate. Figure 2.11: (a) Variation of the price–yield curve with the coupon payments. The higher curves correspond to higher coupon payments. (b) Variation of the price–yield curve with the number of coupon payments. The arrows point in the direction of increasing number of coupon payments. The fact that longer-term bonds have steeper price-yield curves implies that they are more vulnerable to interest-rate changes, since a change in yield will cause a larger change in their price. 2.7 DURATION Bonds are risk-free if held to maturity since their cash flow is deterministic, at least if default risk can be ignored. However, at intermediate times they are not risk-free since their price fluctuates with the prevalent interest-rates (or required yield). We have just noted that longer-term bonds are more exposed to this risk. We shall now develop a way to quantify this risk. MACAULAY DURATION WITH CONTINUOUS COMPOUNDING We start with the price–yield formula for an annual bond, but expressed in terms of continuous compounding: To see the sensitivity with respect to yield, we differentiate: This measures the absolute change in P resulting from a small change in λ. To see the proportional change, we divide it by P: =– . The negative of the quantity on the right is called the Macaulay duration and is denoted by DM, not]DM@DM DM, DM = , so that = –DM. Suppose a small change δλ in the required yield leads to a change δP in the price. Then we have the approximation ≈ , and so, the proportional change in price δP⁄P is approximated as follows: ≈-DM δλ. The form of the Macaulay duration is interesting. It is a weighted average of the times at which payments are made, and each time instant i has a weight proportional to the present value of the corresponding payment. (Here, present value is calculated using the required yield.) In particular, the units of Macaulay duration are of time, and by convention are measured in years. MACAULAY DURATION FOR A NON-ANNUAL BOND Consider a bond with m coupon payments per year, face value F, coupon rate C⁄F, and a total of N coupon payments. Then we can calculate as above to find its Macaulay duration. The result is: where λ is the yield, and the price P is given by MACAULAY DURATION WITH DISCRETE COMPOUNDING If discrete compounding is used and there are m coupon payments annually, the Macaulay duration is analogously defined by DM = . By writing the times of payment as i⁄m, the units of Macaulay duration are kept as years. In this context, Macaulay duration does not have as precise an interpretation in terms of the sensitivity to yield changes as in continuous compounding (see Exercise 2.7.1). Nevertheless, since discrete compounding approximates continuous compounding, it can still be used as an indicator of that sensitivity. Exercise 2.7.1 Consider a bond with face value F, coupon rate C⁄F and m coupon payments per year. If the yield is described by discrete compounding, show that =– , where DM is the Macaulay duration. The quantity DM ⁄(1 + λ⁄m) is called the modified duration of the bond. Exercise 2.7.2 Consider a perpetuity with annual payments starting after 1 year. If the required yield is λ, show that its Modified duration is 1⁄λ. MACAULAY DURATION AND THE MATURITY PERIOD On inspecting Figure 2.11(b), we had observed that longer-duration bonds are more vulnerable to interest-rate risk. Now we have a way to measure this vulnerability—Figure 2.12 illustrates what happens as we increase the maturity Figure 2.12: Duration plotted against maturity date for a 10% annual bond. The three curves correspond to different required yields λ. period n. While duration indeed increases with n initially, the curve eventually flattens and even drops a little before settling at a constant level. Exercise 2.7.3 Consider an annual bond with face value F, coupon rate C⁄F, and n years to maturity. Fix its yield to maturity at some λ and let DM be its Macaulay duration (with continuous compounding). Show that if C≠0, λ > 0, and we use continuous compounding, then Thus, while duration initially increases with n, in the very long run it stabilises at a constant level. Exercise 2.7.4 Consider Exercise 2.7.3. Work out the corresponding result using discrete compounding for a bond with m coupon payments per year. MACAULAY DURATION OF A PORTFOLIO Consider a portfolio comprising an assortment of bonds. This portfolio can be treated as a fixed income security, since its cash flow is predetermined. However, the payments will occur at irregular intervals. Let the cash flow consist of payments C1,…,CN at times t1, …,tN. If all the bonds in this portfolio have the same required yield λ, then we can extend the concept of Macaulay duration to it in a natural way: DM = , where we use a continuously compounded yield. The total price of this portfolio is given by It is easy to check that, as for a single bond, Macaulay duration measures the sensitivity of P to changes in λ: = –DM. Suppose the portfolio consists of k bonds, and let the price and duration of the ith bond be Pi and DM,i respectively. Then P = P1 + + Pk and Thus the overall duration is a weighted average of the individual durations, the contribution of each bond being weighted by the proportion invested in it. 2.8 TERM STRUCTURE OF INTEREST RATES In all our calculations regarding bonds, we have acted as though interest rates are independent of the period of the investment. In real life however, they do vary with the period: for instance, fixed deposits in banks usually earn a higher rate of interest if they are for longer periods. In this section we shall develop a framework for dealing with interest rates that vary with the term and then we shall redo many of our earlier calculations in this general setting. We emphasise, however, that we are not taking up the random daily fluctuations in interest rates. Figure 2.13: The interest rates on State Bank of India fixed deposits of various maturities (in years) as on January 9, 2006. SPOT AND FORWARD RATES Suppose that at time t = 0 a risk-free investment is made for a period T. The investment could be a fixed-deposit in a bank, or a zerocoupon bond maturing at T. The interest (or yield) of this investment is called the spot rate for the period 0 to T and is denoted by sT. Spot rates are expressed annually. Example 2.8.1 Suppose the one-year spot rate is 5% and the twoyear spot rate is 6% (with discrete compounding). Then, an investment of 100 for a year will grow to 100(1 + s1) = 100 × 1.05 = 105. The same amount invested for two years will grow to 100(1 + □ s2)2 = 100(1.06)2 = 112.36. One might think from this calculation that the two-year investment is better than the one-year because it earns a higher rate of interest, yet it is not necessarily so. For, the prevailing rates may increase after a year and reinvesting the 105 at the new high rate may lead to a better result. This discussion brings us to forward rates. Suppose an investor wants to take a loan, but after a year rather than right away. To avoid the risk from interest rate fluctuations, she wants to finalise the loan and the interest that will be charged. The rate that is decided for this loan will be called a forward rate. Thus, a forward rate is an interest rate that will be applied to a transaction in the future but is to be decided in the present. The forward rate for an investment starting at time S and lasting till time T will be denoted by fS,T. Like spot rates, forward rates are expressed annually. Example 2.8.2 Suppose the one-year spot rate is s1 = 5%, while the forward rate for the succeeding one-year period is f1,2 = 6%. Then, we can invest 100 initially for one-year at s1 and then for another year at f1,2 (without risk since the forward rate is set now). The investment will grow to 100(1.05)(1.06) = 111.3 over two years. □ Spot and forward rates are related to each other. Consider an amount A which is to be invested for 2 years, risk-free. If it is invested using the 2-year spot rate, it will grow to A(1 + s2)2. An alternative is to invest it first for one year and then reinvest for another, in which case it will grow to A(1 + s1)(1 + f1,2). Since both routes are risk-free, by the No Arbitrage Principle we must have A(1 + s2)2 = A(1 + s 1)(1 + f1,2). We can solve this for the forward rate: f1,2 = – 1. Exercise 2.8.3 Show that, under annual compounding, the forward rate fm,n is related to the spot rates sm and sn by fm,n = 1⁄(n–m) – 1. (2.6) Exercise 2.8.4 Consider spot and forward rates with discrete compounding m times a year. Let the rates be given on an annual basis. Further, let sn be the spot rate for the first n periods and fk,n the forward rate from the kth to the nth period. Show that forward and spot rates are related by fk,n = m . (2.7) Exercise 2.8.5 Show that, under continuous compounding, the forward rate fm,n is related to the spot rates sm and sn by fm,n = sn – sm. (2.8) Some spot rates can be observed directly from the market. If there is a risk-free zero-coupon bond with maturity time n, we can take its yield to be sn. Difficulties arise when a suitable zero-coupon bond is not present. The yields from other bonds do not directly give spotrates since they involve payments at different times. Consider an annual n-year bond. Using net present value and spot rates, its fair price is If P is observed from the market and the spot rates s1,…,sn–1 are known, we can solve this equation for sn. This observation gives rise to a technique known as bootstrapping. First, we look at all the available zero-coupon bonds and use their yields to obtain a list of spot rates. The gaps in this list are filled by looking at the market prices of other bonds and solving for the corresponding spot rate.8 Example 2.8.6 Suppose at time 0, risk-free bonds with the following characteristics are available: 1. Zero-coupon bonds maturing in 1 year and with yield 5%. 2. Zero-coupon bonds maturing in 2 years and with yield 6%. 3. 10% annual bonds maturing in 3 years and with yield 6.5%. 4. Zero-coupon bonds maturing in 6 years and with yield 7%. From the zero-coupon bonds, we see that s1 = 5%, s2 = 6%, and s6 = 7%. Next, we use the three-year bond to find s3 by setting up its price yield equation: + + = + + . The solution is s3 = 6.585%. Since no bonds are available for n = 4,5 we cannot find s4 and s5. The best we can do is interpolate from the known values. For example, we connect the known values by straight-line segments, and use the values from these to approximate the unknown rates. This is illustrated in Figure 2.14. (The more advanced and standard technique is to connect the known points by cubic splines. This results in a smooth curve through the known points.) Figure 2.14: The spot rates versus time plot for Example 2.8.6 □ In this example we note that, as expected, the spot rate s3 does not equal the yield of the 3-year bond. Nevertheless, the difference is quite small. On reflection, this is reasonable because most of the payment happens at 3 years. Therefore, we can expect bond yields to provide a starting approximation to the spot rates which can then be refined by calculations as in the example. A complication is that the maturity dates of available bonds may not dovetail in the exact way required for bootstrapping. We may seek zero-coupon bonds expiring in exactly 1 year but may only find ones expiring in 11 and 13 months. In such a situation we would have to use an average of their yields to represent the yield a 1-year bond would have had. Once the spot rates have been established, the forward rates can be found by equations (2.6) to (2.8). FISHER–WEIL DURATION Figure 2.15 shows how quickly the spot rates can change by significant amounts. It depicts the twenty-year spot rate curve for India as calculated by the National Stock Exchange on three successive days in September 2008. Suppose, at this time, you had held some zero coupon bonds expiring in 3 years. Then, in just two days their value would have first increased by 1% and then decreased by 1.3%. In the longer term, much wilder swings could be expected. The illustration above points to the importance of quantifying the risk to a bond portfolio from changes in the spot rates. We shall only consider the risk from the simplest kind of change. Specifically, we shall study the sensitivity of bond prices to parallel shifts—when all the spot rates change by the same amount. If we plot the spot rate curve for the new spot rates, we see a parallel shift in the curve. Figure 2.15: Variation in the twenty-year spot rate curve for India over three consecutive days in September 2008 (based on data released by NSE) The concept of Macaulay duration can be adapted to this scenario. Consider an annual n-year bond with face value F and coupon payments of C. If we use continuous compounding, its price is given by where the si are the continuously compounded spot rates. Consider the case when all the spot rates change by the same amount λ. Then the new price is a function of λ: The sensitivity of P to such changes is measured by where DFW is called the Fisher–Weil duration. Exercise 2.8.7 Consider two bonds whose Fisher–Weil durations are D1 and D2 . We invest amounts P1 and P2 in them, respectively. Show that the Fisher–Weil duration of this portfolio is DP = D1 + D2. If we use discrete compounding, the Fisher–Weil duration is defined by DFW = . As with Macaulay duration, the discrete version of Fisher–Weil duration does not precisely capture the sensitivity of the price to the rate shifts. Exercise 2.8.8 describes the modification that is needed. Exercise 2.8.8 Consider an n-year annual bond. Using annual compounding, let the yearly spot rates be s1,s2, etc. Let P be the price of the bond. If each spot rate si shifts by the same amount λ and becomes si + λ, show that λ=0 = –DQ, where DQ is called the quasi-modified duration and is given by: DQ = . Note that DQ is a linear combination of the payment times but it is not an average since the weights do not add up to 1. TYPES OF TERM STRUCTURE When spot rates are plotted against the time to maturity, the resulting curve is called a term structure. The typical term structure is a smoothly increasing curve which also becomes gradually flatter, as in Figure 2.16(a). Such a curve is called a normal term structure. If the curve is more or less constant, we have a flat term structure—this is the structure we assumed before this section. If the curve steadily decreases instead of increasing, we have an inverted term structure. A general term structure is likely to be basically normal with some parts that are flat or inverted. Figure 2.16: Types of term structure: the spot rate sT is plotted against the time to maturity T . Example 2.8.9 Figure 2.17 depicts data released by the Reserve Bank of India regarding yields of Government of India bonds on February 24 2006. The data gives maximum and minimum yields for bonds expiring in the years 2006–07, 2007–08, etc. If we connect the midpoints of the given intervals we get a rough idea of the dependence of yield on maturity date, and in turn this serves as an approximation of the term structure. It is normal with a bump during 2008–09. Figure 2.17: The yield curve for Example 2.8.9 □ Two obvious questions arise out of these observations. The first is: Why are there different term structures? The second is: Why is the normal structure the basic one while the others are rare or transitory? The answers again involve risk. Longer term investments are exposed to more risk over their life. They also reduce the investor’s flexibility by tying up his cash. For these reasons investors demand greater return on longer term investments, leading to an upward sloping term structure. However, occasionally investors perceive certain times in the future as particularly risky, e.g., the period in which a general election takes place. Then the spot rates for these times will rise, causing bumps in the term structure. An inverted term structure will occur when the present is turbulent, so that long term investments are seen as safer than short term ones. 2.9 IMMUNISATION The simplest way to invest money for a time T in a risk-free way is to buy a zero-coupon bond maturing at T. By concentrating the payment at the end, and at a predetermined rate, it completely eliminates interest-rate risk. Unfortunately, only on rare occasion will we find a zero-coupon bond that matures exactly at the required time. We can try to use a zero-coupon bond that matures near T, but this will have attached risks: 1. If it matures before T, we have to reinvest the payoff and the prevailing interest rate is not known beforehand. 2. If it matures after T, we have to sell it at T and its value at that time is not predetermined. Nor does it help to consider bonds with coupon payments. Even if we find one that matures at T, its coupon payments will have to be reinvested for the remaining period and this is risky. We have to accept that the available bonds may not allow us to make a risk-free investment for the required time. Having accepted this, we can at least try to minimise the risk. Intuitively, a reasonable strategy is to use bonds whose payments occur as near to T as possible. Since duration measures the average of the payment times of a bond, this translates into choosing bonds whose combined duration is T. Consider a situation where two bonds with Fisher–Weil duration D1 and D2 are available. If we create a portfolio out of these two bonds, where amounts P1 and P2 are invested in them, respectively, then the Fisher–Weil duration of the portfolio is DP = D1 + D2. To make the portfolio match the requirement of investing an amount P for time T, we match both amount and duration: These two equations can be solved for P1 and P2. (Note that we need T to be between D1 and D2.) This technique is called immunisation and can be applied just as well to a stream of payments. Suppose the requirement is to invest A1 for time T1, A2 for time T2, etc. We calculate the duration D of this cash flow and once again match A1 + A2 + with P1 + P2 and D with DP . Example 2.9.1 Suppose a firm faces the following stream of obligations over the next 3 years: In addition, a zero-coupon bond maturing in 2 years and a 10% annual bond maturing in 5 years, are available. The spot rates for the next five years are as tabulated below. Then we have the following calculations (all the rates are assumed to be for continuous annual compounding): Consider a portfolio where the proportions invested in the two bonds are w1 and w2 . Then we have the following two equations for immunisation: Substituting the calculated values of D, D1 and D2 and solving, we obtain: w1 = 0.86,w2 = 0.14. The NPV of the payment stream is P = 100e–0.04 + 200e–0.05×2 + 300e–0.057×3 = 529.89. Hence we have to invest P1 = 0.86 × 529.89 = 455.71 in the zerocoupon bond and P2 = 0.14 × 529.89 = 74.18 in the annual bond. □ Since Fisher–Weil duration measures the change in the price of a bond or cash flow under a parallel shift of the spot curve, this process of immunisation protects against such changes. As the stream of required payments and the bond portfolio have the same duration, they have similar responses to shifts of the spot rate curve and hence, the portfolio continues to match the requirements (closely, if not exactly). Example 2.9.2 Let us verify that immunisation works in the manner described above. Consider the situation and calculations of the previous example. We start with a portfolio which matches the payment stream in both present value and duration: we have calculated that it consists of 455.71 invested in the zero-coupon bond and 74.18 invested in the annual bond. Now, consider how the present values of the payment stream and the portfolio vary with parallel shifts of the spot-rate curve: Even under a shift as drastic as 2%, the portfolio moves almost exactly with the payment stream. □ We could also execute immunisation by matching the Macaulay or modified duration in place of the Fisher–Weil duration. This would amount to assuming a flat term structure. The gain would be in the ease of calculation. 2.10 CONVEXITY Our development of immunisation in the previous section was based on a linear approximation to the price–yield relationship. The technique can be further improved by using a quadratic approximation —this requires the use of the second derivative of price with respect to yield. Consider a bond whose price is denoted by P. Then P is a function of the yield λ. The convexity of the bond is defined to be = . (2.9) From the shape of the price–yield curve, it is evident that as the yield λ increases, the first derivative of P also increases, moving from highly negative values towards zero. Therefore, the second derivative of P is positive, and so > 0. Equation (2.9) also defines convexity for a portfolio of bonds (assuming that all bonds have the same required yield). Exercise 2.10.1 Consider a portfolio of two bonds, with prices P1 and P2, and convexity 1 and 2, respectively. Let P be the price of the portfolio. Show that the convexity of the portfolio is = 1 + 2 Exercise 2.10.2 Consider a bond with price P, Macaulay duration DM and convexity , relative to a continuously compounded yield λ. Then = D2M – . As always, the mathematics is simplest if we take a continuously compounded yield λ, and we shall confine ourselves to that situation. Then the price of an n-year annual bond with coupon payments C and face value F is given by We differentiate twice with respect to λ and divide by P to get the convexity: Thus, the convexity is a weighted average of the squares of the payment times. The weights are non-negative and add up to 1. So we can express convexity in the form: Recall that Macaulay duration is a weighted average of the payment times—and with the same weights as used by convexity: From these expressions for C and D we can make the following calculations: If we combine this with the result of Exercise 2.10.2, we find that Macaulay duration decreases with yield: ≤ 0. Another easy consequence of equation (2.10) is that ≤ n2. The convexity is therefore maximum for a zero-coupon bond, when it exactly equals n2. Exercise 2.10.3 Show that under annual compounding, the convexity of an annual bond is given by = . IMMUNISATION Under a small change δλ in the yield, the change δP in the price has the quadratic approximation δP≐ δλ + (δλ)2. This can be expressed in terms of the Macaulay duration D and convexity as follows: ≐ –DM(δλ) + (δλ)2. Therefore, the price fluctuations of a portfolio or stream of cash obligations can be matched by a portfolio of bonds which has the same starting price, duration and convexity. Since three parameters have to be matched, three different bonds will have to be used. Example 2.10.4 Consider the following 5 year stream of obligations: Our task is to create a portfolio of bonds whose value will match this stream, even under significant changes in the interest rate. Suppose the available bonds all have face value 10 and are as follows: 1. A zero-coupon bond maturing in 2 years, 2. A 10% annual bond maturing in 5 years, and 3. A 10% annual bond maturing in 4 years. Suppose the required (continuously compounded) yield is 6%. Then the price, duration and convexity of the payment stream are Similar calculations can be carried out for the three bonds. The results are given in the table below: Let w1 = P1 ⁄ P, w2 = P2 ⁄ P and w3 = P3 ⁄ P. Then, to match the duration and convexity these numbers should satisfy: The solution of this system of linear equations is w1 = 0.14, w2 = 0.38, w3 = 0.48. Therefore the amounts invested in the three bonds should be 173.88, 457.13 and 576.20 respectively___corresponding to 19.6, 39.4 and 50.9 bonds. The table below shows the NPV of the obligation stream and the bond portfolio under various interest rates. Exercise 2.10.5 Verify that our convexity formulas extend to non-□ annual bonds. CONVEXITY AND TERM STRUCTURE Convexity can also be based on parallel shifts in the term structure. Consider an annual n-year bond with face value F and coupon rate C. If we use continuous compounding its price is given by where the si are continuously compounded spot rates. Suppose there is a parallel shift in the term structure by an amount λ. This means each spot rate si now becomes si + λ. The new price of the bond, as a function of λ, is Convexity is now defined by This version of convexity can be used to immunise against parallel shifts in the term structure, in conjunction with the Fisher-Weil duration DFW of the bond. Exercise 2.10.6 Verify that if the spot rates are annually compounded convexity is given by = . 2.11 CALLABLE BONDS The bonds we have considered so far are distinguished by having fixed coupon payments. Bonds are also available with variable coupon payments as well as the option of one party cancelling the contract. We will briefly consider the second of these varieties. Since their cash flows are not deterministic they cannot be completely understood within our current framework. A drop in prevailing interest rates represents a loss for the bond issuer, who is now paying interest (in the form of coupon payments) at a higher rate than the one now available. The issuer can limit such losses by building in an option allowing it to buy back or call the bonds by paying a certain premium (called the call price) to the bondholder. In a European callable bond the call option can be exercised only at certain pre-specified dates. In an American one it can be exercised at any time after a certain interval. In either case, the time to the first date when the bond can be called is termed the call protection period. Example 2.11.1 Consider a five year annual bond that has face value 1000 and was issued at par when the required yield was 10%. Suppose this bond can be called back at any time by paying the face value as well as a fee of 50. Now, if after 2 years the required yield has decreased to 7%, then the price of this bond will become P= + + = 1079. Callback results in a savings of 29 for the issuer. □ The concepts of fair price and yield are tricky for a callable bond, since the profit from it depends on whether, and when, it is called. Given the market price, and a possible call date, the corresponding yield to call or YTC is defined to be the yield if the bond is indeed called on that date. The lowest of these YTC’s is called the yield to worst. Exercise 2.11.2 Consider a European callable bond, with annual coupon payments of size C, and call dates at the end of each annual period. Let the call price for the ith year be Ci. Show that its YTC λ at the end of the ith year is the solution of P= . Example 2.11.3 Consider a five year 10% annual bond with face value 100. Suppose it can be called back at any of the first four coupon times, at call prices of C1 = 110, C2 = 107, C3 = 104, and C4 = 102 respectively (the call price is to be paid in addition to the corresponding coupon payment). If it is called at the time of the jth coupon payment, the corresponding YTC is the solution r of the equation Suppose the market price at the time of issue is P = 110. Then the YTC’s at the possible call dates are 9.09%, 7.78%, 7.40%, and 7.46%, while the YTM is 7.53%. Thus the yield to worst is 7.40%. □ PUTTABLE BONDS Puttable bonds are the mirror image of callable ones. Now it is the buyer who is protected against interest rate risk by having the choice to sell the bond back to the issuer for a premium called the put price. Puttable bonds can also be European or American. The yield to put or YTP for a possible put date is the yield if the bond is put back on that date, and the yield to worst is the lowest of the YTP’s. Proper pricing of callable and puttable bonds requires the probabilistic modelling of interest rate fluctuations. As we will not take up that topic in this book, this is as far as we can go with such bonds. A brief history of duration: 1. Frederick Macaulay [29] introduced the concept of duration as a weighted average (1938). 2. John Hicks [24] gave its interpretation as a measure of sensitivity (1939). 3. Frank Redington [38] invented immunisation via duration and convexity (1952). 4. Lawrence Fisher and Roman Weil [21] generalised to the case of an arbitrary term structure (1971). (Source: Poitras [37]) 7 The usage of this term is not uniform in the literature. It is also used for the ratio of the NPV of the positive entries to I, or (N/I) – 1. 8 The term bootstrapping is derived from the phrase ‘lift yourself by your own bootstraps.’ 3 Random Cash Flows W e will now study situations where the future income or expenditure is not known beforehand. In the early days of probability, mathematically aware investors evaluated an investment by considering the expectation of the possible payoffs. However, it was realised that such an evaluation is incomplete. Followed strictly, it would recommend that everyone invest in only one asset—the one offering the most expected return. Actual investors, however, generally prefer to invest in a variety of assets to avoid overdependence on any one of them. The situation was unravelled by Harry Markowitz [30, 31] in 1952. Markowitz realised that one must take risk into account. He measured it by the standard deviation of the returns, and created a systematic theory of portfolio analysis. His training in operations research enabled him to not only set up the problem of finding optimal combinations of assets, but to solve it using the technique of quadratic programming. His work also formed the basis for developments by others—most notably the Capital Asset Pricing Model (CAPM). While Markowitz focussed on how an individual can invest optimally, CAPM analyses the consequences of every investor acting according to his theory. Markowitz’s other main contributions have been to develop sparse matrix techniques and a language (SIMSCRIPT) for programming large-scale simulations. He won the Nobel Prize in Economics in 1990, along with William Sharpe (one of the creators of CAPM) and Merton Miller (one of the first to systematically exploit the No Arbitrage Principle). 3.1 R R Consider a time interval [0,T] and an asset whose value at time 0 is V0. Suppose its value VT at time T is not known initially (such assets are called risky assets). Then VT can be treated as a random variable. The rate of return is also then a random variable. The expectation of r is called the mean return of the asset and is denoted by r. The variance of r is denoted by σ2, and its standard deviation by σ. We call σ2 and σ the variance and standard deviation of the asset. While r measures the profit expected from the portfolio, σ measures the associated risk. Our task, therefore, is to explore the relationship between r and σ for both individual asset as well as their collections in portfolios. r= Note that VT = V0(1 + r). Exercise 3.1.1 Let two assets have (random) rates of return r1 and r2. Use the No Arbitrage Principle to show that it is not possible for these random variables to have a constant and non-zero difference. Consider a portfolio of risky assets numbered from 1 through n. Let Ai be the amount invested in the ith asset. The weight of the ith asset is defined to be the proportion invested in it: wi = . It is easy to see that ∑iwi = 1. SHORT SELLING In many situations, it is possible to sell an asset that you do not own! This can be achieved if there is a gap between the sale and the time of delivery—the asset can be acquired during the gap and then delivered in time. This strategy is called short selling. Now, suppose at time t = 0 the value of the asset is V0, and you short sell it with a future delivery time. To complete your obligations, you actually buy the asset later at time T, when its value is VT. In this case, the initial investment is –V0 (there is a negative sign because there is an initial receipt rather than a payment), while the final payoff is –VT (here the negative sign is because at the end you are paying rather than receiving). The rate of return from the asset on short selling is therefore = , which is just the same as the rate calculated in the usual way. However, this return is calculated on a negative initial investment. Thus, the inclusion of short selling leads to negative weights. Another version of short-selling is that you first borrow the asset for a certain time, and then sell it. Finally, you repurchase it from another source and return it to the lender. The numerical consequences are just as above—initially you receive V0 and finally you pay VT. Example 3.1.2 Suppose you own Rs 200. You notice two shares, S and T, whose prices are Rs 800 and Rs 1000, respectively. Your analysis leads you to expect that in the next few days the price of T will increase much more sharply than that of S. To benefit from this, you implement a strategy of using S to pay for T as follows. You start by short-selling S, with delivery set for 10 days in the future. This earns you Rs 800 right away, and you pool in the Rs 200 you already possess to buy T. At this point, you have no cash, you own T and you owe S. Your net worth is therefore 0 + 1000 – 800 = 200, the same as before. After 10 days, suppose your analysis is borne out, and the values of S and T are Rs 900 and Rs 1200 respectively. You sell T and then use part of the gains to buy S and deliver it. You are left with Rs 300 cash. Thus, overall, you have earned a rate of return of = 0.5 = 50%. Now let us look at the individual parts of our strategy. After the first stage we have Rs 1000 invested in T and Rs –800 invested in S (since we owe S). Therefore the weights for S and T are: wS = = –4, wT = = 5. Note that the weights do add to 1. The individual rates of return are rS = rT = = = = = 12.5% = 20% Don’t be misled by the positive rate of return rS. It actually represents a loss, since it was earned on a negative initial investment. In this story, you have lost Rs 100 on the short sale of S, but have compensated by earning Rs 200 on the trade in T. The role of the short sale was that it raised cash that enabled you to trade in T. □ Exercise 3.1.3 What will happen in the above scenario if, after 10 days, 1. S is worth Rs 1000 and T is worth Rs 1100? 2. S is worth Rs 700 and T is worth Rs 800? Short selling is often restricted or prohibited because, as in the example above, it is popular with speculators who wish to quickly raise cash to exploit their analysis of the market trends. Therefore it is seen as a contributor to instability. Moreover, short selling also increases the chances of default since the short seller may have unexpected difficulty in buying the asset later and completing the sale. RANGE OF WEIGHTS An individual weight can take on any value: it will be positive if we own the corresponding asset and negative if we have short sold or borrowed it. However, if short selling is not allowed, then each weight will be non-negative and this will also force all of them to be at most one: 0 ≤ wi ≤ 1. PORTFOLIO RETURN AND RISK Let ri be the rate of return from the ith asset, and let the corresponding mean return and variance be ri and σi2. The final value of the ith asset is Ai(1 + ri), and hence the final value of the portfolio is . Therefore, the rate of return of the portfolio is r= = = ∑ iwiri. Exercise 3.1.4 Verify this relationship in the scenario of Example 3.1.2. It follows that the mean and variance for the portfolio’s rate of return are given by (see Appendix, §B.12) (3.1) (3.2) where σij is the covariance between the ith and jth rate of returns. These formulas can also be expressed neatly using matrix algebra: (3.3) r= , (3.4) σ2 = . The matrix formulations are very useful when working with large amounts of data. 3.2 PORTFOLIO DIAGRAMS AND EFFICIENCY In order to represent the profit-risk relationship graphically, we plot each asset on a portfolio diagram. This diagram represents each asset A by the mean rAand standard deviation σA of its return. (See Figure 3.1.) You should note that risk (σ) is plotted on the horizontal axis, and expected profit (r) on the vertical axis. The reason is that we are trying to understand how profit depends on risk. Figure 3.1: Portfolio diagram for a single asset A Exercise 3.2.1 Consider the following portfolio diagram which depicts four assets with varying properties. Among the assets A and B in the above diagram, which is more attractive to you? What choices would you make in all the other pairings? The most common response is that A is best because it maximises the expected return while minimizing the risk. Carrying this logic further, both B and C rank between A and D. However there is no obvious way to choose between B and C. This choice will vary with the investor, depending on his particular combination of greed and fear of risk. The biases exhibited above can be broken into two types. First, A is preferred to C (and B to D) because it offers more mean return at the same risk level. This bias is called non-satiation, and it is reasonable to expect that every investor will have this bias. Secondly, A is preferred to B (and C to D) because it offers the same mean return but at a lower risk level. Investors who think in this way are said to be risk-averse. As a general rule, the models used in finance assume investors to be risk-averse. However, it is worth noting that there are investors with other tastes. A risk-neutral investor is one who does not care about risk: such an investor would find A and B equally attractive. An investor could even be riskpreferring. For a risk-preferring investor, the most attractive investment is B. Why would an investor be attracted by risk? Well, suppose he is desperate for high returns, but assets with sufficiently high mean return are not available. Then his best option is to go for an asset with a high risk level. In Figure 3.2, A and B are two assets with the same mean return. The vertical bars represent the possible fluctuations in the actual return about the mean position. The higher σ for B results in a wider bar, which in turn opens up the possibility for higher returns than are available from A (as indicated by the dashed line). On the whole, one may expect occasional risk-preferring behaviour from investors with extreme needs, but the dominant mass of investors will be risk-averse. Figure 3.2: Some investors may like risk because it opens the possibility of higher return. An asset or portfolio A is said to be more efficient than another, B, if their mean returns and standard deviations satisfy one of the following: 1. rA > rB and σA ≤ σB 2. rA ≥rB and σA < σB In the situation of Exercise 3.2.1 above, A is the most efficient among the four assets, and D the least. Risk-averse investors will be attracted to more efficient assets. An asset or portfolio is called efficient if no other portfolio is more efficient than it. Exercise 3.2.2 Identify the efficient assets in the following diagram: 3.3 FEASIBLE SET We now move on from considering individual assets to considering their collections into portfolios. The overall task is as follows: given a full list of available assets and the basic properties of their returns (mean, variance, covariance), consider all the portfolios that can be created from them. Locate each possible portfolio on the portfolio diagram and identify the efficient ones. Thus, consider a market in which assets A1,…,An are available. Let the rate of return of Ai have mean ri and standard deviation σi. Consider a portfolio P made of a combination of these assets, and let wi be the weight of Ai in P. Then, as calculated in §3.1, we have rP = ∑ iwiri, σP2 = ∑ iwi2σ i2 + ∑ i≠jwiwjσij. Note that rP and σP depend only on the proportions invested in different assets and not on the total investment. That is, if we scale all the investments in P by the same amount, we will not change rP and σP. The possible portfolios can be obtained by varying the weights, and the region in the portfolio diagram obtained this way is called the feasible set. We shall see that the shape of the feasible set depends not only on the individual properties of the assets but on their relations with each other as expressed by the covariances of their returns. Exercise 3.3.1 What will be the feasible set for one asset? FEASIBLE SET FOR TWO ASSETS Let us start by considering the simplest situation, that of two assets, A and B. Let ρ be the correlation coefficient of rA and rB: ρ= . Recall that –1 ≤ ρ ≤ 1. Now, in real life, perfect correlation ρ = ±1 is highly unlikely and so we start with the assumption that –1 < ρ < 1. (The cases of perfect correlation are taken up in subsequent exercises.) Consider a portfolio P consisting of A and B with respective weights w and 1 – w. Then the portfolio’s mean return and variance are given by If we solve rP = wrA + (1 – w)rB for w and substitute in the expression for σP, we find a relationship of the form Exercise 3.3.2 Verify the formula above. In particular, check that a= > 0. The corresponding feasible set is like a horizontal parabola, but with a more pointed nose (Figure 3.3). In fact, it is a hyperbola. The next exercise asks you to verify this and to establish some properties of the hyperbola. Figure 3.3: A plot of the feasible set for two assets A and B, with ρ = –0.5. The solid part of the plot corresponds to both weights being positive. In the dashed parts, one weight is negative (corresponding to short-selling). Exercise 3.3.3 Consider two assets A and B with ρ ≠ ± 1. Let rA–B = rA –rB be the difference of their rates of return, rA–B = E[rA–B], and σA– B the standard deviation of rA–B. Show that the feasible set for these assets is a hyperbola whose asymptotes have slopes ± and meet on the r-axis. Now we take up the cases of perfect correlation. Exercise 3.3.4 Assume A and B have different positions on the portfolio diagram and that their correlation coefficient is ρ = 1. 1. Show that σA ≠ σB. 2. Let a portfolio P consist of amounts of A and B. Verify that σP = 0 when the weight of A is w= . Show that this weight satisfies either w < 0 or w > 1. Exercise 3.3.5 Show that for ρ = 1, the feasible set for assets A and B has the form given in Figure 3.4(a). Exercise 3.3.6 Assume A and B have correlation coefficient ρ = –1. Let a portfolio P consist of amounts of A and B. Verify that σP = 0 when the weight of A is w= . Show that this weight satisfies 0 ≤ w ≤ 1. Exercise 3.3.7 Show that for ρ = –1, the feasible set for assets A and B has the form given in Figure 3.4(b). Figure 3.4: The feasible set for two assets with correlation (a) ρ = 1 (b) ρ = –1. The dashed and solid segments represent combinations with and without short selling, respectively. Exercise 3.3.8 Let the returns for each pair among the the assets A,B and C have the same correlation ρ. What is the feasible set if (a) ρ = 1, (b) ρ = –1? DIVERSIFICATION An interesting feature that is already apparent is that a combination of two assets can have a σ that is lower than for either individual asset. For example, in Figure 3.3 there is a combination of A and B which has least σ. The process of reducing risk by combining investments is known as diversification. Exercise 3.3.9 Show that in the feasible set for two assets A and B, risk is minimised by the portfolio in which the weight of A is given by w= . Exercise 3.3.10 In the situation of the previous exercise, let σA = σB and ρAB ≠ 1. Show that the risk then is minimised by w = 0.5. FEASIBLE SET FOR MANY ASSETS Now we move on to the case of three assets A, B and C. Taking them in pairs, we first generate three curves of the type we just saw (Figure 3.5). In the diagrams, we assume that no pair is perfectly correlated. Figure 3.5: Pairwise feasible curves for three assets To obtain general combinations of A,B and C, we first take a combination of A, B by picking a point on the curve joining A and B, and then generate the feasible set of it and C. The next diagram (Figure 3.6) shows the result of carrying out this process for various initial combinations of A,B. Initially, we also assume short selling is not allowed and so all weights are non-negative. Figure 3.6: Feasible curves (dashed) obtained by combining C with points on the AB curve If we complete this process and look at all possible combinations, but allowing only non-negative weights, we sweep out the region shown in Figure 3.7. This is the feasible set when short selling is not allowed. Figure 3.7: Feasible set for three assets with no short selling Figure 3.8 shows what is possible when negative weights are allowed— we get a distribution that is slightly broader and also extends indefinitely to the right. Figure 3.8 Feasible set for three assets with short selling For the moment, we leave matters at this somewhat imprecise stage. In the next section, we will see how to get an exact description of the feasible set and its boundary. The situation for many assets follows the same pattern as for three. The assets generate a feasible set which looks like a bullet heading to the left (see Figure 3.9). This characteristic shape is called the Markowitz bullet. If we consider all portfolios having the same mean return (these will be represented by a horizontal line), the one with minimum variance is on the left edge of the feasible set. This edge is therefore called the minimum variance curve. Note that there is further a unique portfolio which has the minimum variance amongst all the feasible portfolios. This is situated on the tip of the bullet and is called the minimum variance portfolio. On the minimum variance curve, the points above the minimum variance portfolio represent efficient portfolios. The upper half of the minimum variance curve is therefore called the efficient frontier. Note that there are two minimum variance curves and two efficient frontiers, according to whether short selling is allowed or not. We have depicted a situation where the minimum variance portfolio is the same for both cases, but it need not always be so (see Example 3.4.7). Figure 3.9: The Markowitz bullet for an arbitrary collection of assets 3.4 MARKOWITZ MODEL We will now embark on a more detailed investigation of the Markowitz bullet. The first thing we shall do is calculate the minimum variance portfolio for any fixed level of mean return. Similar calculations will yield the overall minimum variance portfolio for all mean returns. This will identify the minimum variance curve and the efficient frontier. Next, we shall show that the minimum variance curve is in fact the feasible set for any two points on it, provided that short selling is unrestricted. This Two Fund Theorem will give an explicit description of the minimum variance curve and the efficient frontier. MINIMUM VARIANCE CURVE WITH SHORT SELLING Let us fix a level r of the mean return, and then look for the portfolio which has this mean return along with the least possible σ2. Mathematically, since σ2 is a function of the weights wi, the task is to minimise the function (3.5) while satisfying the conditions (3.6) (3.7) This problem is one of constrained optimisation and the standard technique for this is the Lagrange multipliers method (see Appendix, §A.3). According to this technique, since there are two constraint equations, we should introduce two new variables, say λ and μ, and then set up the following n equations: (3.8) To carry out the differentiation on the left-hand side, note that in the expression ∑i≠jwiwjσij each i,j pair occurs twice: once as wiwjσij and again as wjwiσji. Also, σij = σji. Therefore, Substituting these calculations into equation (3.8), we get We can simplify these a little more by replacing the variables λ and μ by 2λ and 2μ respectively. Then the last set of equations becomes (3.9) Combining these with the two constraint equations (3.6) and (3.7) gives a system of n + 2 linear equations in n + 2 variables, w1, …,wn,λ,μ. This linear system can be expressed in matrix notation as follows: (3.10) Exercise 3.4.1 Consider three assets A,B and C with the following properties. rA = 0.2, rB = 0.4, rC = 0.6, σA = σB = σC = 1, σAB = σAC = σBC = 0. Find the minimum variance portfolio amongst all their combinations with r = 0.5. Finding the overall minimum variance portfolio, without any constraint on r, is slightly simpler. In this case, the constraint (3.6) is not present and we need only introduce one new variable, μ. We get the following linear system in the variables w1 … , wn,μ (again, after replacing μ by 2μ): This system has the following form in matrix notation: (3.11) Exercise 3.4.2 Consider the assets of Exercise 3.4.1 Find their combination with the minimum variance. Exercise 3.4.3 Consider a universe of n assets with notation as above. Define: Show that the weights w = (w1,w2,…,wn)T of the minimum variance portfolio are given by Once the minimum variance portfolio is known, we can identify the efficient frontier. It consists of all the points on the minimum variance curve whose mean return is more than that for the minimum variance portfolio. MINIMUM VARIANCE CURVE WITHOUT SHORT SELLING When short selling is not allowed, the weights have to be nonnegative. The problem for finding the minimum variance portfolio for a level r of the mean return becomes: subject to the constraints, The constraints wi ≥ 0, being inequalities, complicate matters. The Lagrange multipliers method no longer applies. Instead, one has to resort to the methods of quadratic programming. The mathematics involved, while not very abstract, is yet complicated enough for us to avoid it in this text. Therefore, in the rest of this chapter, we will only consider the situation where short selling is unrestricted. Example 3.4.4 In this example, we shall give a flavour of the results of Markowitz in the absence of short-selling. In fact, we will use one of his examples. We assume the availability of three assets A,B and C with the following mean returns: rA = 0.062, rB = 0.146, rC = 0.128. Their covariance matrix is taken to be: Thus, the variance of A is σA2 = 0.0146, the covariance of A and B is σAB = 0.0187, and so on. Figure 3.10: The feasible set and efficient frontier for Example 3.4.4 The first diagram in Figure 3.10 shows the feasible set for these assets without short selling. It has been drawn by taking about 500 random combinations of the assets with positive weights. The assets themselves are marked by the solid squares. This already shows that the efficient frontier may not be completely smooth. There is at least one point where it turns sharply. In fact, Markowitz showed that the efficient frontier has the following form. First, there are some key points on it, which we will call turning points. The efficient frontier is obtained by connecting adjoining turning points via their feasible curves. Thus, the frontier consists of pieces of hyperbolas, glued together at the turning points. The second diagram in Figure 3.10 shows the turning points (the stars) and efficient frontier for this example. In Example 3.4.7, below, we carry out the calculations whose results are shown in the diagram. □q TWO FUND THEOREM We return to the situation of a market based on n assets, with no restrictions on short selling. By our earlier calculations (equation (3.10)),we know that a portfolio has the minimum variance for a fixed mean return r if its weights (wi) satisfy the system SW = R where S= ,W= ,R= for some choice of λ and μ. Now, let A and B be two portfolios on the minimum variance curve. We set up notation as follows: 1. The weight of the ith asset in A is wiA. 2. The values of λ and μ for A are λA and μA. 3. The mean and variance of the rate of return for A are denoted by rA and . 4. The quantities connected to B are denoted similarly. Let P be a combination of A and B. Let wA and wB be the weights of A and B in P. We first note that the weight of the ith asset in P is given by wiP = w iAw A + wiBw B. Further, we define λP = wAλA + wBλB and μP = wAμA + wBμB. We now calculate: Therefore, the portfolio P satisfies the system (3.12) and is on the minimum variance curve: it is the minimum variance portfolio for the mean return rP= wArA + wBrB. The situation is summed up by: Theorem 3.4.5 (Two Fund Theorem) Suppose short-selling is not restricted. Fix two portfolios A and B on the minimum variance curve. Then, any combination of A and B is also on the minimum variance curve. Conversely, every point on the minimum variance curve can be represented by a combination of A and B. □ In particular, the minimum variance curve is a hyperbola when short selling is unrestricted, since it is the feasible set for any two portfolios lying on it. The Two Fund Theorem has a simple consequence: Theorem 3.4.6 Suppose short-selling is not restricted. Then any combination of efficient portfolios, with all weights non-negative, is also efficient. The Two Fund Theorem is very useful for an investor who does not□ have the resources or inclination to do a full analysis of the available assets. Suppose such an investor identifies two portfolios A and B that he has reason to think are efficient (For example, he may consider two well managed mutual funds). Now, to create an efficient portfolio with a desired mean return r, he only has to combine A,B appropriately. Their weights wA and wB have to be chosen such that r = wArA + wBrB. This is the only calculation he has to make! Example 3.4.7 We will now use the Two Fund Theorem to derive the results of Example 3.4.4 concerning the efficient frontier in the absence of short-selling. Since the example has three assets, all portfolios can be described using 2 weights. Let wA be the weight of asset A and wB the weight of asset B. Then the weight of asset C is 1 – wA – wB. Each portfolio can therefore be represented as a point on a plane, with coordinates wA and wB . The variance and mean return of a portfolio are functions of wA and wB : The portfolios with the same variance therefore lie on an ellipse, while those with the same mean return lie on a line. This gives us a family of concentric ellipses and another family of parallel lines, as depicted in Figure 3.11. Figure 3.11 shows that σ2 decreases as we move towards the inner ellipses. Therefore, minimum variance portfolios (at given levels of r) are located at the points where a constant r line tangentially touches a constant σ2 ellipse. Figure 3.11: (Example 3.4.7) The parallel lines have constant r. The concentric ellipses have constant σ2, changing from 0.03 to 0.0178 to 0.015 as we move from the outermost to the innermost of the drawn ellipses. The square box marks the overall minimum variance portfolio. The thick slanting line is the minimum variance curve: each member has minimum σ2 for its level of r. We solve equation (3.11) to obtain the overall minimum variance portfolio. It has weights wA = 1.1023 and wB = –0.0697. (Note that it requires short selling.) Its mean return and variance are r = 0.054 and σ2 = 0.0143. Minimum variance portfolios at other levels of r can be obtained by solving equation (3.10). The one at r = 0.01 has wA = 1.7236 and wB = –0.2359. It also has σ2 = 0.0178. Figure 3.12: (Example 3.4.7) In the absence of short selling, the minimum variance curve is AJKB. The Two Fund Theorem informs us that the minimum variance portfolios lie on the straight line through the two portfolios we have just determined. In Figure 3.11, this is the thick slanting line. Its equation is = or wB = –0.2675 wA + 0.2252. When short selling is not allowed, we are confined to the triangle lying between the the wA and wB axes and the line wA + wB = 1 (triangle ABC in Figure 3.12). The minimum variance line intersects the edges of this triangle at two points J and K, given by J = (0.8419,0) and K = (0,0.2252). Inside triangle ABC, as we walk down any r = constant line, we will achieve minimum σ at one of the edges AJ, JK or KB. Therefore, the minimum variance curve is now the broken line AJKB. Each of these line segments corresponds to a piece of a hyperbola in the σ – r portfolio diagram. □ Exercise 3.4.8 Consider a portfolio constituted of the three assets A, B and C in the preceding example. Regulations restrict the manager of the portfolio to keeping each asset’s share of the portfolio to at least 20%. What will be the efficient frontier under these conditions? Exercise 3.4.9 Note that a line cuts an ellipse in atmost two points. This implies that for each feasible σ–r combination, in a universe of three fundamental assets, there will be one or two different portfolios with that pairing of mean return and variance. In fact, there will be one portfolio if the combination is on the minimum-variance curve, and two portfolios otherwise. What will happen in a universe of four or more fundamental assets? ONE FUND THEOREM So far, we have considered situations where every asset is risky. Let us now consider situations where there are also risk-free assets. Such assets have σ = 0 and the same rate of return rf by the No Arbitrage Principle. The feasible set arising out of a risky asset A = (σA,rA) and a riskfree asset (0, rf ) is easily calculated. If their respective weights in a portfolio P are w and 1 – w, then rP = wrA + (1 – w)rf w= and σP2 = w2σ A2 σ P = |w|σA = The situation is depicted in Figure 3.13. |rP – rf|. Figure 3.13: Feasible set for a risky asset A and a risk-free asset. The upper and lower dashed lines represent portfolios where the riskfree asset and A have been short sold, respectively. From this observation, it is an easy step to obtaining the feasible set for n risky assets together with a risk-free asset. It is clear that the new feasible set will consist of the region between two straight lines passing through (0,rf), and that these straight lines will be symmetric about the horizontal line at height rf. It is also clear that the edge of the new feasible set will tangentially touch the old feasible set at either one or two points, depending on the position of the minimum variance portfolio (with standard deviation σV and mean return rV ) arising purely from the risky assets: 1. If rV = rf, there will be two points of tangency. 2. If rV < rf, there will be one point of tangency, and it will be on the lower edge. 3. If rV > rf, there will be one point of tangency, and it will be on the upper edge. (This is the situation illustrated in Figure 3.14.) Figure 3.14: Feasible set when there is a risk-free asset and short selling is allowed: the situation of the One Fund Theorem. The last case is the most important one. This is because, in line with the general expectation that riskier assets offer higher returns, we expect rV > rf to be the typical case. Our discussion is summed up by: Theorem 3.4.10 (One Fund Theorem) Suppose that a risk-free asset is available with rV > rf. Then there is a unique portfolio M of risky assets such that the efficient frontier consists of the ray starting at the risk-free asset and passing through M. □ It is a fact that the One Fund Theorem is true even in the absence of short selling. We shall not formally prove this, but it is evident from the picture. The One Fund Theorem is also called the Separation Theorem and was discovered by James Tobin [50] in 1958. Tobin won the Nobel Prize in 1981. The ray giving the efficient frontier is called the capital market line. If the point M = (σM,rM) is known, then the equation of this line is r = rf + σ. This describes the profit-risk relationship for efficient portfolios. Two problems remain: finding M, and establishing the profit–risk relationship for all portfolios. The portfolio M can be found by calculus. For any portfolio P composed purely of risky assets with weights (wi), the slope of the line joining it to the risk-free asset is m= = = . Note that if we scale all the weights by the same positive constant, m does not change. So the constraint ∑ iwi = 1 can be ignored, and we can let the weights wi vary freely. We now have a problem of unconstrained optimisation and we solve it by setting = 0, i = 1,…,n. We differentiate m using the quotient rule and find We first solve the linear system rk – rf = ∑jσkjvj for the vj’s, and then scale them to obtain the weights (wj) of M: wj = . Although we have obtained an equation for the “one fund” M, its practical solution may be difficult. First, in any reasonable market, n will be large and the computations will be time-consuming as well as unreliable due to the accumulation of roundoff errors over the many individual calculations (solving a system of size 1000 × 1000 requires roughly 108 multiplications and divisions). The final answer is also likely to be highly sensitive to even small fluctuations in the estimates of r and σ. 3.5 CAPITAL ASSET PRICING MODEL The high point of the Markowitz model is the One Fund Theorem. However, we noted that while the model gives us a linear system that we can solve to get the “one fund” M, the actual numerical solution may be unreliable. Moreover, since M is special, there ought to be some theoretical insight into its constitution. This is provided by the Capital Asset Pricing Model (CAPM). CAPM also completes the description of the risk–profit relationship, extending it from efficient portfolios to all portfolios. It exists in many versions, with varying assumptions, and what we shall describe here is just the simplest of these. CAPM was developed during the 1960s by various scholars. Chief amongst these were W F Sharpe (he won the Nobel Prize in Economics in 1990), J V Lintner, J Mossin and J L Treynor [27, 34, 44, 22]. MARKET PORTFOLIO CAPM is based on the following assumptions about market conditions and the behaviour of investors: 1. Investors make their decisions only on the basis of the means and variances of the portfolio returns. 2. All investors plan for the same time horizon T and make the same estimates of the means, variances and covariances. 3. Each investor creates an efficient portfolio. 4. The same risk-free rate is applied to lending and borrowing and is available to all investors and for all amounts. The net result is that each investor will calculate the same Capital Market Line, and will then choose a point on it, according to his level of affinity for risk. (We are using the fact, noted at the end of the previous section, that the One-Fund Theorem is true even in the absence of short selling of risky assets.) Each point on the Capital Market Line consists of a proportion of M and the risk-free asset, so the total investment (obtained by summing over all investors) is also a combination of the risk-free asset and M. Hence, the total investment in risky assets is just a multiple of M and can be identified with it (since scaling the size of an investment without changing its weights leads to the same rate of return). Consequently, we call M the market portfolio. It is impossible to completely describe the market portfolio. It is difficult to even list the risky assets fully, let alone calculate the amounts invested in them. Instead, one usually settles on using a comprehensive stock index as an approximation to the market portfolio. MARKET BETA It is natural to explore the relationship between any one asset or portfolio and the full market portfolio M. For example, we would like to know how fluctuations in the market would correlate with those of the asset. If the market return rises by 10%, can the asset return also be expected to rise—and by how much? As a first step, we obtain a linear approximation to the relationship by the method of Ordinary Least Squares or OLS (see Appendix, §B.14). According to this, the rate of return r of the asset is approximated by α + β rM, where rM is the market return, with β β= and α = r – βrM. The coefficient β is called the market beta or just beta of the asset. Table 3.1: Some betas calculated with respect to the NIFTY stock index over the years 2004–2005 and 2006–2007. As this table illustrates, betas are usually positive and mostly vary between 0.5 and 2. (Source: National Stock Exchange of India.) The beta of an asset gives a quick idea of how it is related to the market. If the beta is positive, uptrends in the market will generally correspond to uptrends in the asset. If the beta is negative, uptrends in the market will correspond to downtrends in the asset. In addition to its sign, the magnitude of the beta also carries information. The larger the magnitude of beta, the greater the fluctuations in the asset. For example, suppose β = –2. Then the fluctuations in the asset return will generally be about twice those in the market return, and in the opposite direction. If β = 0, the asset and market returns are uncorrelated. A useful property of β is linearity. Suppose a portfolio P is composed of two assets A and B with weights wA,wB. Let rA denote the rate of return of A, βA denote the beta of A, and so on. Then the beta of P is given by Figure 3.15: A plot of monthly returns on ICICI Bank shares (r) versus the returns on the S&P CNX 500 Index (which represents the market portfolio) rm during the years 2003–2007. There is a clear linear component to the relationship. In §3.8 we explain how to fit a line to such data. More generally, if the assets in a portfolio P have weights w1,…,wn and betas β1 , … , βn, then the portfolio beta is given by Thus the beta of a portfolio can be determined from those of its constituents. CAPM FORMULA Having identified the “One Fund” with the market portfolio, let us now complete the profit–risk description. CAPM achieves this by considering the relationship of any portfolio with the market portfolio. For a given portfolio A, consider the set consisting of all combinations of A and the market portfolio M. This set forms a curve in the feasible set, as shown in Figure 3.16. The Capital Market Line meets this curve at M and, as it cannot cross it, is tangential to it. Let us consider the implications of this geometric insight. Let P be a portfolio in which A has weight t and M has weight 1 – t. Then its mean return and variance are given by Figure 3.16: Illustration for the proof of the CAPM formula In order to calculate slopes at M, which corresponds to t = 0, we differentiate the above expressions at t = 0 to get: The slope of the curve at M is therefore given by By the tangency condition, this equals the slope of the Capital Market Line: We do a final rearrangement: rA= rf + β(rM – rf), where β = ρAM . This is the basic conclusion of CAPM. The coefficient β (beta) measures risk relative to the market (note that this is the same β as obtained earlier via OLS). It gives the change in rA corresponding to a unit change in rM. If we plot r against β, CAPM predicts the result will be the straight line plotted in Figure 3.17. This line is called the Security Market line. CAPM suggests that the risk-profit relationship is more clearly understood if we do not look at an asset in isolation. Instead, we should consider it relative to the overall market. Thus β, rather than σ, is the best fundamental variable. However, since every portfolio lies on the Security Market Line, CAPM does not help us choose between portfolios. The overall picture is that CAPM helps us price portfolios, while the Markowitz Model helps us choose between them. An incidental gain of shifting to β is that risk becomes linear. Figure 3.17 According to CAPM, when r is plotted against β, all assets lie on the Security Market Line. Two special points on this line are the ones corresponding to risk-free investment (β = 0 and r = rf) and the market portfolio M (β = 1 and r= rM). 3.6 DIVERSIFICATION The Markowitz model has shown the benefits of diversification, i.e., investing in a variety of assets. By doing so, we can reduce risk while keeping the expected return at a satisfactory level. Now we shall apply CAPM to this idea. Inspired by the CAPM relationship for mean returns, we consider the original returns themselves. Define a new random variable ϵ by rA = rf + β(rM – rf) + ε, or ε = rA – rf – β(rM – rf). Taking variance on both sides, we find: This is again rearranged as σA2 = β2σM2 + σ ε2. Note that if A = M then ε = 0 and the second term vanishes. Therefore, the second term is seen as representing diversifiable risk: risk which can be avoided by diversifying. The first term cannot be avoided in this manner and is therefore called undiversifiable or systemic risk: it represents the risk due to general trends in the market. CAPM asserts that not all risk is relevant to pricing. The part called diversifiable does not contribute to pricing (since only β shows up in the CAPM formula). The reason is that this risk can be tempered by combining the asset with others—hence there is no need to demand or give compensation for it. The undiversifiable risk, however, is unavoidable and this is what governs pricing. This analysis of risk is one of the striking achievements of CAPM. Exercise 3.6.1 According to CAPM, diversifiable risk does not contribute to pricing. Does this mean it has no relevance to investment decisions? 3.7 CAPM AS A PRICING FORMULA Consider an asset with initial value V0 and expected value V at some future time. Then, according to CAPM, = rf + β(rM – rf). This can be solved for V0: V0 = . Thus CAPM provides a way to price assets based on their future payoff and associated systemic risk. Note how this formula generalises that for the present value of a known cash flow, where we divide the future value by 1 + rf . If an asset with (unknown) future value V is being sold now at a price P, the above formula suggests we define its net present value by NPV = –P + . Exercise 3.7.1 Consider an asset with present value V0 and unknown future value V . Derive its certainty equivalent pricing formula from CAPM: V0 = . Exercise 3.7.2 Consider assets with current values V1 and V2. Show the value of their combination is V1 + V2 by using: 1. No Arbitrage Principle 2. CAPM Exercise 3.7.3 Suppose that a model visualises 3 states for a market one year from now: boom, stagnation, and recession. The corresponding probabilities, market returns, and value of an asset are tabulated below. What should be the current price of the asset? (The risk-free rate is 5%.) 3.8 NUMERICAL TECHNIQUES The models we have been considering involve certain properties of assets that cannot be directly observed. Specifically, they involve prior knowledge of the distribution of asset prices in the future. Clearly, we cannot exactly know the expectation and variance of the future return from an asset. We can only make estimates, based on how the asset prices have varied till date as well as our analysis of what is likely to occur in the future. We shall now describe how historical data is used to make these estimates. We first take up the Markowitz Model. MARKOWITZ MODEL Here, the goal is to describe the feasible set and we need to know r and σ for each relevant asset. One way to estimate them is to look at data from the past. For example, we consider evenly spaced data over a time period and use it to form a sequence of rates of return: r1,r2,…,rN. This can be viewed as a random sample for the mean rate of return r. Hence we can use sample mean and sample variance as estimates for r and σ2 (see §B.15): Generating the full feasible set requires more work. The first step is to find the covariances for each pair of assets. For this, we use the following estimator (see §B.15): where ri,k is the k th value of the observed rate of return of the i th stock, and Ri is the mean of the ri,k (1 ≤ k ≤ N). Example 3.8.1 Figure 1.2 was generated by applying this technique to the weekly returns of the 65 stocks constituting the Dow Jones Composite Index over a one-year period in 2005–06. (The data was downloaded from Yahoo!Finance.) The 65 stocks form , and we have to calculate 1040 different covariances! However, if we collect the data on returns into a single spreadsheet or file, the matrix of covariances can be calculated by programs such as Excel and Mathematica. Once this is available, the program can also be used to solve the linear system for the points on the minimum variance curve. Figure 3.18: Efficient portfolios for the data of Figure 1.2. The square box marks the Dow Jones Composite Index while the diamonds represent its constituent stocks. The filled stars show the efficient frontier when short selling is allowed, while the unfilled stars show the efficient frontier when it is not. (See Example 3.8.1.) While we have not explained the quadratic programming techniques needed for the case when short selling is not allowed, these are already embedded in various software. Figure 3.18 shows □ the two efficient frontiers for the data of Figure 1.2.9 Difficulties in the application of this model centre around the estimates of r and σ, as these depend on the amount of data used. Usually, one would like to use as much data as possible so that the estimates stabilise (see the discussion of the effect of sample size on mean and variance estimates in §B.15). In our present case, however, the behaviour of the assets cannot be expected to remain the same over time and hence older data may distort the picture instead of refining it. SHARPE INDEX Consider a risky asset with mean return r and standard deviation σ. Its Sharpe index is the ratio S= where rf is the risk-free rate. In practice, we would use estimates and for r and σ. Figure 3.19: The Sharpe index S of A is given by S = tanθ Geometrically, the Sharpe index is the slope of the line joining the risky asset to the risk-free asset in the portfolio diagram (the dashed line in Figure 3.19). A higher value of the index indicates that the asset is closer to the Capital Market Line. Hence the index measures the efficiency of the asset. CAPM The flip side of the theoretical elegance of CAPM is a certain lack of solidity when it comes to numerical work. The main difficulty in its use is the elusive nature of the market portfolio M. As remarked earlier, it is not possible to fix the composition of M—in fact it is not even feasible to list all its constituents. ESTIMATING BETA In any case, let us start with the standard first step of choosing a certain comprehensive stock index as an approximation to M. By manipulating historical data as in the previous section, we can find estimates of rM as well as r (for any stock). We can also estimate the β of any stock relative to the stock index by substituting the relevant estimates of covariance and variance into its formula. It is worth noting that beta can also be estimated from its interpretation as the slope of the Ordinary Least Squares line. Thus, suppose we have data x1,…,xN for the rates of return from the market portfolio (or the index representing it) over equal time intervals, and also data y1,…,yN for the corresponding rates of return of a certain stock. We try to fit a line y = a + bx to this data. Following OLS, we define the best line as the one that minimises the total squared error We carry out the minimisation by applying the first derivative test. This gives the equations The solutions are and We shall call the line determined by these values of a,b the OLS line. Exercise 3.8.2 Verify the above results. The number b provides an estimate of the beta β of the asset relative to the index. The advantage of using this estimate of β is that we also get an idea of how good it is. We first define ŷi = a + bxi. This is the y-value corresponding to xi as predicted by the OLS line. The question is: How much of the variation of the yi values from their mean can be explained by the OLS line? Figure 3.20: An illustration of the OLS fit of a line to data Let be the mean of the y values. For any i, the deviation from the mean (in the original data) is yi – y. The amount of this deviation that is explained by the OLS line is ŷi – y (Figure 3.20). To get the overall picture, we define: The coefficient of determination is defined by It represents the fraction of variation that is explained by the linear dependence of y on x. It is a fact that 0 ≤ R2 ≤ 1. The closer R2 is to 1, the more significant is the OLS line, and the more reliable is our estimation of β by b. Example 3.8.3 Figure 3.15 is based on monthly returns over 5 years (2003–2007) for the S&P CNX 500 Index and ICICI Bank stock. The OLS line for this data produced by the above formulas is r = 1.036 rM + 0.0058, where r is the (predicted) monthly rate of return of ICICI stock and rM is the monthly rate of return of the index. The coefficient of determination is R2 = 0.51. This is a reasonably high value of R2 and suggests a strong relationship between the two sets of data. □ Now that we know how to estimate r and β for each asset, we plot the estimated (β, r ) pairs on a single diagram. We cannot expect these points to lie on a straight line since they are not based on the exact market portfolio, and further, they use estimates of r and β which may deviate significantly from the true values. Nevertheless, if CAPM is to be useful, they should lie roughly along a line. SECURITY MARKET LINE An approximate Security Market Line can be drawn by connecting the risk-free point to the one marking the stock index. Example 3.8.4 Example 3.8.3 described the calculation of the β of ICICI Bank versus the S&P CNX 500 index over the years 2003-2007. Figure 3.21 shows the results of extending these calculations to a total of 34 stocks selected from the members of the Nifty index (we have picked all those which were included in Nifty throughout 2003– 2007). Figure 3.21: A plot of r versus β for 35 stocks and the S&P CNX500 index. The box marks the index and the line through it is the estimated Security Market Line. See Example 3.8.4. The resulting plot is supportive of a linear relationship between r and β, as predicted by CAPM. There is only one point which is dramatically away from the Security Market Line. This is the point representing the construction company Unitech, which had a remarkably high mean return in this period due to a single huge contract. □ Figure 3.22: Illustration depicting Jensen’s index α The Security Market Line marks the expected return for any level of β. The performance of an individual asset over the period can be judged by its position relative to this line. If it plots above the line, it has performed better than average at its level of systemic risk. If it plots below, it is below average. Thus, the performance of the asset can be quantized by measuring the vertical distance from the Security Market Line: α = – rf – ( M – rf). This quantity is called Jensen’s Index or Jensen’s alpha. The symbols , M and represent the estimates of r, rM and β, respectively. Investors prefer assets with high α as these have shown superior performance in the recent past. CAPM AND REAL LIFE Much ink has been spilled on the issue of whether CAPM is an accurate or useful model. An important early criticism, which we have already touched upon, was by Richard Roll [39]. Roll observed that the market portfolio is impossible to describe, even at the theoretical level. For example, the level of skill of its work-force is certainly an asset for a company—but how can it be given a numerical value? Even if we agree on a list of assets to be counted, it will be impossible to obtain all the relevant data. Hence, when we test the CAPM, we must use a substitute for the market portfolio. If the test gives a poor result, we cannot say whether this is the fault of CAPM or of our choice of substitute. This means that CAPM cannot be refuted by experiments and so, in a philosophical sense, it is unscientific. Roll’s criticism is certainly valid. However, from a practical standpoint, the issue is not if CAPM is correct but whether it is useful. CAPM has been used in many ways—for example, analyzing risk, pricing assets and evaluating a firm’s performance. Do its predictions in these matters stand up to statistical scrutiny? Here are some of the empirical criticisms of CAPM: 1. The Security Market Line is often far from the OLS line for the data, and so is not a good fit to it. Typically, the OLS line is flatter than the Security Market Line—low β stocks have higher mean returns than CAPM predicts. 2. Data suggests that market β does not completely describe the risk–profit relationship and other variables, such as the Total Market Capitalisation (TMC) of a stock, are also involved. 3. The r–β relationship does not always appear linear. Naturally, many attempts have been made to create models which resolve these difficulties. A few of the better-known ones are: Black’s CAPM: Fischer Black (1972) replaced the assumption about unrestricted availability of the risk-free asset with unrestricted shorting of risky assets. This still leads to a Security Market Line on which every asset lies, but its slope is not prescribed.10 Arbitrage Pricing Theory: This model, formulated by Stephen Ross (1976), assumes only that investors will prefer the higher of two riskfree returns (in particular, they do not carry out mean–variance analysis). The model develops a formula in which an asset’s mean return depends linearly on a number of factors. It does not prescribe what these factors should be, and leaves them to be determined by regression analysis of data. Fama and French’s Three-factor Model: Eugene Fama and Kenneth French (1993–96) created a model in which asset returns depend also on total market capitalisation (TMC) and book-to-market equity ratio (B/M). This has become quite popular. Its main drawback is its empirical nature, lacking a full theoretical explanation for the choice of additional variables. In sum, the story is that there is no completely satisfactory replacement for CAPM. Moreover, most alternate models start with a CAPM type formulation and then try to modify or enlarge it. Thus, over forty years after its birth, it remains at the center of modern finance.11 Exercise 3.8.5 The following exercises sketch the derivation of Black’s CAPM. We assume there is no risk-free asset but there is unrestricted short selling of risky assets. 1. Show that since every investor chooses a point on the efficient frontier, the market portfolio must be efficient. 2. Let Z be a portfolio whose market beta is zero. Show that the mean return of any portfolio A is then given by rA = rZ + β(rM – rZ), where β is the market beta of A. (HINT: In the proof of CAPM, replace the Capital Market Line with the feasible set of M and Z.) 9 The frontierwhen short selling is not allowed has been plotted using the software Efront, available from J R Varma’s website: http://www.iimahd.ernet.in/~jrvarma.Another option is to use the Solver tool in Microsoft Excel and Open Office Cale. 10 We will encounter Fischer Black again when we study financial derivatives. He is best known for the Black−Scholes options pricing formula. 11 A good reference on these matters is Fama and French [20]. 4 Forwards and Futures W ith this chapter we begin the second phase of our study. Earlier, we developed the general principles of how assets can be evaluated and combined into portfolios. Now we shall study financial derivatives, which provide ways of fine-tuning the risk characteristics of a portfolio, such as its systemic risk or its exposure to interest rate fluctuations. Over the last thirty odd years, trade in derivatives has undergone exponential growth and they have become central to investment science. In this chapter, we shall study the most basic derivatives called forwards and futures. Besides becoming familiar with their basic structure, we shall also learn how to use them to modify the risk associated to an asset or portfolio. We now take a detailed look at forwards and futures: their pricing, their uses, and the differences between them. 4.1 FORWARDS AND FUTURES A typical financial derivative is a contract for a future trade of a basic asset such as a stock, a bond, or a commodity like oil. The terms of the contract would clearly specify the details of the future trade, such as the time at which it will be executed, the procedure by which the price of the asset will be determined, the manner of its delivery, etc. As a simple example, consider a contract which offers you the right to buy a certain share a year from now by paying Rs 200 for it at that time. As the year expires, suppose the share price moves to Rs 250. Then the contract clearly offers you a good deal and this will attract the attention of other investors, who become desirous of buying the contract from you in order to benefit from it. Thus, the contract itself becomes an object of trade and its value changes with time, depending on various factors, such as changes in the value of the share and fluctuations in interest rates. We introduce some relevant terminology: 1. Underlying Asset: The asset on whose trade the derivative is based. 2. Spot price: The current value of the underlying asset. 3. Writer: The person or firm that offers the derivative for sale. 4. Holder: The buyer of the derivative. The main problem that will concern is: Fundamental problem: Establish the value of a derivative from currently available information. Success in solving this problem for any derivative will enable us to put it to various uses involving risk management. FORWARD CONTRACTS In a forward contract, the holder agrees to buy a fixed type and amount of the underlying asset from the writer at a fixed future date (expiration date) and at a price (exercise price) agreed upon now (see Figure 4.1). 1. On the expiration date, the holder must pay the exercise price to the writer. 2. In return, the writer must deliver the underlying asset to the holder. 3. No money is exchanged at the time of signing the contract. Figure 4.1: The structure of a forward or futures contract signed at time t = 0 with expiration time t = T and exercise price X The exercise price is also called the strike price or forward price. A forward contract is also just called a forward. Example 4.1.1 A packaged food company and a farmer will trade in a certain amount of potatoes 3 months from now, after the harvest. If the crop is poor, prices will rise, and the company will face a loss. If there is a bumper crop, prices will fall, and it will be the farmer who will face a loss. Both parties can mutually eliminate their risk by agreeing now on what price they will trade in 3 months time. This is a forward contract. □ FUTURES CONTRACT The main features of a futures contract are identical to a forward contract. A futures contract may be referred to simply as a futures, and its exercise price may again be called its strike price or futures price. However, futures are traded through an exchange, are standardised, and can be traded further (the holder can sell to a new holder). These features make futures an easily used and flexible tool for investment. The mathematical treatment of forwards and futures is almost identical, and so we shall mostly treat these terms as interchangeable. Example 4.1.2 Consider a company that buys oil and uses it to generate electricity for supply to certain cities. The price at which it sells electricity is subject to government regulation and cannot be easily changed. Oil prices, on the other hand, are highly unstable. Therefore it is constantly exposed to the risk of loss due to a sudden rise in oil prices. The company can reduce this risk by buying oil futures, say on the New York Mercantile Exchange (NYMEX), locking in the prices for the near future (up to 6 years on NYMEX). Because these contracts are for very large amounts of oil, delivery is not fixed for one day, but can be executed over a month. For example, one set of oil futures on NYMEX had December 2007 listed as the expiry month. These futures could be traded until November 16, 2007, and delivery of the oil had to take place between December 1 to 31, 2007. □ The number of futures traded on various exchanges has mushroomed in recent years: NYMEX: From 68 million in 1994 to 216 million in 2006. London International Financial Futures and Options Exchange (LIFFE): From 106 million in 2000 to 501 million in 2007. National Stock Exchange of India (NSE): From 90,580 in 2000– 01 to 139 million in 2005–06. COMPARISON OF FUTURES WITH FORWARDS In this section we introduce and explore the differences between futures and forwards. First of all, a forward is a contract negotiated between two fixed parties, and is usually not traded. A future on the other hand is a standardised contract which is traded on an exchange. The exchange provides a physical or electronic mechanism to facilitate the trading as well as certain guarantees against default. The latter, particularly, are missing for forwards. We now look at how an exchange provides protection from default. Consider an investor who approaches a broker in order to buy a certain futures for a stock. The broker creates a margin account and asks the investor to make a deposit called the initial margin. Consider the futures contracts for a certain asset, expiring at a time T. Such contracts may be written at any time before T. Let Xn represent the futures price for futures written at the end of day n. We assume that our investor bought his futures during day 0 with a futures price X. At the end of the day, if the futures price X0 is more than X it represents a gain for the investor, as he has locked in a lower price than the one now available. On the other hand, the writer of the contract sees a corresponding loss. Therefore an amount X0 – X is transferred from the margin account of the writer to that of the holder. If X0 < X, the opposite is done: X – X0 being transferred from the holder’s account to the writer’s account. This can also be viewed as a transfer of the negative amount X0 – X into the holder’s account. This process is called marking to market, and is repeated at the end of each trading day. It continually tracks the investor’s gain or loss and thus provides protection from default. The total change in the margin account over the life of the contract is given by: (X0 – X) + (X1 – X0) + + (XT – XT–1) = XT – X = ST – X (4.1) which exactly equals the holder’s profit from the contract. Similarly, the change in the writer’s account is X – ST, which is his profit. At the end, neither party can gain by defaulting since the value owed by the other has already been transferred! Exercise 4.1.3 Justify the last equality in equation (4.1). For the process to work, it is important that the amount in the margin account always be positive. So a maintenance margin is fixed and, if the amount in the margin account ever falls below it, then the investor has to make a fresh deposit. Keeping money in the margin account is a cost for the investor. Some brokers allow the investor to earn interest on this money, but even this is usually at lower than market rates. On the other hand, the investor can withdraw any surplus that accumulates above the initial margin. The possibility of intermediate withdrawals/deposits makes futures slightly different from forwards, since the cash flow is no longer concentrated at the expiry time. However, the difference this creates in pricing is slight and we shall ignore it, apart from some comments in the next section. For the rest of this chapter, “futures” will have the default meaning of “futures and forwards”. Whenever some statement points only to futures (or only to forwards), this will be explicitly stated. 4.2 FORWARD AND FUTURES PRICE In this section we shall consider the problem of determining the right exercise price for a futures contract. Before we begin, let us think about what might influence this choice. The initial spot price of the underlying asset should certainly have a role. Since a future payment is involved, the interest rate should also come into play. At first glance, it may seem that an expected rise in the future asset price would push up the exercise price—then the expected future asset price would become a factor. Perhaps you can imagine other factors that may be relevant. Figure 4.2 shows the typical pattern: the exercise or futures price is very closely tied to the spot price of the underlying asset. To bring some clarity to this situation, we take recourse to the No Arbitrage Principle. The following examples show how we can detect when an exercise price is too high or too low relative to the spot price, and thus determine its correct value. Figure 4.2: Reliance stock and futures during June–August 2005. The futures prices are for contracts expiring on August 25 2005. Example 4.2.1 Suppose a futures is being written with an exercise price of € 100, its expiration date is 3 months from now, and the current spot price of the underlying stock is € 100. Suppose also that it is possible to borrow cash for this duration at an annually compounded rate of interest of 4% per annum. In this situation, an investor can make a riskless profit as follows. First, she acquires the contract when it is written (no cost is attached) and also short sells the stock at a price of € 100. Thus, she currently has € 100. Investing this at 4%, after 3 months she has € 101 cash. With € 100 of this she pays the holder of the contract, acquires the stock, and uses this to make the delivery on the short sale. With all her obligations cleared, she is left with a profit of € 1. This profit is riskless––it does not depend on the future price of the stock! □ Example 4.2.2 In the example above, suppose the exercise price is € 102 instead. Our previous calculation suggests this price is too high and so an investor would attempt to profit by writing this contract. At the same time, he borrows € 100 to buy a unit of stock. At the expiry time, he delivers the stock to the holder, earns € 102, and uses € 101 of it to pay off the loan. He is left with € 1 riskless profit. □ The examples, together with the No Arbitrage Principle, suggest that the correct exercise price for that particular futures is € 101, which is the future value of the initial spot price. With this price, neither holder nor writer can make a riskless profit. We now apply this line of thinking to a general situation. Theorem 4.2.3 Consider a futures contract with expiration date T (measured in years from the writing of the contract). Let the spot price of the underlying asset at the time of writing be S. Further, suppose one can borrow cash for this period at a continuously compounded annual rate of interest r. Then the No Arbitrage Principle implies the following exercise or futures price X for this contract: X = SerT Proof. If X < SerT, the holder of the contract can earn arbitrage as follows: He initially short sells the asset for S. By time T, this amount grows to SerT. He uses X of this to close the contract, get the asset, and deliver to the buyer (through the short sale). With no further obligations, he is left with a risk-free profit of SerT – X. If X > SerT, it is the writer who can make an arbitrage profit: She initially borrows S and uses it to buy the asset. At time T she delivers the asset to the holder, earns X and uses SerT of that to pay off the □ loan. She pockets a riskless profit of X – SerT. Exercise 4.2.4 Show that the exercise price is given by X = S(1 + r)n if the interest is compounded discretely and T equals n time periods. One feature of this formula for X, which most people find surprising when they first encounter it, is that expectations about future prices play no role in it! The discussion above illustrates that the No Arbitrage Principle is a powerful mathematical tool for calculating the correct price of a derivative. It will provide the base for every pricing formula that we develop in this course. Forward and futures prices are the same if interest rates are constant. If they are variable, then differences arise due to marking to market. For instance, suppose the spot price S of the underlying asset is positively correlated to the interest rate r. If r increases, so does S, and hence X, and the holder of a futures benefits because marking to market leads to money being deposited into her margin account. This gain can be withdrawn and reinvested. On the other hand, a fall in r leads to withdrawals from the margin account and may lead to her having to borrow money to maintain the account. Overall, she comes out ahead because she borrows when interest rates are low and invests when they are high. However, the holder of a forward does not benefit in this way and this creates slight differences between forward and futures prices. For short periods (up to about 3 months) this difference is negligible but has been observed to become significant beyond that. (For a detailed exploration of the relation between forward and futures prices, see Cox, Ingersoll and Ross [13].) Exercise 4.2.5 Suppose the current stock price is $17, the current futures price of a contract expiring in one year is $18, r = 8%, and short-selling requires a 30% security deposit attracting interest at d = 4%. Is there an arbitrage opportunity? Exercise 4.2.6 Suppose marking to market is modified in the following way: The adjustment to the margin account at the end of each day is the present value of the change in the futures price. Then, even with random interest rates, forward and futures prices would be equal. 4.3 VALUE OF A FUTURES CONTRACT Consider a futures with an exercise price X. Suppose that, as the expiration date approaches, the price of the underlying asset becomes larger than X. Then, the deal offered by the futures contract becomes attractive to other investors who would like to acquire it from the present holder. Thus, the contract acquires a certain value and may be sold off to a new holder. (If the asset price goes down, the contract represents a loss for the holder; its value is then negative.) Let the time of writing the contract be set as 0, the expiration date be T (years), and the exercise price be X. Let the annual, continuously compounded rate of interest be r. Consider some time t during the life of the contract. The value of this contract at time t will clearly depend on the value of the stock at that time. So we set up some further notation: St = Value of asset (spot price) at time t Vt = Value of contract at time t If we buy the asset at time t, we spend St. On the other hand, if we buy it through the futures, we spend X at time T, which is equivalent to spending Xe–r(T–t) at time t. Therefore, the profit in using the futures to buy the asset (instead of buying it at time t from the market) is St – X e–r(T–t). This is the value of the contract at time t. Theorem 4.3.1 Consider a futures contract with expiration time T. Let the spot price of the underlying asset at time t be St, and let r be the continuously compounded annual rate of interest. Then the value Vt of the futures at time t is given by Vt = St – X e–r(T–t). Exercise 4.3.2 Show that if Vt doesn’t satisfy equation (4.2), arbitrage will be possible. (4.2) □ Exercise 4.3.3 If the interest is discretely compounded and the time from t to expiry at T equals n compounding periods, then Vt = St – X(1 + r)–n. It is implicit in our calculation of the value of a futures that there are no costs or profits associated with owning the underlying asset. Thus, the formula would apply to stocks which do not pay dividends, but not to those that do. Nor would it apply to commodities which have storage costs. We shall now describe a technique which will allow us to handle these other situations as well. Exercise 4.3.4 Consider a futures expiring in 6 months, written on a share which pays no dividends and whose current price is Rs 100. Let the annually compounded interest rate be 10%. With what exercise price should the futures be written? 4.4 METHOD OF REPLICATING PORTFOLIOS Consider two portfolios A and B. Suppose it is possible to predict with certainty that at a certain time T in the future they will have the same value. Then the No Arbitrage Principle implies that in fact they will have the same value at all intermediate times as well! For, suppose A has less value than B at some time t < T. An investor can sell B and buy A at t and pocket the difference. At time T he sells A and buys back B, thus returning to his original situation but having made a riskfree profit. This observation leads to a methodical way of valuing financial instruments. We set up two portfolios such that we are sure they have the same final value, and one of them includes the instrument. Then they have the same initial value. Equating these initial values gives an equation we can solve for the required value. The art is in creating the right portfolios. We illustrate this method by using it to re-derive the formula for the value of a futures contract. Consider a portfolio created at time t = 0 consisting of the following items: 1. The contract 2. An amount X e–rT of cash The value of this portfolio at time t is Vt + X e–r(T–t), where Vt is the value of the futures at time t. At time T, the contract expires and is replaced by a unit of the asset. An expense of X is also incurred, but this is exactly compensated by the cash amount (by time T, X e–rT has become X). So the value of the portfolio at time T is ST. Thus, the portfolio replicates a unit of the asset—at T and hence at all intermediate times also. So we obtain St = Vt + X e–r(T–t), or, Vt = St – X e–r(T–t). If we are sure that one portfolio eventually has the same value as another, we call it a replicating portfolio of the other one. The method of replicating portfolios, which we have just described and illustrated, will be our basic technique for using the No Arbitrage Principle to price derivatives. Exercise 4.4.1 Consider two futures for the same underlying asset (which may generate income), both expiring at T. One of them is written at time 0 with exercise price X0 and the other is written at time t > 0 with exercise price Xt. Use the Method of Replicating Portfolios to show that the value of the first contract at time t is Vt = (Xt – X) e–r(T–t), where r is the continuously compounded interest rate. FUTURES ON AN ASSET PROVIDING KNOWN INCOME We have dealt so far with assets which do not produce any extra income. Now, suppose the underlying asset provides a known income during the life of the futures, i.e., we know what income will accrue and when. Theorem 4.4.2 Consider a futures with expiration date T and exercise price X on an asset producing a known income during the life of the futures. Let the present value at time t of the income during the remaining life of the contract be It. Then the value Vt of the futures contract at time t is given by Vt = St – It – Xe–r(T–t). (4.3) Proof. We set up the following portfolios at time t: P A: One futures contract and a cash amount of Xe–r(T–t). P B: One unit of the asset and borrowings of It. (Note that X and It are known at time t.) At time T, both portfolios become one unit of the asset. Hence by the Method of Replicating Portfolios, they have the same value at all times t < T: Vt + Xe–r(T–t) = St – It. □ Since no money is transferred when buying a futures contract at time t = 0, we have V0 = 0. Substituting in (4.3), we find the right exercise price as well: X = erT(S 0 – I0). (4.4) Exercise 4.4.3 Consider a bond which matures in 9 months, has face value ¥10, 000, and two remaining coupon payments of ¥1,000 each after 3 and 9 months. You own a futures on this bond, expiring in 4 months, and with exercise price ¥10,500. What is the value of this futures? (Assume a continuously compounded interest rate of 10%.) Exercise 4.4.4 Consider a futures, expiring in 3 months on an amount of potatoes that currently costs Rs 10,000. Suppose the storage cost for this amount of potatoes is Rs 300 per month. What should be the exercise price of the futures? (Assume a continuously compounded interest rate of 10%.) FUTURES ON AN ASSET WITH KNOWN DIVIDEND YIELD We now consider the case where income from the asset is not known in absolute terms. However, it is known as a fraction of the spot price. Such income will be called dividend and the fraction will be called the dividend yield. For example, if the dividend yield is set to 2% and the spot price at the time of the dividend payment is 200, then the actual dividend will be 4. The insight that solves the futures pricing problem for such an asset is that while the payment is uncertain in terms of currency, it is certain in terms of amount of asset. The simplest such situation is when there is one expected dividend payment. Theorem 4.4.5 Consider a futures with expiration date T and exercise price X on an asset which will generate an income at time T′ (T′< T) with dividend yield q. Then the value Vt of the futures contract at a time t < T′ is given by Vt = St – Xe–r(T–t). (4.5) Proof. We retain portfolio A from the proof of the previous theorem, and create a new portfolio B (at time t). P A: One futures contract and a cash amount of Xe–r(T–t). P B: units of the underlying asset. Let ST′ be the price of one unit of the asset at time T′. At time T′, portfolio B earns ST′. We immediately reinvest this earning in the asset, acquiring units of it. The total amount of asset owned now becomes and so at time T our portfolio B is finally worth ST. This is also the final worth of Portfolio A. On equating their initial values, we find + = 1 unit, Vt + Xe–r(T–t) = St. □ Exercise 4.4.6 Show that the exercise price of the futures contract in the previous theorem is given by X= (4.6) S0. Exercise 4.4.7 A share is currently priced at Rs 100 and is expected to pay a 2% dividend after 3 months. What should be the exercise price of a futures written on this share and expiring in 6 months if the continuously compounded annual interest rate is 10%? The approach easily generalises to cover multiple payments with known dividend yield. Exercise 4.4.8 Suppose an asset will have dividend yields q1,q2, …,qn at times t1 < t2 < < tn during the interval [0,T]. Consider a futures on this asset written at t = 0 and expiring at T. Show that its exercise price is given by X= S0, where S0 is the initial spot price of the underlying asset and r is the continuously compounded interest rate. Exercise 4.4.9 Consider the asset and futures of the previous exercise. Show that at any instant t between 0 and t1, the value of the futures is given by Vt = St – Xe–r(T–t). The conventions associated with interest rates are extended to dividend yields. Thus, by default, all dividend yields are given as annual rates. If the dividend is actually calculated over a smaller time span, the dividend yield is adjusted accordingly. For example, suppose an annual dividend yield of 6% is quoted for a period of one month. Then the actual dividend yield used is = 0.5%. We can now introduce the concept of a continuous dividend yield as a limit of more and more frequent dividend payments. Consider an annual dividend yield of q distributed over n equally spaced dividend payments during a time interval of T years. Then, each individual dividend payment has yield qT⁄n. If each of these is immediately reinvested in the asset, then by time T one unit of the asset grows into To obtain the continuous yield we let n →∞ and find that one unit of the asset grows into The notion of continuous dividend yield is useful when dealing with diversified portfolios where the various individual dividend streams may be numerous enough for the continuous approximation to become accurate. It will then greatly reduce the amount of computation. Continuous dividend yield is also appropriate when dealing with commodities whose storage costs are proportional to their amount, and their amount varies continuously. Theorem 4.4.10 Consider a futures with expiration date T and exercise price X on an asset with continuous dividend yield q. Then the value Vt of the futures contract at time t is given by Vt = Ste–q(T–t) – Xe–r(T–t). (4.7) Proof. We retain the definition of portfolio A from the proof of the previous theorem and modify portfolio B. P B: We start with e–q(T–t) units of the asset at time t, and all income from dividends is reinvested in the asset. By our previous calculations, we find that at time T, portfolio B consists of one unit of the asset, just like portfolio A. Hence A and B have the same value at any earlier time t ≤ T as well. On equating these values, we get Vt + Xe–r(T–t) = Ste–q(T–t). □ Exercise 4.4.11 Show that the exercise price of the futures contract in the previous theorem is given by X = S0e(r–q)T. (4.8) Exercise 4.4.12 A stock’s price is currently 400, the quoted futures price on a 4 month contract is 405, the risk free rate is r = 10% and the dividend yield is 10% (both continuously compounded). How could you make an arbitrage profit? 4.5 HEDGING WITH FUTURES In this section we take up the art of buying or selling futures to reduce the risk associated with an asset or portfolio. Reduction of risk by combining assets whose risks cancel is called hedging. We can approach hedging from two points of view. Either we own something and fear a drop in its value, or we wish to make a future purchase and fear a rise in its price. Our aim will be to suitably trade futures so that the changes in their values will cancel out the changes in the value of the concerned asset. At this point, let us introduce some terminology from the trading world: 1. To be long in an asset is to own it. To go long means to buy. 2. Short is the opposite of long. Thus, to short is to sell and to be short is to owe. Consider the following sentence: “If you are long in a risky asset, you can reduce risk by shorting a futures contract for it.” What it says that if you own a risky asset, you can reduce risk by writing a futures contract for it. We shall now consider certain tactics for using futures to hedge. The tactics avoid actual transfer of the asset at any time, to eliminate transaction or delivery costs. SHORT HEDGE Suppose at t = 0 we have an asset with spot price S0. We fear a fall in its value by time T. So we carry out a short hedge: At t = 0: Sell (write) a futures on that asset with exercise price X0 and expiry date T. At t = T: Buy a futures contract on that asset with exercise price XT and expiry time T. Then close out both contracts. At time T, we earn X0 from the first futures and pay XT to the writer of the second futures. The asset acquired from the writer of the second contract is delivered to the holder of the first, and the original asset remains with us. Therefore the value of our portfolio at time T is VT = ST + (X0 – XT). The No Arbitrage Principle implies ST = XT. Hence VT = X0. There is no uncertainty in the outcome—the risk has been completely eliminated. Exercise 4.5.1 Show that a short hedge is equivalent to selling the asset at t = 0 and investing the proceeds at the risk-free rate. Our approach reduces risk at intermediate times also. Suppose the second futures contract was bought at time t < T, with expiry date T and exercise price Xt. Then The term St – e–r(T–t)Xt represents the risk, and is called the basis. Without the hedge, the basis would have been St. Since S and X are strongly correlated (move up or down together), the variation in St – e–r(T–t)Xt can be expected to be much less than the variation in St (in fact the No Arbitrage Principle predicts that St –e–r(T–t)Xt = 0). In this way, the short hedge reduces risk. The second step in the short hedge is optional, and can be skipped if we actually wish to sell the asset through the original futures. However, even when we wish to sell the asset at time T, we may still find it beneficial to carry out the second step. The reason is that, typically, it is the obligation of the writer to deliver the asset to whoever is the holder at the time of expiry of the contract—in particular, any delivery costs are borne by the writer. The second step makes us a holder and so we can demand delivery where the holder of the first contract resides. In this way, we avoid paying the delivery costs ourselves. Further, we proceed to sell the asset itself in the local market at the current price. LONG HEDGE Suppose we expect an inflow of funds by time T and intend to use it to buy an asset. We are worried the spot price will rise in the interim. We hedge against this scenario by carrying out a long hedge: At t = 0: buy a futures with exercise price X0 and expiry date T. At t = T: sell (write) a futures contract with exercise price XT and expiry date T. Then close out both contracts. Exercise 4.5.2 Show that a long hedge is equivalent to buying the asset at the price X0 at time T. Like a short hedge, a long hedge completely eliminates risk. We can skip the second step if we wish to actually acquire the security itself, rather than just its value. The analysis of risk associated with a long hedge exactly mirrors that for a short hedge, and we leave it to you. CROSS HEDGE We have seen that it is possible to use futures to completely eliminate risk, provided they are available for the asset whose value we wish to hedge. In reality, futures are only available for those shares and commodities with a high enough volume of trade (For example: on the NSE, in February 2008, futures were available only for 225 out of about 1000 listed stocks). In particular, they are not available for individual assets such as a particular office building. Now, suppose you are the owner of that building and wish to hedge against fluctuations in its value. What can you do? The answer is that you can’t do anything about the risks which are specific to your building – such as the sudden discovery that it doesn’t satisfy local safety rules. But it is possible to hedge against risks arising out of general trends in the market or, more specifically, the real estate market. This can be done by using futures which are based on the stock of a prominent real estate company or on an index which tracks the real estate sector. This discussion leads us to the notion of a cross hedge, in which the asset underlying the futures is not the same as the asset held (or desired) by the investor. Let us denote by S the spot price of the asset held (or desired) by the investor, and by S* that of the asset underlying the futures. Then the value (or cost) at time T is: X0 + ST – XT = X0 + (ST*– X T) + (ST – ST*) = X 0 + (ST – ST*) In this case, risk is not completely eliminated. It is measured by the additional basis ST – ST*. It will be low if S and S* are strongly and positively correlated. ROLLING HEDGE We have noticed that we may not be able to execute a perfect hedge because futures with the right underlying asset may not be available. Another difficulty is that we may not be able to exactly match the expiry time of the futures with the time of purchase, either because contracts with the right expiry date are not available, or because we are uncertain as to the exact date when we will wish to trade the asset. Then we can use contracts with a longer expiry date (as illustrated in the discussion on short hedges). If contracts with a long enough life are not available, we can implement a sequence of hedges using futures with shorter lives. Such a rolling hedge will reduce risk but not eliminate it. The risk arises from fluctuations in interest rates, since the futures used in the later stages of the rolling hedge will not lock in today’s interest rate. Example 4.5.3 Suppose the current annually compounded interest rate is 12%. We wish to hedge an asset over the next 6 months, but the maximum time to expiry of a futures is 3 months. We implement a rolling hedge by using one futures for the first 3 months and another for the next 3 months. At each stage, we lock in the current risk-free rate. Now, suppose after 3 months the interest rate drops to 8%. Then the overall growth of value of our hedged portfolio will be given by (1 + 0.03)(1 + 0.02) = 1.0506. This is equivalent to an annual growth of 10.12%. □ OPTIMAL HEDGE RATIO The short and long hedging strategies described so far assume that convenient futures are available, and then they work perfectly. When convenient futures are not available, we have to implement a cross or rolling hedge, in which case we are able to reduce risk but not eliminate it completely. We shall now undertake a statistical treatment of the problem with the intention of quantifying the remaining risk and using this knowledge to minimise it. The approach is very general and does not assume the No Arbitrage Principle, or any particular relationship between the hedged asset and the futures used in the hedging process. Nor do we necessarily hedge by using one futures for each unit of the asset. Let δS be the change in the spot price of the hedged asset over the hedge period and δX the change in the exercise price of the futures contract. Consider these as random variables with respective standard deviations σS and σX. Let ρ be the coefficient of correlation of δS and δX. Finally, let h be the hedge ratio–the ratio of contracts written to units of asset held (or sought). We do the calculations for a short hedge for one unit of the asset. The change in value over the hedge period is δS – hδX. The variance of this change is: ν = σS2 + h2σ X2 – 2ρσSσXh. We wish to minimise ν (since it represents risk). So we apply the first derivative test and solve dv⁄dh = 0 to obtain. h=ρ . (4.9) This value of h is called the optimal hedge ratio. The parameters ρ, σS and σX have to be determined from the historical data. Exercise 4.5.4 Suppose the No Arbitrage Principle holds and a short hedge is carried out using a futures with the same underlying asset as the asset being hedged. Show that the optimal hedge ratio equals 1. 4.6 CURRENCY FUTURES Futures can be based on any quantity that changes with time. In this section we will study currency futures, which are based on the exchange rate of two currencies. Such futures are used to manage the risk arising out of fluctuating exchange rates. Currency futures were created in 1972 by the Chicago Mercantile Exchange (CME). In India, both BSE and NSE introduced trading in currency futures in late 2008. In a currency futures, one currency plays the role of the underlying asset while the other is used in pricing. For example, Indian currency futures have the US Dollar as the underlying or base currency while the Indian rupee is the variable currency used in the pricing. A sample contract could have $1000 as the underlying asset with an exercise price of Rs 45,000 and expiry in one month. This contract effectively fixes an exchange rate of Rs 45 per dollar for the future trade. The holder of this contract is obliged to pay its writer Rs 45,000 on expiry and will then receive $1000 in return. Alternately, the holder may receive the Rupee equivalent of $1000 based on the market exchange rate at the time of expiry. Suppose the actual exchange rate after a month is Rs 46 per dollar. Then the holder receives Rs 46,000 from the writer and makes a net profit of Rs 1,000. How should currency futures be priced? We start by noting that the underlying currency must be treated as an asset that generates income, since it can be deposited in its native country and earn interest. This income is not known in terms of the variable currency since the exchange rate fluctuates. Fortunately, it can be treated as a case of known dividend yield. Suppose a currency futures is written for one unit of the base currency, with expiry at time T from now. Let one unit of the base currency be worth S units of the variable currency at present. Further, let rb be the (continuously compounded) risk-free rate applicable to the base currency, and let rv be the corresponding rate for the variable currency. Then over the time T one unit of the base currency will become erbT units, and so rb is the dividend yield. Therefore, as shown by equation (4.8), the correct exercise price for the contract is X = Se(rv–rb)T. (4.10) Thus the current exchange rate S and the interest rates for the two currencies determine the exchange rate X to be used in the futures. Example 4.6.1 Consider a futures with US dollars as the base currency and Indian rupees as the variable currency. The contract is written on November 1 and expires on November 30. On November 1, we have the following information: Exchange rate (rupees per dollar) = 47.71 US 1-month spot rate = 3% Indian 1-month spot rate = 6% Therefore, the contract will be based at the following rate: 47.71e(0.06–0.03)⁄12 = 47.83. For example, a contract with $1000 as underlying would have exercise price Rs 47,830. □ Exercise 4.6.2 Suppose the spot rates are given with discrete compounding, there are m compounding periods per year, and the life of the currency futures is one compounding period. Show that equation (4.10) will be modified to X=S . The relatively high interest rates in the Indian market make it attractive to foreign investors. They would, however, be concerned about whether high earnings in rupees would really translate into high earnings in their native currencies—the risk would come from a potential fall in the value of the Rupee. Currency futures provide a way around this by enabling them to lock in the future exchange rate. Example 4.6.3 A US-based company, let us call it XC, invests $10 million in Indian risk-free assets on November 1. The prevailing interest and exchange rates are as described in Example 4.6.1. So, the first thing that happens is that the dollars are converted into Rs 10 × 106 × 47.71 = 477.1 million. Over a month, these will become Rs 477.1 × 106 × e0.06⁄12 = 479.5 million. If the exchange rate stays stable, this can be reconverted into $479.5 × 106⁄47.71 =10.05 million. Of course, this is the same as earning 6% annually in dollars. Now, suppose the exchange rate fluctuates and after a month we have Rs 50 per dollar. Then the final value in dollars is only 479.5 × 106⁄50 =9.59 million—a loss! XC can avoid the risk of loss by buying currency futures expiring in a month. Recall that these fix the final rate as Rs 47.83 per dollar. Assume each contract has to be in multiples of $ 1000 (this is the rule followed by BSE). So XC can invest in contracts. Then XC will finally end up with $10.025 million, an annualised growth rate of 3%. (This was expected: if you remove all risk, you have to end up with the riskfree rate, and for dollars the risk-free rate was given to be 3%.) □ = 10,025 In this example, XC was able to completely hedge against exchange rate risk, but then it lost out on the hoped for gains from the higher interest rate for Rupees. It has to let some element of risk remain. One option for it would be to carry out only a partial hedge with a smaller number of futures. Another approach could be to invest in high-performing but risky Indian assets while continuing to hedge against exchange rate risk. 4.7 STOCK INDEX FUTURES A stock index futures has a stock index as the underlying asset. Since stock indices track general trends, these futures are used to hedge against systemic risk (risk due to movements of the market as a whole, rather than of individual stocks). A stock index futures works just like a futures with a stock as underlying asset, except that at the expiry date a multiple of the value of the stock index is delivered rather than the index itself (since it would be impractical to deliver a portfolio of stocks mirroring the index). Example 4.7.1 On the Bombay Stock Exchange (BSE), futures on the BSE Sensex index have a multiplier of 15. This means that the Sensex is treated as an asset whose value in rupees is 15 times the index value. Thus, suppose the index value on a particular date is 15,000 and a futures is being written on it with expiry in 3 months. Assume a continuously compounded annual interest rate of 5%. Then the exercise price of the futures will be calculated taking the current spot price to be Rs 15 × 15,000 = 225,000. X = 225,000 e0.05⁄4 = 227,830. Suppose that after 3 months the index stands at 16,000. Then its value is taken as Rs 15 × 16,000 = 240,000. At this point, the holder owes Rs 227,830 to the writer, and the writer owes Rs 240,000 to the holder. Naturally, matters can be settled via a single payment of Rs 240,000 – 227,830 = 12,170 from the writer to the holder. □ The multiplier m by which the index is multiplied to get a currency value varies with the index as well as the exchange. We have already noted that for the Sensex we have m = Rs 15 on the BSE (we would say that Sensex futures are on ‘Rs 15 times the index’). Some other examples are: 1. For S&P 500 futures on the Chicago Mercantile Exchange, m = $500. 2. For S&P 500 futures on the London International Financial Futures Exchange (LIFFE), m = £250. 3. The Nikkei 225 futures on LIFFE have m = £5. In pricing or valuing an index futures, we have to consider whether any income should be associated with the index. The answer depends on the particular index and how it is calculated. Some stock indices ignore the dividend payments from their members—these should be treated as being without income. Others take the dividends into account—these should be treated as having continuous dividend yield, since the number of dividend payments is large. We will develop our formulas for the first case. They are easily modified for the second case by replacing the continuous risk-free rate r by r – q, where q is the dividend yield for the index. If we have a portfolio, we should track how it changes relative to an index to determine how much hedging is needed. For example, if our portfolio moves just as much as the index, we need less hedging than if it moves twice as much. In fact, to hedge a portfolio by a stock index futures, we can use the optimal hedge ratio as calculated earlier (equation (4.9)): h= ρPX, where σP is the standard deviation of the changes in the portfolio value, σF is the standard deviation of the changes in the exercise price of the stock index futures, and ρPX is the correlation coefficient of the two changes. We can rephrase this in terms of the index itself. Let It stand for the value of the index at time t (after multiplying by the multiplier m), and δI for the change in the value of the index over the time interval [0,T]. We assume index futures are available expiring at T. Then, Therefore, δI and δX have the same standard deviation: σI = σX. This calculation also shows that δI and δX have the same correlation with δP: ρPI = ρPX. Hence, h= ρPI. At this point, h starts to resemble the β in CAPM, with the stock index standing in for the market portfolio, but based on return rather than rate of return. If we adjust accordingly, we find: h=β (4.11) . where P0 = Initial value of portfolio. Exercise 4.7.2 Verify the relationship between h and β. Example 4.7.3 Consider a portfolio with a starting worth of 100 million. It is hedged using 6 month futures on an index whose current value is 17,000 and dividend yield is 3%. Each futures is on 100 times the index. The portfolio has β = 1.5 relative to this index. The risk-free rate is 6%. Based on this information, the optimal number of futures to be shorted is N=β = 1.5 × = 88.23. Therefore, the portfolio managers short 88 index futures. The exercise price of each of these futures is given by X = I0 e(r – q)T = 100 × 17000 × e(0.06 – 0.03)⁄2 = 1,725,692. Suppose that over the next 3 months there is a fall in the market, and the index falls to 15,000. Then the futures price becomes and marking to market will compensate the portfolio managers to the tune of X′ = 100 × 15000 × e(0.06–0.03)⁄4 = 1,511,292 88 × (1725692 – 1511292) = 18,867,200. This will act as immediate cash compensation for the hit the portfolio must also have taken from the fall in the market, and can be reinvested in more stable assets. □ USE OF STOCK INDEX FUTURES TO ADJUST BETA Suppose a portfolio P is hedged by shorting N stock index futures which are written at time t = 0 and expire at t = T. Let us investigate what effect this has on the beta of the portfolio over the time interval [0,T]. We shall use the following notation: Shorting N index futures creates a new portfolio P′ consisting of P and the shorted futures. For this portfolio we create the following notation: The value (to the writer) at t = T of an index futures written at t = 0 is X – IT, while at t = 0 it is zero. Hence, Therefore, the return on the new portfolio is r′ = = =r+ (X - IT). Let us denote the beta of the original portfolio P by β. The beta of the new portfolio is If we wish to change β to a particular value β′, we can solve the above equation for the number of contracts we should short: (4.12) Note that the optimal number N calculated in the previous section reduces β to zero. Hence it eliminates systemic risk. Example 4.7.4 Consider the portfolio in Example 4.7.3. Suppose that at the start of the hedging period its managers wish to change its beta to β′ = 0.5. Then the number of futures to be shorted is (using equation (4.12)): So the managers actually short 59 futures, and this adjusts the beta to 5 Stock Price Models O ne of the remarkable features of our treatment of futures in the previous chapter was that we only used the current price of the underlying asset. Expectations of its future behaviour played no role, and so we did not need to model the future fluctuations in the asset price. As we move to more complicated derivatives, this is no longer the case. The No Arbitrage Principle has to be combined with some model of price fluctuations in order to study the pricing and behaviour of the derivative. This chapter, therefore, is devoted to describing the basic models for the evolution of stock prices. The first mathematical treatment of continuously fluctuating prices was carried out by Louis Bachelier (1870–1946) in Paris. In his doctoral thesis titled Theorie de la Speculation [4],12 published in 1900, he treated bond and stock prices as continuous, but random, functions of time and developed what is now called additive Brownian motion. He applied his theory to estimate prices for options and to evaluate the risk involved in various investment strategies. His work was not immediately appreciated, but over time it has influenced important innovators in both mathematics and economics. He continued to research in Brownian motion through his life, with his last major contribution coming in 1941. Among mathematicians who have acknowledged his influence, we may mention A. N. Kolmogorov and Kiyosi Ito. Kolmogorov developed Bachelier’s ideas in creating the general theory of Markov processes in the 1930s, while Ito was led to his famous results in stochastic calculus. Curiously, while these further developments were not motivated by finance, they have turned out to be perfectly suited for it and terms such as Martingales, stochastic differential equations, and Ito’s lemma, are now part of the basic vocabulary of financial analysts. Bachelier was noticed by economists in the 1950s. In particular, his thesis was read by Paul Samuelson, who improved upon his work and suggested the Lognormal model (or geometric Brownian motion) for stock prices. Bachelier’s treatment of options in many ways also anticipates the work of Fischer Black, Myron Scholes and Robert Merton in the 1970s. In appreciation of Bachelier’s work, he is now often called the father of mathematical finance. This chapter makes extensive use of the Binomial, Normal and Lognormal random variables. You should revise the relevant sections of the Appendix (§B.4 to B.7) prior to reading it. 5.1 LOGNORMAL MODEL We shall develop a way to model the evolution of stock prices over a time period [0,T], from an initial value S to a final value ST. The model is probabilistic: it treats ST as a random variable and gives its probability distribution. If the price change has no randomness, we have a risk-free situation, and then the price must grow at the risk-free rate: ST = SerT. To introduce randomness into the rate of return, we consider the following model: ST = SeμT+cTZ, where Z is a standard normal variable, and μ and cT are some constants (parameters of the stock). The idea is that μ represents a steady trend, to which the cTZ term adds random fluctuations. We have included a dependence on T since the variability measured by cT must depend on T; intuitively, we expect it to grow with T. 2⁄2 Exercise 5.1.1 If Z is a standard normal variable, then E[ecZ] = ec . Therefore, in our model the random terms cause a steady increase in the expected return, since 2⁄2 > E[ecTZ] = ecT 1. We wish to have a model where the random terms do not contribute any regular growth. Hence, we adjust it as follows: 2⁄2 S. ST = eμTecTZ–cT To explore the dependence on T, let us track the changes in the stock price over successive intervals, [0,T] and [T,2T]. We let Z1 and Z2 be standard normal variables representing the random fluctuations in these two intervals. On applying our model to these intervals in succession, we find: 2⁄2 S2T = ST eμT+cTZ2–cT 2 = S eμ(2T)+cT(Z1+Z2)–cT . Now we make our main assumption: the random fluctuations over non-overlapping time intervals are given by independent random variables. (We say two intervals overlap if they have more than one point in common. Thus, [0,1] overlaps with [0.5, 2] but not with [2,3] or [1,2].) Then, Z1 + Z2 is again a normal variable, with mean 0 and variance 2 (see §B.11 and specifically, Exercise B.11.4). Therefore, S2T = S eμ(2T)+ cTZ–cT2 . where Z is standard normal. On the other hand, if we treat [0,2T] as a single interval, we get 2⁄2 . S2T = S eμ(2T)+c2TZ–c2T To make our answers match, we must have c2T = achieved by setting cT. This is cT2 = σ2T, where σ2 is a positive constant. Thus, our final model gives the following expression for the spot price at T: 2⁄2)T+σW T, (5.1) ST = S0e(μ-σ where WT is normal with mean 0 and variance T. The parameters μ and σ are said to represent drift and volatility, respectively. The model is called Lognormal because it is based on the lognormal distribution. Exercise 5.1.2 An analyst estimates XC stock as having an annual drift μ = 0.2 and volatility σ = 0.3. What is the probability of a 20% return over the next 6 months? Exercise 5.1.3 Assuming the Lognormal model with parameters μ and σ, show that Expected return after Δt = S(eμΔt – 1). Variance of the return after Δt = S2e2μΔt(eσ – 1). 2Δt Therefore, to first order (see Appendix, page 197, we have Expected return after Δt ≐ SμΔt, Variance of the return after Δt ≐ S2σ2Δt. Exercise 5.1.4 Verify the first order approximations given above. Remark. For simplicity, we sometimes use the first order approximation to the Lognormal model: ST ≐ S(1 + (μ – σ2⁄2)T + σW T). Figure 5.1 shows examples of data evolving according to the Lognormal model with varying values of μ and σ. On comparing the graphs (a) and (b), we see that an increase in σ creates paths with larger jumps, leading to faster variation in values. A comparison of (b) and (c), on the other hand, shows that a change in μ creates paths with the same speed of variation. The difference is that paths created when μ is higher will, on average, show a greater long-term upward trend. In other words, when we increase μ, the possible paths are the same but upward tending ones become more likely. The practical implication of this is that when we observe just one path coming out of a geometric Browmian motion (for example, the actually observed prices of a particular stock), we can hope to use it to estimate σ, since σ affects the nature of individual paths. But we cannot hope to get a good estimate of μ. We say that σ is observable, but μ is not (see Example 5.2.2). 5.2 GEOMETRIC BROWNIAN MOTION The Lognormal model can be rephrased to describe the evolution of stock prices over any time interval [a,b] as follows: 2⁄2)(b–a)+σW [a,b]. Sb = Sa e(μ–σ where W[a,b] is normal with mean 0 and variance b – a. It is worth identifying certain features of this model: 1. ln(Sb⁄Sa) is normal with mean (μ – σ2⁄2)(b – a) and variance σ2(b – a). 2. ln(Sb⁄Sa) depends only on b – a, and not on a itself. This kind of behaviour is known as Geometric Brownian Motion or GBM. Exercise 5.2.1 Let Xt be the futures price for a contract written at time t and expiring at T. If the spot price of the underlying asset follows GBM with parameters μ and σ, show Xt follows GBM with parameters μ – r and σ. ESTIMATING GBM PARAMETERS Suppose we have regular samples Si (i = 0,1,…,n) of spot prices gathered over consecutive time intervals of length Δt each. Then, we first form the new sequence Ui = ln . Figure 5.1: Each graph shows five simulations of the Lognormal model with varying values of drift μ and volatility σ: (a) μ = 0.1 and σ = 0.2, (b) μ = 0.1 and σ = 0.1, (c) μ = 0 and σ = 0.1. The Ui are called log returns, since they can also be written as ln(Si+1) – ln(Si), i.e., as the returns of the logs of the prices. According to GBM, Ui = (μ – σ2⁄2)Δt + σWΔt, where WΔt is normal with mean 0 and variance Δt. Therefore, Ui ~ N((μ – σ2⁄2)Δt,σ ). Moreover, the Ui are independent so that U1,…,Un can be seen as a random sample of size n of a normal variable with mean (μ – σ2⁄2)Δt and variance σ2Δt. Let U and S2 denote the sample mean and variance, respectively, of the observations of the Ui. If the actually observed values of the spot prices are si, then the observed values of the log returns are ui = ln(si + 1⁄si) and the observed sample mean and variance are Then we obtain estimates of μ and σ by solving the approximate equalities: (μ – σ2⁄2)Δt ≈ u. σ2Δt ≈ s2. This gives: μ ≈ σ ≈ , . Figure 5.2: The first chart presents the daily closing prices of the BSE Sensex index over 8 years from July 1997 to July 2005. On fitting the Lognormal model to this data, we obtain μ = 0.06 and σ = 0.29. The second chart is a simulation of the lognormal model with the same parameter values. We had noted earlier that σ is observable, but μ is not. This indicates that of these two estimates, only the one of σ should be reliable. Consider the following example. Example 5.2.2 Let us consider a GBM with μ = 0.2 and σ = 0.3 (These are quite typical values for stocks.) Suppose we first simulate a path for this GBM and then apply the above estimators to this path. We hope to recover the original μ and σ, not perfectly but reasonably well. Table 5.1 shows the result of doing this 10 times, with paths of 520 steps each (corresponding to taking weekly prices over 10 years). Table 5.1 Notice how the μ estimates jump all over the place from 0.028 to 0.305, but those for σ are stable and accurate. □ Data from stock exchanges has to be cleaned up before it can be used in these calculations for the following seasons: 1. Since an exchange is closed on the weekends and on certain holidays, the data is not entirely gathered at regular intervals. If we wish to analyze daily data, we have to remove those Ui which correspond to a gap of more than 1 day. 2. Some changes in the stock price are not to be taken literally. For example, on July 1, 2004, Infosys shares fell by 76% on the NSE. This was caused simply by Infosys issuing 3 bonus shares for every share that already existed. The number of shares went up by a factor of 4 and so the price fell to a quarter. 3. Another reason for a sudden drop in price can be the announcement of a dividend payment. In the days before its anticipated announcement prices rise because buyers expect an imminent extra profit. As soon as the payment is made, prices fall accordingly. To give a true picture of changes in the value of a stock, exchanges release an adjusted price which takes bonus issues and dividend payments into account. It is this adjusted price which should follow GBM. 5.3 SUITABILITY OF GBM FOR STOCK PRICES It is evident from its definition that the evolution of a GBM during some time span [a, b] is independent of the occurrences before a (all that matters is the price Sa at a). It is not obvious that stock prices have this property, and some caution in applying GBM is indicated. Apart from independence of jumps, the other feature of GBM is the use of the normal distribution to represent the individual jumps. Example 5.3.1 In Figure 5.3 we plot a histogram for the frequency distribution of the daily log returns Ui = ln(Si⁄Si–1) for Infosys shares over 2001–2003. We have superimposed the frequency distribution of the normal distribution with the same mean and variance as this data. The histogram is more peaked in the centre than the normal distribution, and does not die out as quickly on either side. There are some highly negative values which would be extremely unlikely if a normal distribution were indeed being followed. In fact, the probability of the corresponding normal distribution taking values less than –0.2 is only 0.000002, while in reality there were 3 such values (giving a relative frequency of about 0.005). Figure 5.3: The histogram shows the relative frequency distribution of the daily log returns for Infosys stock from January 2001 to December 2003. This is compared with a normal distribution (solid curve) and a Cauchy distribution (dashed curve). See Example 5.3.1. Therefore, one can consider replacing the normal distribution with one which does not die so quickly on each side. Figure 5.3 also shows a Cauchy distribution with the same median and interquartile range as the given data, which is visibly a much better fit than the normal distribution. □ This example turns out to be quite typical. In general, large price fluctuations are more common than a normal distribution would suggest. A heavy tailed distribution, such as the Cauchy distribution, would be more suitable. Unfortunately, heavy tailed distributions are mathematically difficult to work with—for instance, their mean and variance generally do not exist. (See §B.18 for more information on such distributions.) And even the Cauchy distribution fails in one respect. It is symmetric, while the data is typically slightly asymmetric—large falls occur more often than large increases. A final problem with GBM is the assumption of a constant volatility. In reality, there are alternating periods of calm and turbulence, as illustrated by Figure 5.4. There are models that take this into account by letting the volatility change randomly with time, but with clustering of high and low values. Popular models of this type are the ARCH and GARCH families. (ARCH was created by Robert Engle[18, 19] in 1982; he received the 2003 Nobel Prize). Figure 5.4: Twenty years of monthly log returns of the Dow Jones Industrial Average, showing extended periods of low or high volatility We shall, therefore, use GBM in our work with the understanding that it doesn’t give the closest approximation to reality, but at least it gives one that we can easily manipulate to get useful insights. Overall, while GBM has its limitations, it remains a favourite model. We shall soon apply it to obtain the Black–Scholes model for options pricing. 5.4 BINOMIAL TREE MODEL The Binomial tree model simulates stock price movements by conceiving them as a sequence of small up or down jumps. The basic building block of this model is the following branch: Here, we have started with a spot price of S and assumed that over some small time interval Δt it can either move up by a factor U to SU, or down by a factor D to SD. Further, we let p denote the probability of an up move, and 1 – p that of a down move. The price movements over some time interval [0,T] are modelled as a sequence of such steps. We break it into n equal intervals of length Δt = T⁄n each, and let the price follow one of the up or down branches over each subinterval. We assume each step is independent of the previous ones, and that the parameters p, U and D are the same for each branch (see Figure 5.5). The possible prices at T are SUkDn-k, with k = 0,1,…,n. They follow a binomial distribution: p[ST = SUkDn–k] = pk(1 – p)n-k. To estimate the parameters of the binomial tree model, we match it with Geometric Brownian motion. Recall that according to that model, the spot price at time Δt is given by 2⁄2)T+σW SΔt = S0e(μ–σ Δt where WΔt is normal with mean 0 and variance Δt. Thus, ln SΔt = (μ – σ2⁄2)Δt + σWΔt if we set the starting price S0 to 1. Then we have, E[lnSΔt] = νΔt, Var[lnSΔt] = σ2Δt, where we have set ν = μ – σ2⁄2. Now consider a one-step binomial tree for stock prices where the step is over the time Δt. the up move is taken with a probability p, and the down move with a probability 1 – p. Let us also write u = lnU and d = ln D. Then (see Exercise 5.4.1), Figure 5.5: The binomial tree model for stock prices E[lnSΔt] = pu + (1 – p)d, V ar[lnSΔt] = p(1 – p)(u – d)2. Exercise 5.4.1 Consider a random variable X taking values a and b with probabilities p,1 – p respectively. Show that E[X] = pa + (1 – p)b and Var[X] = p(1 – p)(a – b)2. On matching the calculations from the two models, we get pu + (1 – p)d = νΔt , p(1 – p)(u – d)2 = σ2Δt. Since we have two equations for three variables, we have some freedom to choose convenient solutions. Thus, we set U = D–1, which makes the binomial tree symmetric in that an up move exactly cancels a down move. This gives u = –d. Then equations (5.4) and (5.5) become (2p – 1)u = νΔt. (5.6) 4p(1 – p)u2 = σ2Δt. (5.7) We square (5.6) and add it to (5.7) to get u2 = σ2Δt + (νΔt)2. Thus, we have the following values for the three parameters: , D = e– U=e ,p= . For small Δt, we obtain first-order approximations by keeping only the lowest power of Δt. Thus, we neglect (Δt)2 in favour of Δt, and Δt in favour of . This leads to the following popular estimates (recall that Δt = T⁄n): U ≈ eσ D ≈ e–σ p≈ (5.8) . (5.9) . . (5.10) An interesting aspect of these estimates of U and D is that they do not involve the drift! This is a virtue as the jumps in the underlying tree now depend only on σ, which is observable. Thus, the possible paths for the price are independent of μ–its role is only in determining the probability associated to a path. At this stage we have two models for stock prices, one continuous and the other discrete. We shall soon see that sometimes one model is convenient, sometimes the other. Therefore, we wish to be reassured that the two models are consistent with each other. It turns out that if we let the time step go to zero in the binomial tree model, it tends towards GBM. We shall not prove this. Figure 5.6 illustrates this by an example. The first diagram shows that even for n = 20 the probability distribution of the final stock price, under the binomial tree approach, has a distinctly lognormal look. We confirm this in the second diagram by comparing the cumulative distribution functions of the final stock price under the two models. (We have taken T = 1⁄4. The GBM model has μ = 0.2 and σ = 0.3. The parameters for the Binomial Tree have been calculated from the first-order estimates (5.8) to (5.10), using n = 20.) (a) (b) Figure 5.6: Comparison of binomial tree and GBM models. Graph (a) is the pdf of the price distribution under a binomial tree model with n = 20. Graph (b) compares the cdf of this price distribution with a matching GBM model. 12 English translations of Bachelier’s thesis have been published in [11, 17]. For a description of his work and its influence, see [49]. 6 Options O ptions have a long history. One of the early references is to the philosopher Thales of Miletus (circa 600 BC), who made a fortune through options on olive presses. According to Aristotle [2], one winter Thales paid a small fee for the first right to rent these presses during the olive season. If the harvest was good and the presses were in demand (they are used to produce olive oil), Thales planned to exercise his options and sublet the presses at a much higher rent. If the harvest was poor and nothing was to be gained from renting the presses, he could let his right lapse losing only the initial fee. As it happened, the harvest was good, and Thales made a large profit. Incidentally, Thales was also credited by later generations with introducing deduction into geometry and providing the first proofs of basic results such as the ASA rule for congruence of triangles. Aristotle said of Thales that for him, “The primary question was not What do we know, but How do we know it.” The first detailed mathematical treatment of options was in Bachelier’s 1900 thesis [4], where he used his Brownian motion model to estimate the risks involved with various investment strategies based on options. His work was not significantly improved till the 1973 publications of Fischer Black, Myron Scholes and Robert Merton [6, 32]. These described what is now called the Black−Scholes Model and gave an extensive analysis of options pricing and the use of options in hedging and speculation. The model came into being just as the use of computers was becoming widespread and the result was an explosion in the use of options. Scholes and Merton received the Nobel Prize in 1997, Black having unfortunately passed away two years earlier. In this chapter we shall carry out a study of options and their use in hedging. However, instead of the Black–Scholes model, we shall use a simplified discrete version known as the Binomial Options Pricing Model or BOPM. BOPM was introduced in 1978 by William Sharpe and extended in 1979 by John Cox, Stephen Ross and Mark Rubinstein [15]. Figure 6.1: The structure of a European call option signed at time t = 0 with expiration time t = T, exercise price X, and call premium C. The dashed arrows at the t = T stage indicate that the trade is optional. 6.1 CALL OPTIONS In forwards and futures, both parties are committed to a future trade. Naturally, each would desire the chance to drop out if the gain from the contract becomes less than is available on the open market. This desire creates an opening for a new kind of contract, in which one party pays the other a fee for the right to cancel the trade. Such a contract is called an option and comes in two flavours, depending on which party buys the right to cancel. NSE introduced trading in options in June 2001. The number of traded options grew from 1.2 million during 2001–02 to 18.2 million during 2005–06. In a European Call Option (Figure 6.1), the holder buys the right to make a future purchase, without the obligation to do so: 1. The holder pays the writer an initial fee (the call premium) to buy the contract. The contract details an amount of the underlying asset, an expiration date, and an exercise price (or strike price). 2. On the expiration date, the holder may pay the writer the exercise price. 3. If the holder pays up, the writer must deliver the specified amount of the underlying asset. If the exchange happens, we say the contract has been exercised. Example 6.1.1 On June 3, 2005, the closing price for TISCO stock on NSE was Rs 349.70. Call options on this stock were available with a variety of exercise prices and expiry dates. Table 6.1 shows some of the available exercise prices (X) for calls expiring on June 30, 2005, as well as the premium (C) at which they could have been purchased. The closing price for TISCO stock on June 30 was Rs 339.75. Table 6.1: The table shows the closing call premiums (C) on June 3, 2005, for a range of exercise prices (X) that were available for call options on TISCO stock. The options expired on June 30, 2005. (Source: NSE) Suppose that on June 3 you had bought a call with exercise price Rs 320 at the closing price of Rs 29.75. On June 30 you would be able to buy a stock worth Rs 339.75 for just Rs 320. Unfortunately, you had already paid Rs 29.75 for this opportunity, so you end up with a total loss of Rs 10. You would come out ahead if the final stock price were above Rs 349.75 (ignoring the possibility of earning interest). Finally, had the June 30 price dropped to below Rs 320, you would not exercise the option, since exercising it would cause further loss. □ Exercise 6.1.2 Suppose two call options are identical, except that one has a higher exercise price. Which one will have a higher call premium? A variation on the above structure is an American Call Option – such a contract can be exercised at any time between its birth and expiry. Thus the holder of an American call can either exercise it at any time by paying the exercise price X, or let it lapse. Since an American call gives the holder more rights than a European call, it is obvious that it will have at least as high a premium. In other words, let CA be the premium of an American call and CE the premium of a European call such that both calls have the same underlying asset, expiry time and exercise price. Then CA ≥ CE. A call option can be written on various kinds of assets, e.g., commodities, bonds, stocks, stock indices, and futures. It can be traded through an exchange or directly. When a call is traded on an exchange, the exchange standardises the expiry times, the amounts of underlying asset, and the exercise price. The exchange also arranges a process of marking-to-market (as for futures) to protect against default risk. Like futures, options can be traded at any time of their life to new holders. The price at which they are sold is again called their premium. The task before us is to find the correct premium at which a call should be sold. To do this, we have to establish how good a deal is being offered to the holder. Figure 6.2 illustrates how the final payoff to the holder of a European call option depends on the spot price of the underlying asset at the expiry time. If the final spot price ST is greater than the exercise price X, the holder exercises the call and profits by ST – X. Otherwise, she does not exercise it (since it is cheaper for her to buy the asset by paying ST in the market) and the final payoff is zero. Thus the final payoff is given by max{0, ST − X}, and this is also the premium of the call at time T. Figure 6.2: The payoff to the holder of a European call option on its expiry at T as a function of the final spot price ST of the underlying asset Since the final payoff is guaranteed to be non-negative, both common sense and the No Arbitrage Principle dictate that the call premium must be non-negative: C ≥ 0. (6.1) Another insight comes from observing that the holder of a European call has a superior final payoff as compared to the holder of a futures with the same asset, expiry time and exercise price. Therefore, the call premium must be higher than the value of the corresponding futures. If the asset generates no income or cost, we get: C ≥ S – Xe–rT. (6.2) Combining the bounds (6.1) and (6.2) we get the following bounds for the premium of a European call: C ≥ max{0,S – Xe–rT}. (6.3) Exercise 6.1.3 Write down the form the inequality (6.3) will have if the underlying asset generates either known income or a continuous dividend yield. Exercise 6.1.4 Show that the bound (6.3) is also valid for an American call on an asset without income. An upper bound on C can be obtained if it is known that the asset price cannot be negative. In this case, the payoff from the call is always less than or equal to ST . Hence the value of the call is less than the value of owning the asset: C ≤ S. (6.4) For example, this inequality is valid when the underlying asset is a stock (whose price cannot be negative) but not when it is a futures (whose value can be negative). Figure 6.3: This graph compares the premiums of calls on Maruti stock (stars) with the lower bound given by 6.3 (diamonds). The horizontal axis represents the time interval 23 May 2005, to 29 June 2005. The calls had an exercise or strike price of Rs 460 and expired on June 30, 2005. During this period, the price of a Maruti share ranged between Rs. 435 and Rs. 478. Exercise 6.1.5 Show that the inequality (6.4) is also valid for an American call on an asset without income. Bounds are useful if they are not too far away from the actual values. Figure (6.3) illustrates how the lower bound given by the inequality (6.3) is usually reasonably close to the actual premium values. The upper bound given by (6.3), though correct, is too large to be useful. Here is our first surprise concerning call options: Theorem 6.1.6 An American call has the same value as a European call with the same parameters if the underlying asset generates no income. Proof. Consider an American call with expiry at T and exercise price X. Let Ct be its premium at a time t < T, and St the spot price of the underlying asset at that time. From (6.3), we see that Ct ≥ St – Xe–r(T–t), and hence Ct > St – X. Since St – X is the payoff from exercising the call, we see that it is never optimal to exercise the call. It would be better to sell it. Thus, an American call will never be exercised before expiry, and hence has the same value as a European call with the same parameters. □ This conclusion fails when the asset generates income, because then the early exercise of an American call would have the added benefit of bringing in a share of this income. 6.2 PUT OPTIONS A put option is the reverse of a call option. Here, the holder buys the right to sell the asset to the writer. Figure 6.4: The structure of a European put option signed at time t = 0 with expiration time t = T, exercise price X, and put premium P. The dashed arrows at the t = T stage indicate that the trade is optional. In a European put option the holder buys the right to make a future sale, without the obligation to do so, in the following manner: 1. The holder pays the writer an initial fee (the put premium) to buy the contract. 2. On the expiration date the holder may deliver the underlying asset to the writer. 3. If the holder delivers the asset, the writer must pay the exercise price (or strike price). Our earlier remarks about calls apply as well to puts. In particular, we have American put options, which give the holder the right to exercise the contract before the expiry time. The premium of an American put will be at least as much as the premium of a European put with the same parameters. And this time there are no surprises— in general, the premium of the American put will be strictly greater (even when the asset generates no income). Example 6.2.1 Consider an American put with X = 100 and expiry in one year. Suppose the risk-free rate is r = 15%. If the spot price of the underlying stock drops to 10, then immediate exercise will net 90 and over a year this will become 90 × 1.15 = 103.5. This is more than the maximum possible profit from holding the put to expiry. □ Exercise 6.2.2 Show that if the risk-free rate r = 0, then the premium of an American put on an asset which generates no income equals that of a European put with the same parameters. Figure 6.5: The payoff to the holder of a European put option expiring at T as a function of the final spot price ST of the underlying asset Figure 6.5 shows the final payoff from a European put expiring at T and with exercise price X. The formula for this final payoff is max{0, X – ST}. This immediately tells us that the put premium P at time t = 0 must satisfy P ≥ 0. Next, we observe that the payoff to the put holder is always at least as much as that to the writer of a futures with the same parameters. Hence the put has greater value. For an asset without income, this gives: P ≥ Xe–rT – S. (6.5) Combining (6.5) with P ≥ 0, we get the following lower bound for the premium of a European put: P ≥ max{0,Xe–rT – S}. (6.6) Exercise 6.2.3 Write down the form the inequality (6.6) will have if the underlying asset generates either known income or a continuous dividend yield. An upper bound on P can be obtained if it is known that the asset price cannot be negative. In this case, the maximum possible payoff from a put is its exercise price X. Hence the premium of a European put cannot exceed the present value of X: P ≤ Xe–rT. (6.7) Exercise 6.2.4 How will you modify the bounds (6.6) and (6.7) for an American put on an asset without income whose spot price cannot be negative? Figure 6.6: The payoffs to the holders of (from left to right) a European call, a European put and a forward, each with the same underlying asset, expiry time T and exercise price X 6.3 PUT—CALL PARITY We have already found it useful to compare calls and puts with forwards and futures. Figure 6.6 collects the payoff patterns for a European call, a European put and a forward, with the same underlying asset, expiry time T, and exercise price X. The comparison shows that for S T ≥ X the holders of the call and the forward receive the same payoff. For ST ≤ X, the writer of the put and the holder of the forward receive the same payoff. Thus, if we simultaneously become the holder of the call and the writer of the put, our final payoff at T will exactly match that of the holder of the forward. Therefore, the initial value of our portfolio must match that of the forward: C – P = S – X e–rT. Theorem 6.3.1 (Put−Call Parity) Suppose call and put options are available on the same underlying asset with the same expiry date T and exercise price X. Let the continuously compounded risk-free rate be r, and let S be the spot price of the underlying asset. Then the call premium C and the put premium P are related by P + S = C + Xe–rT. (6.8) □ It is important to note that put–call parity is only valid for European options. The form in equation (6.8) holds when the asset generates no income. It is easily adapted to the cases when the asset either generates known income or has a known dividend yield. Exercise 6.3.2 Give a formal proof of put–call parity by means of the method of replicating portfolios, and without assuming the existence of a forward with the right features. An interesting application of the put–call parity is that by rearranging it, we can create a combination of three assets that mimics the fourth one. For example, we see that a portfolio which has Xe–rT cash and is also long one call and short one share is equivalent to being long one put. Exercise 6.3.3 What will be the put–call parity formula when (a) the asset generates a known income, (b) the asset has a constant and continuous dividend yield? Exercise 6.3.4 Consider options on a stock following GBM with drift parameter μ. One may reason it as follows: If μ is higher, then the expected value of ST is higher, so the call premium should increase while the put premium should decrease. Evaluate this conclusion in light of put-call parity. Let us summarise the progress we have made: We have lower bounds for the option premiums. When the asset price must be positive, there are upper bounds as well. When the asset generates no income, American and European calls have the same premium. European options satisfy put-call parity. This is about as far as we can get with the No Arbitrage Principle alone. Significant further progress is only possible by combining the No Arbitrage Principle with models of asset price fluctuations. We shall now begin this task, using the models developed in the previous chapter. It is relevant to note here that these particular models have been found to be most acceptable for stock prices. (For example, Ross [40] shows that crude oil price data is not consistent with the assumption of independent jumps.) Thus, from now on, asset should really be read as stock. 6.4 BINOMIAL OPTIONS PRICING MODEL The Binomial Options Pricing Model, or BOPM, is based on the Binomial Tree Model for price fluctuations. In this model we break up the time span [0,T] into n equal parts, and imagine that over each part the asset price can either move up by a factor U or down by a factor D. Thus, the basic object is the following branch: We remark that typically U > 1 > D. (But see Exercise 6.4.2.) ONE-STEP BOPM First, consider the n = 1 case. Imagine a European call option on this asset which expires at T and has exercise price X. If its initial premium is C, then the evolution of its value over the interval [0,T] is represented by: Here, Cu is the call payoff if the asset price moves up, and Cd is the call payoff if the asset price moves down. It is clear that Cu > Cd. So, the call value goes up if the asset price goes up, and down if the asset price goes down. Now, if we become the writer of the call, the position reverses. Profits from the asset will be cancelled by losses from the call. If we have the right amounts of long asset and short calls, the fluctuations can exactly cancel and we shall have a risk-free portfolio. Thus, suppose we own h units of the asset and write 1 call. The value of this portfolio evolves according to the following branch: For this portfolio to be risk-free, we need hSU –Cu = hSD –Cd. We solve this for h: h= . The ratio h is called the option’s delta, and this process of creating a risk-free portfolio is called delta hedging. Since the portfolio is risk-free, its initial value equals the present value of its value at T. Thus, hS – C = e–rT(hSU – C u). Substituting the value of h, we get: C = e–rT p *Cu + (1 – p*)Cd , (6.9) Exercise 6.4.1 Verify equation (6.9). Exercise 6.4.2 Use the No Arbitrage Principle to show that D < erT < U, and hence 0 < p* < 1. Note that the probabilities of the up and down moves did not enter into our calculations. TWO-STEP BOPM We now let the above process happen twice in succession, cutting the time interval [0, T] into the equal parts [0,T/2] and [T/2,T]. Then we have the following picture for the evolution of the spot price: We label the corresponding payoffs from the call expiring at T as follows: Note that we have Cuu = max{0,SU2 – X}, Cud = max{0,SUD – X}, Cdd = max{0,SD2 – X}. Applying the one-step BOPM to the branches with nodes at Cu and Cd gives: Cu = e–rT/2 p*Cuu + (1 – p*)Cud , Cd = e–rT/2 p*Cud + (1 – p*)Cdd , where p* = . We apply the one-step BOPM, once again, to the branch with node at C to get: C = e–rT/2 p*Cu + (1 – p*)Cd = e–rT p*2Cuu + 2p*(1 – p*)Cud + (1 – p*)2Cdd . Exercise 6.4.3 Suppose we have a two-step BOPM with S = 100, X = 100, U = 1.1, D = 0.9, r = 10% and T = 1. Show that C = 10.71. MANY-STEP BOPM A certain pattern in the call premium formulas should now be evident. First, we list all the possible final spot prices. Over n steps, these are ST = SUkDn–k, k = 0,1,…,n. The payoff from the call corresponding to the final value SUkDn–k is max{SUkDn–k – X,0}. We multiply this payoff by the binomial expression p*k(1 – p *)n-k, where not]pstar@p * p* = . Finally, we sum all these terms and divide the sum by Rn = erT. The general formula is thus obtained: Theorem 6.4.4 (Many-Step BOPM) Consider an asset whose price follows an n-step binomial tree with parameters U and D and initial price S over the time interval [0,T]. Let the continuously compounded risk-free rate be r. Consider a call premium on this asset that expires at T and has exercise price X. Then the call premium C is given by (6.10) where p* = . □ We have given this formula as a reasonable generalisation of the one and two-step BOPM. You may enjoy proving it by induction! Exercise 6.4.5 Show that in an n-step BOPM for a European call, C= The formula (6.10) for C has a very interesting form. First, the division by erT represents discounting to a present value. This present value is taken as a weighted average of the payoffs from the call at expiry. The weights are the probabilities associated with a binomial random variable with parameters n and p*. Thus, one is led to interpreting p* as a probability, which is possible since we have already determined that 0 < p* < 1 (Exercise (6.4.2). Since it arose out of making the process risk-free, it is called the risk neutral probability. Figure 6.7: Premiums of a European call plotted against the initial spot price. The dots represent the predictions of a 3-step BOPM. The smooth curve is obtained from a 100-step BOPM. The dashed line is the graph of the function S – Xe-rT, which is asymptotic to the premium plots. (See Exercise 6.4.5) Another interesting aspect of the BOPM is that the premium depends on T (represented by n), S, X, r and volatility (represented by U,D), but not on the real-world probabilities of price increase or decrease. For European puts we have a similar analysis, except that the call payoffs at expiry are replaced by the put payoffs. Theorem 6.4.6 Consider an asset whose price follows an n-step binomial tree with parameters U and D and initial price S over the time interval [0,T]. Let the continuously compounded risk-free rate be r. Consider a European put on this asset that expires at T and has exercise price X. Then the put premium P is given by (6.11) where p* = . □ Exercise 6.4.7 Show that the BOPM formulas for European put and call premiums satisfy put–call parity. (This reassures us that BOPM correctly captures the properties of options.) ESTIMATING BOPM PARAMETERS To put BOPM to honest work, we need estimates of the parameters U and D. We have already seen one way of making these estimates in the previous chapter where we matched the binomial tree to the lognormal model and obtained U ≐ eσ , D≐e–σ , where σ is the volatility parameter in the Lognormal model. In the same chapter, we also described how to obtain σ from historical data. Example 6.4.8 Figure 6.8 compares the actual closing premiums of European call options on Maruti stock with the theoretical ones calculated by a 10-step BOPM. The options expired on June 30, 2005, and had an exercise price of Rs 460. For the BOPM calculations, we have assumed an annual risk-free rate of 5% and taken the volatility to be σ = 0.26. Figure 6.8: A comparison of actual premiums (stars) of call options on Maruti stock with the predictions from a 10-step BOPM The main issue here is the choice of σ. This example illustrates that □ the model works well if we have the right value of σ, but how do we obtain it? It is true that we have earlier given a way of estimating σ from historical prices, but a little reflection throws up some obvious problems. How much data should we use? What should be its frequency? Unfortunately, our choices in these matters can have a dramatic impact on the value that we get for σ. One solution that has become popular is to work back from the options themselves. Find the σ that makes BOPM give accurate values for one set of options and use it in calculations for others! We will return to this idea in the next chapter. 6.5 PRICING AMERICAN OPTIONS Recall that an American call has the same value as the corresponding European call. We will now see how the BOPM can be used to compute the premium of an American put. In applying BOPM to an American option, we assume that the decision to exercise or not can only be made at the nodes of the branching process. We start by considering the one-step situation: The corresponding diagram for the payoffs from the American put is: Here, we use P* to refer to the value of the American put. If we use P to indicate the value of the corresponding European put, we have, Pu* = Pu and Pd * = Pd, since an American put held till expiry is equivalent to a European put. Now, at the starting point the holder has the option of either exercising the put right away, or holding it till expiry. If he holds it till expiry, it becomes a European put whose value is given by the BOPM for European puts: e–rT p*Pu* + (1 – p*)Pd* . On the other hand, exercising it immediately is worth X – S. The holder will obviously take that step which gives him more value. Therefore, P* = max . This one-step BOPM for American puts can be used to piece together an n-step BOPM in the usual way. We show below how the two-step BOPM is created. We work our way back from the right end of the tree. We first note that: Puu* = max{X – SU2,0}, Pud* = max{X – SU D,0}, Pdd* = max{X – SD2,0}. Now, we apply the one-step BOPM to the two second-stage branches: Pu* = max , Pd* = max . Finally, we apply the one-step BOPM to the first branch: P* = max . By now it should be clear how the process will work for an n-step tree. This is an easy process to implement numerically. However, it does not lead to a closed form solution (unlike the case of European put), and so does not give any analytic insight. Example 6.5.1 Consider a 1-step BOPM for an American put with T = 1, erT = 1.05, S = 100, U = 1.1, D = 0.9 and X = 100. Then the riskneutral probability is given by p* = = 0.75. We have the final payoffs Pu* = max{0,X – SU} = max{0,100 – 110} = 0, Pd* = max{0,X – SD} = max{0,100 – 90} = 10. The put premium is therefore given by P* = max = max{0, × 0.25 × 10} = 2.38. In this case it is best not to exercise early as that has a zero payoff. On the other hand, if we change X to 105, the payoff from exercising early is superior. □ Exercise 6.5.2 Verify that according to BOPM, an American call has the same premium as a European call with the same parameters. 6.6 FACTORS INFLUENCING OPTION PREMIUMS According to BOPM, the factors affecting the value of a call or put option are the time to expiry, exercise price, current spot price, volatility and the risk-free interest rate. Expiry Time (T): An increase in T increases the range of possible profits (because of the increase in range of the final spot price ST), while the loss is always limited to the premium. Thus, the value of an option increases with T. In an American option one has the additional possibility of closing out the contract if a large profit is available early. A longer T means a greater possibility of encountering such a profit, so this strengthens the positive connection between the value of a call or put option and T. Figure 6.9 confirms these ideas. The curve showing the premiums of American puts is both higher and has greater slope. Figure 6.9: Variation of put premium with time to expiry plotted using a 10-step BOPM. The higher curve is for an American put and the lower one for a European put with the same underlying asset and exercise price. Current Spot Price (S): The value of a call rises with S since this suggests that ST will also be high, bringing in greater return from the call. On the other hand, since a put brings more profit as ST decreases, the value of a put goes down when S rises. Exercise Price (X): A higher X decreases the profit from a call and so lowers its value. The value of a put will rise with X. Volatility (σ): A higher σ indicates chances of greater profit from the option. So, the value of an option rises with volatility. Risk-free Interest-rate (r): Suppose r increases. Then the present value of the final payoff decreases and this will pull down the value of the option. However, there is another factor: if r increases, one expects a general rise in prices, including those of stocks. This will pull up the value of a call, and pull down that of a put. Thus, we certainly expect the value of a put to decrease with an increase in r. For calls, we have two opposite influences and cannot immediately say which pull is stronger. It has been empirically observed (and is also a consequence of BOPM) that the second influence dominates and thus the value of a call increases with r. Figure 6.10: Variation of option premiums with the risk-free rate, calculated from a 10-step BOPM. All the cases have S = X = 10, σ = 30% and T = 1. 6.7 OPTIONS ON ASSETS WITH DIVIDENDS The BOPM approach can also be applied when the asset has a known dividend yield. For simplicity, we assume a constant and continuous dividend yield, though the approach can be easily extended to discrete or varying yields. Let us start with a one-step binomial tree for an asset which also has a continuous dividend yield q. Let the up and down factors be U and D, the time interval be Δt, the initial asset price be S, and the exercise price for a European call on the asset be X. Let the continuously compounded risk-free rate be r. As usual, we initially create a portfolio P which is long h units of the asset and short one call. All the dividend earned over the Δt interval is reinvested in the asset so that at the end of the process the portfolio is long heqΔt asset units and short one call. We denote the final value of the call by Cu (when the asset price moves up) and Cd (when the asset price moves down). Exercise 6.7.1 Show that P is risk-free if the hedge ratio is set to h = e–qΔt . Suppose h is set to the value given in the above exercise. Then the portfolio, being risk-free, must grow at the risk-free rate. Therefore, its initial value is simply the present value of its final value: hS – C = e–rΔt(heqΔtSU – C u). Exercise 6.7.2 The call premium is given by C = e–rΔt(p *Cu + (1 – p*)Cd), where p* = Cu = max{0,SU – X} Cd = max{0,SD – X}. Starting with these calculations, it is easy to carry out an n-step BOPM over a time interval T. Theorem 6.7.3 The value of a European call on an asset with dividend yield q is (6.12) where p* = is, again, called the risk-neutral probability. □ Exercise 6.7.4 Show that the price of a European put on an asset with dividend yield q is given by where p* = . Exercise 6.7.5 Show that in the current context, put–call parity takes the following form: P + Se–qT = C + Xe–rT. 6.8 DYNAMIC HEDGING Dynamic hedging is a process whereby a portfolio is constantly readjusted to keep it risk-free. Example 6.8.1 Consider a two-step BOPM for a European call with S = 100, X = 100, U = 1.1, D = 0.9 and r = 5%. Then, we have Cuu = 21 and Cud = Cdd = 0. With the aid of these numbers, we calculate Cu = 15 and Cd = 0. Moreover, the corresponding hedge ratios are hu = 0.9545 and hd = 0. Our next calculation is C = 10.714, and the hedge ratio for this step is h = 0.75. We see that to keep the portfolio risk-free we have to adjust the hedge ratio at each step, depending on the move that happens. Let’s follow the process through one sequence of possible events. At t = 0, we write 1000 calls and buy 750 shares, thus keeping a hedge ratio of 0.75. Our net investment is 750 × 100 − 1000 × 10.714 = 64,285. At t = 1, suppose the spot price has gone up by U = 1.1 to 110. Then the value of our portfolio becomes 750 × 110 − 1000 × 15 = 67,500. Note that the portfolio value has increased by exactly the risk-free rate of 5%. To keep the portfolio risk-free over the next stage of the tree, we have to adjust its hedge ratio to 0.9545. We can do this by either buying 204.5 shares (so we have a total of 954.5) or by buying back 214.3 calls (decreasing their number to 785.7). While the steps are equivalent mathematically, the second one involves a much lower investment and is therefore more practical. Thus, we decide to buy back 214.3 calls at a price of 15 each, and we borrow 214.3 × 15 = 3214 to do so. At t = 2, suppose the spot price again rises. Then the value of the portfolio is 750 × 121 − 785.7 × 21 − 3214 × 1.05 = 70,875. Again, the portfolio value has increased by exactly the risk-free rate. □ This is only a toy example in that the real prices would not actually go up or down by the prescribed factors, and so the tree has to be reconstructed after each step for the new price. This also means that we cannot expect the hedge to be perfect. In the next example we illustrate one way of handling an actual sequence of prices. Example 6.8.2 Suppose a stock is estimated to follow GBM with drift μ = 0.2 and volatility σ = 0.3. We wish to hedge a portfolio of 1000 shares of this stock over 3 months using call options expiring in 90 days. The initial price S of the stock is 10 and this is also the exercise price X of the options. The risk-free rate is 5%. To keep down the transaction costs associated with trading options, we decide to readjust the portfolio every 9 days. We then set up a 10-step binomial tree with each step being of 9 days. We start by hedging for the first 9 days by using the first hedge ratio from this tree. After 9 days we observe the new stock price and create a corresponding 9-step binomial tree. We calculate the new hedge ratio and accordingly change the number of written call options. Thus, every 9 days, we shorten the number of steps in the tree by one. The time-step of 9 days is the same for each binomial tree, so we use the following up/down factors and risk-free probability throughout. U = es D= = 1.048 = 0.954 p* = = 0.501 The initial 10-step BOPM gives the following starting hedge ratio and call premium: h = 0.561, C = 0.6388. Our first step is to write 1000/h = 1783 calls. Our portfolio now consists of 1000 shares, 1783 shorted calls and 0.6388 × 1783 = 1139 in cash. The cash cancels the value of the shorted calls, so the initial value of this portfolio is 1000 × 10 = 10,000. Now, suppose the stock prices take the following values at 9-day intervals (they have been randomly generated to follow a GBM with the given drift and volatility): 10, 10.02, 9.53, 10.43, 10.30, 10.16, 10.30, 9.89, 10.17, 9.70, 10.05. After 9 days, the new stock price is 10.02. The 9-step BOPM gives the hedge ratio and call premium values as: h = 0.563, C = 0.6436. Therefore, we have to adjust the portfolio so that it has 1000/0.563 = 1776 written calls. We buy back 7 calls at an expense of 7 × 0.6436 = 4.51. The position becomes: Value of shares = 1000 × 10.02 = 10020 Value of written calls = −1776 × 0.6436 = −1143 Cash = 1139 e9r/365 − 4.51 = 1136 Total value of hedged portfolio = 10020 − 1143 + 1136 = 10013. Figure 6.11 shows the result of repeating this process till the expiry of the calls. The hedging is quite successful in removing the large oscillations present in the stock price. □ Figure 6.11:Dynamic hedging with calls. The diamonds represent the unhedged shares and the stars represent the hedged portfolio (Example 6.8.2). 6.9 RISK-NEUTRAL VALUATION Let us take another look at the structure of the BOPM formulas for European call premiums. We observed earlier that the premium turns out to be the present value of the expected future payoffs, calculated using the risk-neutral probability p*. A little later, we saw that the premium of a European put has the same form. Further insight into p* comes by considering risk-neutral investors. Since they are blind to risk, they do not demand any compensation for it. If all investors are risk-neutral, then the expected value of any asset will grow at the risk-free rate. Now consider a one-step binomial tree for an asset, over a time T. Let the up move have probability p. The expected final value of the asset is E[ST] = pSU + (1 – p)SD. We assume a world of risk-neutral investors. Then we have E[ST] = erTS pSU + (1 – p)SD = erTS p= = p*. So the risk-neutral probability p* can also be interpreted as the probability of an up move in a risk-neutral world. Yet another description of p* that emerges from this calculation is that it is the probability that makes today’s spot price equal to the present value of the expected future payoff. Our results for European options extend immediately to any derivative whose final payoff depends only on the final spot price of the underlying asset. We shall call such a derivative a European derivative. The extension of the BOPM formula to such a derivative is automatic since our calculations of European options premiums are independent of the formula for the final payoffs. We have reached the Principle of Risk-Neutral Valuation: The value of a European derivative is the present value of the expectation of the final payoff, where the expectation is calculated using risk-neutral probabilities. As an application of this principle, we can re-derive the formula for the value of a futures contract. The payoffs at the end of an n-step binomial tree are SUkDn−k – X, and hence the initial value of the futures is given by = e–rT = e–rT S(p*U + (1 – p*)D)n – X = e–rT SerT – X = S – Xe-rT. Exercise 6.9.1 Consider a contract which will pay you the square of the price of the underlying asset at a future time T. What is the correct price for this contract? Exercise 6.9.2 Consider a cash-or-nothing call option. If the final spot price ST is greater than or equal to the exercise price X, the holder receives 1 unit of the currency. If ST < X, the holder receives nothing. Price this contract. We shall take a closer look at the risk-neutral probabilities when the underlying asset actually follows geometric Brownian motion. Consider an n-step binomial tree in each step of which the stock price can move up or down by factors U and D respectively. We denote the stock price at each step as follows: S = S0, S1, S2, …, Sn-1, Sn = ST. The log returns are defined by Ui = ln , i = 1,…,n. The risk-neutral probability p* gives the probability of an up move in a risk-neutral world. Its value is p* = . This choice of p* ensures, our options pricing model will be free of arbitrage. Each Ui is an independent Bernoulli variable taking values u = ln(U), d = ln (D), with probabilities p* and 1 – p* respectively. Therefore, V ar[Ui] = p*(1 – p*)(u – d)2. We can now describe the variance of the overall log return under the risk-neutral probability: Var ln = Var ln = np*(1 – p*)(u – d)2. At this stage we bring in our earlier estimates of u,d (from matching with a GBM with drift μ and volatility σ): u=σ , d = –σ . We shall look at what happens to our model as we go to the continuous limit, i.e., we let n →∞. First, Therefore, Thus the risk-neutral probability preserves the volatility of the underlying GBM! We have obtained the basic properties of the riskneutral probability in BOPM: 1. The expected stock price grows at the risk-free rate (this prevents arbitrage). 2. The volatility is the same as for the underlying GBM (at least in the limiting sense). We are now ready to take the step up to the Black–Scholes model. 7 The Black–Scholes Model I n this chapter we shall take up the Black-Scholes model for the pricing of European options whose underlying asset follows Geometric Brownian Motion. The most satisfying and mathematically rigourous way of doing this is to follow the same scheme as was adopted for the BOPM: create a combination involving the derivative and the asset that evolves in a risk-free manner with time. However, this approach requires a prior study of random processes that evolve continuously with time—mathematics beyond what is usually covered at the undergraduate level. We shall proceed in a different way: we assume that the continuous model must have the same fundamental properties as the discrete one. That is, we assume that the Principle of Risk-Neutral Valuation must be valid in the continuous case, and use it to price various European derivatives. The great advantage of the Black-Scholes formula is the ease with which it can be used to examine the exact relationship of the option’s price with various factors. The numerical insights from BOPM can be improved to formulas of rates of change. This makes it much easier to carry out procedures like dynamic hedging. It also enables investors to fine-tune the exposure of their portfolios to changes in specific factors such as stock prices, volatility, and interest rates. 7.1 RISK-NEUTRAL VALUATION Let us begin by recalling the Principle of Risk-Neutral Valuation as developed in the previous chapter: The value of a European derivative is the present value of the expectation of the final payoff, where the expectation is calculated using risk-neutral probabilities. We obtained this principle from the n-step BOPM. The risk-neutral probability was defined as the one that makes the current stock price equal to the present value of the expected final stock price. We also discovered that when the underlying asset follows Geometric Brownian Motion, the risk-neutral probability preserves its volatility (in the limit n →∞). It is time to take up the direct pricing of European derivatives on an asset following GBM. We proceed in two steps (which we shall again term the Principle of Risk-Neutral Valuation): 1. Replace the original GBM by a risk-neutral one with the same volatility. 2. Calculate the present value of the expected payoff from the riskneutral GBM. The first step is easily done. Suppose the asset price follows GBM with drift μ and volatility σ: 2⁄2)T+σ ST = Se(μ–σ Z, Z ~ N(0,1). Its risk-neutral version is ST* 2⁄2)T+σ ST* = Se(r–σ Z, Z ~ N(0,1). Exercise 7.1.1 Verify that S = e–rT E[S T*], Var ln = σ2T. The hard work is in implementing the second stage. Suppose the payoff from the European derivative is given by a function f(ST) of the final stock price. Then we formulate the value of the derivative as V = e–rT E[f(S T*)]. Before we tackle European options, let us warm up by applying riskneutral valuation to simpler situations. Example 7.1.2 Consider a futures on an asset following GBM with parameters μ,σ. Let its exercise price be X and the time remaining to expiry be T. According to the Principle of Risk-Neutral Valuation, its value is V = e–rTE[S T*– X]. Therefore, Example 7.1.3 Consider a contract which will pay you the square□ of the final spot price of the underlying asset. What should you pay to buy this contract? We assume the underlying asset follows GBM with volatility σ and calculate the value as follows: 7.2 THE BLACK–SCHOLES FORMULA □ Theorem 7.2.1 (Black-Scholes Formula for European Call Options) Consider aEuropean call with expiry date T and exercise price X on an asset following GBM13 with volatility σ. Then the call premium at t = 0 is given by C = SΦ(w) – Xe–rTΦ(w – σ ). (7.1) where w= . (7.2) Proof. By the Principle of Risk-Neutral Valuation, the premium at t = 0 is given by C = e–rTE[(S T*– X)+]. (7.3) where x+ = max{x,0} and 2⁄ 2)T+σ ST* = Se(r–σ Z, Z ~ N(0,1). Now we calculate the right hand side of equation (7.3) to obtain C. 2 We need to identify where the function Se(r–σ ⁄2)T+σ x is greater than X. Since this function is monotonically increasing, we need to 2⁄2)T+σ just identify the value x = a, where Se(r–σ is a= x = X. The solution . Therefore, (7.4) Let us separately evaluate the two terms in equation (7.4). First, we calculate where Z is a standard normal variable. Let Φ be the cumulative distribution function of Z: Φ(z) = ℙ[Z ≤ z]. Then, by the symmetry of Z, we have ℙ[Z ≥ z] = ℙ[Z ≤–z] = Φ(–z). Hence, where w = –a + σ = Now we calculate the second term: . Substituting the values of I and II back into our original equation for the call premium gives the Black– Scholes formula. □ We can use the Black– Scholes formula to see how C depends on various parameters (we had done this earlier using intuition and BOPM). It is convenient to write ST* = SeW, where W ~ N((r – σ 2⁄2),σ ). We also use ↑ to indicate an increase and ↓ to indicate a decrease. 1. If S ↑, then clearly E[(SeW – X)+] ↑ and so C ↑. 2. If X ↑, then clearly E[(SeW – X)+] ↓ and so C ↓. 3. To see what happens when r changes, we rearrange the expression for C as follows: Now, if r ↑, then e–rT ↓, and so C ↑. 4. The nature of the dependence on σ and T cannot be seen so directly from the formula. We will work that out a little later. Figure 7.1: These diagrams illustrate the call premium predictions of the Black–Scholes formula. In both we have fixed T = 0.25. Further, in the first diagram S = 100, σ = 0.3, and C is plotted against the exercise price X for three different values of r (0, 0.5 and 1 from left to right). In the second, X = 100, r = 0.1, and C is plotted against S for different values of σ (0.4, 0.2 and 0.001 from left to right). Exercise 7.2.2 Use Put–Call Parity to derive the Black–Scholes formula for a European put: P = –SΦ(–w) + Xe–rTΦ(–w + σ ). (7.5) Recall that the value of an American call is the same as that of a corresponding European call, so that too can be obtained from the Black–Scholes model. However, the value of an American put cannot be obtained from this model. Exercise 7.2.3 Consider the cash-or-nothing call option defined in Exercise 6.9.2. Price this contract assuming that the underlying asset follows a GBM with volatility σ. Exercise 7.2.4 A stock-or-nothing call option delivers the stock if ST ≥ X and nothing if ST < X. Price this contract, assuming the underlying stock follows a GBM with volatility σ. 7.3 OPTIONS ON FUTURES Options are also written with a futures as the underlying asset. The contract details the underlying asset and the expiry date of the futures. A call option, on exercise, would make its holder the holder of the futures. A put option would make him the writer of the futures. Let us create some notation for such an option: An option on a futures could be European or American. In either case, if it is exercised at a time t, then the writer has to deliver a futures contract expiring at TF and with futures price X. Since the available contracts will be marked-to-market, the writer will actually deliver a futures with futures price Xt and a compensatory cash amount of e–r(TF–t)(Xt –X). Two points must be noted about this delivery: 1. The marked-to-market futures has zero value at delivery. 2. The compensatory cash is discounted to the present since the futures price is a future payment. The Black–Scholes approach works quite neatly for European options on futures. Let us carry out the calculations for a European call option on a futures and obtain its premium CF. We assume the asset underlying the futures follows a GBM with volatility σ. In this case, the final payoff at time TO has the form: This is the same payoff as that for a European call on the asset underlying the futures and with exercise price Xe–r(TF–TO). Therefore, the value CF is given by the Black–Scholes formula for such a call: where w= = . American options on futures can be handled by BOPM. 7.4 OPTIONS ON ASSETS WITH DIVIDENDS In this section we consider assets which follow GBM and also have a continuous dividend yield. In §6.7, we had carried out the BOPM analysis for European options on such assets. Now, we shall derive the continuous version. Let us start by reviewing our earlier work. We note that: 1. The calculations can be carried out for any European derivative. 2. The derivative premium is the present value of the expectation of the future payoff, where the expectation is calculated using the risk-neutral probability p*. 3. Under the risk-neutral probability, the current asset price is the present value of the expectation of the future asset price. In other words, under BOPM, the Principle of Risk-Neutral Valuation extends to European derivatives on assets with continuous dividends. Now we take up an asset whose price evolves from S to ST over a time interval T following a GBM with volatility σ. We approximate this by an n-step binomial tree with U = eσ and D = 1⁄U. Suppose also that the asset has a continuous dividend yield q. Exercise 7.4.1 Let p* be the risk-neutral probability for this asset. Show that 1. 2. We see that the risk-neutral probability preserves volatility. Thus, in the continuous case, the risk-neutral version of the original GBM should have the same volatility. Its drift, on the other hand, is r – q: this choice ensures that S is the expected value of ST under the riskneutral GBM. Therefore, in applying the principle of risk-neutral valuation, we replace the original GBM 2⁄2)T+σ ST = Se(μ–σ Z with the risk-neutral one 2⁄2)T+σ ST* = Se(r–q–σ Z, where Z has the standard normal distribution. The premium of a European derivative on this asset is now given by V = e–rTE[f(S T*)], where f is the payoff function of the derivative. In particular, the premiums of European options can be easily obtained by modifying our earlier calculations. Exercise 7.4.2 Show that the premiums of European puts and calls on an asset with dividend yield q and volatility σ are given by 7.5 BLACK–SCHOLES AND BOPM An alternate derivation of the Black–Scholes equation is to start with a BOPM having time-step △ t, and then let △ t → 0 to get a continuous model. In taking this limit, the binomial distribution in the BOPM converts to the normal distribution in the Black–Scholes. In particular, by taking a small enough △ t, we get a discrete approximation of the Black–Scholes. This is illustrated in Figure 7.2, which shows that even a 10-step BOPM gives a good approximation to the continuous limit. The Black–Scholes formula has a simple form which lends itself to quick calculation as well as analytic treatment leading to general conclusions. BOPM involves considerably more computation (if large tree sizes are used). On the other hand, it is a much more flexible tool than Black–Scholes. For instance, BOPM can be used to price American puts whereas Black–Scholes cannot. BOPM can also easily handle the so-called exotic options, which have more complicated rules. In general, we could say that the scope of BOPM is wider but the analysis from Black–Scholes is deeper. Figure 7.2: Comparison of Black–Scholes and BOPM. The call premium C has been plotted against the elapsed time t. The continuous path tracks the prices predicted by Black–Scholes, while each dot has been calculated using a 10-step BOPM. (In this instance, we have taken T = 1, S = X = 100, r = 10% and σ = 0.3.) Example 7.5.1 We consider the case of a barrier call. Such an option becomes worthless if the spot price ever crosses a certain barrier. A barrier call places limits on the profits available to the holder, so its value is obviously less than that of a standard call. We show below how BOPM can be used to find the right price for a barrier call. Consider a European call with X = 100 and T = 3. Suppose the underlying stock has initial spot price S = 100 and can move up by a factor U = 1.1 or down by a factor D = 0.9 during a unit time period. Let the risk-free rate be r = 0.05 during a unit time period. In addition, suppose there is a barrier at 95: the call will expire worthlessly if the spot price goes below 95. We will use a 3-step BOPM to find the call premium C. First, from the given data we calculate R = 1 + r = 1.05, q = = 0.6, 1 – q = 0.4. The binomial tree for S is given in Figure 7.3. The dashed line represents the barrier at 95. This call cannot be priced by setting up a tree for the call payoffs in the usual manner, since the value at any location is not completely determined by the subsequent payoffs—it also depends on the path taken earlier. If that path ever fell below the barrier, the value is zero. Hence, when calculating the risk-neutral probability for a final payoff, we have to actually calculate the probability of reaching it by a path that doesn’t cross the barrier. In this example, the paths that have positive payoff are UUU, UUD, and UDU. We tabulate the probabilities and payoffs for these paths below: Figure 7.3 Therefore the expected payoff is EP = 0.216 × 33.1 + 2 × 0.144 × 8.9 = 9.71. The call premium is C = Present value of EP = = 8.39. □ As this example illustrates, exotic options may require us to track paths rather than nodes. Now, in an n-step BOPM, the number of possible paths is 2n, as there are 2 choices of movement at each stage. The number of nodes is only 1 + 2 + + (n + 1) = (n + 1)(n + 2) ⁄ 2 ≈ n2⁄2. For example, if n = 100, the number of paths is 1.3 × 1030 while the number of nodes is 5151. The tremendous amount of computation involved may hinder us from getting accurate values out of BOPM. Figure 7.4: The first chart shows implied volatility of the Nifty stock index, calculated from call option prices on 31 August 2005 and plotted against time to expiry (in months). The two curves represent options with exercise price 2350 and 2400 respectively. The second chart plots implied volatility of Reliance shares, measured from call options expiring on 29 September 2005, against the exercise price of the corresponding call. A Black–Scholes kind of approach can also be applied to such options. It requires us to assign probabilities to the paths of Brownian motion and to integrate over sets of paths to calculate expected values. Calculations of this kind are important in quantum physics, and the techniques developed there find an application in finance! 7.6 IMPLIED VOLATILITY From the data about past stock prices, we can estimate the historical volatility σ of the spot price. We can substitute this in the Black–Scholes formula to get the corresponding value of a European option. Also, we can reverse this process: starting with the actual price of the option in the market, we can solve the Black–Scholes equation for σ. The value we get is called the implied volatility, denoted by σI. The implied volatility can be seen as reflecting the opinion of the market about the future volatility of the stock price. Indeed, studies suggest that implied volatility is better than historical volatility for forecasting the future. In terms of calculation, there are some difficulties with implied volatility. One is that the Black-Scholes equation cannot be explicitly solved for σI, so some numerical technique has to be applied. A deeper problem is that different options for the same stock typically lead to different values of σI, so one has to take some kind of weighted average of these values. Such problems also exist with historical volatility since the estimated value depends on the period from which the data is taken. Example 7.6.1 The implied volatility of the Nifty stock index, as measured from call option prices on 31 August 2005, varied from 0.1 to 0.126. Its historical volatility was 0.18 over the previous year, 0.26 over the previous 5 years, and 0.31 over the previous 15 years. □ Figure 7.4 shows some examples of implied volatility patterns. The pattern we see in the second chart in that figure is common enough to have a name—the volatility smile. The Black–Scholes model does not predict this smile. Its prediction is of a flat line (with σI = σ). A good amount of current research is devoted to finding explanations for the smile. These fall into two categories. One focusses on the information available to traders. For instance, it might be that traders investing in options with extreme exercise prices are doing so because they expect higher volatility. The other kind of research investigates the assumptions underlying the Black– Scholes model, such as a constant risk-free rate or a fixed volatility. Models in which the risk-free rate and volatility vary randomly with time, have been able to generate volatility smiles of the types observed in reality. 7.7 DYNAMIC HEDGING DELTA Consider a European call with exercise price X and expiry time T on a stock with initial spot price S, drift μ and volatility σ. Let the risk- free rate be r. Then, according to the Black–Scholes model, the call premium C is a function of S, X, T, σ and r. We are interested in how changes in any of these will affect C. To start, we consider the effect of changes in S. To this end, we define the delta of the option by δC = . We can calculate the delta by starting from the the Black–Scholes formula for C: C = SΦ(w) – Xe–rTΦ(w – σ ). Differentiate it using the chain rule: So our final result is: δC = Φ(w). Note that δC > 0 and so C increases if S does. We shall now see how knowledge of delta is used in hedging. DELTA HEDGING Consider a portfolio which is long one share and h calls. Then its value is V = S + hC. To hedge against changes in the spot price, we set = 0. This gives: 0 = 1 + hδC = 1 + hΦ(w), or h = – . In other words, we have to short 1 ⁄ Φ(w) calls. Such a portfolio is called delta neutral and this process is called delta hedging. Example 7.7.1 The process of delta hedging again leads to dynamic hedging: a portfolio would be periodically readjusted to keep it delta-neutral. As an illustration, we take up the same simulated stock prices used in Example 6.8.2. Recall that the stock was assumed to follow GBM with drift μ = 0.2 and volatility σ = 0.3. The task was to hedge a portfolio of 1000 shares of this stock over 3 months using call options expiring in 90 days. The initial price S of the stock was 10 and this was also the exercise price X of the options. The risk-free rate was 5%. Finally, we had decided to adjust the hedged portfolio every 9 days. The calculations for the first time-step are: w = Call premium, C = SΦ(w) – Xe–rTΦ(w – σ = 0.653 = 0.157. Number of shorted calls, N = Cash in portfolio, D = N × C = 1162. Hedged portfolio value = 10,000. = 1778. ) Figure 7.5: Delta hedging a stock portfolio with calls (Example 7.7.1) After 9 days, the spot price is 10.02. Black–Scholes gives the new call premium to be 0.628. The value of the portfolio is now 10.02 × 1000 – 1778 × 0.628 + 1162 e0.05×9⁄365 = 10,067. We repeat this process over the subsequent stages. Figure 7.5 shows the result. It is almost identical to the hedging carried out via BOPM. You must have noticed, however, that it was easier to set up and faster to carry out. □ GAMMA The most important parameter affecting C is the spot price S. For a closer look at their relationship, we involve the second derivative as well. Therefore, we define the gamma of the call by γC = . It is explicitly given by the following formula: γC = 2⁄2 . e–w Exercise 7.7.2 Confirm the formula for the gamma of a call given above. DELTA–GAMMA HEDGING We have already encountered delta hedging, where the delta is set to zero by shorting an appropriate number of calls. We can further reduce risk due to changes in the spot price by also setting the gamma to zero. To do this, we create a portfolio consisting of the stock and two different calls on this stock (e.g. they could have different expiry dates or exercise prices). Suppose it has one share, x1 units of the first call and x2 units of the second call. We similarly use Ti, Xi, Ci to represent the expiry time, exercise price and premium of the ith call. Then the value V of the portfolio is given by: V = S + x1C1 + x2C2. Setting δV and γV to zero gives the following equations: 1 + x1Φ(w1) + x2Φ(w2) = 0, 2⁄2 + = 0. x1e–w1 2⁄2 x2e–w2 This is a 2 × 2 linear system and is easily solved for x1 and x2. Figure 7.6: An instance of delta–gamma hedging (Example 7.7.3) Example 7.7.3 Let us take up the situation of Example 7.7.1 again. We assume another call option on the stock is also available with expiry in 90 days and exercise price 11. Figure 7.6 shows the result of carrying out delta–gamma hedging in this situation. Note that the hedged portfolio shows noticeably less fluctuation than was the case with delta hedging alone. □ 7.8 THE GREEKS In the last section we investigated how the price C of a European call varies with the spot price S and used the results to form portfolios which reduced the risk arising from fluctuations in S. Similar calculations and applications can be carried out for the other parameters affecting C. THETA First we consider the time factor. Let t be a time instant during the life of the call. Then the time to expiry is T – t, and the value of the call at t is given by C(t) = StΦ(w(t)) – Xe–r(T–t)Φ(w(t) – σ where w(t) = . We define the theta of the call by To calculate ΘC, we first differentiate w(t): ), keeping in mind that, since this is a partial differentiation, we take St to be a constant S. Now we complete the calculation of ΘC: This looks intimidating but if we substitute the value of ∂w⁄∂t|t=0 and patiently look for cancellations, we are rewarded with the following result: 2⁄2 – e–w ΘC = – rXe–rTΦ(w – σ ). (7.8) Note that ΘC < 0 and so the value of the call decreases with time. Alternately, we interpret this as: the value of the call increases with the time to expiry T. VEGA Next, we consider the dependence on the volatility σ. We define the vega of the call by VC = . The calculation of the vega is quite simple and gives VC = S 2⁄2 . e–w We see that VC > 0 and so C increases with σ. RHO The parameter rho measures the dependence on the risk-free rate r, and is defined by ρC = . Its value is given by ρC = TXe–rTΦ(w –σ ). (7.10) We have ρC > 0 and so C increases with r. The various parameters we have defined are collectively known as “The Greeks”. (Vega is the only one which is not named after a Greek alphabet.) More generally, consider any portfolio based on one stock and associated options. If we denote its value by V , then the Greeks for this portfolio are defined by Delta: δV = Gamma: γ = V Vega: VV = Theta: ΘV = Rho: ρV = All the derivatives are evaluated at t = 0. Exercise 7.8.1 Show that the Greeks for a European put are given by δP = –Φ(–w) γP = VP = 2⁄2 e–w S ΘP = – ρP = –TXe–rTΦ(σ 2⁄2 e–w + rXe–rTΦ(σ – w) – w). Suppose each parameter x changes by a value △ x over a time period △ t. Then the change in the value of the portfolio is approximated by △V ≐ δV △S + (△S)2 + V V △σ + ΘV △t + ρV △r. V can be hedged against changes in one or more parameters by setting the corresponding Greeks to zero. 7.9 THE BLACK–SCHOLES PDE In this section we shall obtain the partial differential equation (or PDE) that was at the centre of the original work of Black, Scholes and Merton. Consider a European derivative whose final payoff at the expiry time T is given by a function f of the final spot price ST. Suppose the underlying asset price follows a GBM with volatility σ. Then the value of the derivative as a function of spot price S and time t is given by the Principle of Risk-Neutral Valuation: We first differentiate with respect to S to calculate the delta and gamma of the derivative (see §A4): Next, we calculate ΘV , the derivative with respect to t: We denote the last integral by I and integrate by parts: If f′ is “nice” and does not grow too fast as x → ±∞, for example if it is a polynomial, then the limits in the last expression are zero and we have I = σS γV . Substituting this in the expression for ΘV gives ΘV = rV – rS δV – γV . We rearrange this as follows: ΘV + rS δV + γV – rV = 0. These calculations are valid for any t (not only t = 0) and so we get + rS + – rV = 0. This partial differential equation is known as the Black–Scholes PDE. Exercise 7.9.1 Verify that the Black–Scholes formula for the premium of a European call or put satisfies the Black–Scholes PDE. As remarked earlier, we are reversing history. The original route uses the study of continuously evolving random processes (or stochastic calculus) to set up the Black-Scholes PDE. The fact that it does not involve the drift leads to risk-neutral valuation and gives one way (our way) of finding derivative premiums. (This observation was first made by John Cox and Stephen Ross in 1976 [14].) Alternately, we directly solve the Black-Scholes PDE together with the boundary condition coming from the payoff function of the derivative: (7.12) In this regard, it is interesting to note that the Black–Scholes PDE is essentially just the heat equation: = . We can convert the Black–Scholes PDE to the heat equation by the following substitutions: s = T – t, x = ln(S ⁄ X) + (r – )(T – t), D(x,s) = er(T–t)V (S,t). These substitutions also convert the boundary condition (7.12) into the initial condition The benefit is that the heat equation has been extensively studied and there are standard techniques for its solution, both theoretical and numerical. Exercise 7.9.2 Verify that the change of variables given above convert the Black–Scholes PDE to the heat equation. The connection of GBM with the heat equation is yet another discovery that was first made by Bachelier. The essential difference between the work of Bachelier and Black–Scholes is that Bachelier used real-world probabilities instead of risk-neutral ones. There are two objections to taking the expectations calculated from a real-world probability as prices. The first, as we emphasised in earlier parts of this book, is that risk also contributes to price. The second is that it may be impossible to consistently price all assets and derivatives by the real-world expectation of their future payoffs! A simple example (from [43]) is given below. Example 7.9.3 Suppose every asset is priced by its expected return. Let St be the price of one US Dollar at time t in Indian rupees. If interest rates are zero, then the futures price for a contract to sell $1 at time T is Rs X = S0 = E[ST]. The forward price for a contract to sell Re 1 at time T is similarly $Y = E[1 ⁄ ST]. By the No Arbitrage Principle, we must have Y = 1 ⁄ X, and hence E = . However, by Jensen’s inequality (a basic result in probability), for any non-constant random variable X taking only positive values, E[1 ⁄ X] > 1 ⁄ E[X]. So this equality is impossible. □ When we use risk-neutral probabilities, we avoid this difficulty because then we use a different assignation of probabilities for each asset rather than a single framework of probabilities for all assets simultaneously. Example 7.9.4 We illustrate the point made above by using a 1-step BOPM in the context of Example 7.9.3. Let r = 0 and the initial exchange rate be Rs 40 per US dollar. Consider a BOPM model with U = 1.1 and D = 0.9. Then the tree for dollar prices (in rupees) is The risk-free probabilities for the up and down moves in this tree are Now we consider the tree for rupee prices (in dollars): The risk-free probabilities for the up and down moves here are q* = = 0.45 and 1 - q* = 0.55. Thus, each tree has a distinct risk-neutral probability distribution, and this enables us to avoid the inconsistency encountered in Example 7.9.3. □ 7.10 SPECULATING WITH OPTIONS So far we have studied derivatives as tools for reducing risk. They have another application: speculation. If an investor has information on how the market, or a particular asset, will behave in the future, she can profit by using derivatives to lock in favourable prices now. This can be done in a methodical way, and we will now look at various strategies used by speculators. Table 7.1: Data for call options on TCS stock traded on the National Stock Exchange on August 31, 2005. All these options had 29 September 2005 as the expiry date. Example 7.10.1 Consider an investor whose analysis leads her to expect 30% annualised return from TCS (Tata Consultancy Services) stock during the 1-month period starting on September 1, 2005. She has to decide how best to profit from this analysis. The first possibility is to buy TCS stock. The closing price for this on August 31, 2005, was Rs 1405. Over one month, her expected gain per share is × 1405 = 35. To estimate the associated risk, she decides to use the implied volatility. The relevant data for this is given in Table 7.1 From the data on various call options and assuming a risk free rate of 5%, she finds that the implied volatility ranges from 0.188 to 0.235. She weights each value by the corresponding number of contracts traded to arrive at the average implied volatility: σI = 0.224. Therefore, a first order approximation to the standard deviation of the expected return after 1 month is S σI = 1405 × 0.224 × = 91. Thus, to first order, the return is normally distributed with standard deviation 91. Its mean, according to our analyst, is 35. If we let Z denote a standard normal variable, then the probability of a positive return is ℙ[ST ≥ 0] = ℙ Z ≥- = 65%. The second tactic would be to profit by investing in calls which expire in 1 month. For instance, the closing price on August 31 of a call on TCS stock with exercise price X = 1350 and expiry on 29 September, was C = 75. The expected payoff from buying this call is and hence the expected return is 98 – 75 = 23, which amounts to an annualised rate of 368%! With such an incredible expected return, the associated risk must have some unpleasant features. We first find the probability of a positive return: This compares well with investing in the stock. The real danger is when ST sinks below X, for then the investment in the call is completely lost. The probability of this is: It is this last factor, the possibility of being totally wiped out, which is the downside of speculating with options. □ This example illustrates both the temptation and the danger associated with speculating via options. To tread a safer path, investors use a variety of combinations of options which reduce the danger at the expense of the amount of possible profit. BULL AND BEAR SPREADS Consider an investor who is confident that a certain stock will increase significantly in value over a time period T (i.e., the investor is “bullish” about the stock). She can speculate accordingly by buying a call in that stock, say with expiry time T and exercise price X1 at a price of C1. To reduce her investment/risk she can also sell a call on the same stock with the same expiry time T and an exercise price X2 > X1. It follows that the premium C2 earned on the shorted call satisfies C2 < C1. This combination of calls is called a bull spread and has the following profit pattern (as a function of the final spot price ST ): The profit from a bull spread is given by Profit = Compared to buying a single call, a bull spread gives up the lure of unbounded profit in favour of reducing the possible loss. Example 7.10.2 Let us return to the situation of the previous example. Suppose our analyst constructs a bull spread in which the long call has X1 = 1320 and C1 = 96, and the short call has X2 = 1440 and C2 = 23. Both calls expire on September 29. Then the profit function is Profit = The maximum profit has been capped at 47, and the possible loss is now about 75% of what it would be if the investor just went long in the first call. □ Exercise 7.10.3 Why is a bull spread better than simply investing in fewer calls? Exercise 7.10.4 Show how to construct a bull spread by using puts. An investor who is confident that a certain stock will decrease significantly in value over a time period T (i.e., the investor is “bearish” about the stock), can speculate accordingly by buying a put in that stock. Suppose it has expiry time T, exercise price X1 and a price of P1. To reduce his investment/risk he can also sell a put on the same stock with the same expiry time T and an exercise price X2 < X1. It follows that the premium P2 earned on the shorted put satisfies P2 < P1. This combination of puts is called a bear spread and has the following profit pattern (as a function of the final spot price ST ): Exercise 7.10.5 Show how to construct a bear spread by using calls. BUTTERFLIES Butterflies are combinations used by investors who expect a certain amount of variation in the spot price but are not sure about the direction in which it will occur. Butterflies come in two flavours depending on whether the expected variations are small or large. Consider an investor who expects the future spot price to lie in a certain small range. He can speculate by creating a combination of options whose profit profile is as follows: Thus, he profits if the spot price stays in the expected range, and has limited loss if it does not. This kind of profile can be created via the following steps: 1. Buy two call options with exercise prices X1 and X3 respectively, choosing X3 > X1. 2. Sell two call options with exercise price X2, choosing X2 halfway between X1 and X3. If C1 , C2,C3 are the corresponding call premia, we have C3 < C2 < C1. Exercise 7.10.6 Show how to construct a butterfly by using puts. A reverse butterfly is used when a large fluctuation is expected in the spot price, but its direction is not known. It can be obtained by shorting a butterfly. Its profit profile is: STRADDLES The strategies considered so far used only calls or only puts. Now we consider combinations of puts and calls. Straddles, like butterflies, are used to speculate on the expected volatility of the spot price. However they raise both the possible profit as well as loss. A bottom straddle or straddle purchase is used when the investor wants to bet on a large move in the spot price. It consists of buying a call and put with the same expiry date and exercise price. A top straddle or straddle write is used to bet on a small movement in the spot price. It is constructed by shorting a bottom straddle. Therefore its profile is: This is a rather dangerous strategy, as it has the potential of unlimited loss! STRANGLES Strangles are like straddles, except that the put and call have different exercise prices. Therefore the profit profile has a flat portion in the centre: A strangle would be used by an investor expecting a large jump in the spot price. COLLAR A collar has a similar profile to a bull spread. It is used by investors who want to protect the gains made by a stock that they own. For example, if you own a stock that has done well recently and you wish to sell it after a month, you can use a collar as protection against a subsequent drop in its price. A collar consists of the following pieces: 1. The held stock, with initial price S. 2. A long put expiring at T, with exercise price XP. 3. A short call expiring at T with exercise price XC, such that XC > S > XP. Suppose the put has initial premium P and the call has initial premium C. Then there is an initial expense of P – C in setting up the collar (and it is possible that this is a gain rather than an expense). The final value of this portfolio has the following structure as a function of ST: Exercise 7.10.7 Explain how to create combinations of options with the following profit profiles. 13 Recall that GBM is a good model for stock which does not pay dividends. 8 Value at Risk O ver the previous chapters we have developed an extensive range of techniques for estimating and handling fluctuations. These techniques have mainly been limited to small fluctuations that we may encounter on a typical day. On the other hand, a firm needs to have an idea of the losses it may face under sudden and extreme developments. These may be broken up into two cases. The first is when there is a sudden breakdown of normality, such as the defaulting on loans by a large country, a war leading to a breakdown in the supply of essential commodities, or a natural calamity. In the second case, the normal relationships and trends continue to hold, but chance leads to a clustering of adverse events for the firm. In this chapter we shall study a tool known as Value at Risk, or VaR, which has become popular for estimating the possible losses from developments of the second type. (For the first type, there is little meaningful mathematical analysis.) These estimates are used to judge the quality of a portfolio and to modify it accordingly. The concept of VaR provides a single descriptor of the risk of loss associated with a portfolio. While the Greeks give local descriptions of risk arising from various fluctuations, VaR attempts to give a global summary by measuring the possible loss under extreme circumstances. (We emphasise, again, that it does not cover the most extreme.) The popularisation of VaR began in 1994 with the publications of the RiskMetrics Group of J. P. Morgan (for example, the 1996 RiskMetricsTM – Technical Document [10]). It is now popular both for internal risk management by companies as well as a tool used by regulatory authorities. Its global acceptance in this regard is encapsulated in the Basel II Accord of 2004, which provided a framework, based on VaR, to regulate how much capital a bank should set aside for use on a rainy day. 8.1 DEFINITION OF VAR Typically, the VaR of a portfolio is given as a statement of the following form: The n-day VaR at P% is V or The n-day P% VaR is V . This means that the loss over the next n days will be less than V with probability P%. P is usually of the order of 95 to 99.14 Note that V is the magnitude of the possible loss and is given as a positive number. It is also assumed that the composition of the portfolio does not change during this time. The probability P% gives an idea of how often losses beyond the VaR limit can occur. For example, if we calculate the one-day 95% VaR, we can expect losses to exceed it once in 20 days. At the 99% level we expect this to happen about once in 100 days. On the other hand, we do not learn much about how large the losses will be on these bad days. Anything could happen. Figure 8.1: An illustration of the 95% VaR for a portfolio whose return is normally distributed. The shaded region is the 5% quantile. Example 8.1.1 Consider an asset whose return R over the next month is estimated to be normally distributed with mean 100 and standard deviation 150. Then, Z= is standard normal. Suppose we wish to find the one-month 95% VaR. Let us denote it by V . It is defined by 0.95 = ℙ[R ≥ –V ] = ℙ . Therefore, = Φ(0.05) = –1.645 V = 147. In the next example we use VaR to study the different risk characteristics of shares and calls. □ Example 8.1.2 We take up the scenario of TCS stock and calls of Example 7.10.1. In this example, the initial price of a TCS share was Rs 1405 and its 1-month return R was projected to be a normal variable with mean 35 and standard deviation 91. The 1-month VaR for this investment can be calculated, as in the previous example, at various levels. Now, let us consider the alternate strategy of buying a call. In the example, a 1-month call on TCS stock was available with exercise price X = 1350 at a premium C = 75. The probability of a complete loss of the investment was calculated to be ℙ[ST ≤ X – C = 16%, so that the 84% VaR is 75. Thus VaR shows a danger of full loss of the investment at all the usual levels of 90% and above. □ Estimating the VaR was easy in this example because we were looking at the call value at expiry, which has a simple relation with ST. It would be rather more complicated to estimate VaR over a shorter period such as 10 days, as we would then have to track the change in the call premium. To do this theoretically, we could use a model such as Black–Scholes, but we would still face the obstacle of having to find the probability distribution of the call premium. We start by noting that this is a non-trivial problem. First of all, the return from the call will not be normal due to the non-linear relation between the call premium and the spot price. Another difficulty is in dealing with large portfolios. The VaR of the portfolio cannot be obtained from the VaR of its parts but has to be calculated in one go for the full portfolio. This entails modelling the probability distribution for the full portfolio. Below, we give three approaches to this problem. In the first, we use a linear approximation to the portfolio value so that it becomes normally distributed. While this simplifies the mathematics, it introduces potentially large errors. In the second, we use a quadratic approximation, thus taking into account the non-linearity to some extent. This already makes it hard to obtain an explicit probability distribution. Finally, we sidestep this problem by using Monte Carlo simulation. Exercise 8.1.3 Consider a portfolio consisting of Rs 100,000 investments in each of assets A and B. Assume the daily volatilities of A and B are 1% and the coefficient of correlation between their returns is 0.3. Suppose further that the two daily returns follow a bivariate normal distribution. What is the 2-day 99% VaR for this portfolio? 8.2 LINEAR MODEL Consider a portfolio P consisting of ni assets of type i. If a unit of the ith asset has value Vi, then the value of the portfolio is Suppose Vi depends on the value Si of some underlying variable (for example, the ith asset is an option and Si is the spot price of the underlying stock, or the ith asset is a bond and Si is the yield). The basic question is: If each Si changes by an amount △Si, what is the corresponding change △Vi in Vi? If we can answer this question, we can also find the change in the value of the entire portfolio: Now, to first order, we have △Vi ≐ △Si. Note that ∂Vi ⁄ ∂Si can be obtained from either a model such as Black–Scholes or estimated from historical data. We thus have our linear model: For VaR calculations, we have to find the probability distribution of △ VP. A reasonable assumption is that each △ Si is normal. If we further assume that the collection of all the △Si’s has a multivariate normal distribution, then △VP, being their linear combination, is also normal. From the historical data, we can estimate the means, variances, and covariances of the △ Si’s, and this will completely specify the distribution of △VP. Example 8.2.1 Consider the TCS situation of Example 8.1.2. Recall that on 31 Aug, 2005 the closing price for TCS stock was 1405. Its annualised return for the next month was projected to be 30% and its implied volatility was σ = 0.224. The risk free rate was assumed to be 5%. Moreover, a call option on this stock with exercise price 1350 and expiry on 29, September 2005 was priced at 75. Suppose the analyst needs to estimate the 95% VaR for this call over the next 10 days. First, consider the stock. The expected rate of return over the next 10 days is 30 × 10⁄365 = 0.82%. Therefore, the expected change in the stock price after t = 10 days is E[△S] = 1405 × 0.0082 = 11. The standard deviation of △S is approximated by S0 σ = 1405 × 0.224 × = 52. Therefore △S can be taken to be normally distributed with mean 11 and standard deviation 52. Next, we calculate the delta of the call: w = δC = = 0.71 Φ(w) = 0.76 Hence, the change in the call premium over 10 days is approximated as △C ≈ δC△S = 0.76△S So △C is approximately normally distributed with mean 0.76 × 11 = 8.36 and standard deviation 0.76 × 52 = 39.52. The 95% VaR V for the call can now be estimated: Exercise 8.2.2 Suppose the daily change in the value of a □ portfolio is modelled as depending linearly on two uncorrelated and normally distributed factors. The delta of the portfolio is 6 with respect to the first factor and –4 with respect to the second. The standard deviations of the factors are 20 and 8, respectively, and the means are 0. What is the 1-day 95% VaR? Figure 8.2: Comparison of the probability density functions of Z (solid curve) and Z 2 (dashed), where Z is a standard normal variable. The pdf of Z 2 is fZ 2 (x) = . 8.3 QUADRATIC MODEL The linear model has been found to work well when the portfolio includes stocks, bonds and futures, but not when options are also present. In this case, we can try to improve the quality of our approximations by making them quadratic. The second order approximation for changes in Vi is △Vi ≐ △Si + (△Si)2. The second derivative can be estimated from a model or from historical data. We now get the full quadratic model: The difficulty in dealing with this model is that even when △ Si is normal, ( △ Si)2 is not—see Figure 8.2. When there is only one underlying asset, we can use the results of Exercises B.3.1–B.3.3 to model the distribution of the portfolio. Example 8.3.1 We again take up the call of Example 8.2.1, with exercise price 1350 and price 75. We have calculated w = 0.71 and δC = 0.76 for this call. We try to refine our earlier work in two ways: We use gamma for a second order approximation to the dependence on the spot price, and theta to track the change in value due to the passing of time. Figure 8.3: The pdf of the quadratic approximation to the portfolio return in Example 8.3.1. Notice how the quadratic approximation enables the modelling of an asymmetric distribution. γC = ΘC = 2⁄2 e–w – = 0.0034 2⁄2 –rXe–rTΦ(w e–w –σ ) = – 218.8 The new approximation to the call premium change △C is △C ≈ δC △S + γC (△S)2 + ΘC △t = 0.76△S + 0.0017(△S)2 – 5.99 = 0.0017(447△S + (△S)2) – 5.99 = 0.0017((△S + 223.5)2 – 223.52) – 5.99 = 0.0017(△S + 223.5)2 – 90.91. Combining △ S ~ N(11,52) with the results of Exercises B.3.1 to B.3.3, we get the approximate cdf of △C: F△C(x) ≈ F(△S+223.5)2 = F△S+223.5 - F△S+223.5 - =Φ -Φ . With this formula in hand, it is not hard to locate the 95% VaR: it is 53.17. □ When there are several underlying assets with various correlations, the distribution of the full portfolio is much harder to describe. Rather than try, we will present a numerical technique that uses random numbers to simulate this distribution and estimate the probabilities related to it. 8.4 MONTE CARLO SIMULATION In this section we shall use the technique of Monte Carlo simulation, as described in the Appendix (§B.20). Example 8.4.1 Suppose the quadratic model for a portfolio takes the form △VP ≐ △S1 + △S2 + △S3 + 2(△S1)2 – 10(△S 2)2, where △S1 ~ N(–10,0.5), △S2 ~ N(5,1) and △S3 ~ N(0,3). Let the correlation matrix be . Using the procedure described in §B.20, we can simulate standard normal variables Zi having this correlation matrix. We then set △S1 = –10 + 0.5Z1, △S2 = 5 + Z2 and △S3 = 3Z3, noting that linear changes don’t affect correlations. For each simulated set of values of the Zi, we first calculate the corresponding value of the △Si, and then the value of △VP. Figure 8.4 shows the histogram for a set of 104 values of △ VP obtained this way. The bottom 5% of the data set is marked by – 259.66. Thus our estimate for the 95% VaR of this portfolio is 259.66. We should wonder about the reliability of this estimate—if we run the simulation again, will we see a similar value? Figure 8.5 shows the distribution of 100 values of the 95% VaR obtained by repeating this process. The range is relatively narrow, from 255 to 269, which is reassuring. We can feel confident that the mean of 262 is a reliable estimate of the VaR. Figure 8.4: Histogram showing the frequencies for the simulated data of Example 8.4.1. The dark region marks the 95% VaR. If we have models for the values of all the assets in the portfolio,□ we need not use the linear or quadratic approximations. Instead, we can use the models to exactly calculate the change △VP arising from any set of changes △ Si. This is called full valuation and while it does away with one set of approximations, it considerably increases the numerical work. We illustrate it for the case of a single call. Example 8.4.2 We pay a final visit to the situation of the call on TCS stock (Examples 8.1.2 and 8.3.1). We have estimated the 10-day return from one share to be normal with mean 11 and standard deviation 52. Since the initial share price was 1405, the final share price has a N(1416,52) distribution. We can simulate the final price S10 and for each value s that we obtain, we calculate the new call premium c using the Black-Scholes Formula. In our case, we have Figure 8.5: Histogram of 100 simulated values of the 95% VaR and the new call premium c = sΦ(w) – Xe– r(T– t)Φ(w – σ = sΦ(w) – 1349.65 Φ(w – 0.051). ) Figure 8.6 shows 105 simulated values of the return from the call over 10 days. The 95% VaR from this simulation is 56.21. Figure 8.6: Histogram of simulated returns from TCS call. The dashed and solid curves represent the linear and quadratic approximations calculated in Examples 8.1.2 and 8.3.1, respectively. TESTING VAR PERFORMANCE □ It should be obvious by now that VaR may be estimated in a number of ways. Indeed, we have only touched on the simplest situations and procedures. It becomes important, therefore, to also have a way of testing the results from a particular VaR model. This is done by backtesting: we apply the VaR model to historical data and see how well it stands up. For example, a 95% VaR should not be violated much more than 5% of the time. We could also look at various other aspects of the violations and see if they are in accord with our model: Do they appear to be independent of each other? Is their average what we would expect? 8.5 THE MARTINGALE Martingales are random processes which evolve with time, with the property that the current value is the expectation of the future values (given the present value). We will not define them more formally, but we can get an example by starting with a GBM St with drift μ and volatility σ. Recall that if Sa is known, we have E[St+a] = eμtS a. This is not a martingale, but we use it to create one by defining Mt = e–μtS t. Note that knowing the value of Ma is equivalent to knowing that of Sa. So we have: E[Mt+a] = e–μ(t+a)E[S t+a] = e–μ(t+a)eμtS a = Ma This gives an indication of how a process which is not a martingale may be converted into one, and also of how martingales become relevant to finance. However, this section is not about martingales, important though they are. Instead it is about a betting strategy which is called the martingale and is at least a few centuries old. In its simplest form, it concerns a game in which bets are placed on repeated tosses of a coin. If you bet x on the coin showing heads and it shows tails, you lose x. Otherwise you gain x. The strategy is as follows: 1. Always bet on the coin showing heads. 2. If you win on any toss, take your winnings and leave the game. 3. If you lose, double your bet on the next toss. Now suppose you start by betting Rs 1 and the first head appears on the N + 1 toss. Your stake on that throw is 2N and your total gain (after deducting your previous losses) is 2N – = 2N – = 1. Surely a head will eventually turn up – it appears you have a guaranteed profit of Rs 1 from this strategy. Difficulties arise, however, because a long string of tails may lead to losses that drive you out of the game before the first head turns up. So this strategy leads to a curious situation. Almost all the time you will win a small amount. But if you do lose, you will lose very badly indeed. All the risk has been concentrated into one tiny and therefore extremely toxic zone. Let us do a mean–variance analysis of this strategy over 10 tosses. There are two possibilities, win and lose, with the following payoffs and probabilities: Scenario Payoff Lose 1 – 210 Win 1 Probability 1– Therefore, the mean and variance of the payoff are: Thus, in spite of the initial impression of giving sure profits, the martingale is a poor strategy for a risk-averse investor. Nor is it good for risk-preferring investors who only accept risk in exchange for a possibility of large profit. It may be fair to say that it is suitable only for the risk ignorant. What does VaR have to say about the martingale? Over 10 tosses, the probability of a loss is 1⁄210 or just under 0.1%. Thus we would be justified in saying that the 99.9% VaR is zero! In other words, the usual levels of VaR are unable to detect the risks associated with this strategy. This re-emphasises our earlier comment that VaR does not help us reach the most extreme cases. The use of VaR as a regulatory tool therefore becomes problematic. The VaR value is used to decide how much capital a bank should keep safe or deposit with government authorities. This may naturally drive banks towards martingale-like strategies which have low VaR, and thus increase the chance of a single massive catastrophe relative to a sequence of small ones. 14 Sometimes VaR is defined by the discounted loss. Since VaR is usually calculated for small time periods, this has little numerical impact. Appendix A Calculus This chapter contains the calculus that is used in this book. Some of it should be familiar to you, while the latter parts may be new. You do not need to read it all at one go. Refer to it as the need arises. Our treatment of calculus is admittedly superficial and you may want to supplement it with a more detailed text. There are innumerable books that you can consult. Most of them will teach you how to calculate—Apostol [1] will teach you how to think. A.1 ONE VARIABLE CALCULUS DIFFERENTIAL CALCULUS Let us start with a quick review of calculus in one variable. If we have a function f : I → ℝ, where I is an open interval in ℝ, then the derivative or differential of f at a ∈ I is defined by provided the limit exists. The quantity f ′(x) is visualised as the instantaneous rate of change of the value f(a). Practically, if we know f ′(a) and f(a), then we can estimate the value of f at a nearby point a + h by If we draw the graph of f, then f ′(a) provides the slope of the line which is tangent to the graph at (a, f(a)). (See Fig. A.1.) f(a + h) ≈ f(a) + f ′(a)h. Recall that two functions f, g can be added, subtracted, multiplied, divided or composed according to the following rules: Addition: (f + g)(x) = f(x) + g(x). Subtraction: (f – g)(x) = f(x) – g(x). Multiplication: (fg)(x) = f(x)g(x). Division: (x) = (Provided g(x) ≠ 0). Composition: (f ∘ g)(x) = f(g(x)). Figure A.1: The graph of a function f(x) showing the tangent line at a point (a, f(a)). The tangent line has slope f ′ (a). ALGEBRA OF ERIVATIVES For the purposes of this book, one has to be aware of the following basic rules involving derivatives. Let f,g be two functions. Assuming the existence of their derivatives at the point of concern, we have: Linearity: If a,b are two numbers, then (af + bg) ′ (x) = af ′ (x) + bg ′ (x). Product rule: (fg) ′ (x) = f ′ (x)g(x) + f(x)g ′ (x). Quotient rule: If g(x) ≠ 0, ′(x) = . Chain rule: (f ∘ g) ′ (x) = f ′ (g(x))g ′ (x). These rules allow us to differentiate a complicated function by breaking the problem into simpler parts. To complete the method we need a list of the derivatives of the basic functions: Figure A.2: The graph of a function f(x) showing local extrema at x = a,b. There is a local maximum at x = a and a local minimum at x = b. Constants: If f(x) = c for some fixed number c, then f ′ (x) = 0. Monomials: If f(x) = xn, then f ′ (x) = nxn–1. Exponential: If f(x) = ex, then f ′ (x) = ex. Logarithm: If f(x) = ln(x) = loge(x), then f ′ (x) = 1 ⁄ x. An important use of derivatives is in estimating how large or small the values of a given function can be. We say f has a local maximum at a point a if there is an interval I = (a–δ,a + δ) around a such that f(x) ≤ f(a) for each x ∈ I. There is a local minimum if f(x) ≥ f(a) for each x ∈ I. And we say there is a local extremum if there is either a local maximum or a local minimum. These situations are depicted in Fig. A.2. The figure serves to remind us that at a local extremum the tangent line is horizontal, i.e., the derivative is zero. To detect points of local extremum, we use the first derivative test: f ′ (a) = 0. Having found such an a, we use the second derivative test to explore the specific nature of the point: If f ′′ (a) < 0 it is a local maximum, and if f ′′ (a) > 0 it is a local minimum. Unfortunately, if f ′′ (a) = 0, we get no information and in fact the point may fail to be a local extremum at all! For example, the functions x3, x4 and –x4 all have zero first and second derivatives at x = 0. For x3, x = 0 is not even a local extremum, while for x4 it is a local minimum and for –x4 it is a local maximum. (See Fig A.3). FIRST AND SECOND ORDER APPROXIMATIONS The derivative of f at a gives a linear approximation to it, which is accurate when we are close to a: f(x) ≐ f(a) + f ′ (a)(x – a). The symbol ≐ signifies that the left-hand side is a first order approximation to f near a: it has the same value and first derivative as f at a. By matching higher order derivatives, we get higher order approximations. To get a second order approximation to f at a, we start with a quadratic function g(x) = A + B(x – a) + C(x – a)2 and match the values of f,g as well as their first two derivatives at a: Figure A.3: The graphs of the functions x3 (solid), x4 (dotted) and –x4 (dashed) Therefore the second order approximation to f at a is f(x) ≐ f(a) + f′(a)(x – a) + (x –a)2. Example A.1.1 Consider f(x) = ex. The first order approximation to f at 0 is 1 + x, while the second order approximation is 1 + x + x2⁄2. □ Higher order approximations can be similarly obtained but are not used in this book. Exercise A.1.2. Let f(x) ≐ A + B(x – a) and g(x) ≐ C + D(x – a). Show that The following result is fundamental: Mean Value Theorem Let f : [a,b] → ℝ be continuous, and let it be differentiable on (a,b). Then there is a c ∈ (a,b) such that f′(c) = . Suppose we know that a function f has zero derivative everywhere on an interval. Then the Mean Value Theorem informs us that that f must have a constant value on that interval. INTEGRAL CALCULUS If functions f,g are related by f ′ (x) = g(x) at every x we call f the antiderivative or indefinite integral of g and denote the relationship by Note that the indefinite integral of a function g(x) is not unique: If f ′ (x) = g(x), then for any constant c we also have (f + c) ′ (x) = g(x). A very important fact is that this is the full extent of nonuniqueness, and if f,h are both indefinite integrals of g then there must be a constant c such that h(x) = f(x) + c for each x. For f ′ = h ′= g implies (f – h) ′= 0 and then it follows from the Mean Value Theorem that f – h = c. Example A.1.3 Here is a useful application of the last fact. Suppose we know that a function f : ℝ → ℝ has the following property: f ′ (x) = q f(x) for every x and with a constant q. Then it follows that (f(x)e–qx) = (x) e–qx + f(x) = q f(x) e–qx – q f(x) e–qx = 0. Hence there is a constant A such that f(x) e–qx = A, or f(x) = A eqx for every x. Substituting x = 0, we find that f(0) = A, and so f has to be of the form f(x) = f(0) eqx. □ Now suppose . Then we define the definite integral of f over an interval [a,b] by Although g is not unique, the definite integral is unique due to the fact that indefinite integrals can only differ by constants. If the number b is replaced by a variable x, the definite integral becomes a function of x: Since F(x) = g(x) – g(a), it is immediate that F′(x) = g′(x) = f(x). Figure A.4: The area of the shaded region under the graph of f is given by When f ≥ 0, the Fundamental Theorem of Calculus tells us that equals the area under the graph of f over the interval [a,b]. (See Figure A.4.) For evaluating a definite integral, the following properties are useful: Linearity: If f,g are two functions and c,d two numbers, then Splitting: If a ≤ b ≤ c, then Change of Variables: If f,g are two functions and g is monotonic, then The term monotonic indicates that the function g(x) either always increases in value (g is monotonically increasing) or always decreases in value (g is monotonically decreasing) as the value of x increases. For example, ex is a monotonically increasing function, while –x is a monotonically decreasing function. On the other hand, x2 is not monotone. IMPROPER INTEGRALS We have defined definite integrals over an interval [a,b] and stated that these represent the area under a curve. In some problems the curve extends over the whole real line and we are interested in the entire area under it. Naturally, we represent this area by but how is this defined? Well, we first define the integral from some point a to ∞ as a limit of ones we already know how to calculate: Similarly, we define the integral from – ∞ to a: Now we put the pieces together: Two limits are involved in this definition and if either of them does not exist, we say that diverges or does not exist. Integrals where both f and the range of integration are bounded are called proper while those where either is unbounded are called improper. Improper integrals are very important in probability, as most calculations of expectation and variance are based on them. Example A.1.4 Consider the exponential function f(x) = e–x. Then On the other hand, does not exist. Thus both diverge. □ Exercise A.1.5 Let f be a continuous function such that Show that for any fixed number z, Note that we are not assuming that exists! A.2 PARTIAL DERIVATIVES Consider a function f : U → ℝ where U is a subset of ℝ2. Such a function is described by an expression of the form f(x,y), where the pair (x,y) ranges over all of U. The basic approach to studying how the values of f change with x and y is to fix one of these variables and vary the other: this reduces the problem to the one-variable situation and we already have techniques for studying that. Thus, we define two partial derivatives. In one of them we vary x and fix y: this is called the partial derivative with respect to x. Similarly, the partial derivative with respect to y is obtained by fixing x and varying y. The formal definitions are: Alternate popular notation is fx = and fy = . Partial derivatives are easy to calculate. To find ∂f ⁄ ∂x, just treat y as a constant and differentiate with respect to x in the usual manner. For example, More generally, f may be a function of n variables x1,…,xn, and we define the partial derivatives of f with respect to each of these n variables. Let ei denote the vector with all entries 0 except for a 1 in the ith position. The partial derivative of f at a = (a1,…,an) with respect to the variable xi is defined by The partial derivatives of f are collected in a single vector called the gradient of f and denoted by ∇f : ∇f(a) = . A.3 LAGRANGE MULTIPLIERS We shall consider the problem of finding the extreme values of a function f : U → ℝ, where U is a subset of ℝn, subject to a constraint g(x) = c, where g : U → ℝn and c is a constant. This means that we are interested in finding the maximum or minimum value of f(x), with x varying over the set G = g–1(c) = {x : g(x) = c}. The Lagrange multipliers method applies to this situation. We shall not motivate the method as that would involve developing a good amount of multi-variable calculus, but shall describe it carefully. Given f and the constraint g(x) = c we introduce a new variable λ (called the Lagrange multiplier) and set up the equation ∇ ( f + λg) = 0. This is an equation involving vectors and leads to n equations involving scalars: Together with the original constraint g(x1,…,xn) = c, this gives n + 1 equations for the n + 1 variables λ,x1,…,xn. If we succeed in solving this system, we have a candidate for p = (x1,…,xn) (we also have λ, but that is superfluous knowledge). In general, there would be multiple solutions and by evaluating f at each of these we would find the extreme values. We shall also have occasion to consider situations where there are two constraints instead of one. Let these be g(x) = c and h(x) = d. Then we introduce two Lagrange multipliers λ and μ, and set up the equation ∇( f + λg + μh) = 0. On expanding in coordinates, this again gives n equations, and by adding the two constraint equations we get a total of n + 2 equations for the n + 2 variables λ, μ, x1,…,xn. A.4 DIFFERENTIATING UNDER THE INTEGRAL SIGN We have noted that differentiation is linear: Since integration is viewed as a kind of continuous sum, it is natural to ask whether the following relationship also holds: In fact it does not always hold. The next theorem describes a broad class of situations when it holds. Theorem A.4.1 Let f : [a,b] × [c,d] → ℝ be a function such that: 1. For each is defined. 2. (x,y) is defined and continuous on [a,b] × [c,d]. Then given by is differentiable on [c,d], and its derivative is We use this result in Chapter 7 to derive the Black-Scholes PDE for□ European options. A.5 DOUBLE INTEGRALS A function f : U → ℝ, where U is a subset of the Cartesian plane ℝ2, can be integrated over the entire region U by a process known as double integration. Double integration is used in probability to express the relationship between two different random variables. Figure A.5: Choice of order of integration in double integration. Over the same region, the first diagram corresponds to and the second to Look at the shaded region in the two diagrams in Fig. A.5. Suppose it is named U. In the first diagram, we have marked its boundary as made of two curves: the lower part is the graph of y = g(x) and the upper part is the graph of y = h(x). In both cases, x varies from a to b. If we look at any of the vertical line segments filling out U, then on each of these segments, x is fixed while y varies from g(x) to h(x). So we can carry out the integral and this represents the integral of f over the vertical line segment located by x. The function I(x) is defined for x ∈ [a,b] and so we can integrate it too: The last expression represents the double integral of f over U. We could have used horizontal line segments instead of vertical ones. This approach is depicted by the second diagram in Fig. A.5 and leads to the expression It is possible that the two choices give different results and in that case we would not have a well-defined double integral. However, if f is continuous then it is guaranteed that they will give the same result, and we can use the one which is more convenient. In this case, we may also write for the double integral, indicating the region of integration and the integrand but not the choice of order of integration. CHANGE OF VARIABLES Consider a double integral Suppose we wish to replace x,y by new variables u,v (probably to change the integrand to a more tractable form), which are related to them by x = x(u, v), y = y(u, v), The Jacobian of this change of variables is defined to be the following determinant: J(u,v) = det . The change of variables formula is where V consists of the (u,v) values corresponding to the (x,y) values in U. Example A.5.1 Let us carry out a change of variables to polar coordinates. The Cartesian coordinates (x,y) and polar coordinates (r,θ) of the same point are related by: x = r cos θ x = r sin θ The Jacobian of the change from Cartesian to polar coordinates is = ⋅ - ⋅ = r cos2θ + r sin2θ = r. Now consider the double integral of a function f over the disk with center at origin and radius R. In polar coordinates, this corresponds to a rectangle with the r side going from 0 to R and the θ side going from 0 to 2π. Therefore the double integral can be expressed as Note that r is already positive, so in the change of variables we have |r| = r. IMPROPER DOUBLE INTEGRALS □ Improper double integrals are defined in the same way as improper integrals: as limits of proper ones. Example A.5.2 We shall use improper double integrals to compute an improper integral which is very important in probability: . This is done by a rather clever trick. First we write The last expression involves a double integral over the entire xyplane. We rewrite it in terms of polar coordinates: x = r cosθ, y = r sinθ and J(r,θ) = r. Thus we get On taking square roots, we get: □ Appendix B Probability and Statistics This appendix on probability aims at giving you a compact description of those aspects of the subject which are most necessary for finance. The presentation emphasises the intuition or motive behind the definitions and results, rather than the formal details. To fill in these details you could consult Ross [41], Freund [23], or Ashenfelter, Levine and Zimmerman [3]. B.1 BASIC PROBABILITY The sample space is the collection of objects whose statistical properties are to be studied. Each such object is called an outcome, and a collection of outcomes is called an event. The act of selecting a particular outcome is called a random experiment or just experiment. Mathematically, the sample space is a set, an outcome is a member of this set, and an event is a subset of the sample space. Typical notation is to use not]S@S S for the sample space, lower case letters such as x for individual outcomes, and capital letters such as E for events. Example B.1.1 Suppose we are interested in those stocks listed on the National Stock Exchange of India (NSE) that have shown a return of at least 20% over the past year. Then the sample space is S = The collection of stocks listed on the NSE. Each such stock constitutes an outcome. We are interested in the outcomes constituting the following event: E = {S ∈ S : The return on S is at least 20% over the past year}. In a particular context, we would be interested in some specific □q numerical property of the outcomes. In the above example it was the return over the past year. This property allots a number to each outcome, and so it can be viewed as a function whose domain is the sample space S and range is in the real numbers ℝ. Therefore, we introduce the concept of a random variable: it is a function X : S → ℝ. Our interest is in taking a particular random variable X and studying how its values are distributed. What is the average? How much variation is there in its values? Are very large values unlikely enough to be ignored? We present two viewpoints on the meaning of probability. Both are relevant to finance and, interestingly, they lead to the same mathematical definition of probability! Viewpoint 1: The probability of an event should predict the relative frequency of its occurrence. That is, suppose we say the probability of a random stock having increased in value over the last month is 0.6. Then, if we look at 100 different stocks, about 60 of them should have increased in value. The prediction should be more accurate if we look at larger numbers of stocks. Viewpoint 2: The probability of an event reflects our (subjective) opinion about how likely it is to occur, in comparison to other events. Thus, if we allot probability 0.4 to event A and 0.2 to event B, we are expressing the opinion that A is twice as likely as B. Viewpoint 1 is appropriate when we are analyzing the historical data to predict the future. Viewpoint 2 is useful in analyzing how an individual may act when faced with certain information. Both viewpoints are captured by the following mathematical formulation: Let not]Omega@Ω Ω be the collection of all events: we call it the event algebra.15 A probability function is a function not]prob@ℙ ℙ : Ω → [0,1] such that: 1. ℙ(S) = 1. 2. , if the Ei are pairwise disjoint events (i.e., i≠j implies Ei ∩ Ej = ∅). Let ℙ : Ω → [0,1] be a probability function. Then it automatically has the following properties: 1. ℙ(∅) = 0. 2. , if the Ei are pairwise disjoint events. 3. ℙ(Ac) = 1 – ℙ(A), where Ac denotes the complement of A in S. 4. for any collection of events E1,E2,… . B.2 RANDOM VARIABLES We return to our main question: How likely are different values (or ranges of values) of a random variable X : S → ℝ? If we just plug in the definition of a random variable, we realise that our question can be phrased as follows: What is the probability of the event whose outcomes correspond to a given value (or range of values) of X? Thus, suppose we are asked: What is the probability that X takes on a value greater than 100? This is to be interpreted as: What is the probability of the event whose outcomes t all satisfy X(t) > 100? That is, ℙ(X > 100) = ℙ({t : X(t) > 100}). It is convenient to consider two types of random variables, ones whose values vary discretely (discrete random variables) and those whose values vary continuously (continuous random variables).16 Example B.2.1 1. Let us allot +1 to the stocks whose value rose on a given day, and –1 to those whose value fell. Then we have created a random variable whose possible values are ±1. This is a discrete random variable. 2. Let us allot the whole number +n to the stocks whose value rose by between n and n + 1 on a given day, and –n to those whose value fell by between n and n + 1. Then we have created a random variable whose possible values are all the integers. This is also a discrete random variable. 3. If, in the previous example, we let X be the actual change in value, it is still discrete (since all changes are in multiples of Rs 0.01). However, now the values are so close that it is simpler to ignore the discreteness and model X as a continuous random variable. □q DISCRETE RANDOM VARIABLES Let S be the sample space, Ω the event algebra, and ℙ : Ω → [0,1] a probability function. Let X : S → ℝ be a discrete random variable with range x1,x2,… (the range can be finite or infinite). The probability of any particular value x is ℙ(X = x) = ℙ({t ∈ S : X(t) = x}). These values create a function fX : ℝ → [0,1]: fX(x) = ℙ(X = x), which is called the probability distribution function of X. We shall also refer to it by the abbreviation pdf. We can find the probability of a range of values of X by just summing up the probabilities of all the individual values in that range. For instance, In particular, summing over the entire range gives since the total probability must be 1. Example B.2.2 (Discrete Uniform Distribution) Consider an X whose range is {0,1,2,…,n} and each value is equally likely. Then its pdf is given by: □q CONTINUOUS RANDOM VARIABLES Suppose the values of a random variable X vary continuously over some range, such as [0,1]. From a real life viewpoint, since exact measurements of a continuously varying quantity are impossible, it is only reasonable to ask for the probability of an observation lying in a range, such as (0.49,0.51), rather than its having an exact value like 0.5. The notion of a probability distribution of a continuous random variable is developed with this in mind. Recall that in the discrete context the probability of a range of values was obtained by summing over that range. So, in the continuous case, we seek to obtain probability of a range by integrating over it. Given a continuous random variable X, we define its probability density function (or pdf) to be a function not]fX@fX fX : ℝ → [0,∞) such that for any a,b with a ≤ b, In particular, From this definition, it also follows that any individual value of a continuous random variable has zero probability: Remark Thus the number fX(x) does not represent the probability that X = x. Individual values of fX have no significance, only the integrals of fX do! (Contrast this with the discrete case.) Example B.2.3 (Continuous Uniform Distribution) A continuous random variable X is called uniform if its probability density function has the form For then, with a ≤ s ≤ t ≤ b, Thus the probability of X taking a value in the interval [s,t] is proportional to its length, and does not depend on its location. The interval [a,b] is called the range of X. Consider the picture below. Since I1 and I2 have the same length, their probabilities are equal. As I3 has twice their length, its probability is also double of theirs. The graph of fX is: □q If two random variables X,Y have the same pdf, we write not] equald and say that they have the same distribution. Note that this does not mean the two random variables are equal. They may not even have the same sample space. Exercise B.2.4 Let X count the number of heads arising out of a single toss of a fair coin, and Y the number of tails. Then X ≠ Y but . B.3 CUMULATIVE DISTRIBUTION FUNCTION Let S be a sample space, ℙ a probability function on its event algebra, and X : S → ℝ a random variable. Then the cumulative distribution function or cdf of X, denoted not]FX@FX FX, is defined by: FX(x) = ℙ(X ≤ x). Thus FX is a function from ℝ into the interval [0,1]. It is easy to see that where fX is the pdf of X. The cdf has certain advantages over the pdf. For one, it is defined in a uniform manner for all random variables, whether discrete or continuous. It may happen that we wish to analyse whether the values taken by one random variable are likely to be close to those taken by another. If one is discrete and the other continuous, the pdf’s do not provide a convenient way to test this, but the cdf’s do. When X is a continuous random variable, the cdf has the additional advantage of being defined explicitly, while the pdf is defined indirectly by means of a property it should possess. This makes it easier to find a formula for the cdf as compared to the pdf. Often, we first find the cdf, and then exploit the following equation to obtain the pdf (in the continuous case): This relation follows from the Fundamental Theorem of Calculus (page 200). Exercise B.3.1 Let FX be the cdf of a continuous random variable X. Then the cdf of aX + b satisfies Exercise B.3.2 Let fX be the pdf of a continuous random variable X. Then the pdf of aX + b satisfies Exercise B.3.3 Let fX be the pdf of a continuous random variable X. Then the cdf and pdf of X2 satisfy (for x > 0): Example B.3.4 Let X be a discrete random variable taking values 0,1,2, each with probability 1⁄3. Then its cdf is The graph of FX is Example B.3.5 Let X be a continuous random variable whose probability density function is: The cdf of X is The graph of FX is: □q □q Figure B.1: The first diagram shows how to read off quartiles from a cdf plot. The second shows the median and interquartile range marked against the density function. These two examples illustrate the basic properties of the cdf FX: 1. FX is an increasing function: s ≥ t implies FX(s) ≥ FX(t). 2. FX need not be strictly increasing: s > t does not imply FX(s) > FX(t). 3. When X is discrete, FX is not continuous. There are jumps at the points where X has non-zero probability. 4. When X is continuous, FX is continuous. 5. Since FX may have repeating or missing values, we cannot always define its inverse function. However, we can get something close to it. Given a number q ∈ (0,1), the q-quantile of X is defined to be the least value xq such that FX(xq) ≥ q. When X is continuous, so is FX, and then xq is the least value such that FX(xq) = q. The numbers x0.25, x0.5 and x0.75 are called quartiles. The number x0.5 is also called the median and is used to estimate the centre of the values of the distribution. The difference x0.75 – x0.25 is called the interquartile range––it measures the amount of spread of the values of the distribution (see Figure B.1) Exercise B.3.6 Identify the quartiles and interquartile range for the random variable in Example B.3.4. Exercise B.3.7 Identify the quartiles and interquartile range for the random variable in Example B.3.5. The median and interquartile range provide a summary of the basic features of the random variable. Unfortunately, they are not so well suited to the study of combinations of random variables. For example, suppose a random variable is the sum of two random variables whose medians are known. This knowledge does not suffice to give the median of the sum. We shall, therefore, develop more sophisticated measures of the centre and the spread. These will be called expectation and variance, respectively. Now we shall look at two kinds of random variables, one discrete and one continuous, which are especially important. B.4 BINOMIAL RANDOM VARIABLE Consider a random variable X which can take on only two values, say 0 and 1 (the choice of values is not important). Suppose the probability of the value 0 is q and of 1 is p. Then we have: 1. 0 ≤ p, q ≤ 1, 2. p + q = 1 Suppose we observe X n times. What are the likely distributions of 0’s and 1’s? Specifically, we ask: What is the probability of observing 1 k times? We calculate as follows. combinations of n 0’s and 1’s: Recall that not]binom@ the formula Let us consider all possible is read as “n choose k” and is given by , where n! = n(n – 1) … 2 ⋅ 1. The probability of each individual combination with k 1’s = pk(1 – p)n– k. Therefore, We therefore say a random variable Y has a binomial distribution with parameters n and p if it has range 0,1,…,n and its probability distribution is: We call Y a binomial random variable and write not]Bnp@B(n,p)Y ~ B(n,p). When n = 1, we say we have a Bernoulli random variable. As illustrated above, binomial distributions arise naturally wherever we are faced with a sequence of choices. In finance, the binomial distribution is part of the Binomial Tree Model used to study stocks and options. Figure B.2 depicts the pdf’s of the binomial distributions with p = 0.2 and p = 0.5, using different types of squares for each. Both have n = 10. Exercise B.4.1 In Figure B.2, identify which points correspond to p = 0.2 and which to p = 0.5. Figure B.2: Two binomial distributions. Both have n = 10 while the p values are 0.2 and 0.5. B.5 NORMAL RANDOM VARIABLE This kind of probability distribution is at once the most common in nature, among the easiest to work with mathematically, and theoretically at the heart of probability and statistical inference. Among its remarkable properties is that any phenomenon occurring on a large scale tends to be governed by it. When in doubt about the nature of a distribution, assume it is (nearly) normal, and you will usually get good results! We first define the standard normal random variable. This has a probability density function of the form It has the following ‘bell-shaped’ graph: Exercise B.5.1 Can you explain the factor (Hint: Think about the requirement that total probability should be 1.) Note that the graph is symmetric about the y-axis. The axis of symmetry can be moved to another position m, by replacing x by x – m. In the following diagram, the dashed line represents the standard normal distribution: Also, starting from the standard normal distribution, we can create one with a similar shape but bunched more tightly (or loosely) around the y-axis. We achieve this by replacing x with x⁄s: By combining both kinds of changes, we reach the definition of a general normal distribution: A random variable X has a normal distribution with parameters μ,σ, if its probability density function has the form: We call X a normal random variable and write not]Nms@N(μ,σ)X ~ N(μ,σ). The parameter μ can be any real number, while σ has to be positive. The axis of symmetry of this distribution is determined by μ and its clustering about the axis of symmetry is controlled by σ. Exercise B.5.2 Will increasing σ make the graph more tightly bunched around the axis of symmetry? What will it do to the peak height of the graph? Exercise B.3.2 shows that under shifts and scalings, normal variables stay variable. Exercise B.5.3 Let X ~ N(μ,σ) and a,b ∈ ℝ with a≠0. Then aX + b ~ N(aμ + b,|a|σ). Exercise B.5.4 Show that X ~ N(μ,σ) if and only if Through this link, all questions about normal distributions can be converted to questions about the standard normal distribution. For instance, let X,Z be as above. Then: In the empirical sciences, errors in observation tend to be normally distributed: they are clustered around zero, small errors are common, and very large errors are very rare. Regarding this, observe from the graph that by ±3 the density of the standard normal distribution has essentially become zero: in fact the probability of a standard normal variable taking on a value outside [–3,3] is just 0.0027. In theoretical work, the normal distribution is the main tool in determining whether the gap between theoretical predictions and observed reality can be attributed solely to errors in observation. B.6 EXPECTATION AND VARIANCE Suppose we have some data consisting of numbers xi, each occurring fi times. Then the total number of data points is The average of this data is defined to be: . Now, if we have a discrete random variable X, the probability fX(xi) predicts the relative frequency with which xi will occur in a large number of observations of X, i.e., we view fX(xi) as a prediction of . And then, becomes a predictor for the average x of the observations of X. ∙ The expectation of a discrete random variable X is defined to be not]Exp@E[X] On replacing the sum by an integral we arrive at the notion of expectation of a continuous random variable: ∙ The expectation of a continuous random variable is defined to be Expectation is also called mean and is denoted by μX or just μ. Exercise B.6.1 Make the following calculations: 1. X has the discrete uniform distribution with range 0,…,n. Then E[X] = n⁄2. 2. X has the uniform distribution on [0,1]. Then E[X] = 1⁄2. 3. X ~ B(n,p) implies E[X] = np. 4. X ~ N(μ,σ) implies E[X] = μ. Some elementary properties of expectation are: 1. E[c] = c, for any constant c. (A constant c can be viewed as a discrete random variable whose range consists of the single value c.) 2. E[cX] = c E[X], for any constant c. Suppose X : S → ℝ is a random variable and g : ℝ → ℝ is any function. Then their composition g ∘ X : S → ℝ, defined by (g ∘ X)(w) = g(X(w)), is a new random variable which we will call g(X). Example B.6.2 Let g(x) = xr. Then g ∘ X is denoted Xr. □q Suppose X is discrete with range {xi}. Then, the range of g(X) is {g(xi)}. Therefore we can calculate the expectation of g(X) as follows:17 If X is continuous, one has the analogous formula: Example B.6.3 Let g(x) = x2. Then □q With these facts in hand, the following result is easy to prove. Exercise B.6.4 Let X be any random variable, and g,h two real functions. Then E[g(X) + h(X)] = E[g(X)] + E[h(X)]. VARIANCE Given some data , its average x is seen as a central value about which the data is clustered. The significance of the average is greater if the clustering is tight, less otherwise. To measure the tightness of the clustering, we use the variance of the data: Variance is just the average of the squared distance from each data point to the average (of the data). Therefore, in the analogous situation where we have a random variable X, if we wish to know how close to its expectation its values are likely to be, we again define a quantity called the variance of X: not]var@Var[X] Var[X] = E[(X – E[X])2]. Alternate notation for variance is not]sigmasq@σX2 σX2 or just σ2. The quantity not]sigma@σX σX or σ, the (non-negative) square root of the variance, is called the standard deviation of X. Its advantage is that it is in the same units as X. Exercise B.6.5 Note that Var[X] ≥ 0, so σ is defined. When can we have Var[X] = 0? Exercise B.6.6 Will a larger value of variance indicate tighter clustering around the mean? Sometimes, it is convenient to use the following alternative formula for variance: Var[X] = E[X2] – E[X]2. This is obtained as follows. The elementary properties of variance are: 1. Var[X + a] = Var[X], if a is any constant. 2. Var[aX] = a2 Var[X], if a is any constant. Exercise B.6.7 Will it be correct to say that σaX = a σX for any constant a? Exercise B.6.8 Let X be a random variable with expectation μ and standard deviation σ. Then has expectation 0 and standard deviation 1. Exercise B.6.9 Suppose X has the discrete uniform distribution with range {0,…,n}. We have seen that E[X] = n ⁄ 2. Show that its variance is n(n + 2). Exercise B.6.10 Suppose X has the continuous uniform distribution with range [0,1]. We have seen that E[X] = 1 ⁄ 2. Show that its variance is 1 ⁄ 12. Example B.6.11 Suppose X ~ B(n,p). We know E[X] = np. Therefore, And so, □q Example B.6.12 Therefore, Suppose X ~ N(μ,σ). We know E[X] = μ. Now we integrate by parts: □q Therefore Var[X] = σ2 Now we can also illustrate our earlier statement about how the normal distribution serves as a substitute for other distributions. Figure B.3 compares the pdf of a binomial distribution (n = 100 and p = 0.2) with that of a normal distribution with the same mean and variance (μ = np = 20 and σ2 = np(1 – p) = 4). Generally, the normal distribution is a good approximation to the binomial distribution if n is large. One criterion that is often used is that, for a reasonable approximation, we should have both np and n(1 – p) greater than 5. Figure B.3: Normal approximation to a binomial distribution B.7 LOGNORMAL RANDOM VARIABLE If X ~ N(μ,σ) then Y = eX is called a lognormal random variable with parameters μ and σ. The name comes from “The log of Y is normal.” Exercise B.7.1 Let X ~ N(μ,σ). Then The t = 1,2 cases of the Exercise immediately give the following: If Y is a lognormal variable with parameters μ and σ, then Lognormal random variables are used in finance to model the variation of stock prices with time. The fact that they can only take positive values is one factor that makes them suitable for this. Figure B.4 shows the probability density functions of some lognormal random variables. Note that as σ becomes smaller, the shape more closely approximates that of a normal random variable. Figure B.4: Lognormal density functions: (A) μ = 0,σ = 1, (B) μ = 1,σ = 1, (C) μ = 0, σ = 0.4. B.8 CAUCHY RANDOM VARIABLE A Cauchy random variable with parameters δ,γ is given by the probability density function (B.1) where δ and γ are called the location and scale parameters. The location parameter δ can be any real number, while the scale parameter γ has to be positive. Figure B.5: Comparison of the pdf’s of a Cauchy and a normal random variable (the dashed curve). The Cauchy pdf is thinner in the middle and does not die as quickly on the sides. Exercise B.8.1 Verify that Equation B.1 defines a probability density function. Further, show that a Cauchy random variable has no mean or variance. Exercise B.8.2 Let X have a Cauchy distribution with parameters δ,γ. Show that the median of X is δ while the interquartile range is 2γ. (This justifies the names of the parameters.) Exercise B.8.3 Let X have a Cauchy distribution with parameters δ, γ. Show that has a Cauchy distribution with parameters 0 and 1. Figure B.5 compares the probability density functions of a Cauchy and a normal random variable. The Cauchy probability density function is thinner in the middle and does not die as quickly on the sides. These fat or heavy tails make it suitable for modeling phenomena where extreme events are somewhat likely. They are also the reason that its mean and variance do not exist. B.9 BIVARIATE DISTRIBUTIONSS So far we have dealt with individual random variables. The expectation and variance of a random variable give us information on where, and to what extent, the values of the random variable are concentrated. An investor may use these to estimate likely profits from her portfolio as well as the possible fluctuations in this profit. The next step is to study the relationships that exist between different random variables. For instance, our investor might like to know the nature and strength of the connection between her portfolio and a stock market index such as the NIFTY. If the NIFTY goes up, is her portfolio likely to do the same? How much of a rise can she hope for? This leads us to the study of pairs of random variables and the probabilities associated with their joint values. Are high values of one associated with high or low values of the other? Is there a significant connection at all? Let S be a sample space and X,Y : S → ℝ two random variables. Then we say that X,Y are jointly distributed. Let X,Y be jointly distributed discrete random variables. The joint probability distribution function (or joint pdf) not]fXY@fX,Y fX,Y of X and Y is defined by fX,Y (x,y) = ℙ(X = x,Y = y). Since fX,Y is a function of two variables, we call it a bivariate distribution. It can be used to find any probability associated with X and Y . Let X have range {xi } and Y have range {yj}. Then: 1. 2. . 3. . We will be interested in various combinations of X and Y . Therefore, consider a function g : ℝ2 → ℝ. We use it to define a new random variable g(X,Y ) : S → ℝ by g(X,Y )(w) = g(X(w),Y (w)). The expectation of this new random variable can be obtained, as usual, by multiplying its values with their probabilities: We create analogous definitions when X,Y are jointly distributed continuous random variables. Now let X,Y be jointly distributed continuous random variables. Their joint probability density function (or joint pdf) fX,Y is a function of two variables whose integrals give the probability of X,Y lying in any range: Then the following are easy to prove: 1. 2. Suppose X,Y are jointly distributed continuous random variables and we have a function g : ℝ2 → ℝ. Then we define g(X,Y ) as before, and its expectation is given by Suppose X,Y are jointly distributed random variables. Then expectation distributes over their sum: E[X + Y ] = E[X] + E[Y ]. We shall write the proof for the case when X,Y are both discrete. Exercise B.9.1 Show expectation distributes over the sum of two jointly distributed continuous random variables. The behaviour of variance with respect to sums is slightly more complicated. We shall need to introduce the concept of covariance to describe it. Let X,Y be jointly distributed random variables. Then the covariance of X and Y is defined to be E[(X – μX)(Y – μY)]. It is denoted by not]cov@Cov[X,Y ] Cov[X,Y ] or not]sigmaXY@σXY σXY . Suppose large values of X tend to go with large values of Y , and small values of X with small values of Y . Then X – μX and Y – μY will generally have the same sign, hence the product will tend to be positive, and its average – which is covariance – will be positive. On the other hand, if large values of X tend to go with small values of Y , and small values of X with large values of Y , then X – μX and Y – μY will generally have opposite signs. In this case, covariance will be negative. A zero covariance indicates that X and Y do not have a simple relation (if they have any relation at all). (See Figure B.6.) Exercise B.9.2 Cov[X,X] = Var[X]. Exercise B.9.3 Cov[X,Y ] = Cov[Y,X]. Exercise B.9.4 Cov[aX,Y ] = a Cov[X,Y ]. Exercise B.9.5 Cov[X + Y,Z] = Cov[X,Z] + Cov[Y,Z]. Exercise B.9.6 Cov[X,Y ] = E[XY ] – E[X] E[Y ]. Exercise B.9.7 Var[X + Y ] = Var[X] + Var[Y ] + 2 Cov[X,Y ]. Figure B.6: Observed values of two jointly distributed standard normal variables with covariances 0.95, – 0.6 and 0 respectively Compare the last exercise with the identity connecting the dot product and length of vectors: This motivates us to think of covariance as a kind of dot product between different random variables, with variance as squared length. Geometric analogies then lead to useful statistical insights, such as the next statement and its proof. A very important fact is that | Cov[X,Y ]|≤ σXσY . This is an analogue of the geometric fact that |u⋅v|≤||u|| ||v||. We first verify this when Var[X] = Var[Y ] = 1: If X, Y are arbitrary, then X ⁄ σX and Y ⁄ σY have variance 1, and so 1≥ = |Cov[X,Y ]| implies |Cov[X,Y ]|≤ σXσY . The correlation coefficient of X,Y is defined to be not]rho@ρ ρ = ρX,Y = . The inequality |Cov[X,Y ]|≤ σXσY immediately implies: –1 ≤ ρX,Y ≤ 1. The advantage of the correlation coefficient is that it is not affected by the units used in the measurement: Exercise B.9.8 If we replace X by X ′ = aX + b and Y by Y ′ = cY + d, where a and c have the same sign, we will have ρX ′,Y ′ = ρX,Y . Exercise B.9.9. Suppose a die is tossed once. Let X take on the value 1 if the result is ≤ 4 and the value 0 otherwise. Similarly, let Y take on the value 1 if the result is even and the value 0 otherwise. 1. Show that the values of fX,Y are given by: 2. Show that μX = 2 ⁄ 3 and μY = 1⁄2. 3. Show that Cov[X,Y ] = 0. B.10 CONDITIONAL PROBABILITY Consider a sample space S with event algebra Ω and a probability function ℙ : Ω → [0,1]. Let A,B ⊂ Ω be events. If we know B has occurred, what is the probability that A has also occurred? We reason that since we know B has occurred, in effect B has become our sample space. Therefore all probabilities of events inside B should be multiplied by 1 ⁄ ℙ(B), to keep the total probability at 1. As for the occurrence of A, the points outside B are irrelevant, so our answer should be ℙ(A ∩ B) times the correcting factor 1 ⁄ ℙ(B). The conditional probability of A, given B, is therefore defined to be ℙ(A|B) = . Note that this requires ℙ(B) > 0. We now apply this idea to random variables. Let X,Y be jointly distributed discrete random variables. Then we have ℙ(Y = y|X = x) = = . Hence, the conditional probability distribution (or conditional pdf) of Y , given X = x, is defined to be not]fYXx@fY ∥X=x fY |X=x(y) = . This definition also serves to define the conditional probability density function (conditional pdf) when X,Y are continuous. The conditional pdf is a valid pdf in its own right. For example, in the discrete case, we have 1. 0 ≤ fY |X=x(y) ≤ 1, 2. Since fY |X=x is a pdf, we can use it to define expectations. The conditional expectation of Y , given X = x, is defined to be E[Y |X = x] = Note that E[Y |X = x] is a function of x. It is also denoted by μY |X=x or μY |x, and its graph is called the regression curve or curve of regression of Y on X. Example B.10.1 In finance, it is often reasonable to model the rates of return of certain assets as being normally distributed. The multivariate normal distribution provides a way to model multiple normally distributed variables with desired correlations. For the time being, we shall only describe a special case of this distribution. Namely, we take two jointly distributed standard normal variables, and assume their joint pdf to be given by fX,Y (x,y) = , where –1 < ρ < 1. X and Y are then called a bivariate normal pair. We leave the following for you to verify. Exercise B.10.2 Verify that: 1. The function fX,Y is a valid joint pdf. 2. X and Y are indeed standard normal. 3. The parameter ρ in the definition is equal to the correlation coefficient of X and Y . Figure B.7: On the left is a plot of the joint pdf of a bivariate normal pair of variables X,Y with ρ = 0.7. On the right is a contour plot of the same pdf together with the line of regression y = ρx of Y on X. The dashed vertical line shows that for a fixed value X = x, the line of regression gives the mean of Y . The conditional probability distribution of Y , given X = x, is therefore defined by: This is the pdf of a normal variable with mean ρx and variance 1 – ρ2. Hence the conditional expectation is E[Y |X = x] = ρ x. The curve of regression of Y on X is the straight line y = ρ x. q □ Exercise B.10.3 In the above example, what is the curve of regression of X on Y ? Is it the same as the curve of regression of Y on X? Figure B.8: Regression curve for a bivariate normal pair of variables with ρ = 0.7, plotted against a large number of observations of their paired values. Note how, for any small range of x, the number of observations above and below the line are approximately equal. The function E[Y |X = x] creates a new random variable E[Y |X] as follows: For each outcome w of the sample space, we first evaluate x = X(w), and this defines a number E[Y |X = x] which depends on w. This is expressed by the following equation: E[Y |X](w) = E[Y |X = X(w)]. Exercise B.10.4 Consider the bivariate normal pair of variables from Example B.10.1 Show that E[Y |X] = ρX. Below, we calculate the expectation of this new random variable in the continuous case. Similar calculations can be carried out in the discrete case, so that we have the general result: E[Y ] = E[E[Y |X]]. This result is useful when we deal with experiments carried out in stages, and we have information on how the results of one stage depend on those of the previous ones. B.11 INDEPENDENCE Let X,Y be jointly distributed random variables. We consider Y to be independent of X, if knowledge of the value taken by X tells us nothing about the value taken by Y . Mathematically, this means: fY |X=x(y) = fY (y). This is easily rearranged to: fX,Y (x,y) = fX(x)fY (y) (B.2) Therefore we formally define two jointly distributed random variables X,Y to be independent if they satisfy the identity (B.2). Note that the definition is symmetric in X,Y . Exercise B.11.1 If X,Y are independent random variables and g : ℝ2 → ℝ is any function of the form g(x,y) = m(x)n(y), then E[g(X,Y )] = E[m(X)]E[n(Y )]. Exercise B.11.2 If X,Y are independent, then Cov[X,Y ] = 0. A common error is to think zero covariance implies independence–in fact it only indicates the possibility of independence. Exercise B.11.3 If X,Y are independent, then Var[X + Y ] = Var[X] + Var[Y ]. We return again to the normal distribution. The following fact about it is key to its wide usability: If Xi ~ N(μi,σi) with i = 1,2 are independent, then Of course, the mean and variance would add up like this for any collection of independent random variables. The important feature here is the preservation of normality. We shall use the cdf to demonstrate this. First, let X,Y be any two jointly distributed random variables. Then the cdf of X + Y can be expressed as follows: We differentiate to obtain the density function: Now let X ~ N(0,σ) and Y ~ N(0,1) be independent normal variables. Then, from the above calculation, Hence X + Y ~ N(0, particular case: ). It is easy to generalise from this Exercise B.11.4 Let Xi ~ N(μi , σi) be independent (i = 1,2). Then X1 + X2 ~ N(μ,σ) where μ = μ1 + μ2 and σ2 = σ12 + σ22. The next exercise is for those who enjoy a mathematical challenge (or at least one in integration) — it is not essential to our work. Exercise B.11.5 Let X,Y be independent random variables following a Cauchy distribution with location parameter δ = 0 and scale parameter γ = 1. Show X + Y has a Cauchy distribution with δ = 0 and γ = 2. B.12 MULTIVARIATE DISTRIBUTIONS The definitions and results about pairs of jointly distributed random variables are easily extended to the case when we have an arbitrary number of jointly distributed random variables. Let X1,…,Xn : S → ℝ be jointly distributed. If they are all discrete, we define their joint probability distribution function (joint pdf) by f(a1,…,an) = ℙ[X1 = a1,…,Xn = an]. If they are all continuous, their joint probability density function is a function f : ℝn → ℝ with the property that The joint density function pdf f can be used to obtain the joint pdf of any subset of X1,…,Xn. For instance, when the Xi are discrete, the calculation shows that the joint pdf of X1,…,Xn–1 is Similarly, if they are continuous, the joint pdf of X1,…,Xn–1 is EXPECTATION AND VARIANCE Any function g : ℝn → ℝ can be composed with the Xi to create a new random variable g(X1,…,Xn) : S → ℝ defined by g(X1,…,Xn)(w) = g(X1(w),…,Xn(w)). The expectation of this new variable is defined by if the Xi are discrete, and if they are continuous. The expectation and variance of the sum are given by the following generalisations of the two variable case: INDEPENDENCE Jointly distributed random variables X1,…,Xn are called independent if their joint pdf f is given by f(x1,…,xn) = f1(x1) fn(xn), where fi is the pdf of Xi . Exercise B.12.1 If X1,…, Xn are independent, show that 1. Each subcollection of the Xi is also independent. 2. Cov[Xi , Xj] = 0 whenever i ≠ j. 3. B.13 COVARIANCE MATRIX Let X1,…,Xn be jointly distributed random variables. Let σij = Cov[Xi , Xj]. The covariance matrix C for these variables is the n × n matrix whose (i , j) entry is σij (we use the notation σii = σi2): C= . The correlation matrix is similarly defined: it has entries ρij = σij ⁄ σiσj. Note that if σi = 1 for each i, then the covariance and correlation matrices coincide. The covariance matrix is useful for conveniently arranging calculations involving multiple variables. For example, we have the identity xTCx = Var ∑ ixiXi (B.3) where x is the column vector whose entries are the numbers x1, …,xn. The proof of this identity is as follows: Let us now mention that an n × n matrix P is called positivedefinite if it has the following two properties: Symmetry: P = PT. Positivity: For any non-zero column vector x with n entries, xTPx > 0. A covariance matrix C is clearly symmetric. Moreover, we have xT Cx = Var ∑ ixiXi ≥ 0, so it almost satisfies the positivity condition. In applications, it is often a reasonable assumption that none of the Xi is completely determined by the others. Under this assumption, ∑i xi Xi can only be a constant if each xi is zero. For, suppose we have ∑i xi Xi = c, where c is a constant and one of the xi is non-zero. Let the non-zero one be x1. Then X1 is completely determined by the others: X1 = c –∑ i=2nx iXi . This gives us the following chain of steps: If x is non-zero, then ∑i xi Xi is not constant, hence xTCx = Var ∑i xi Xi > 0, and C becomes positive-definite! Why do we care whether C is positive-definite? One reason is that then it has a Cholesky Decomposition: it can be expressed in the form C = AAT, where A is a lower triangular matrix and the diagonal entries of A are positive. We shall not offer a proof of this result. However, once we know A exists, it is not hard to find it. First, we write the desired expression: = On matching the (1,1) entries of the two sides, we see that σ12 = a 112 a 11 = σ1. Next, we match the (1,2) entries: σ12 = a11a21 a21 = . Proceeding in this manner, from left to right, and top to bottom, we recursively find all the entries of A. Exercise B.13.1 Verify that the entries of A satisfy aij = The Cholesky decomposition makes it easy for us to manipulate covariances. For example, suppose we start with independent standard normal variables Z1,…,Zn and we wish to construct normal variables X1,…,Xn with a given covariance matrix C = (σij). Let the Cholesky decomposition be C = AAT. Define the Xi by =A Then . B.14 LINEAR REGRESSION AND LEAST SQUARES In this section, we take up the problem of finding the best linear approximation to the true relationship between two jointly distributed random variables. Suppose the variables are called X and Y and we are looking for an expression of the form α + βX which will give the best approximation to Y . The meaning of the word ‘best’ is obviously the main point of contention here–while different choices are available, we shall only discuss the most popular one. This goes by the name of Ordinary Least Squares (or OLS) and seeks to minimise the expression h(α,β) = E[(Y – α – βX)2]. To find the minimum, we apply the first derivative test and set the two partial derivatives of h to zero: These two equations can be put in a standard form, The solutions are β= and α = E[Y ] – βE[X]. One nice property of the OLS line is that it matches the regression curve whenever the latter is a line. Thus, suppose we know that the regression curve is a line: E[Y |X = x] = a + bx, or E[Y |X] = a + bX (B.4) Taking expectations on both sides of (B.4), we get E[Y ] = E[E[Y |X]] = E[a + bX] = a + bE[X], which matches the first equation of OLS. Next, we multiply both sides of (B.4) by X before taking expectations: E[XY ] = E[E[XY |X]] = E[aX + bX2] = aE[X] + bE[X2]. Thus we obtain the second equation of OLS. This shows the regression line is the same as the one from OLS. Linear regression and OLS are amongst the most commonly used techniques in modeling phenomena, and the terms are sometimes used interchangeably. We need them first while studying the Capital Asset Pricing Model, as well as in later topics (and frequently). B.15 RANDOM SAMPLING The subject of statistical inference is concerned with the task of using observed data to draw conclusions about the distribution of properties in a population. The main obstruction is that it may not be possible (or even desirable) to observe all the members of the population. We are forced to draw conclusions by observing only a fraction of the members, and these conclusions are necessarily probabilistic rather than certain. Sampling is the act of repeatedly observing the values of a random variable. Each observation is itself represented by a random variable which has the same distribution as the original one. We will also make the usually reasonable assumptions that the observations do not disturb the process connected to the original random variable, and that each choice is independent of the others. A random sample is a finite sequence of jointly distributed random variables X1,…,Xn such that 1. Each Xi has the same probability density function fX. 2. The Xi are independent. The common probability density function fX for all the Xi is called the population density. We also say that we are sampling from a population of type X. The parameters associated to X are called the population parameters. Example B.15.1 Consider an investor who wishes to understand the possible fluctuations in the price of a particular stock over a week. She may have available data for past prices of this stock. Then she can look at the prices at 1 week intervals, and calculate the corresponding rates of return. Suppose the prices are x1 , … , xn, with one week separating successive prices. This gives her n – 1 rates of return r1,…,rn–1, with ri = . If the possible rate of return over a week is represented by a random variable R, then each ri can be viewed as an observation of a random variable Ri which has the same distribution as R. Independence of the Ri is not obvious in this case, but has been found to be a reasonable assumption in practice. □q This example should also serve to remind you of a notation we have been using. Random variables are indicated by upper-case letters and their values by the corresponding lower-case letters. Thus, if a random variable is named X, an observed value of it will be denoted by x. Broadly, the task of sampling is to estimate the type of a random variable, as well as the associated parameters. Thus we might first try to establish that a random variable has a binomial distribution, and then estimate n and p. The population parameters can be estimated in various ways, but are most commonly approached through the mean and variance. For instance, if we have estimates and 2 for the mean and variance of a binomial variable, we could then substitute these in μ = np and σ2 = np(1 – p) to estimate the population parameters n and p. Notation A random sample X1,…,Xn represents the possibilities for a sequence of measurements of a random variable. Values of actual measurements will be represented by x1,…,xn and will be collectively called an observation of the random sample. B.16 SAMPLE MEAN, VARIANCE AND COVARIANCE SAMPLE MEAN Let X1,…,Xn be a random sample. Its sample mean is the random variable not]Xbar@X X defined by Observed values of the sample mean are used as estimates of the population mean μ. Therefore, we need to be reassured that, on average at least, we will see the right value. This is easily done: Moreover, we would like the variance of X to be small so that its values are more tightly clustered around μ. We have where σ2 is the population variance. Thus the variance of the sample mean goes to zero as the sample size increases, and the sample mean becomes a more reliable estimator of the population mean (see Figure B.9). Figure B.9: An illustration of the behaviour of the sample mean. For the first diagram, 10,000 independent observations were made of a binomial population with parameters p = 0.3 and n = 30, so that the population mean was 9. We see that the observed sample mean converges towards the correct value as the sample size increases to 10,000. The second diagram shows quite different behaviour when the population follows a Cauchy distribution (in this case, with location parameter 0 and scale parameter 1). Here, the sample mean does not settle down at all. Exercise B.16.1 If X1,…,Xn is a random sample from an N(μ,σ) population then X ~ N(μ, σ ⁄ ). The Cauchy distribution is an exception to the previous description– we cannot make the observed average stabilise by taking larger samples. This is not surprising, since the distribution does not have an expectation (see Figure B.9 again). Notation Let x1,…,xn be an observation of a random sample X1, …,Xn. Then will be called an observed value of X. SAMPLE VARIANCE Let X1,…,Xn be a random sample. Its sample variance is the random variable not]Ssq@S2 S2 defined by S2 = Do the values of S2 give good estimates of the population variance? We first reformulate the definition of S2: Now we take the expectation: = n(σ2 + μ2) - σ2 - nμ2 = (n - 1)σ2. Thus, we obtain E[S2] = σ2 A rather longer calculation, which we do not include, shows that18 where μ4 = E[(X - μ)4] is called the fourth central moment. Consequently, In sum, we see that sample variance has the right average (σ2) and clusters more tightly around it if we use larger samples (See Figure B.10). Figure B.10: An illustration of the convergence of the sample variance to the population value. 10,000 independent observations were made of a binomial population with parameters p = 0.3 and n = 30, so that the population variance was 6.3. The diagram shows the trend of the sample variance as the sample size increases to 10,000. Exercise B.16.2 Let S2 be the sample variance of a sample of size n from an N(μ, σ) population. Show that Var[S2] = . Notation Let x1,…,xn be an observation of a random sample X1, …,Xn. Then will be called an observed value of S2. Finally, the square root of the sample variance is denoted by S and is called the sample standard deviation. SAMPLE COVARIANCE AND CORRELATION Suppose we have two jointly distributed variables X,Y (with respective means μX and μY ) and we wish to estimate their covariance. Let (X1,Y1),…,(Xn,Yn) be a sequence of independent observations of (X,Y ). This means that Xi is independent of both Xj and Yj whenever i≠j. These assumptions imply: 1. E[XiYi] = E[XY ] for every i. 2. E[XiYj] = E[Xi]E[Yj] = μXμY whenever i≠j. We define the sample covariance to be the random variable not]SXY@SXY The expectation of the sample covariance equals the covariance of X and Y . To prove this, we first obtain the following identity: Now take expectation on both sides: and hence E[SXY ] = Cov[X,Y ]. Exercise B.16.3 Show that The variance of the sample covariance has an explicit formula, which we state without proof:19 Var[SXY ] = , where μ2,2 = E[(X - μX)2(Y - μY )2]. This formula reassures us that, as n → ∞, the variance of the sample covariance dies down to zero. Thus increasing the sample size leads to better estimates of the covariance of X and Y . The sample correlation R is defined in analogy with correlation by dividing the sample covariance by the sample standard deviations: not]R@R R= , where SX2 and SY respectively. 2 are the sample variances of X and Y B.17 CENTRAL LIMIT THEOREM For a more detailed understanding of the sample mean, we have to study its probability distribution. Naturally, this depends on the base distribution. To gain some insight, let us look at some simulations with different distributions and sample sizes. Figures (B.11) to B.13 depict the results of drawing samples from certain populations. In each case, the distribution of the sample mean is approximated by a normal distribution. The figures show that for large sample sizes, the approximation is almost perfect. By doing some normalising, we can connect everything to the standard normal distribution. Given a sequence of independent random variables Xi, all with the distribution X, define Yn = , and Zn = , where μ = E[X] and σ2 = Var[X]. Yn is the sample mean of the sample X1,…,Xn and has mean μ and standard deviation σ⁄ . Therefore Z n has mean 0 and standard deviation 1. In our examples, since the sample means Yn are becoming normal, their normalised versions Zn will become standard normal. It turns out that this is a general phenomenon. Figure B.11: Behaviour of sample mean from a uniform distribution. Each histogram is based on a simulation of 50,000 samples. The sample size is 2 for the first figure and 6 for the second. Figure B.12: Behaviour of sample mean from a lognormal distribution, based on simulations of 10,000 samples. The sample size is 6 for the first figure and 200 for the second. Figure B.13: Behaviour of sample mean from a Bernoulli distribution, based on simulations of 10,000 samples. The sample size is 6 for the first figure and 100 for the second. Theorem B.17.1 (Central Limit Theorem) Let (Xi) be a sequence of independent random variables, with identical distribution. Suppose their common mean is μ and standard deviation is σ. Let Yn be the sample mean of the sample X1 , … , Xn and Zn = . Then, as n →∞, the random variables Zn converge to the standard normal distribution Z in the sense that □q We have not developed, in this Appendix, the mathematical techniques required to prove this theorem. However, the accompanying figures should have served to make it appear plausible to you. It is also useful to note that when the distribution being sampled is normal, the sample mean is exactly normally distributed. The Central Limit Theorem is truly remarkable––whatever distribution you start with, its independent sums start looking like the normal distribution. The only caveat is that the starting distribution should possess a mean and a variance. B.18 STABLE DISTRIBUTIONS A distribution X is said to be stable if for any random sample X1, …,Xn from it, an arbitrary linear combination is just a scale and translation of X: a1X1 + + anXn aX + b, for some a,b ∈ ℝ. Example B.18.1 We have seen normal distributions are stable. If X1, …,Xn are a random sample from an N(μ,σ) population, and X ~ N(μ,σ), then X1 + + Xn aX + b, where a = and b = (n - )μ. q □ Now, suppose X is stable with mean μ and standard deviation σ. Let (Xi ) be a sequence of independent random variables with Xi X, and set Yn = (X1 + Xn), Zn = . Then E[Zn] = 0 and Var[Zn] = 1. Since X is stable, we have Zn anX + bn. On equating the mean and variance, we get anμ + bn = 0, an2σ2 = 1 which implies an = ± , bn = ∓ . Hence the sequence Zn can consist of only two distributions: Zn ± . By the Central Limit Theorem, the distribution of Zn converges to the standard normal distribution Z. Therefore, the Zn’s must eventually have the distribution Z. Since Z - Z, we find that ~ N(0,1), and hence X ~ N(μ,σ). So we have established that the normal distribution is the only stable distribution with mean and variance. On the other hand, the Cauchy distribution is a stable distribution without mean or variance. In fact, there is a large family of stable distributions. They have certain common features––they are all continuous, and with a single peak tapering off on either side. Unfortunately, they are known indirectly through the Fourier transform of their density functions, and this has made it harder to develop intuition about them. The lack of mean or variance is also an obstacle to developing intuition. Computer simulation, however, has made it possible to use them in applications and their use is growing. B.19 DATA FITTING In choosing which distribution best fits some given data, there are two stages: First, we have to choose an appropriate type of distribution (binomial, normal, etc.) and then within the type we look for the particular member that works best. The first choice usually corresponds to our choice of model, and the second to fixing the parameters of the model. For example, if we adopt the Geometric Brownian Motion model for stock prices, we have decided to fit a normal distribution to the log of the returns. The model has two parameters–– drift and volatility––and determining them corresponds to fitting a specific normal distribution to the data. The simplest way to make a distribution fit the data is to match some key features. For example, by matching the mean and variance of the data to those of the distribution, we get two equations involving the parameters of the distribution. This suffices for distributions like the binomial and normal ones, which have only two parameters. If more parameters have to be fixed, we can match the higher moments. The kth central moment of a set of data (xi) is defined to be: The corresponding kth central moment of a random variable X is E[(X - E[X])k]. The third moment is seen as capturing information about the symmetry of the density function about the mean, and the fourth as describing how quickly the “tails” die to zero. However, these representations are not very reliable. Higher moments of the data are very sensitive to extreme events and therefore less reliable - on the whole, it is deemed best to only match the mean and variance. A more nuanced approach is to match the quantiles. Quantiles of a distribution were defined in §B.3. For a set of data (xi), we analogously define the q-quantile (0 < q < 1) to be the least number xq such that at least 100q% of the data is less than or equal to q. Example B.19.1 Figure B.14 is based on the data for Infosys stock depicted in Figure 5.3. The quantiles for this data are shown as discs. The solid curve represents the normal distribution with the same mean and variance as the data. We see that it is too low on the left and too high on the right, as far as most of the data is concerned. This has happened because the extreme events at the ends have pulled it away from the centre. We can compensate by ignoring the extreme events. Then we get a near perfect fit (the dashed curve) to the middle 80% of the data. Figure B.14: Fitting normal distributions to data using quantiles We are not able, however, to get a good match through the full (0,1) range. This suggests that the data does not truly represent a normal distribution. □q This example illustrates how quantiles can be used to visualise the fit of a distribution to data, and to adjust it accordingly. For example, if we do not care about extreme events and just want a model that does well under normal circumstances, the distribution depicted by the dashed curve would do very well indeed. Another positive feature of this approach is that it works even when the mean and variance of the distribution do not exist. In the example at hand, if we reject the normal distribution as a good model for the data, we can look for a replacement among the class of stable distributions. The non-normal members of this class will not have a variance (and often not even a mean) so the quantile alternative will be essential. Example B.19.2 A Cauchy distribution with parameters δ,γ has median δ and interquartile range 2γ and this can be used to get a quick fit to the data. For example, the Infosys data studied in this section has median 0 and interquartile range 0.04, so we fit a Cauchy distribution with δ = 0 and γ = 0.02. □q Figure B.15: Fitting a stable distribution to the data of Figure B.14. The second diagram compares the density function of the fitted stable distribution (solid) to a normal one (dashed). Note the slight asymmetry and the heavier tails of the stable distribution.(See Example B.19.3) Example B.19.3 It is possible to get a much closer fit to the data in the previous examples by using a general stable distribution. Figure B.15 shows one possibility.20 Stable distributions are parametrised by four variables, and we mention for the record that the one shown in the figure has α = 1.53, β = 0.1665, γ = 0.021 and δ = -0.001, where we use the 0-parametrisation from Nolan[35]. q □ B.20 MONTE CARLO SIMULATION Every modern computing system (from scientific calculators upward) comes equipped with a system for generating “random numbers”. A typical scientific calculator has a key marked RAN or RAND, on pressing which a number between 0 and 1 is obtained. Repeated pressing of the key produces a sequence such as the following: 0.0391, 0.0921, 0.7406, 0.7691, 0.8491, 0.9403, 0.1396, 0.9285, 0.0607. The numbers in this sequence are generated using some simple arithmetic, but still have interesting statistical properties. They are random in the sense that knowing many members of the sequence does not enable us to predict the next with any certainty (in other words, the rule used to generate the numbers cannot be guessed from the output). Moreover, they are chosen without any bias toward any subinterval of [0,1]. Thus, the procedure simulates sampling from a uniform distribution over [0,1]. Figure B.16: Uniformly distributed random numbers (See Example B.20.1.) Example B.20.1 In spreadsheet programs such as Microsoft Excel and OpenOffice Calc, the RAND function generates random numbers which are uniformly distributed over [0,1]. Figure B.16 shows the results of a run of 1000 consecutive evaluations of RAND. The first diagram plots the numbers consecutively along the y-axis, showing that there is no obvious pattern in their arrangement. The second diagram shows their distribution among different parts of the [0, 1] interval. □q With this basic building block, one can generate sequences that simulate sampling from any distribution we wish to consider. In particular, let us note that if we can simulate random sampling of a particular distribution, we can also simulate sampling of a number of independent random variables with this distribution. For example, suppose we wish to generate 1000 pairs of values of independent random variables X,Y which are uniformly distributed over [0,1]. Then we generate 2000 values of the uniform distribution over [0,1] and allocate them alternately to X and Y . Example B.20.2 Suppose X1,…,X12 are independent random variables, all uniformly distributed over [0,1]. Define The Central Limit Theorem (§B.17) tells us Y is approximately normal, and it is easily checked that E[Y ] = 0, Var[Y ] = 1. Thus Y is approximately standard normal. This observation gives us a straightforward way to simulate a standard normal distribution if we have a method of simulating uniform distributions. Specifically, suppose x1,…,x12 is one set of generated values for X1,…,X12. Then is a value for Y . Figure B.17: Comparison of the density functions of a sum of twelve independent uniform distributions (solid) with a standard normal distribution (dashed) (Example B.20.2). Figure B.17 shows the approximation is almost perfect. One concern could be that the values of Y are confined to the interval [-6,6] instead of varying over all of the real line. However, the probability of a standard normal variable taking a value outside of [-6,6] is a tiny 2 × 10-9, so this is only an issue when simulating very large data sets. Figure B.18 shows an example of using this procedure to simulate 1000 observations of a standard normal random variable. Figure B.18: Normally distributed random numbers (Example B.20.2) The process of simulating a standard normal distribution provides□q a base for simulating many other distributions. Suppose z1,z2,… are simulated values of the standard normal distribution. Then w1 = μ + σz1,w2 = μ + σz2,… are simulated values of the N(μ,σ) distribution. Further, ew1,ew2,… are simulated values of the lognormal distribution with parameters μ and σ. Example B.20.3 Cauchy distributions can be easily simulated due to the following fact: If X is uniformly distributed over [-π,π] then Y = tanX has a Cauchy distribution with location parameter δ = 0 and scale parameter γ = 1. This follows from the computation below. Therefore the pdf of Y is fY (y) = (y) = Thus, if x1,x2,… are simulated values of a uniform distribution over [-π,π], then tan(x1),tan(x2),… are simulated values of a Cauchy distribution with δ = 0 and γ = 1. □q We will now see how to simulate correlated variables. To simulate n standard normal variables with covariance matrix C = (σij), first simulate n independent standard normal variables Z1,…,Zn. Next, find the Cholesky decomposition C = AAT (assume C is positivedefinite so that the Cholesky decomposition exists). Finally, define new variables X1,…,Xn by =A . The Xi are normal since they are linear combinations of independent normal variables. We can easily see that each Xi has zero mean: We have already computed on page 240 that the covariances are Cov[Xi,Xj] = σij. 15 The event algebra does not always include all the subsets of the sample space S. But we will let this issue lie undisturbed. 16 These are not the only types. But these types contain all the ones we need. 17 Our calculation is valid if g is one-one. With slightly more effort we can make it valid for any g. 18 Few books include the proof. You can find one version on the MathWorld website: http://mathworld.wolfram.com/SampleVarianceDistribution.html. Also, it is a special case of the formula for the variance of the sample covariance given later. 19 You can find it online at the University of Alabama’s “Virtual Laboratories in Probability” website: <http://www.math.uah.edu/stat/sample/Covariance.xhtml> . 20 The computations for this example programSTABLE, available from J. http://academic2.american.edu/~jpnolan>. were carried P. Nolan’s out via the website: < Appendix C Solutions to Selected Exercises CHAPTER 1 Exercise 1.2.3 Let us compare the given situations with the requirements of an arbitrage opportunity: zero initial investment and zero probability of loss together with a positive probability of profit. The first two situations can be arbitrage if interest rates are low enough. For, we can start by borrowing Rs 10, thus avoiding an initial investment of our own money. This will earn us Rs 20 by the end of the set time (ten years or one day), which we can use to pay off the loan. If the interest on the loan is low enough, we will still have money left over and this will constitute our risk-free profit. In the second situation, this is almost certain to happen. The lottery ticket does not constitute arbitrage since there is no guarantee of profit (indeed, a loss is almost certain). The free lottery ticket does provide an arbitrage opportunity. There is no possibility of loss, and a very small one of success. In the fifth situation, we can loan some amount from Bank A and deposit it in Bank B for a year. At the end we will earn 15% interest on it, which can be used to pay off the 10% interest on the loan. The remaining 5% is our arbitrage profit. The last situation does not, by itself, provide an arbitrage opportunity. Exercise 1.3.7 Let each instalment be of an amount I. a. After 6 months, the debt is 10,000 × 1.075 = 10,750, since the interest rate for 6 months is 15/2=7.5%. On payment of the first instalment, a debt of 10,750 – I remains. Over the final 6 months, this becomes (10,750 – I)×1.075. For the debt to be paid off, this amount must equal the last instalment and we get the equation I = (10,750 – I) × 1.075. The solution is I = 5569.28. b. Of the first instalment, Rs 750 goes towards the interest and Rs 4819.28 towards the principal. The last instalment pays off the remaining principal amount of Rs 5180.72 as well as interest amounting to Rs 5569.28 – 5180.72 = 388.56. Exercise 1.3.9 Let us base our calculations on an investment of Rs 1. Let the rate of interest be r. Then the three cases yield the following equations: a. 2 = 1 + 10r b. 2 = (1 + r)10 c. 2 = e10r The respective solutions are r = 0.1, 0.072 and 0.069, or 10%, 7.2% and 6.9%. Exercise 1.3.10 The arbitrage strategy is to borrow an amount X for a year and deposit it in the same bank. Withdraw it after 6 months and immediately reinvest it in that bank. After 1 year, the invested amount becomes 1.0252X = 1.050625X. We use 1.05X to pay off the loan and are left with a risk-free profit of 0.000625X. Exercise 1.3.13 The annual growth factor is 1.0212 = 1.268, so the effective annual rate is 26.8%. Exercise 1.3.14 a. Let the invested amount be Rs 1, the discrete rate be rd, and the continuous rate be rc. Since the interest earned over 1 year is the same for both A and B, we find that Hence which shows that the interest earned over 6 months is also the same for both A and B. If we graph the interest earnings against time, we get the diagram as shown in the next page. In particular, at the 9 month mark, A has earned more interest than B. b. Since the effective rate is 10%, we obtain the following equations: 2 = erc = 1.1. This yields rd = 0.098 and rc = 0.095. Over the first 6 months, using years as units, the difference between the interests earned by A and B is f(t) = 1000(1 + 0.098t – e0.095t), 0 ≤ t ≤ 0.5. To locate the maximum we use the first derivative test: 0 = f′(t) = 1000(0.098 – 0.095e0.095t), which gives t = 0.33. Hence the maximum gap is f(0.33) = 0.49, or a mere 49 paise! Exercise 1.4.2 When continuous rates are used, the relationship between the original and final purchasing power is Purchasing power after 1 year = . If an amount A is invested at the risk-free rate r for a year, at the end we have the effective amount (in terms of purchasing power): = er–fA. Therefore the real risk-free rate is r – f. Exercise 1.4.3 The growth in purchasing power is = 1.166. Therefore the return in real terms is 16.6%. CHAPTER 2 Exercise 2.4.3 In part 1, we apply the formula for a finite geometric sum: For part 2, we substitute C = λF in the above formula to get: P= ((1 + λ)n – 1)F + F = F. This shows that when the yield equals the coupon rate, the bond is sold at par. If the yield is greater, the price must fall below the face value, and the bond is at a discount. If the yield is lower, the price must rise and the bond is at a premium. Exercise 2.4.4 Similar to the solution above. Exercise 2.5.2 After 6 months, the accrued interest is 10/2 = 5, and so the purchase or dirty price is 102 + 5 = 107. Exercise 2.7.1 To save ink, we write the solution for m = 1. The price is given by Hence, Dividing through by P gives =– . Exercise 2.7.3 In the limit, the bond becomes a perpetuity with annual payments of C. Hence, where x = e–λ. Now the denominator is a geometric series: 1 + x + x2 + x3 + = . On differentiating both sides with respect to x we get 1 + 2x + 3x2 + = , which gives a formula for the numerator as well. Therefore, we have obtained Exercise 2.7.4 Go through the same steps as in the previous exercise, with x = . The final answer is Exercise 2.8.3 If we invest Re 1 for n years, it becomes (1 + sn)n. If we invest it for m years and then reinvest it for the remaining n – m years, it becomes (1 + sm)m(1 + fm,n)n–m. Since both routes are riskfree (due to the use of the forward rate), the No Arbitrage Principle says the final sums must be the same: (1 + sn)n = (1 + s m)m(1 + f m,n)n–m. We solve this for fm,n to get the required formula. Exercise 2.8.4 Just as above. Exercise 2.8.5 Proceed as in Exercise 2.8.3 and obtain: This implies nsn = msm + (n – m)fm,n, which gives the formula for fm,n. Exercise 2.8.7 Use the linearity of differentiation, as done for Macaulay duration on page 42. Exercise 2.8.8 Similar to the calculations for modified duration in Exercise 2.7.1. Exercise 2.11.2 If the bond is called at the end of the ith year, for the purposes of calculating YTC it becomes an i year bond with face value Ci. Apply Exercise 2.4.3. C 3 Exercise 3.1.1 Suppose r1 = r2 + c, with c > 0, over a time interval [0,T]. Then we implement the following arbitrage strategy. At time t = 0, short sell an amount of the second asset for a price V . Invest this V in the first asset. Then, at time t = T, sell the first asset. This will earn us V (1 + r1) = V (1 + r2) + V c. We use V (1 + r2) of this to buy the required amount of the second asset and deliver it to complete the short sale. We are left with a risk-free profit of V c. In these calculations, note that r1,r2 are random, but their difference c is not. Exercise 3.1.3 In both cases, we are left with Rs 100 cash – a loss of Rs 100. Exercise 3.1.4 Let the weights for S and T be denoted by wS and wT. Their values are: wS = – = –4 and wT = = 5. The individual rates of return are: rS = = and rT = = . Hence the overall rate of return should be given by as was calculated directly in the example. wS rS + wT rT = –4 × +5× = 50%, Exercise 3.2.2 C and B are more efficient than A, while C and D are more efficient than E. Hence the efficient ones are B, C and D. Exercise 3.3.1 Just that asset itself. Exercise 3.3.2 The equation rP = wrA + (1 – w)rB can be solved for w. We get w= and 1 – w = . Substitute these in the formula for σP2 to get σ2P = . The RHS is a quadratic in rP, hence it can be put in the form a rP2 + b rP + c. To obtain a, we collect all the coefficients of rP2: a= . To see a > 0 note that the numerator is positive because it is the variance of rA – rB. Exercise 3.3.3 This exercise is for the mathematically curious, so try it yourself. Exercise 3.3.4 1. This solution is modelled on the proof that –1 ≤ ρ ≤ 1. Consider the variance of the random variable rA – rB. If σA = σB = σ, and using ρ = 1, we have Hence rA – rB is a constant. By Exercise 1.3.14 we must have rA = rB, and then A and B have the same coordinates in the portfolio diagram. On the other hand, it was given that their coordinates are different. So we must have σA ≠ σB. Var[rA – rB] = σ2A + σ2 B – 2σ AσB = σ2 + σ2 – 2σ2 = 0. 2. The portfolio variance is σ2P = w2σ2 A + (1 – w)2σ2 B + 2w(1 – w)σ AσB = (wσA + (1 – w)σB)2, and so 0 = σP = |wσA + (1 – w)σB|. We solve this to get If σB > σA, we get w > 1. If σB < σA, we get w < 0. w= . Exercise 3.3.5 The portfolio mean return and standard deviation are Exercise 3.3.6 Similar to the solution of part 2 of Exercise 3.3.4. Exercise 3.3.9 Consider the portfolio variance as a function of the weight w: σ2P = w2σ2A + (1 – w)2σ2B + 2w(1 – w)σAB. To minimise the variance, we calculate its derivative with respect to w and set it to zero. Rearrange this equation to obtain the formula for w. Exercise 3.3.10 Let σA = σB = σ. Then we have w= = = . Exercise 3.4.1 We have to solve the following linear system. = This system has the following block structure (indicated by the lines in the matrices above): = This gives two vector equations: IW + RL = 0 RTW = R′ Substituting the first in the second gives a 2 × 2 system for λ,μ: Now, we can obtain the weights of the minimum variance portfolio with r = 0.5: The variance of this portfolio is σ2 = u2 + v2 + w2 = 0.459 Exercise 3.4.3 becomes In the notation of this exercise, Equation 3.4.3 = Carrying out matrix multiplication on the LHS, we get two equations: Cw + uμ = 0 (C.1) uTw = 1 (C.2) We eliminate w between these equations. Equation (C.1) gives w = –μC–1u and we substitute this in (C.2) to get μ= . (C.3) Substituting this in (C.3), we get the desired result. Exercise 3.7.1 Substitute in CAPM. This gives r= = rf + (rM – rf). Multiply both sides by V0 and solve for V0. Exercise 3.7.3 We shall use the Certainty Equivalent Pricing Formula. We have the following calculations: Substituting all these into the Certainty Equivalent Pricing Formula, along with rf = 1.05, gives V0 = 7.42 million. CHAPTER 4 Exercise 4.2.6 Reinvest the income from the margin account at the prevailing risk-free rate till the expiry date. Exercise 4.3.4 We have T = 0.5 years, S0 = 100, r = 0.1. Therefore, X = 100 × 1 + = 105. We note in passing that, had r been continuously compounded, the numerical result would hardly have changed: X = 100 e0.05 = 105.13. Exercise 4.4.1 At time t, create two portfolios: Portfolio A: The futures with exercise price X and Xe–r(T–t) cash. Portfolio B: The futures with exercise price Xt and Xte–r(T–t) cash. At time T both portfolios become the asset, hence have equal value. Therefore they have equal value at time t: Vt + Xe–r(T–t) = 0 + Xte–r(T–t). Exercise 4.4.3 The spot price of the bond is S0 = 1000 e–0.25×0.1 + 11000 e–0.75×0.1 = 975.31 + 10205.18 = ¥11,180.49. The net present value of the income from the bond during the life of the futures is: I0 = 1000 e–0.25×0.1 = ¥975.31. Since X = ¥10,500, the value of the futures is V = S0 – I0 – Xe–rT = 10205.18 – 10500 e–0.1×4⁄12 = ¥49.41. Exercise 4.4.4 Due to the storage costs, the asset has an associated negative income: I0 = –300 e–0.1×1/12 – 300 e–0.1×2/12 – 300 e–0.1×3/12 = Rs 885.14. Therefore the exercise price is X = (S0 – I0) erT = (10000 + 885.14) e0.1×3/12 = Rs 11,160.70. Exercise 4.4.7 Rs 103.07 Exercise 4.7.2 Let the rates of return from the portfolio and the index be rP and rI respectively. Then rP = rI = . Therefore β= = = h. CHAPTER 5 Exercise 5.1.1 See the solution of Exercise B.7.1. Exercise 5.1.3 We have 2⁄2)△t+σ S△t = S e(μ–σ Z, where Z is standard normal. Therefore, Hence the expected return is Exercise 5.1.4 Use the first order approximation ex ≐ 1 + x. Exercise 5.2.1 Combining the GBM formula with the formula for futures price, we obtain which is the formula for a GBM with drift μ – r and volatility σ. Exercise 5.4.1 This problem can be done directly through the formulas for expectation and variance for discrete random variables. We can also do it in the following way. Let Y takes the values 1 and 0 with probabilities p and 1 – p respectively — so it is a Bernoulli variable and has mean p and variance p(1 – p). Therefore, Y= . CHAPTER 6 Exercise 6.1.3 We compare with the value of the corresponding futures to get: Exercise 6.2.3 Exercise 6.2.4 If an American put is exercised at t = 0, there is a payoff of X – S > Xe–rT – S. So the lower bound can be improved to P ≥ max{0, X – S}. The upper bound similarly changes to P ≤ X. Exercise 6.3.2 Create the following portfolios at time t = 0: Portfolio A: 1 put and 1 share. Portfolio B: 1 call and Xe–rT cash. Exercise 6.3.4 The same reasoning would suggest that put prices would fall. But put–call parity says put and call prices move together —the only resolution is that the prices must be independent of the expected future asset price. Exercise 6.4.7 Note that for any number x, max{0,x}– max{0,–x} = x. Therefore, max{0, SU kD n–k – X}– max{0,X – SU kD n–k} = SU kD n–k – X. Substitute this in the n-step BOPM formula for C – P: Exercise 6.9.1 An n-step BOPM gives the value of the contract as If we take U = eσ obtain the value and D = e–σ , and then let n →∞, we 2)T . V = S2e(r+σ This limit takes some effort to calculate. A hint: substitute the power series expansions and consider the first couple of terms. Using the log function helps. It is also possible to attack the problem via L’Hôpital’s Rule. CHAPTER 7 Exercise 7.2.3 Use risk-neutral valuation. The payoff function is f(ST) = And so the derivative value is where (see page 158) –a = =w–σ . Exercise 7.2.4 A stock-or-nothing call is equivalent to one European call and X cash-or-nothing calls. Exercise 7.8.1 European call. Combine put–call parity with the Greeks for a Exercise 7.10.3 Reducing the number of calls lowers the slope of the payoff function and hence the profit from moderate increases in the asset price. CHAPTER 8 Exercise 8.1.3 The 2-day standard deviation of the return from each asset is σ = 105 × 0.01 × = 1414.21. So the standard deviation of the 2-day return from the portfolio is σP = = 1140.18. Over a short period such as 2 days, it is reasonable to take the mean return to be zero. So we model the return as a N(0,1140.18) variable. The 1% quantile for this is –2652.44. Hence the 99% 2-day VaR is estimated to be 2652.44. Exercise 8.2.2 The change in portfolio value is modelled as △V ≐ 6S1 – 4S2 where 1 ~ N(0,20) and S2 ~ N(0,8). So we model dV ~ N(0, ) = N(0, 21.54). The 5% quantile is –35.43. So the 1-day 95% VaR is 35.43. APPENDIX A Exercise A.1.5 We start by noting a consequence of the Mean Value Theorem: If f is a continuous function on [a,b] then there is a c ∈ [a,b] such that To prove this, let . Then we have F′(x) = f(x), F(a) = 0 and The Mean Value Theorem implies the existence of c ∈ [a,b] such that Now we can start on the exercise. We calculate where b′∈ [b – z,b] and a′∈ [a – z,a]. Now b →∞ implies b′→∞ and a →–∞ implies a′→–∞. Therefore, APPENDIX B Exercise B.3.2 We check that the given formula for faX+B satisfies FaX+b (x) = The case a < 0 is shown below (The a > 0 case is similar and easier): Exercise B.3.6 From the graph of FX we see that the leftmost point x at which F(x) ≥ 0.25 is x = 0. Hence x0.25 = 0. Similarly, we see that x0.5 = 1 and x0.75 = 2. Therefore the interquartile range is 2 – 0 = 2. Exercise B.3.7 From the graph of FX, we read that x0.25 = 0.25, x0.5 = 0.5, x0.75 = 0.75. Hence the interquartile range is 0.75 – 0.25 = 0.5. Exercise B.6.5 A continuous X cannot have zero variance. A discrete X can have zero variance if and only if it is constant. Exercise B.7.1 Note that = 1 since this is the integral of the probability density function of a normal variable (with mean σt and variance 1). Exercise B.8.1 To confirm we have a pdf, we have to check that the total integral is 1: And, Now, and so the expectation integral diverges. Hence, a Cauchy random variable has no mean. So, it does not have a variance either! Exercise B.8.2 The cdf is We calculate the quartiles: Hence, median= x0.5 = δ, and interquartile range = x0.75 – x0.25 = 2γ. Exercise B.8.3 Let Y = (X – δ)/γ. We work with the cdf of Y : Therefore the pdf of Y is fY (t) = . Exercise B.10.4 Since E[Y |X = x] = ρ x, we have E[Y |X](w) = E[Y |X = X(w)] = ρX(w), and hence E[Y |X] = ρX. Exercise B.11.4 We have already worked out that if X ~ N(0,σ) and Y ~ N(0,1) are independent, then X + Y ~ N(0, ). Now let X1 ~ N(μ1, σ1) and X2 ~ N(μ2, σ2) be independent. Define Since X1 and X2 are independent, so are X and Y . Now, Exercise B.13.1 Note that since A is lower triangular, all entries aij with i < j are zero. So we are only concerned with the entries that have i ≥ j. To find aij , with i ≥ j, we multiply the ith row of A with the jth column of AT. This is supposed to equal σij, so we obtain the equation ai1aj1 + ai2aj2 + + aijajj = σij. When i > j, this gives When i = j, the original equation becomes a2i1 + a2 i2 + + a ii2 = σ2i, and we solve it for aii: Exercise B.16.1 The fact that X will have mean μ and standard deviation σ/ has been proven just before this exercise. Further, we know from the previous units that the sum of independent normal variables is normal. Exercise B.16.2 We derive it from the general formula given previously. The main task is to calculate the fourth central moment for X ~ N(μ, σ): Substitute this result in the given formula for Var[S2]. Bibliography 1. Tom M. Apostol. Calculus, Vol I and II. Wiley India, 2nd edition, 2007. Original printing 1966. 2. Aristotle. Politics. A Treatise on Government. Translated by William Ellis. J. M. Dent and Sons Ltd., London, 1912. Downloadable from Project Gutenberg: http://gutenberg.net. 3. Orley Ashenfelter, Phillip B. Levine, and David J. Zimmerman. Statistics and Econometrics: Methods and Applications. JohnWiley and Sons, New York, 2003. 4. Louis Bachelier. Théorie de la spéculation. Annales Scientifiques de l’École Normale Supérieure 3e série, 17:21–86, 1900. Downloadable from http://www.numdam.org/item?id=ASENS_1900_3_17_21_0 . 5. Simon Benninga. Financial Modeling. MIT Press, Cambridge, MA, 1997. 6. Fischer Black and Myron Scholes. The pricing of options and corporate liabilities. Journal of Political Economy, 81(3):637– 654, 1973. 7. Zvi Bodie, Alex Kane, and Alan J. Marcus. Investments. Irwin/McGraw-Hill, Boston, 5th edition, 2002. 8. Richard A. Brealey, Stewart C. Myers, Franklin Allen, and Pitabas Mohanty. Principles of Corporate Finance. Tata McGraw-Hill, India, 8th edition, 2007. 9. Marek Capiński and Tomasz Zastawniak. Mathematics for Finance: An Introduction to Financial Engineering. Springer Undergraduate Mathematics Series. Springer-Verlag, London, 2003. 10. Morgan Guaranty Trust Company. RiskMetricsTM – Technical Document. http://www.jpmorgan.com/RiskManagement/RiskMetrics/RiskMe trics.html, 4th edition, 1996. 11. Paul Cootner, editor. The Random Character of Stock Market Prices. MIT Press, Cambridge, MA, 1964. 12. Jean-Michel Courtault, Yuri Kabanov, Bernard Bru, Pierre Crépel, Isabelle Lebon, and Arnaud Le Marchand. Louis Bachelier on the Centenary of Théorie de la spéculation. Mathematical Finance, 10(3):341–353, 2000. 13. John C. Cox, Jonathan E. Ingersoll Jr., and Stephen A. Ross. The relation between forward prices and futures prices. Journal of Financial Economics, 9:321–346, 1981. 14. John C. Cox and Stephen A. Ross. The valuation of options for alternative stochastic processes. Journal of Financial Economics, 3:145–66, 1976. 15. John C. Cox, Stephen A. Ross, and Mark Rubinstein. Option pricing: A simplified approach. Journal of Financial Economics, 7:229–263, 1979. 16. Keith Cuthbertson and Dirk Nitzsche. Financial Engineering. John Wiley and Sons, London, 2001. 17. Mark Davis and Alison Etheridge. Louis Bachelier’s Theory of Speculation. Princeton University Press, Princeton, 2006. 18. Robert F. Engle. Autoregressive conditional heteroscedasticity with estimates of variance of United Kingdom inflation. Econometrica, 50:987–1008, 1982. 19. Robert F. Engle. GARCH 101: an introduction to the use of ARCH/ GARCH models in applied econometrics. NYU Working Paper No. FIN-01-030. Available http://ssrn.com/abstract=1294571, 2001. at SSRN: 20. Eugene F. Fama and Kenneth R. French. The Capital Asset Pricing Model: Theory and Evidence. SSRN eLibrary, 2003. 21. Lawrence Fisher and Roman Weil. Coping with the risk of interest rate fluctuations: Returns to bondholders from naive and optimal strategies. Journal of Business, 44:408–31, 1971. 22. Craig W. French. Jack Treynor’s Toward a Theory of Market Value of Risky Assets. SSRN eLibrary, 2002. 23. John E. Freund. Mathematical Statistics. Prentice-Hall India, New Delhi, 5th edition, 2001. 24. John Hicks. Value and Capital. Clarendon Press, UK, 1939. 25. Barnabas Hughes. The earliest correct algebraic solutions of cubic equations. In Ronald Calinger, editor, Vita Mathematica. Mathematical Association of America, 1997. 26. John C. Hull. Options, Futures and Other Derivatives. PrenticeHall India, New Delhi, 7th edition, 2008. 27. John Lintner. The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. Review of Economics and Statistics, 47:13–37, 1965. 28. David G. Luenberger. Investment Science. Oxford University Press, New York, 1998. 29. Frederick R. Macaulay. The Movements of Interest Rates. Bond Yields and Stock Prices in the United States since 1856. National Bureau of Economic Research, New York, 1938. 30. Harry M. Markowitz. Portfolio selection. Journal of Finance, 7(1):77– 91, 1952. 31. Harry M. Markowitz. Portfolio Selection; Efficient Diversification of Investments. John Wiley, New York, 1959. 32. Robert C. Merton. Theory of rational option pricing. Bell Journal of Economics and Management Science, 4(1):141183, 1973. 33. Thomas Mikosch. Elementary Stochastic Calculus, with Finance in View. World Scientific Publishing, 1998. 34. Jan Mossin. Equilibrium in a capital Econometrica, 34(4): 768–783, 1966. asset market. 35. John P. Nolan. Stable Distributions - Models for Heavy Tailed Data. Birkhäuser, Boston, 2010. In progress, Chapter 1 online at htpp://academic2.american.edu/~jpnolan. 36. Geoffrey Poitras. The Early History of Financial Economics, 1478–1776. Edward Elgar Publishing, UK, 2000. 37. Geoffrey Poitras. Frederick R. Macaulay, Frank M. Redington and the emergence of modern fixed income analysis. In Geoffrey Poitras, editor, Pioneers of Financial Economics (Vol.2). Edward Elgar, UK, 2007. 38. Frank M. Redington. Review of the principles of life office valuations. Journal of the Institute of Actuaries, 78:286–340, 1952. 39. Richard Roll. A critique of the asset pricing theory’s tests. Part I: On past and potential testability of the theory. Journal of Financial Economics, 4(2):129–176, 1977. 40. Sheldon M. Ross. An Introduction to Mathematical Finance: Options and Other Topics. Cambridge University Press, Cambridge, 1999. 41. Sheldon M. Ross. A First Course in Probability. Prentice-Hall International, New Jersey, 6th edition, 2001. 42. Mark Rubinstein. A History of the Theory of Investments. John Wiley & Sons, Hoboken, 2006. 43. Walter Schachermayer. Introduction to the mathematics of financial markets. In Pierre Bernard, editor, Lectures on Probability Theory and Statistics, Saint-Flour summer school 2000, volume 1816 of Lecture Notes in Mathematics, pages 111–177. Springer-Verlag, Heidelberg, 2003. 44. William F. Sharpe. Capital asset prices - A theory of market equilibrium under conditions of risk. Journal of Finance, 19(3):425–42, 1964. 45. William F. Sharpe, Gordon J. Alexander, and Jeffrey V. Bailey. Investments. Prentice-Hall India, New Delhi, 3rd edition, 2000. 46. Steven E. Shreve. Stochastic Calculus for Finance I: The Binomial Asset Pricing Model. Springer-Verlag, New York, 2004. 47. Steven E. Shreve. Stochastic Calculus for Finance Continuous-Time Models. Springer-Verlag, New York, 2004. II: 48. Joseph Stampfli and Victor Goodman. The Mathematics of Finance: Modeling and Hedging. Brooks/Cole Thomson Learning, Australia, 2001. 49. Murad S. Taqqu. Bachelier and his Times: A Conversation with Bernard Bru. In H. Geman, D. Madan, S. R. Pliska, and T. Vorst, editors, Mathematical Finance–Bachelier Congress 2000. Springer-Verlag, 2002. Downloadable from http://www.bu.edu/mathfn/people/bachelier-english43-fin.pdf. 50. James Tobin. Liquidity preference as behavior towards risk. The Review of Economic Studies, 25:65–86, 1958.