Uploaded by siwomot855

The Calculus of Finance - Amber Habib

advertisement
The Calculus of Finance
For our entire range of books please use search strings "Orient
BlackSwan", "Universities Press India" and "Permanent Black" in
store.
The
Calculus
of
Finance
Amber Habib
Mathematical Sciences Foundation
New Delhi
THE CALCULUS OF FINANCE
Universities Press (India) Private Limited
Registered Office
3-6-747/1/A & 3-6-754/1, Himayatnagar, Hyderabad 500 029
(Telangana), INDIA
e-mail: info@universitiespress.com
Distributed by
Orient Blackswan Private Limited
Registered Office
3-6-752 Himayatnagar, Hyderabad 500 029 (Telangana), INDIA
e-mail: info@orientblackswan.com
Other Offices
Bengaluru, Bhopal, Chennai, Guwahati, Hyderabad, Jaipur, Kolkata,
Lucknow, Mumbai, New Delhi, Noida, Patna, Visakhapatnam
© Universities Press (India) Private Limited 2011
First Published in 2011
Reprinted 2018
eISBN 9789389211023
e-edition:First Published 2019
ePUB Conversion: TEXTSOFT Solutions Pvt. Ltd.
All rights reserved. No part of this publication may be reproduced,
distributed, or transmitted in any form or by any means, including
photocopying, recording, or other electronic or mechanical methods,
without the prior written permission of the publisher, except in the
case of brief quotations embodied in critical reviews and certain
other noncommercial uses permitted by copyright law. For
permission requests write to the publisher.
To my brother Faiz,
for lighting my way
Contents
Preface
List of Notation
1 Basic Concepts
1.1 Arbitrage
1.2 Return and Interest
1.3 The Time Value of Money
1.4 Bonds, Shares and Indices
1.5 Models and Assumptions
2 Deterministic Cash Flows
2.1 Net Present Value
2.2 Internal Rate of Return
2.3 A Comparison of IRR and NPV
2.4 Bonds: Price and Yield
2.5 Clean and Dirty Price
2.6 Price –Yield Curves
2.7 Duration
2.8 Term Structure of Interest Rates
2.9 Immunisation
2.10 Convexity
2.11 Callable Bonds
3 Random Cash Flows
3.1 Random Returns
3.2 Portfolio Diagrams and Efficiency
3.3 Feasible Set
3.4 Markowitz Model
3.5 Capital Asset Pricing Model
3.6 Diversification
3.7 CAPM as a Pricing Formula
3.8 Numerical Techniques
4 Forwards and Futures
4.1 Forwards and Futures
4.2 Forward and Futures Price
4.3 Value of a Futures Contract
4.4 Method of Replicating Portfolios
4.5 Hedging with Futures
4.6 Currency Futures
4.7 Stock Index Futures
5 Stock Price Models
5.1 Lognormal Model
5.2 Geometric Brownian Motion
5.3 Suitability of GBM for Stock Prices
5.4 Binomial Tree Model
6 Options
6.1 Call Options
6.2 Put Options
6.3 Put–Call Parity
6.4 Binomial Options Pricing Model
6.5 Pricing American Options
6.6 Factors Influencing Option Premiums
6.7 Options on Assets with Dividends
6.8 Dynamic Hedging
6.9 Risk-Neutral Valuation
7 The Black–Scholes Model
7.1 Risk-Neutral Valuation
7.2 The Black–Scholes Formula
7.3 Options on Futures
7.4 Options on Assets with Dividends
7.5 Black–Scholes and BOPM
7.6 Implied Volatility
7.7 Dynamic Hedging
7.8 The Greeks
7.9 The Black–Scholes PDE
7.10 Speculating with Options
8 Value at Risk
8.1 Definition of VaR
8.2 Linear Model
8.3 Quadratic Model
8.4 Monte Carlo Simulation
8.5 The Martingale
Appendix A: Calculus
A.1 One Variable Calculus
A.2 Partial Derivatives
A.3 Lagrange Multipliers Method
A.4 Differentiating under the Integral Sign
A.5 Double Integrals
Appendix B: Probability and Statistics
B.1 Basic Probability
B.2 Random Variables
B.3 Cumulative Distribution Function
B.4 Binomial Random Variable
B.5 Normal Random Variable
B.6 Expectation and Variance
B.7 Lognormal Random Variable
B.8 Cauchy Random Variable
B.9 Bivariate Distributions
B.10 Conditional Probability
B.11 Independence
B.12 Multivariate Distributions
B.13 Covariance Matrix
B.14 Linear Regression and Least Squares
B.15 Random Sampling
B.16 Sample Mean, Variance and Covariance
B.17 Central Limit Theorem
B.18 Stable Distributions
B.19 Data Fitting
B.20 Monte Carlo Simulation
Appendix C: Solutions to Selected Exercises
Bibliography
Preface
Mathematics has always enjoyed a close relationship with financial
matters. Early developments in arithmetic owed much to the needs
of accounting, and even geometry was influenced by the need of the
State to measure area to fix taxes. While economics deals with the
general issues regarding money and its place in society, finance has
a narrower aim: how should we invest our money to make it grow the
most? This sharper focusmakes itmore tractable to mathematical
treatment. It is exciting that relatively elementary mathematics can
lead to quite deep results in finance, including work that haswon
theNobel Prize. At the same time, the problems of finance have
helped motivate new mathematics of the highest order. Mathematical
finance offers a new solution to the perennial problem
mathematicians face—convincing people that our work has some
significance for society.
This book will introduce you to the basic concepts and products of
modern finance. The emphasis is not somuch on the details of the
financialworld as the basic principles by which we seek to
understand it. Thus the aim of the book is to teach you how to think
about finance. This seems particularly pertinent in the context of
market upheavals that appear to be caused, to a fair extent, by the
careless application of mathematical tools to the creation and pricing
of complicated contracts. The carelessness stems from a lack of
intuition or regard for the importance of the assumptions underlying
the models, leading to incorrect evaluation of risk. As this book will
show you, the understanding and quantification of risk is the central
problem of finance.
The book is based on material developed for the courses in
mathematical finance of the Mathematical Sciences Foundation,
Delhi. (MSF’s website is www.mathscifound.org) These courses are
mainly aimed at undergraduates, but also attract students of
professional courses as well as those in employment. They are
taught portfolio analysis and financial derivatives, with the highlights
being the Markowitz Model, the Capital Asset Pricing Model and the
Black– Scholes approach to options pricing. The students come from
a wide variety of backgrounds and so the required mathematics is
also taught in parallel to the material on finance. We emphasise
hands-on work, with extensive lab as well as student projects where
theory is applied to (and tested by) real-life data.
I have tried to retain the flavour of theMSF programs in that the
book should be accessible to undergraduates and others of varying
backgrounds. (Exposure to basic calculus and probability is all that is
required. No prior knowledge of economics or finance is needed.)
For those who have not taken any probability or calculus after high
school, the required mathematics is described in fair detail in the
Appendix. The book is peppered with examples that use real-life
data to ground the theory. Exercises are also scattered through the
text—their purpose varies fromsimple practice in applying formulas
to extending the ideas in the text to new situations.
The numbering of the exercises and examples needs some
explanation. Exercise 1.4.2 will be found in Chapter 1, Section 4.
Examples share the same numbering scheme with Exercises, so
that Example 1.4.3 is found in Chapter 1, Section 4, just after
Exercise 1.4.2.
As you read the book, you will notice that not many references
have been provided to the original sources. One reason is that it is
not always clear who first had a certain idea, or how much credit
should be given to the person who put it in its final or most popular
form. For instance, a technique may be used, perhaps implicitly, by
traders and investors long before it gets academic treatment and
acquires a provenance. A large and contentious book could be
written on the many claims to originality (one already has been:
Rubinstein [42]). The best thing would be for you to followup this
bookwith one ormore of the detailed texts on finance, for example,
Bodie, Kane and Marcus [7], Brealey andMyers [8], Cuthbertson and
Nitzsche [16], Hull [26], Luenberger [28], and Sharpe, Alexander and
Bailey [45]. Formore onMathematical Finance, the following books
are at about the same level as this one, but with varying choices of
coverage: Capinski and Zastawniak [9], Ross [40], and Stampfli and
Goodman [48]. To read more advanced texts, you would need to
become proficient in stochastic calculus. A gentle start in this
direction is provided by Mikosch [33], while the two volumes of
Shreve [46, 47] are more advanced and comprehensive.
I am grateful to my colleagues at MSF for support and inspiration
in countless ways. Perhaps the most striking is their commitment to
creating innovative teaching and research programs centered
around the interaction of mathematics with all aspects of the world
we live in. I particularly thank Professor Dinesh Singh, Director, MSF,
for inviting me to take part in MSF activities and for sharing his vision
of mathematics. Professor Sanjeev Agrawal has been a constant
source of ideas, advice, and energy. Other colleagues who have
helped me refine my thoughts on finance are Charu Sharma, Divya
Beri, Jatin Anand, Niteesh Sahni and Ziaur Rehman.
At Universities Press, I must thank its Director, Madhu Reddy, and
editors Shubashree Desikan and Sreelatha Menon for
encouragement, advice, and gentle prodding. Thanks are due to the
referee for pointing out various ways of improving the text.
Two old friends have played a special role in this story. Surajit
Basu added to the long list of kindnesses he has done me by
commenting on an early draft. Adnan Aziz rode in like the proverbial
white knight just before publication and saved me from an army of
ambiguities and omissions. I appeal to the reader to help root out
those that remain by writing to me at amber@mathscifound.org.
And, finally, my most heartfelt thanks to my wife Abha and son
Zafar for continually showing me new ways of looking at life.
Amber Habib
Mathematical Sciences
Foundation New Delhi
List of Notation
β, 82
, 217
B(n,p), 217
Cov[X,Y ], 229
δC , 166
DFW, 47
DM, 39
e, 10
, 214
≐, 197
E[X], 221
fS,T, 43
FX , 214
fX , 212, 213
fX,Y , 227
γC , 168
μ, 120
∇f, 203
N(μ,σ), 219
Ω, 210
Φ, 157
ℙ, 210
p* , 139, 141, 148
R, 247
R2 , 91
r, 59
reff, 13
ρ, 230
ρC , 171
S, 209
σ, 59, 120
σX , 222
σI , 165
σX 2, 222
σXY , 229
S2 , 244
sT , 42
SXY , 246
ΘC , 170
Var[X], 222
V C , 170
X, 243
x+ , 157
xq , 216
1 Basic Concepts
T
he aim of finance is to explore how money should be invested.
Imagine that you have inherited a large sum of money from a
rich uncle. The sum is so large that even when you have
satisfied your immediate needs, you still have a considerable amount
left over. What can you do with it? Some typical responses are:
1. Put it in a savings account.
2. Put it in a fixed deposit.
3. Buy bonds.
4. Buy shares.
5. Invest in a mutual fund.
6. Buy gold.
7. Buy real estate.
Of course, there are many other possibilities, but let us start with just
these. The question which arises is––in which of these should we put
our money? This naturally depends on how the nature of these
investments matches with our requirements.
For example, the advantage of a fixed deposit as opposed to a
savings account is that the former pays a higher rate of interest. The
savings account, on the other hand, allows you constant access to
your money, while a fixed deposit requires the money to be with the
bank for a set time such as three months or a year.
A bond provides regular payments over a set time period in return
for an initial payment and is thus rather like a fixed deposit. Many
bonds can be traded during their lifetime, and this provides additional
flexibility to the investor. Bonds are also issued by companies, not
only banks, and typically offer higher gains than fixed deposits. But
there is a downside––if the company hits sufficiently bad times it may
not be able to meet its obligations and the investor may not receive
the promised payments or even get the initial payment back. In other
words, the investor faces default risk, though only in exceptional
circumstances. Risk also enters the picture if the investor wishes to
sell the bond before its expiry, since the price would be affected by
prevailing market conditions and the perceived financial stability of the
issuer.
Risk comes into even more prominence when we consider the
remaining possibilities for investing. Share prices, for instance, vary
greatly from day to day. Even if we invest in a company with an
excellent record, there is no guarantee that we will gain by owning its
shares over the next few months. On the other hand, if we hold on to
the shares for many years we have a good chance of making a
handsome profit. It used to be thought that bonds provide the optimal
way to do well in the long run––say over 20 or 30 years. The current
opinion, however, is in the favour of shares, provided one invests in a
diverse collection of companies and thus reduces the possible loss
due to one or more of them doing badly. Mutual funds, which
distribute the investor’s money over such a collection, cater to the
investor who wants steady long-term growth. The investor who
wishes to make money quickly would invest in just a few shares that
he believes are going to do exceptionally well in the immediate future.
Such an investor would naturally be exposed to high levels of risk.
RISK AND PROFIT
Our discussion has brought forth some aspects of risk and profit.
In this book we shall investigate these in greater detail. The main task
is to quantify the relationship between risk and profit, so we can make
well-informed and precise decisions.
The initial problem is to figure out the ‘correct’ price for a product,
by which we mean a price that satisfies both buyer and seller. We will
refer to it as the value of the product. The products, by the way, could
be anything from commodities like cars or wheat, to bonds and
shares, or even contracts about future transactions. We will use the
generic term asset for the products being traded. A collection of
assets will be called a portfolio.
Beyond pricing, the main decision is what assets to invest in.
Naturally, we would like to invest in ones whose value seems likely to
increase at a faster rate. It is almost a law of Nature, however, that
bigger promises are also less reliable. In fact, less reliable promises
must be bigger if they are to have any takers. Thus, there is a tradeoff between expected profit and risk: to aim for higher profit, the
investor must undertake greater risk.
The fundamental problem in finance is to understand the
relationship between risk and profit.
The word risk is used in finance in a special way. It refers to
uncertainty and does not necessarily have a purely negative
connotation. Thus, consider the choice between putting money in a
bank account or using it to buy shares in a company. The second
investment is riskier because it has more uncertainty, but it is not
obvious how its worth compares with that of the the first one. Lotteries
provide an extreme instance of high-risk investments which are
nevertheless popular.
Figure 1.1: Long term behaviour of the return from some asset classes in the
US over the years 1926–1999. (Data from Stocks, Bonds, Bills and Inflation
2002 Yearbook, Ibbotson Associates. Used with permission from Morningstar,
Inc.) Note how the classes with greater mean return are also the ones with
greater fluctuation. Treasury bills, which are considered essentially risk free,
barely outperform inflation!
PROBABILITY
This discussion leads us to the role of probability in finance. The
relative worth of an investment depends on the probabilities of the
possible pay-offs. If higher pay-offs are perceived as more likely, its
value should increase. For example, if we can model the fluctuations
in prices of a stock, we can assign probabilities to the possible payoffs from buying that stock, and thus estimate its value to the
investor.1 Specifically, we treat the future profit as a random variable.
Its expectation then represents the expected profit, while its
standard deviation represents fluctuations and hence risk. (See
Figures 1.1 and 1.2.) 2
Figure 1.2: This diagram considers the 65 stocks making up the Dow Jones
Composite Index and their weekly profits over the one year period ending
November 6, 2006. The mean profit (per dollar invested) is plotted on the
vertical axis, and the standard deviation of the profits (representing risks) is
plotted on the horizontal axis. The curve has been drawn to emphasise the
absence of stocks with high mean profit but low risk – this indicates that higher
mean profit requires greater risk.
RISK-FREE ASSETS
Some assets can be viewed as free of risk. For instance, deposits in
banks and bonds bought from governments are typically treated as
risk-free. Of course, both banks and governments can collapse, but
such instances are rare. We shall soon see that there is good reason
to expect that all risk-free assets will gain in value at the same rate,
and we may therefore talk of the risk-free rate of growth. This rate is
not universally fixed, but varies with market and time.
PORTFOLIOS
So far, we have considered individual assets. To design a portfolio,
we need to consider not only the individual characteristics (regarding
profit and risk) of the assets, but also their relationships with each
other. Two assets could be linked together in certain ways––for
instance, they may show a tendency to rise or fall in value together.
Alternately, one may tend to move in the opposite direction to the
other. In the latter case, a rise in one would be offset by a fall in the
other, and a portfolio consisting of both these assets would be less
risky than a portfolio consisting of only one of them! By combining
assets in various ways, we can tailor a portfolio to satisfy the risk
preferences of any investor.
HEDGING
The process of reducing risk by combining assets appropriately is
called hedging. By hedging we reduce risk and therefore, also lower
our expected profit. One of the goals we will pursue in this text is to
see how to hedge against specific risks to which a portfolio is
exposed, for instance fluctuations in the prices of stocks or in interest
rates. If the hedging is complete, no risk will remain, and the portfolio
will grow slowly at the risk-free rate. Therefore, we will also consider
how to hedge to the right extent, so that the remaining risk just falls
within acceptable levels and the portfolio is able to grow at a faster
rate.
In the latter part of this book we will consider the financial
instruments known as derivatives. Derivatives are contracts that fix
the terms for future trades. A simple example is a contract that binds
two parties to a sale of crude oil, six months from now, at a price of
$50 per barrel.3 A prime use of derivatives is to reduce uncertainty
about future expenses (or profits), and they have become a very
popular means of hedging. The creation and pricing of suitable
derivatives is a major focus of modern finance.
1.1 ARBITRAGE
Arbitrage is the making of profit without undertaking risk. It can be
earned, for instance, when a product is being sold at different prices
in different markets. Then risk-free profit can be made by selling it
where it is costlier and buying it where it is cheaper. A variation is
when the different prices are at different times, so that it is possible to
buy today at a low price and sell some days later at a higher price.
For this profit to qualify as arbitrage, however, it must be absolutely
certain beforehand that the price will go up.
Exercise 1.1.1 Consider the following situations. Is it possible to
exploit them so that profit is certain?
1. A kind relative offers to sell you a share whose value has gone
up by at least 15% every year for 50 years.
2. A valid lottery ticket is lying on the road.
3. Horses A and B race against each other. If you bet a rupee on
horse A and it wins, you get back Rs 2. If you bet a rupee on
horse B and it wins, you get back Rs 4. If you bet on a horse that
loses, you lose your money.
Here is an early use of the term arbitrage in our sense, occurring in a
description of stock and derivative trading in eighteenth-century
Holland:
‘There are other arbitrages and other profitable combinations
independent of gambles or events, which are executed by combining
2 or 3 simultaneous transactions.’ Traite de la Circulation et du Credit
by Isaac de Pinto, 1771. (Quoted in Poitras [36])
Deciding whether a profit has been made can be tricky. If you
invest Rs 10 and after a while it becomes Rs 20, you may feel you
have made a profit of Rs 10. Suppose however, that you are based in
another country and count your gains in dollars. In the given time
period, if the value of the rupee in terms of dollars falls sufficiently far
you will perceive a loss rather than a gain. Yet again, suppose that in
the same period a rupee put in a savings account would have more
than doubled. Then it would be difficult to see the first investment as
truly representing a profit.
A simple way to resolve these ambiguities is to demand that
arbitrage must be carried out without investing your own money
(essentially, this means you start by borrowing some money and then
pay off the loan by the end). If you start with zero and end up with
something, you have definitely made a profit.
A basic principle is that arbitrage opportunities are short-lived:
Prices evolve in such a way as to eliminate them. For, as soon as it is
realised that a product is under-valued and is creating an arbitrage
opportunity, investors will rush to buy it. This will drive up its price,
reducing and ultimately eliminating the arbitrage opportunity. Similarly,
if the opportunity arises from an over-priced product, there will be a
rush to sell it and this will drive its price down.
Reflecting on this process, we are led to the formal definition of
arbitrage. We start by noting that the amount of profit does not have
to be known beforehand: it is enough to know that it cannot be
negative and has a chance of being positive. This will suffice to attract
investors and initiate the stabilisation process described above.
Definition 1.1.2 An investment strategy is said to lead to arbitrage if:
1. It does not involve an initial investment of the investor’s own
money.
2. It is known that at some future time the investment will have a
value which is definitely non-negative and additionally has a nonzero probability of being strictly positive.
Exercise 1.1.3 Which of the following situations provides an
arbitrage opportunity?
1. A guarantee that in return for Rs 10 paid now, Rs 20 will be
returned after ten years.
2. A guarantee that in return for Rs 10 paid now, Rs 20 will be
returned tomorrow.
3. A lottery ticket.
4. A free lottery ticket.
5. Bank A loans money at an annual interest rate of 10%, while
bank B pays 15% interest annually on deposits.
6. Bank A loans money at an annual interest rate of 15%, while
bank B pays 10% interest annually on deposits.
How long an arbitrage opportunity lasts depends on the
communication within the market. The better it is, the faster investors
will react to the situation and eliminate the opportunity. Thus, in the
idealised situation of an efficient market, in which communication is
instantaneous and complete, arbitrage opportunities will die
immediately. This is our main assumption and is used throughout
our text. Its brief statement is:
No Arbitrage Principle: In an efficient market, there are no arbitrage
possibilities.
The No Arbitrage Principle is a surprisingly powerful tool for
establishing the ‘correct’ price of a product and underlies every
important result in this book. It seems to have been first noticed by
Louis Bachelier in his doctoral thesis in 1900 when he described
‘transactions in which one of the parties makes a profit at all prices’
and also noted that ‘these are never found in practice.’4 Its systematic
use in modern finance, however, was initiated by Franco Modigliani
and Merton Miller in the 1950s. Both received the Nobel Prize in
economics––Modigliani in 1985 and Miller in 1990.
1.2 RETURN AND INTEREST
Consider an asset whose value evolves from V0 at an initial time t = 0
to VT at a later time t = T. Then the return from this asset over the
time interval [0, T] is defined to be
Return = VT – V0.
The rate of return is defined by
Rate of return =
.
Commonly, one also writes ‘return’ for ‘rate of return’. Confusion is
avoided by noting that return has a currency as unit, while the rate of
return is unit-free. We will further express rate of return in
percentages. Thus the phrase ‘The return was Rs 10’ refers to the
first definition, while ‘The return was 10%’ refers to the second and
conveys that the rate of return was 0.1.
INTEREST
We will call the income from an investment interest if it is earned
regularly and in a predetermined manner, without risk. (This is a
rather narrow use of the word and we employ it for clarity at this initial
stage.)
Interest can be calculated according to different conventions.
Consider a starting amount P (called the principal) on which interest
is earned over a time period T. The amount of interest earned is given
by the rate of interest, denoted r, in accordance with the adopted
convention. The rate r is given relative to some time interval, called its
period. The most commonly used period is one year, in which case
the rate is called annual.
Rates are commonly given as percentages, which have to be
converted to fractions for calculations.
SIMPLE INTEREST
In simple interest, the interest earned over one period is not added to
the principal (e.g., it may be returned to the investor), and further
interest is again earned on the principal alone. Thus, if P is invested
at a rate of interest r, the amount after one period is
A = P + Pr = P(1 + r).
During the second period, interest is again earned on P alone, so
that the amount after two periods is
A = P(1 + r) + Pr = P(1 + 2r).
This calculation easily extends to the general case:
Theorem 1.2.1 Suppose simple interest is earned on an investment
P at a rate r over n periods. Then the final amount A is
A = P(1 + nr).
□
Example 1.2.2 A common example of simple interest is the provision
of certain types of fixed deposits by banks. The interest earned on the
money in such a fixed deposit account is returned to the investor so
that future interest is earned on the original principal alone.
□
DISCRETE COMPOUND INTEREST
In compound interest, interest earned over one period is added to the
principal and earns interest in subsequent periods. If an amount P is
invested at a rate r, then the amount after one period is
A = P(1 + r),
just as for simple interest. However, in the second period P(1 + r)
serves as the principal, so that the amount after two periods is
A = P(1 + r)(1 + r) = P(1 + r)2.
The general case is again easy to obtain:
Theorem 1.2.3 Suppose compound interest is earned on an
investment P at a rate r over n periods. Then the final amount A is
A = P(1 + r)n.
□
Sometimes the period for which the rate is quoted is not the same as
the interval at which interest is compounded. For instance, the rate
may be given as an annual one, while the interest is calculated every
6 months. In this situation, the rate is adjusted linearly, as in the
following example.
Example 1.2.4 Suppose you invest Rs 10,000 for one and a half
years at an annual rate of 10% with semiannual compounding (that is,
the compounding is every six months). Then interest is calculated for
each six-month period at half the annual rate, i.e., at 5%.
Therefore, over one-and-a-half years, the invested amount
becomes
A = 10,000 × 1.053 = 11,576.25.
□
If interest is compounded m times during the period of the rate, then
the rate per compounding period is set to r/m and so we have the
following result.
Theorem 1.2.5 Suppose compound interest is calculated with a rate
r and is compounded m times per period. Then, over n periods an
investment P grows into an amount A given by:
A=P
n.
□
Example 1.2.6 Savings accounts in banks provide an example of
compound interest since the interest earned on the amount in the
account is fed back into the account.
□
Exercise 1.2.7 Suppose you take a loan of Rs 1000, and have to pay
it back in two equal and equally spaced installments over a year. The
annual rate of interest applied to this loan is 15% and the interest is
compounded semi-annually.
a. What will be the size of each installment?
b. How much of each installment will go toward the principal and
how much toward the interest? (Assume that each payment has
to pay off the outstanding interest at the time.)
CONTINUOUS COMPOUND INTEREST
Consider a bank offering interest compounded annually at a rate r.
Suppose it allows an investor who withdraws his money at a time t
before one year to earn interest at a linearly adjusted rate of rt. For
example, an investor can withdraw his investment of P after 6
months, together with the interest earned. It would total P(1 + r/2). He
can then immediately reinvest it for another 6 months. This strategy
nets him a final amount of
A = P(1 + r/2)(1 + r/2) = P(1 + r + r2⁄4),
which is slightly better than the P(1 + r) he would have had if he had
just let the money sit in the bank for the whole year. An investor who
can create this strategy will certainly think of pushing it further by
using smaller and smaller investment periods. In general, if he
withdraws and reinvests m times, he will end up with
m.
A=P
For example, if P = 100 and r = 10%, we have
m=2
⟹
A = 110.250
m=4
⟹
A = 110.381
m = 12
⟹
A = 110.471
m = 52
⟹
A = 110.506
m = 365
⟹
A = 110.516
The larger the value of m, the greater is his profit. This naturally leads
to considering the limit case m →∞. To evaluate this limit, we need to
recall the number e, which is called Euler’s number and is defined by
Euler’s number is approximately 2.71828. Now we can calculate:
This suggests creating a new kind of interest calculated by A = Per for
a single period.
Interest calculated according to this formula is said to have been
continuously compounded, and r is called the continuously
compounded rate of interest.
Theorem 1.2.8
Suppose continuously compounded interest is
calculated with a rate r per period. Then, over n periods an
investment P grows into an amount A given by
A = Penr.
□
Again, if we take P = 100 and r = 10%, then the continuously
compounded amount after a year is
A = 100 e0.1 = 110.517,
while daily compounding over a year gave 110.516. Thus daily
compounding is barely differentiable from continuous compounding!
INTEREST AT ARBITRARY TIMES
We have been considering the interest earned when money is
invested for a full time period, or for n full time periods. Now we look
at what happens when an investor withdraws her money at some
intermediate time. In particular, let the investor withdraw her money
after a time T which consists of n full time periods and a final fraction t
of a time period (so 0 ≤ t < 1).
The convention we will adopt is that during the fractional period
the rate of interest is adjusted linearly to rt. (Note that this is
consistent with the convention used when the period for the quoted
rate differs from the compounding period.) Then, over the time T, the
invested amount becomes
Simple interest: A = P(1 + nr) + Prt = P(1 + rT)
Discretely compounded interest: A = P(1 + r)n(1 + rt)
Continuously compounded interest: A = Penrert = PerT
The formulas for simple and continuously compounded interest are
mathematically simple, while that for discrete compounding is slightly
more complicated. If we plot A versus T, then for simple interest we
get a straight line, for discretely compounded interest a sequence of
straight line segments with increasing slope, and for continuously
compounded interest a smooth exponential curve (Figure 1.3).
Figure 1.3: This diagram shows the growth of Rs 100 according to the
different interest rates, each with r = 10%, over a period of 10 years.
Exercise 1.2.9 Consider a bank that offers to double your investment
in 10 years. What is the corresponding annual rate of interest if we
assume the interest is:
a. simple
b. compounded annually
c. compounded continuously.
Continuous compounding is mathematically pleasant in another way.
Withdrawing and reinvesting becomes just the same as making a
single long investment since
erT1erT2 = er(T1+T2).
Moreover, if continuous compounding is used, the same r can be
used for borrowing and lending without creating arbitrage
opportunities. This is not the case with discrete compounding.
Exercise 1.2.10 Suppose a bank has fixed an annual 5% discretely
compounded interest rate for both deposits and loans. Show that this
creates an arbitrage opportunity.
Exercise 1.2.11 Given that continuous compounding has nicer
behaviour than discrete compounding, can you explain why financial
institutions use the latter?
Let us now consider the possibility of different interest rates being
available in the market. The differences could be of various types:
1. Use of different types of interest (simple or compound with
different periods).
2. Different rates offered by different financial institutions.
3. Different rates for deposits and loans.
4. Different rates for investments of different time durations.
Typically, institutions use discretely compounded interest. The
difference would be in the frequency of compounding. The same
value of r but with more frequent compounding leads to more interest
being earned. Thus institutions doing more frequent compounding
would also use slightly lower values of r. Continuous compounding is
used more in mathematical modelling, and these models would use a
value of r that is essentially equivalent to that being used for discrete
compounding in the real world.
EFFECTIVE RATE OF INTEREST
One way to reduce the confusion from different kinds of interest is to
calculate for each the amount of interest it earns over one year. In
other words, we calculate the annually compounded interest rate that
would generate the same amount of interest.
Thus, suppose that a principal P has grown to an amount A by
earning interest over a year. Then the interest earned is A–P. The
corresponding effective rate of interest is defined to be the interest
earned in one year per unit invested:
reff =
.
We can expect that the effective rates of various available interestearning schemes would be the same.
The quoted rate of interest is called the nominal rate, to
distinguish it from the corresponding effective rate.
Example 1.2.12 Consider a nominal rate r of 10% annually. If this is
used with annual compounding, then the corresponding effective rate
is again 10%. If the compounding is semi-annual (every 6 months),
the effective rate becomes (1 + 0.1⁄2)2 – 1 = 0.1025, i.e. 10.25%.
Finally, if continuous compounding is used, the effective rate is e0.1 –
1 = 0.1052 or 10.52%.
□
Exercise 1.2.13 A credit card offers a cash withdrawal facility at a
“low” monthly rate of 2%. What is the corresponding effective annual
rate?
Exercise 1.2.14 Consider two investments A and B of the same
amount, and at the same effective annual interest rate. Suppose A
earns semi-annually compounded interest and B earns continuously
compounded interest.
a. Which one earns more interest if the period of the investment is:
6 months, 9 months, 1 year?
b. Suppose the invested amount is Rs 1000, and the common
effective rate is 10%. What is the maximum difference in the
interests earned by A and B at any point during the first 6
months? (The answer is quite small, so use a good number of
decimal places in your calculations.)
Let us now look at the other kinds of variations in interest rates listed
on page 12. The second kind of variation can be expected to be small
due to competition. A bank offering lower interest on deposits than its
competitors would soon start losing customers and would have to
raise its rates.5
The third kind certainly exists. Thus, a bank will offer lower interest
on deposits than it will exact on loans. However, it should be noted
that only the first rate can be reasonably seen as risk-free. The
second rate involves a risk taken by the bank, which explains why it is
higher.
Exercise 1.3.15 Show that the No Arbitrage Principle rules out a
bank offering higher interest on deposits as compared to loans.
The fourth kind of difference is quite important and we will consider it
in detail later (§2.8). As a general (but not universal) rule, investments
for longer time durations are granted higher interest rates. The idea is
that such an investment is exposed to more risk over its life and, to
compensate, it must promise a higher profit.
Example 1.2.16 In April 2005, 6-month investments in Reserve Bank
of India bonds were earning 5.4% interest, while 12-month
investments were earning 5.6%.
□
We can, therefore, expect the same effective interest rates for
investments of the same duration. Thus, we may (and do) talk of a
common risk-free rate that applies to all risk-free investments over
the same time period.
1.3 THE TIME VALUE OF MONEY
Consider two offers: the first promises you one rupee right away and
the other, after a month. Assuming both the offers are from
trustworthy sources, do you have any reason to prefer one to the
other? The simple answer is that it is better to get the money early, as
it can be put in a bank to start earning interest. This example
illustrates the important idea that the value of a transaction involves
not only an amount of money but also the time at which it is
undertaken.
PRESENT AND FUTURE VALUE
We have observed that holding a rupee now is not the same as
holding it a year from now. Well then, what is the precise difference
between the two? It depends on how much the rupee could have
earned in a year by means of interest.
Example 1.3.1 Suppose a rupee can be invested at a simple annual
rate of 5%. Then, after a year, it becomes 1.05 rupees. In this
situation, earning a rupee now is equivalent to earning 1.05 rupees
after a year.
□
In the above example, Rs 1.05 is the future value of Rs 1.
Conversely, Rs 1 is the present value of Rs 1.05.
In general, consider two amounts P and F that exist at times t1
and t2 respectively, with t1 < t2. Let the risk-free rate of return over the
interval [t1,t2] be r. Then we call P the present value of F (at t1) if
P=
.
Conversely, F is the future value of P (at t2). The factor C = 1/(1 + r) is
called the discount factor for this period.
If the risk-free rate is given in terms of interest, then the
corresponding discount factor can be calculated as follows:
1. Suppose the annual interest rate is r, with compounding being m
times a year (after equal periods of time). Then the discount
factor over n periods is
C=
.
2. If the interest is compounded continuously, then the discount
factor over T years is
C = e−rT.
The following cannot be over-emphasised:
Amounts of money existing at different times must not be compared
or combined without taking into account the relevant discount
factors.
INFLATION
Another way that the value of money changes through time is with
respect to its purchasing power. Typically, the same amount of money
can buy less and less as time progresses––this phenomenon is
known as inflation. Inflation shows up as a general increase in
prices. Occasionally, prices may fall, and then we have a deflation
situation. But inflation is the general trend.
By averaging the rise in prices over various commodities, one can
arrive at a single number––the rate of inflation f––which represents
the annual decrease in purchasing power of a unit of currency.
Purchasing power after 1 year =
.
If an amount A is invested at the risk-free rate r for a year, then the
effective amount one has after a year is
A,
and so, we may talk of the real risk-free rate r′ defined by
1 + r′ =
, or r′ =
.
Exercise 1.3.2 If continuous rates are used for inflation as well as
risk-free growth, show that the real risk-free rate is given by r′ = r − f.
Example 1.3.3 The Reserve Bank of India calculates inflation rates
from the Wholesale Price Index (WPI) which tracks the price of a
certain varied mix (or ‘basket’) of commodities. The RBI also
maintains subindices for food articles, manufactured articles, and so
on. The table below shows the WPI during 1993−99:
For example, the inflation rate over 1994-95 would be calculated as
follows. Since the price of the basket rose from 100 to 112.6, we
could say that the purchasing power of the rupee changed by a factor
of 100/112.6 = 0.888. The rate of inflation can now be obtained:
=
= 0.888
f = 0.126 or 12.6%.
We can repeat these calculations to obtain the inflation rate for
each period:
Exercise 1.3.4: If an investment (in rupees) earned an annually □
compounded 8% interest throughout the period 1993–99, what
would be the return from it in real terms (i.e., in terms of purchasing
power)?
In this book, we will not worry about inflation. The above
discussion shows that, if necessary, inflation can be taken into
account by a suitable modification of the risk-free rate.
Indeed, the fact that inflation applies to all amounts of money
makes it less relevant to financial choices than we may expect. For
example, consider a choice between the following offers:
1. Rs 100 now
2. Rs 110 a year from now
Suppose the available interest rate is 8%. Then the first amount
grows to Rs 108 over a year – and this is a bit less than Rs 110. Most
people, however, still prefer the first choice. They argue that the Rs
110 will also be subject to inflation and so they would prefer to have
the Rs 100 now.
Yet the point to remember is that present value is not just a
theoretical notion––it has practical implications. If we are sure of
receiving Rs 110 in a year, we can borrow its present value against it
today. In this case that amounts to Rs 101.85. In other words, the
second offer is equivalent to an offer of Rs 101.85 today, and so it
wins no matter what the rate of inflation is.
1.4 BONDS, SHARES AND INDICES
Bonds and shares are the principal means by which institutions raise
money for their operations, and hence they provide the chief avenues
for investment.
A bond is a contract written by a company or government (called
its issuer). The purchaser of the bond makes an immediate payment
to the issuer and, in return, is entitled to a predetermined number of
regular payments in the future. Thus, a bond is essentially a loan.
Institutions use bonds to raise money when the amount needed is too
large to be obtained from a single source. Investors use bonds as
relatively safe investments providing a higher rate of return than a
simple deposit in a bank. Judiciously used, they can provide
insulation from interest rate changes as well. (We will take up this
application in §2.9.)
A share represents a part of the capital of a company. Its owner is
thus a part-owner of the company and takes part in its fortunes. The
share may offer him certain voting rights in the affairs of the company.
Many companies also release regular payments, called dividends, to
their shareholders out of their profits. The term stock is also used for
a share. Another usage of the word stock is the total capital
represented by the shares, or the total market capitalisation (TMC)
of the company.
A stock index is a hypothetical portfolio used for keeping track of
the general trends in the market.
Here are some examples of stock indices from India6:
1. BSE Sensex is based on 30 stocks forming a sample of large,
liquid and representative companies listed on the Bombay Stock
Exchange (BSE).
2. S&P CNX Nifty consists of the 50 stocks with highest TMC on the
National Stock Exchange of India (NSE). It represents about 60%
of the TMC on the NSE.
3. S&P CNX 500 consists of 500 stocks and covers 91% of the total
turnover on NSE and 92% of the market capitalisation.
4. CNX Realty Index represents 85% of market capitalisation and
88% of turnover in the real estate sector.
Here are some international examples:
1. Standard and Poor’s 500 Index (S&P 500), is based on 500 US
stocks. S&P 500 covers about 70% of the total TMC and 78% of
the total traded value.
2. Dow Jones Industrial Average (DJIA) consists of 30 of the largest
public companies in the US. It was created in 1896, when it
consisted of 12 companies (See Figure 1.4).
Figure 1.4: The Dow Jones Industrial Average Index (DJIA) between
1928 and 2008
3. NASDAQ 100 consists of 100 of the largest companies (including
non-US ones) listed on the NASDAQ exchange. It is relatively
heavy in IT companies. Infosys is one of the current components
of this index.
4. NASDAQ Composite (or just ‘the NASDAQ’) includes every
company listed on the NASDAQ exchange––currently more than
3000.
5. NYSE Composite includes each of the over 2000 stocks listed on
the New York Stock Exchange.
6. FTSE 100 consists of the top 100 companies in terms of TMC on
the London Stock Exchange. It represents about 80% of the TMC
on the London Stock Exchange.
7. Nikkei 225 is based on the Tokyo Stock Exchange.
8. Hang Seng Index consists of 39 companies listed on the Hong
Kong Stock Exchange and comprising 65% of its TMC.
Figure 1.5: In this diagram we plot the logarithms of the DJIA values and see
that they have a strong linear trend. They also enable a clearer look at the
relative size of fluctuations over a long term.
Of these various stock indices, the smaller ones (with 30–50
constituents) give a summary of some particular aspect of the
economy––perhaps of its largest companies, or of those belonging to
a particular sector. Larger ones attempt to portray the national
economy as a whole. Recently, stock indices have been created that
track entire continents or the whole world.
In this book we will consider two important uses of stock indices.
One is as a benchmark against which other portfolios are measured.
This aspect will be prominent in the applications of the Capital Asset
Pricing Model. Another, which we take up during our study of financial
derivatives, is as a tool to hedge against the general movements of
the economy.
1.5 MODELS AND ASSUMPTIONS
Any mathematical treatment of ‘real life’ must start by simplifying and
thereby distorting it. A good choice of simplification is one that distorts
less and yet leads to more insight. We may even accept a choice that
we know is an over-simplification if it brings clarity to some essentials.
Thus, the nature of mathematical modelling is quite different from
what we generally expect of mathematics. We are less concerned
with the validity of our deductions than with the value of the final
result.
In this text, the main mathematical assumption is the No Arbitrage
Principle, sitting atop the assumption of an efficient market. In the real
world, arbitrage opportunities do exist and, in fact, some financial
institutions invest considerable effort in quickly detecting and
exploiting them. The assumption, however, is almost indispensable
because of the tremendous direction it gives to our work. It is further
justified by the observation that arbitrage opportunities, when they
exist, do not exist for long.
We have also ignored many other aspects of real markets.
Foremost among these are the various costs associated with
transactions––fees charged by exchanges and brokers as well as
taxes to governments. These are collectively known as friction in the
financial world since they slow down its machine by lowering profits.
Friction also has the effect of obliterating small arbitrage
opportunities. The result is that the No Arbitrage Principle does not
yield a finely balanced single ‘correct’ price but a range of valid prices.
Yet another issue is the linearity of prices. This is, after all, true
only for small trades. Further, high volume trades affect prices. If you
attempt to offload a large amount of a particular stock at today’s price,
your act will lower its price, and you will likely end up with less than
you had hoped for. We have not taken up such complications in our
book.
The material in this book, therefore, constitutes only the first few
steps on the path to acquiring a detailed understanding of finance. Yet
these few steps are the most crucial and bring to you tools of great
power and reach.
1 We shall look at specific stock price models in Chapter 5 of this book.
2 You can ignore this discussion for now if you are unfamiliar with the terms
random variable, expectation, and standard deviation. However, you should
start your study of the basics of probability (Appendix B) and familiarise yourself
with these concepts as they will soon become essential.
3 A barrel of oil equals 159 litres. Between January and December 2008, the
price of a barrel of crude oil rose from $85 to $122 (in June) and then fell all the
way to $33!
4 The English translation is by Davis and Etheridge on page 24 of [17]. The
seminal contributions of Bachelier are described in more detail in the
introductory remarks of our fifth chapter.
5 The rates would have to be compared after taking into account any fees
charged by the banks. A bank offering better service might also get away with a
lower rate. Thus, in the real world, we only expect the differences to be small –
not absent.
6 For more information, consult the websites of the National Stock Exchange of
India
(www.nse-india.com)
(www.bseindia.com).
and
the
Bombay
Stock
Exchange
2 Deterministic Cash Flows
A
cash flow is a sequence x0,x1,…,xn of cash transactions
occurring at corresponding times t0, t1, …, tn (We will always
take t0 = 0, the present time). Earnings are represented by
positive signs, and expenditures by negative signs. For example, if a
cash flow has annual transactions –1000, 400, and 700, then the first
entry is an expenditure while the next two are earnings.
A cash flow may, for example, represent the earnings and
expenses associated with a particular investment, or a portfolio of
investments, or the entire cash transactions of a company. It may also
represent a single transaction.
An investor or a manager needs to know how to choose between
competing investments or projects. To do this, she has to be able to
summarise the corresponding cash flows by one or two numbers that
characterise the profits associated to them. The final decision will be
made on the basis of these numbers. We will see that there isn’t any
one number that works universally. Instead, there are different
methods of comparison catering to the different requirements that she
may have.
In general, a cash flow is random in that the entries are not known
ahead of their time. We will start, however, by imagining that the
entries are known in advance, i.e., the cash flow is deterministic.
This is justified on the grounds that there are important cash flows
which are indeed deterministic (for example, income from government
bonds, which we shall study later on in this chapter). Furthermore,
when choosing between two projects, we calculate their projected
earnings and expenses and, for the purpose of comparison, act as if
these cash flows will indeed happen.
It is often convenient to depict a cash flow by a diagram like the
one given in Figure 2.1. In this diagram, the times when the
transactions occur are marked along the horizontal axis. The
transactions are represented by vertical arrows—these point up for
earnings and down for payments, while their lengths represent the
amounts of cash involved.
Figure 2.1
2.1 NET PRESENT VALUE
Consider a cash flow x0, x1, …, xn occurring at times t0 = 0, t1, …, tn.
Let Ci be the discount factor for the time interval [t0, ti]. Recall that the
discount factor gives the present value of an investment. Thus the
present value of xi is Ci xi .
The net present value or NPV of the cash flow is just the sum of
the present values of all the entries:
NPV = ∑ i=0nC ixi.
Owning this cash flow is equivalent to an immediate earning equal to
its NPV. To see this, we split the NPV into its constituent parts, C0x0 =
x0, C1x1, …, Cnxn. We invest each part Cixi in a risk-free asset for the
corresponding time ti. Then, over this time it grows into the amount xi.
Thus, the NPV is the exact amount needed now to generate the given
cash flow.
Hence one way to compare different cash flows is to compare
their NPVs.
Example 2.1.1 Consider two projects with projected annual earnings
as given below:
0
1
2
A
–500
–100
700
B
–500
700
–100
Let the discount factors for the first and second years be 0.9 and 0.8
respectively. Then the net present values of A and B are given by
NPV(A)
=
–500 – 0.9 × 100 + 0.8 × 700
=
–30
NPV(B)
– 500
–500 + 0.9 × 700 – 0.8 × 100 = 50.
=
50
Thus, on the basis of NPV, project B is better than project A. If the
time value of money had not been taken into account, the projects
would have appeared identical, each showing a total expenditure of
600 and an earning of 700.
□
An important special case is when the cash flow occurs at regular
intervals (e.g., it may consist of annual payments), and the same riskfree rate can be applied to all its entries. Thus, suppose the risk-free
rate is r over each interval and we use discrete compounding. Then
the discount factors are
and so,
where x0 is earned at time zero, x1 after the first interval, and so on.
Example 2.1.2 An annuity consists of annual payments of the same
amount A. An annuity lasting for n years would be represented by
Figure 2.2.
Figure 2.2
The NPV of an n-year annuity, with the first payment after 1 year, can
be calculated as follows (assuming a constant risk-free rate r):
Here we have a geometric sum of the form
Fortunately, there is a formula for such a sum:
, with x = 1⁄(1 + r).
We substitute this formula in the previous equation and obtain
If the annuity is from a source that can be considered risk-free, then
this NPV is the fair price at which the annuity should be traded at time
0.
□
Example 2.1.3 A perpetuity is an annuity that extends forever. Its
NPV can be obtained by letting n →∞ in the previous calculation:
NPV =
.
The formula is more easily understood by treating the NPV as the
amount which can generate annual payments of A when the rate of
interest is r. This immediately gives
NPV × r = A.
□
Until the First World War, perpetuities were a popular way for
European governments to raise money, particularly for war.
Speculation centered around trade in these perpetuities was a major
financial activity. In 1900, the center for such trade was the Paris
Stock Exchange, and the total capital loaned through perpetuities was
70 billion gold francs (by the governments of France, Germany,
Russia, etc.). In comparison, the annual budget of France was 4
billion francs! (Source: Taqqu [49])
Net present value is an easy-to-calculate technique for choosing
between projects. Another nice property of NPV is additivity: the NPV
of two cash flows taken together is the sum of their NPVs. In symbols,
NPV(A + B) = NPV(A) + NPV(B).
Thus one can build up a view of a project by separately evaluating its
parts.
Example 2.1.4
Figure 2.3(a).
Consider two cash flows, A and B, as shown in
Figure 2.3(a)
The cash flow A+B collects these transactions into a single cash flow
shown in Figure 2.3(b).
Figure 2.3(b)
□
It is important to keep in mind that net present value does not use
only the intrinsic properties of the cash flows but is closely tied to
market conditions in the form of interest rates. Since interest rates
fluctuate with time, a project that starts off with higher NPV may not
stay that way. For instance, in Example 2.1.1, if the sequence of
discount factors is changed to 0.8, 0.9, then project A becomes better
than project B.
In the next section we will study a measure which uses only the
intrinsic properties of a cash flow to evaluate it.
2.2 INTERNAL RATE OF RETURN
Once again, consider a deterministic cash flow sold at time t = 0 for a
price P. Its internal rate of return or IRR is that rate of interest r
which would allow the flow to be generated from its price. Another
way of putting this is to say that IRR is the rate of interest which would
make the cash flow’s NPV equal to P.
Let us look at some simple examples.
Example 2.2.1 Consider a cash flow of 2,4 occurring after 1 and 2
years respectively. Suppose it is sold at a price P = 5 at time t = 0. For
a (discretely compounded) interest rate r, the NPV of this flow is
+
.
Therefore the IRR is obtained by solving the following equation for r:
5=
+
.
This can be rearranged into a quadratic equation
5(1 + r)2 – 2(1 + r) – 4 = 0,
leading to the two values
r = 0.12, – 1.72.
Now, since the price paid is less than the total earned, it is clear that r
should be positive. So we reject the value –1.72 and obtain 0.12 as
the IRR.
□
IRR need not always be positive. For example, suppose the cash flow
in the last example was bought at a price P = 7. Then the two
solutions for r are
r = – 0.09, – 1.63.
Both solutions are negative, which was to be expected since the
amount paid is more than that received. Can we still reject one? Since
some of the amount paid does come back, it is clear that the loss
does not reach 100%. Hence we should have r > –1. This leads us to
reject the value –1.63 and take the IRR to be –0.09.
Figure 2.4: Internal rate of return for the cash flow in Example 2.2.1
In Figure 2.4 we plot the NPV of the last example as a function of
r:
f(r) =
+
.
The intersection of the graph with the horizontal line at height P gives
the possible values of IRR if this cash flow is sold for P at time 0.
Figure 2.4 shows that we always obtain one value below –1 and one
value above –1. Due to the reasons given above, the value which is
above –1 is taken to be the IRR.
Consider a regular cash flow of amounts x1, …, xn, occurring at
times 1, …, n and sold at time t = 0 for a price P. Its IRR is then a
solution of the equation
(2.1)
If we include P as part of the cash flow, we obtain the cash flow x0 = –
P,x1,…,xn, occurring at times 0,1,…,n. In this notation the IRR
equation is
(2.2)
The internal rate of return has the virtue of using only knowledge of
the cash flow, without any reference to external and transient factors
like interest rates. The characterisation of a cash flow by a rate of
growth is also intuitively appealing, and so, the most popular
technique for attracting investors is to promise a high IRR.
The IRR is a solution of a polynomial equation since equation (2.2)
can be rearranged into
Unfortunately, a polynomial equation may have no solution, and in
that case IRR will not be defined.
One of the earliest non-trivial cubics to be solved was
+
+ 15x = 50
by Dardi of Pisa, circa 1350 AD. The equation arose out of a
calculation of monthly interest on a loan (Source: Hughes[25]).
Exercise 2.2.2 Consider the cash flow x0 = –1,x1 = 0,x2 = –1. Show
its IRR is not defined.
Exercise 2.2.3 Consider a cash flow in which x0 < 0 and the net
inflow is greater than the net outflow. Show its IRR equation has a
solution.
A more significant source of trouble is that a polynomial equation may
have many solutions, and then we may not be able to say which is the
‘right’ IRR.
Exercise 2.2.4 Consider the cash flow – 1, 8, – 17, – 2, 24 at times
0,1,2,3,4. Its IRR equation is
– (1 + r)4 + 8(1 + r)3 – 17(1 + r)2 – 2(1 + r) + 24 = 0.
You can verify by substitution that the solutions are r = – 2,1,2,3.
□
There is one situation in which the IRR is defined without ambiguity.
Consider a cash flow occurring at times 0, 1, …, n such that there is
exactly one sign change in the transactions. Such a flow has the form
– x0,– x1,…,– xk,xk+1,…,xn,
or
x0,x1,…,xk,– xk+1,…,– xn,
with each xi ≥ 0. In both cases the IRR equation is
– x0 –
–
–
+
+
= 0.
We multiply each term by (1 + r)k and rearrange:
x0(1 + r)k + x 1(1 + r)k–1 +
+ xk =
+
+
.
Figure 2.5
Consider the left side of this equation as a function of r. As r varies
over (–1,∞), the values of this function increase from xk to ∞. On the
other hand, the right-hand side decreases from ∞ to 0. Therefore the
two sides meet at exactly one point, as depicted in Figure 2.5.
Hence, in this situation, there is a unique value of r which satisfies
the IRR equation as well as r > –1.
TECHNIQUES FOR CALCULATING IRR
In the situations when the internal rate of return is defined, we still
have to solve equation (2.1) to find it. Usually, it is not possible to
solve it exactly, but there are techniques for finding approximate
solutions. The simplest one is the following.
BISECTION METHOD
Consider the function
Calculate values of f(r) for different values of r until you find values r1
and r2 such that f(r1) > P > f(r2). Then the IRR is between r1 and r2.
Let r3 be the midpoint of r1 and r2. If f(r3) > P, the IRR is between r3
and r2; otherwise it is between r3 and r2. Take the appropriate
midpoint and continue the process. The midpoints give a sequence of
gradually improving approximations to the IRR.
Example 2.2.5 Consider a cash flow of 1,2,2,4 at years 1,2,3,4.
Suppose it was sold for a price of P = 5 at time 0. Let
f(r) =
+
+
+
.
We calculate f(0.1) = 6.8 and f(0.5) = 2.9; hence the IRR is between
0.1 and 0.5. Their midpoint is 0.3 and f(0.3) = 4.3, so the IRR is
between 0.1 and 0.3. The next midpoint is 0.2, with f(0.2) = 5.3.
Proceeding in this way, we create the following sequence of
approximations to IRR:
0.3, ,0.2, 0.25, 0.225, 0.2375, 0.23125, 0.228125, …
In fact, f(0.228) = 4.979, so 0.228 is a reasonable approximation to
the IRR.
□
We now present a more sophisticated technique for solving f(r) = P.
NEWTON–RAPHSON METHOD
We first calculate the derivative of f(r):
We then define
The method starts with any choice of a value r1 of r. We then define r2
= g(r1), r3 = g(r2), and so on. If this sequence converges to some
value s, then by taking limits on both sides of the definition of g, we
see that
s=s–
, and hence f(s) = P,
which means that s is the IRR.
Example 2.2.6 Let us apply the Newton–Raphson method to the
situation of Example 2.2.5. If we start with r1 = 0.5, the rule rk+1 = g(rk)
creates the following sequence:
0.5, 0.08, 0.19, 0.224, 0.22618, …
We find that f(0.22618) = 5.00007, so this method has given a much
better approximation than the bisection method, and in fewer steps! □
These methods are built into spreadsheet programs such as Microsoft
Excel and OpenOffice Calc. Both of these have a tool called Goal Seek
in which we can set up the values and solve the IRR equation.
IRR AND PROJECT CHOICE
IRR attempts to capture the rate of growth of an investment. Hence,
we would usually prefer a cash flow with a higher IRR.
Example 2.2.7 Consider the two projects of Example 2.1.1.
The IRR of A is 8.7%. The IRR equation for B has two solutions:
23.8% and –83.9%. Since the investment in B is less than the gain,
we reject the negative value and take 23.8% as the IRR. So, on the
basis of IRR, we would choose B.
□
The next example illustrates that some caution is needed while
applying this principle.
Example 2.2.8 Consider the following cash flows.
t=0
t=1
A
100
–150
B
–100
150
Cash flow A represents borrowing while cash flow B represents
lending; both have an IRR of 50%. While this high IRR is good for the
lender, it is bad for the borrower!
NPV, unlike IRR, would have distinguished between the situations
of borrowing and lending. For instance, with the risk free rate set at
5%, NPV(A) = –42.86, and NPV(B) = 42.86.
□
In a cash flow with a single sign change, we can say which side of the
deal we are on whether we are giving or taking the loan. If the cash
flow has initial positive signs we are taking the loan and if the initial
signs are negative we are giving it. In more complicated flows we
cannot say, and then a high IRR is just as likely to be bad as good!
2.3 A COMPARISON OF IRR AND NPV
We have encountered two ways of evaluating projects, with rather
different characteristics. They may make opposing recommendations,
so it is important to have some understanding of their relative
strengths and weaknesses. The main features of NPV are:
1. It is always well defined, and easy to calculate once the interest
rates are known.
2. It is linear: the NPV of the whole is the sum of the NPVs of the
parts.
3. It gives a sense of the total profit over the life of the project.
4. It does not give a sense of the rate of growth of the project—
either in the sense of profit per unit investment or in the sense of
growth per unit time.
Let us emphasise the last point with a couple of examples.
1. Is an NPV of a million really desirable? What if it is earned from a
project with a total investment of a billion?
2. Again, is an NPV of a million from a project that lasts 20 years
obviously better than an NPV of 500,000 from a project that lasts
5 years?
The first of these difficulties can be handled as follows. Given a cash
flow, we let
I = magnitude of the NPV of the negative entries,
N = NPV of the full cash flow.
Then the present value ratio (N⁄I) gives the value per unit invested.7
With IRR, the situation is just the reverse.
1. It is only well defined for very simple cash flows, and even then it
is not easy to calculate. On the other hand, it requires no extra
information (like interest rates).
2. The IRR of the whole cannot be obtained from the IRRs of the
parts.
3. It does not give a sense of the total profit over the life of the
project.
4. It gives a sense of the rate of growth of the project—both in the
sense of profit per unit investment and in the sense of growth per
unit time.
5. Finally, let us recall from the last session that it may not be clear
whether a high IRR indicates a high rate of profit or a high rate of
loss!
Here, we would be troubled by questions such as:
1. Is a small project with an IRR of 30% to be preferred to a large
project with an IRR of 20%?
2. Is a project with an IRR of 30% over 10 years to be preferred to a
project with an IRR of 20% over 20 years?
NPV and IRR are indicators of the virtues or faults of a project.
Neither should be taken as giving the final word. The information they
convey must be supplemented by a careful analysis of the investor’s
needs and plans. For example, if a project is a stand-alone
opportunity with a clear-cut life, then NPV is better. An example of this
is the construction and sale of an office building. Starting and running
a car factory is a project of a very different kind. Here one would be
interested in the annual growth of profit rather than the total gain over
the life of the factory—here IRR is the natural fit.
MODIFIED IRR
In recent years, another measure of the rate of growth has gained
popularity. It is called the modified internal rate of return or
modified IRR or MIRR. It is a sort of hybrid of NPV and IRR. In
MIRR, we evaluate a project’s cash flow within the context of the firm
undertaking it. All negative entries (outflows) are assumed to be
generated by investing at a certain rate called the finance rate. All
positive entries (inflows) are assumed to be reinvested at another rate
called the reinvestment rate. Typically, the finance rate is taken to be
the market risk-free rate. The reinvestment rate is usually the firm’s
cost of capital—the rate at which its overall growth is taking place.
We estimate the total investment by taking the NPV of all the
negative entries using the finance rate. Let us denote the magnitude
of this NPV by P. The gains are estimated by taking the net future
value of all the positive entries using the reinvestment rate at the
conclusion of the cash flow. Let us denote this net future value by A.
The MIRR μ is then defined to be the interest rate under which P
would grow into A over the life of the cash flow. If the life is n time
periods, then the MIRR per period is defined by
P(1 + μ)n = A.
Example 2.3.1 Consider the cash flow with annual payments of –
1000, 2000, –1000, 2000. Suppose the relevant annually
compounded rates are:
Finance rate
=
10%
Reinvestment rate
=
20%.
Then,
P = 1000 +
= 1961.17,
A = 2000 × 1.202 + 2000 = 4880.
Therefore, the MIRR μ is defined by
1961.17(1 + μ)3 = 4880,
and this gives MIRR= 0.36 or 36%. On the other hand, the IRR of this
flow is 100%!
□
2.4 BONDS: PRICE AND YIELD
A fixed income security is a tradeable contract detailing a
deterministic cash flow. The buyer of the security receives
predetermined amounts of money at predetermined times over the life
of the contract. The basic problem is of pricing: how much should the
buyer pay for this security? This can be tackled through NPV and
IRR. Such securities can be seen as risk-free, except for the
possibility of default, where the writer of the contract is unable to
make the promised payments. However, there is an additional
element of risk: the value of a cash flow depends on the interest rates
in effect and will fluctuate with changes in these rates. Therefore, we
need a measure not only of the price but of the sensitivity of the price
to market conditions. These, and related matters, will be taken up in
the rest of this chapter.
The most common form of a fixed income security is a bond.
While bonds come in various flavours, the simplest one consists of n
equal payments (called coupon payments or coupons) of an
amount C paid at regular intervals, together with an additional
payment F (called the face value) made with the last coupon
payment. This is represented in Figure 2.6.
Figure 2.6
The face value is also called the maturity value or par value. The
date on which it is paid is called the maturity date. Bonds with this
simple structure are called straight or plain vanilla bonds. More
complicated bonds may offer one party the right to terminate the
contract early or allow for some fluctuation in the coupon payments.
Suppose the cash flow offered by a bond is purchased for some
price P. This price depends not only on the structure of the bond itself
but also on certain external factors. Two important factors are:
1. The current interest rates: Higher interest rates reduce the
present value of the future payments and hence the value P of
the bond.
2. The risk of default by the writer of the bond: If the perceived risk
is higher, the bond has to offer greater return to compensate.
This is achieved by a decrease of P.
In this book we shall ignore the second factor and treat bonds as if
they are risk-free. We only note that the investor can use credit
ratings from various agencies to gauge the risk of default. In the US,
the popular rating agencies are Moody’s, Standard and Poor’s (S&P)
and Fitch IBCA. In India we have CRISIL (Credit Rating Information
Services of India, Ltd) and CARE (Credit Analysis and Research,
Ltd). The compensation or premium for the default risk is calculated
via a statistical study of the historical loss from default at each of the
rating levels.
The following terminology is used to give a quick idea of the current
status of a bond:
At par: The bond is selling at a price equal to its face value.
Discount: The bond is selling below its face value.
Premium: The bond is selling above its face value.
These terms, as we shall soon see, relate the payment structure of
the bond to current interest rates and are not a judgement on the
worth of the bond.
A bond which is sold at par can be visualised as a loan of the face
value F for some time. The coupons are then viewed as interest
payments, and at the end we have the last interest payment as well
as the return of the borrowed F.
Before starting our analysis of bonds, let us list some special
cases.
Annuity: (see Example 2.1.2) An annuity has n coupon payments
and a face value of zero.
Perpetuity: (see Example 2.1.3) A perpetuity has an infinite
sequence of coupon payments. Since there is no final payment, the
face value is zero.
Zero-Coupon Bond: A zero-coupon bond has no coupon payments.
There is only the final payment of the face value. These are also
called pure discount bonds.
Consider a bond with n annual coupon payments of C each and a
face value F. If the annual discretely-compounded risk-free rate over
the life of the bond is r, then the net present value of the bond is.
(2.3)
If continuous compounding is used, the formula becomes
(2.4)
This can be taken as the fair price for the bond. It should be noted,
however, that the assumption of a uniform r over the life of the bond is
a strong one (especially as the life of a bond can stretch up to 30
years!). This can be taken into account by using different values of r
for different time spans, and we will do this later in the section on the
term structure of interest rates (§2.8).
Exercise 2.4.1 Check that annuities and perpetuities are always sold
at a premium, while a zero-coupon bond is always sold at a discount.
The coupon payment is usually given as a percentage of the face
value (called the coupon rate). Thus, a ‘10% annual bond’ is one
whose coupon payments are annual and each is 10% of the face
value. If the coupon payments are more frequent, then the coupon
rate refers to the total annual coupon payment. For instance, many
bonds have semi-annual coupon payments (i.e., every six months)
and a ‘10% semi-annual bond’ has annual coupon payments of 10%
of the face value. An individual coupon payment for this bond is
therefore equal to 5% of the face value.
Figure 2.7: The price versus yield plot for an annual bond
Exercise 2.4.2 Consider a 12% monthly bond with face value Rs
1000. What will be the value of each coupon payment?
For a bond with face value F, coupon rate C⁄F, m regular payments
per year and a total of N coupon payments, formula (2.3) is modified
to
(2.4)
Internal rate of return (IRR) is a commonly used measure of the worth
of a bond, and is called its yield to maturity (YTM) or just yield. If a
bond with annual coupon payments is sold at a price P, then its yield
λ is obtained by solving the equation
(2.5)
Note that the right-hand side decreases from ∞ to 0 as λ varies from –
1 to ∞, so for every (positive) value of P there is a unique value λ of
the yield. Thus the yield of a bond is always well-defined. This
observation also shows that a rise in P lowers the yield.
Similarly, if continuous compounding is used, the yield equation is
Figure 2.7 shows the characteristic plot of an annual bond’s price
versus its yield. Two special points are worth noting. When the yield is
0, the price is nC + F, the sum of all the payments from the bond. And
when the yield is equal to the coupon rate, the price is the face value
F.
Exercise 2.4.3 Consider a bond with n years to maturity, annual
coupon payments of C and a face value F.
1. Show that if its yield is λ then its price is given by
P=
.
2. Show that if its yield equals the coupon rate C⁄F, then the bond is
at par. If the yield is greater than the coupon rate, the bond is
sold at a discount. If the yield is lower, it is sold at a premium.
Exercise 2.4.4 Consider a bond with n years to maturity, m coupon
payments per year totalling to C, and face value F.
1. If its yield is λ, show its price is given by
P=
.
Verify that λ = C⁄F implies P = F.
2. Draw the price–yield curve for this bond.
A bond has to compete with the other available financial instruments.
In particular, its price has to be low enough so that the yield is not
below the risk-free rate. If the bond can be seen as risk-free, then the
yield will equal the risk-free rate. Thus we have the concept of the
required yield for a bond: this is the level of yield required for it to be
competitive in the market after taking into account the prevalent
interest rates as well as the risk associated with the bond.
2.5 CLEAN AND DIRTY PRICE
The value of a bond changes during its lifetime and depends on the
remaining payments as well as on the required yield. If the bond is
sold on the day of a coupon payment (with the payment going to the
seller) then the remaining cash flow can be seen as a fresh bond with
the same face value and coupon rate. Its value can therefore be
calculated from the already established formulas. Of course, this is a
rare situation, and usually the bond will be sold at a date between two
coupon payments.
Suppose the situation is as depicted in Figure 2.8, where the bond
is being sold at a time t between the k and k + 1 coupon payments.
Figure 2.8
We calculate the price by calculating the NPV of the remaining
payments at the required yield. We use continuous compounding as
that makes it easier to deal with arbitrary time intervals.
Example 2.5.1 Consider a 3-year 5% annual bond with face value
100. Figure 2.9 shows how the price of this bond changes over its life,
assuming a constant required yield. We take three values of the
effective required yield λ: 5%, 10% or 15%. Then the equivalent
continuous yield is given by ln(1 + λ).
Figure 2.9: Change of bond price with time for different values of the
effective required yield λ (Example 2.5.1)
We note that the price curve has a typical sawtooth shape, with
sharp drops when coupon payments are made. In fact, we can
visualise the overall curve as a sawtooth shape combined with an
almost linear trend.
□
In this example, we see that the sawtooth effect tends to hide the long
term behaviour of the bond price. It is useful to identify and subtract it
to reveal the underlying trend. We view the rise and fall as due to the
approach and delivery of the next coupon payment. So we start by
defining the accrued interest, which is the fraction of the next
coupon payment which is already due. If the bond is sold at a time t
between the k and k + 1 coupon payments, this is the fraction
I(t) = C
of the (k + 1)th coupon payment.
We then define the clean price of the bond by
Q(t) = P(t) – I(t).
The actual price P(t) is now called the dirty price. Figure 2.10 depicts
the accrued interest, clean price and dirty price plots for Example
2.5.1.
Figure 2.10: Clean price, dirty price, and accrued interest plots for Example
2.5.1
Exercise 2.5.2 A 10% 3-year bond with face value 100 is issued on 1
January, 2008. Suppose its clean price on 1 July, 2008 is 102. How
much would you have to pay to buy this bond?
The clean price is seen as giving a truer picture of the value of a
bond. Therefore, when bond prices are stated by stock exchanges,
these are usually the clean prices. To make an actual purchase, the
accrued interest has to be added to the clean price (which is also
called the quoted price).
2.6 PRICE–YIELD CURVES
We have already plotted the price of a bond against its yield. In this
section we will explore how this price–yield curve varies with certain
parameters.
First, we ask what happens if the coupon payments are increased.
Then, for any given yield, the price will increase, and so the priceyield curve will shift upward. This is illustrated in Figure 2.11(a).
Now, we consider how an increase in the number of coupon
payments affects the price. Although this increases the total cash
delivered, it doesn’t necessarily lead to a higher price. If the required
yield is lower than the coupon rate C⁄F, then each coupon payment
represents a higher than required interest payment and then it is
beneficial to have more of these payments. So the price of longer-
term bonds will be higher. On the other hand, if the required yield is
above the coupon rate, each coupon payment represents a loss and
the longer-term bonds will have less value. Thus, in Figure 2.11(b) the
longer-term bonds start off with higher prices (when λ is low) but fall
below the shorter-term bonds when λ increases. The crossing occurs
at the coupon rate.
Figure 2.11: (a) Variation of the price–yield curve with the coupon payments.
The higher curves correspond to higher coupon payments. (b) Variation of the
price–yield curve with the number of coupon payments. The arrows point in the
direction of increasing number of coupon payments.
The fact that longer-term bonds have steeper price-yield curves
implies that they are more vulnerable to interest-rate changes, since a
change in yield will cause a larger change in their price.
2.7 DURATION
Bonds are risk-free if held to maturity since their cash flow is
deterministic, at least if default risk can be ignored. However, at
intermediate times they are not risk-free since their price fluctuates
with the prevalent interest-rates (or required yield). We have just
noted that longer-term bonds are more exposed to this risk. We shall
now develop a way to quantify this risk.
MACAULAY DURATION WITH CONTINUOUS
COMPOUNDING
We start with the price–yield formula for an annual bond, but
expressed in terms of continuous compounding:
To see the sensitivity with respect to yield, we differentiate:
This measures the absolute change in P resulting from a small
change in λ. To see the proportional change, we divide it by P:
=–
.
The negative of the quantity on the right is called the Macaulay
duration and is denoted by DM, not]DM@DM DM,
DM =
,
so that
= –DM.
Suppose a small change δλ in the required yield leads to a change δP
in the price. Then we have the approximation
≈
,
and so, the proportional change in price δP⁄P is approximated as
follows:
≈-DM δλ.
The form of the Macaulay duration is interesting. It is a weighted
average of the times at which payments are made, and each time
instant i has a weight proportional to the present value of the
corresponding payment. (Here, present value is calculated using the
required yield.) In particular, the units of Macaulay duration are of
time, and by convention are measured in years.
MACAULAY DURATION FOR A NON-ANNUAL BOND
Consider a bond with m coupon payments per year, face value F,
coupon rate C⁄F, and a total of N coupon payments. Then we can
calculate as above to find its Macaulay duration. The result is:
where λ is the yield, and the price P is given by
MACAULAY DURATION WITH DISCRETE COMPOUNDING
If discrete compounding is used and there are m coupon payments
annually, the Macaulay duration is analogously defined by
DM =
.
By writing the times of payment as i⁄m, the units of Macaulay duration
are kept as years. In this context, Macaulay duration does not have
as precise an interpretation in terms of the sensitivity to yield changes
as in continuous compounding (see Exercise 2.7.1). Nevertheless,
since discrete compounding approximates continuous compounding,
it can still be used as an indicator of that sensitivity.
Exercise 2.7.1 Consider a bond with face value F, coupon rate C⁄F
and m coupon payments per year. If the yield is described by discrete
compounding, show that
=–
,
where DM is the Macaulay duration. The quantity DM ⁄(1 + λ⁄m) is
called the modified duration of the bond.
Exercise 2.7.2 Consider a perpetuity with annual payments starting
after 1 year. If the required yield is λ, show that its Modified duration is
1⁄λ.
MACAULAY DURATION AND THE MATURITY PERIOD
On inspecting Figure 2.11(b), we had observed that longer-duration
bonds are more vulnerable to interest-rate risk. Now we have a way
to measure this vulnerability—Figure 2.12 illustrates what happens as
we increase the maturity
Figure 2.12: Duration plotted against maturity date for a 10% annual
bond. The three curves correspond to different required yields λ.
period n. While duration indeed increases with n initially, the curve
eventually flattens and even drops a little before settling at a constant
level.
Exercise 2.7.3 Consider an annual bond with face value F, coupon
rate C⁄F, and n years to maturity. Fix its yield to maturity at some λ
and let DM be its Macaulay duration (with continuous compounding).
Show that if C≠0, λ > 0, and we use continuous compounding, then
Thus, while duration initially increases with n, in the very long run it
stabilises at a constant level.
Exercise 2.7.4 Consider Exercise 2.7.3. Work out the corresponding
result using discrete compounding for a bond with m coupon
payments per year.
MACAULAY DURATION OF A PORTFOLIO
Consider a portfolio comprising an assortment of bonds. This portfolio
can be treated as a fixed income security, since its cash flow is
predetermined. However, the payments will occur at irregular
intervals. Let the cash flow consist of payments C1,…,CN at times t1,
…,tN. If all the bonds in this portfolio have the same required yield λ,
then we can extend the concept of Macaulay duration to it in a natural
way:
DM =
,
where we use a continuously compounded yield. The total price of
this portfolio is given by
It is easy to check that, as for a single bond, Macaulay duration
measures the sensitivity of P to changes in λ:
= –DM.
Suppose the portfolio consists of k bonds, and let the price and
duration of the ith bond be Pi and DM,i respectively. Then P = P1 +
+ Pk and
Thus the overall duration is a weighted average of the individual
durations, the contribution of each bond being weighted by the
proportion invested in it.
2.8 TERM STRUCTURE OF INTEREST RATES
In all our calculations regarding bonds, we have acted as though
interest rates are independent of the period of the investment. In real
life however, they do vary with the period: for instance, fixed deposits
in banks usually earn a higher rate of interest if they are for longer
periods.
In this section we shall develop a framework for dealing with
interest rates that vary with the term and then we shall redo many of
our earlier calculations in this general setting. We emphasise,
however, that we are not taking up the random daily fluctuations in
interest rates.
Figure 2.13: The interest rates on State Bank of India fixed deposits of various
maturities (in years) as on January 9, 2006.
SPOT AND FORWARD RATES
Suppose that at time t = 0 a risk-free investment is made for a period
T. The investment could be a fixed-deposit in a bank, or a zerocoupon bond maturing at T. The interest (or yield) of this investment
is called the spot rate for the period 0 to T and is denoted by sT.
Spot rates are expressed annually.
Example 2.8.1 Suppose the one-year spot rate is 5% and the twoyear spot rate is 6% (with discrete compounding). Then, an
investment of 100 for a year will grow to 100(1 + s1) = 100 × 1.05 =
105. The same amount invested for two years will grow to 100(1 +
□
s2)2 = 100(1.06)2 = 112.36.
One might think from this calculation that the two-year investment is
better than the one-year because it earns a higher rate of interest, yet
it is not necessarily so. For, the prevailing rates may increase after a
year and reinvesting the 105 at the new high rate may lead to a better
result.
This discussion brings us to forward rates. Suppose an investor
wants to take a loan, but after a year rather than right away. To avoid
the risk from interest rate fluctuations, she wants to finalise the loan
and the interest that will be charged. The rate that is decided for this
loan will be called a forward rate.
Thus, a forward rate is an interest rate that will be applied to a
transaction in the future but is to be decided in the present. The
forward rate for an investment starting at time S and lasting till time T
will be denoted by fS,T. Like spot rates, forward rates are expressed
annually.
Example 2.8.2 Suppose the one-year spot rate is s1 = 5%, while the
forward rate for the succeeding one-year period is f1,2 = 6%. Then, we
can invest 100 initially for one-year at s1 and then for another year at
f1,2 (without risk since the forward rate is set now). The investment will
grow to 100(1.05)(1.06) = 111.3 over two years.
□
Spot and forward rates are related to each other. Consider an amount
A which is to be invested for 2 years, risk-free. If it is invested using
the 2-year spot rate, it will grow to A(1 + s2)2. An alternative is to
invest it first for one year and then reinvest for another, in which case
it will grow to A(1 + s1)(1 + f1,2). Since both routes are risk-free, by the
No Arbitrage Principle we must have
A(1 + s2)2 = A(1 + s 1)(1 + f1,2).
We can solve this for the forward rate:
f1,2 =
– 1.
Exercise 2.8.3 Show that, under annual compounding, the forward
rate fm,n is related to the spot rates sm and sn by
fm,n =
1⁄(n–m)
– 1.
(2.6)
Exercise 2.8.4 Consider spot and forward rates with discrete
compounding m times a year. Let the rates be given on an annual
basis. Further, let sn be the spot rate for the first n periods and fk,n the
forward rate from the kth to the nth period. Show that forward and spot
rates are related by
fk,n = m
.
(2.7)
Exercise 2.8.5 Show that, under continuous compounding, the
forward rate fm,n is related to the spot rates sm and sn by
fm,n =
sn –
sm.
(2.8)
Some spot rates can be observed directly from the market. If there is
a risk-free zero-coupon bond with maturity time n, we can take its
yield to be sn. Difficulties arise when a suitable zero-coupon bond is
not present. The yields from other bonds do not directly give spotrates since they involve payments at different times.
Consider an annual n-year bond. Using net present value and spot
rates, its fair price is
If P is observed from the market and the spot rates s1,…,sn–1 are
known, we can solve this equation for sn. This observation gives rise
to a technique known as bootstrapping. First, we look at all the
available zero-coupon bonds and use their yields to obtain a list of
spot rates. The gaps in this list are filled by looking at the market
prices of other bonds and solving for the corresponding spot rate.8
Example 2.8.6 Suppose at time 0, risk-free bonds with the following
characteristics are available:
1. Zero-coupon bonds maturing in 1 year and with yield 5%.
2. Zero-coupon bonds maturing in 2 years and with yield 6%.
3. 10% annual bonds maturing in 3 years and with yield 6.5%.
4. Zero-coupon bonds maturing in 6 years and with yield 7%.
From the zero-coupon bonds, we see that s1 = 5%, s2 = 6%, and s6 =
7%. Next, we use the three-year bond to find s3 by setting up its price
yield equation:
+
+
=
+
+
.
The solution is s3 = 6.585%. Since no bonds are available for n = 4,5
we cannot find s4 and s5. The best we can do is interpolate from the
known values. For example, we connect the known values by
straight-line segments, and use the values from these to approximate
the unknown rates. This is illustrated in Figure 2.14. (The more
advanced and standard technique is to connect the known points by
cubic splines. This results in a smooth curve through the known
points.)
Figure 2.14: The spot rates versus time plot for Example 2.8.6
□
In this example we note that, as expected, the spot rate s3 does not
equal the yield of the 3-year bond. Nevertheless, the difference is
quite small. On reflection, this is reasonable because most of the
payment happens at 3 years. Therefore, we can expect bond yields to
provide a starting approximation to the spot rates which can then be
refined by calculations as in the example.
A complication is that the maturity dates of available bonds may
not dovetail in the exact way required for bootstrapping. We may seek
zero-coupon bonds expiring in exactly 1 year but may only find ones
expiring in 11 and 13 months. In such a situation we would have to
use an average of their yields to represent the yield a 1-year bond
would have had.
Once the spot rates have been established, the forward rates can
be found by equations (2.6) to (2.8).
FISHER–WEIL DURATION
Figure 2.15 shows how quickly the spot rates can change by
significant amounts. It depicts the twenty-year spot rate curve for
India as calculated by the National Stock Exchange on three
successive days in September 2008.
Suppose, at this time, you had held some zero coupon bonds
expiring in 3 years. Then, in just two days their value would have first
increased by 1% and then decreased by 1.3%. In the longer term,
much wilder swings could be expected.
The illustration above points to the importance of quantifying the
risk to a bond portfolio from changes in the spot rates. We shall only
consider the risk from the simplest kind of change. Specifically, we
shall study the sensitivity of bond prices to parallel shifts—when all
the spot rates change by the same amount. If we plot the spot rate
curve for the new spot rates, we see a parallel shift in the curve.
Figure 2.15: Variation in the twenty-year spot rate curve for India over three
consecutive days in September 2008 (based on data released by NSE)
The concept of Macaulay duration can be adapted to this scenario.
Consider an annual n-year bond with face value F and coupon
payments of C. If we use continuous compounding, its price is given
by where the si are the continuously compounded spot rates.
Consider the case when all the spot rates change by the same
amount λ. Then the new price is a function of λ:
The sensitivity of P to such changes is measured by
where DFW is called the Fisher–Weil duration.
Exercise 2.8.7 Consider two bonds whose Fisher–Weil durations are
D1 and D2 . We invest amounts P1 and P2 in them, respectively. Show
that the Fisher–Weil duration of this portfolio is
DP =
D1 +
D2.
If we use discrete compounding, the Fisher–Weil duration is
defined by
DFW =
.
As with Macaulay duration, the discrete version of Fisher–Weil
duration does not precisely capture the sensitivity of the price to the
rate shifts. Exercise 2.8.8 describes the modification that is needed.
Exercise 2.8.8 Consider an n-year annual bond. Using annual
compounding, let the yearly spot rates be s1,s2, etc. Let P be the price
of the bond. If each spot rate si shifts by the same amount λ and
becomes si + λ, show that
λ=0
= –DQ,
where DQ is called the quasi-modified duration and is given by:
DQ =
.
Note that DQ is a linear combination of the payment times but it is not
an average since the weights do not add up to 1.
TYPES OF TERM STRUCTURE
When spot rates are plotted against the time to maturity, the resulting
curve is called a term structure. The typical term structure is a
smoothly increasing curve which also becomes gradually flatter, as in
Figure 2.16(a). Such a curve is called a normal term structure. If the
curve is more or less constant, we have a flat term structure—this is
the structure we assumed before this section. If the curve steadily
decreases instead of increasing, we have an inverted term
structure. A general term structure is likely to be basically normal
with some parts that are flat or inverted.
Figure 2.16: Types of term structure: the spot rate sT is plotted against the time
to maturity T .
Example 2.8.9 Figure 2.17 depicts data released by the Reserve
Bank of India regarding yields of Government of India bonds on
February 24 2006. The data gives maximum and minimum yields for
bonds expiring in the years 2006–07, 2007–08, etc. If we connect the
midpoints of the given intervals we get a rough idea of the
dependence of yield on maturity date, and in turn this serves as an
approximation of the term structure. It is normal with a bump during
2008–09.
Figure 2.17: The yield curve for Example 2.8.9
□
Two obvious questions arise out of these observations. The first is:
Why are there different term structures? The second is: Why is the
normal structure the basic one while the others are rare or transitory?
The answers again involve risk. Longer term investments are
exposed to more risk over their life. They also reduce the investor’s
flexibility by tying up his cash. For these reasons investors demand
greater return on longer term investments, leading to an upward
sloping term structure. However, occasionally investors perceive
certain times in the future as particularly risky, e.g., the period in
which a general election takes place. Then the spot rates for these
times will rise, causing bumps in the term structure. An inverted term
structure will occur when the present is turbulent, so that long term
investments are seen as safer than short term ones.
2.9 IMMUNISATION
The simplest way to invest money for a time T in a risk-free way is to
buy a zero-coupon bond maturing at T. By concentrating the payment
at the end, and at a predetermined rate, it completely eliminates
interest-rate risk. Unfortunately, only on rare occasion will we find a
zero-coupon bond that matures exactly at the required time. We can
try to use a zero-coupon bond that matures near T, but this will have
attached risks:
1. If it matures before T, we have to reinvest the payoff and the
prevailing interest rate is not known beforehand.
2. If it matures after T, we have to sell it at T and its value at that
time is not predetermined.
Nor does it help to consider bonds with coupon payments. Even if we
find one that matures at T, its coupon payments will have to be
reinvested for the remaining period and this is risky.
We have to accept that the available bonds may not allow us to
make a risk-free investment for the required time. Having accepted
this, we can at least try to minimise the risk. Intuitively, a reasonable
strategy is to use bonds whose payments occur as near to T as
possible. Since duration measures the average of the payment times
of a bond, this translates into choosing bonds whose combined
duration is T.
Consider a situation where two bonds with Fisher–Weil duration
D1 and D2 are available. If we create a portfolio out of these two
bonds, where amounts P1 and P2 are invested in them, respectively,
then the Fisher–Weil duration of the portfolio is
DP =
D1 +
D2.
To make the portfolio match the requirement of investing an amount P
for time T, we match both amount and duration:
These two equations can be solved for P1 and P2. (Note that we need
T to be between D1 and D2.) This technique is called immunisation
and can be applied just as well to a stream of payments. Suppose the
requirement is to invest A1 for time T1, A2 for time T2, etc. We
calculate the duration D of this cash flow and once again match A1 +
A2 +
with P1 + P2 and D with DP .
Example 2.9.1 Suppose a firm faces the following stream of
obligations over the next 3 years:
In addition, a zero-coupon bond maturing in 2 years and a 10%
annual bond maturing in 5 years, are available. The spot rates for the
next five years are as tabulated below.
Then we have the following calculations (all the rates are assumed to
be for continuous annual compounding):
Consider a portfolio where the proportions invested in the two bonds
are w1 and w2 . Then we have the following two equations for
immunisation:
Substituting the calculated values of D, D1 and D2 and solving, we
obtain:
w1 = 0.86,w2 = 0.14.
The NPV of the payment stream is
P = 100e–0.04 + 200e–0.05×2 + 300e–0.057×3 = 529.89.
Hence we have to invest P1 = 0.86 × 529.89 = 455.71 in the zerocoupon bond and P2 = 0.14 × 529.89 = 74.18 in the annual bond.
□
Since Fisher–Weil duration measures the change in the price of a
bond or cash flow under a parallel shift of the spot curve, this process
of immunisation protects against such changes. As the stream of
required payments and the bond portfolio have the same duration,
they have similar responses to shifts of the spot rate curve and
hence, the portfolio continues to match the requirements (closely, if
not exactly).
Example 2.9.2 Let us verify that immunisation works in the manner
described above. Consider the situation and calculations of the
previous example. We start with a portfolio which matches the
payment stream in both present value and duration: we have
calculated that it consists of 455.71 invested in the zero-coupon bond
and 74.18 invested in the annual bond. Now, consider how the
present values of the payment stream and the portfolio vary with
parallel shifts of the spot-rate curve:
Even under a shift as drastic as 2%, the portfolio moves almost
exactly with the payment stream.
□
We could also execute immunisation by matching the Macaulay or
modified duration in place of the Fisher–Weil duration. This would
amount to assuming a flat term structure. The gain would be in the
ease of calculation.
2.10 CONVEXITY
Our development of immunisation in the previous section was based
on a linear approximation to the price–yield relationship. The
technique can be further improved by using a quadratic approximation
—this requires the use of the second derivative of price with respect
to yield.
Consider a bond whose price is denoted by P. Then P is a
function of the yield λ. The convexity of the bond is defined to be
=
.
(2.9)
From the shape of the price–yield curve, it is evident that as the yield
λ increases, the first derivative of P also increases, moving from
highly negative values towards zero. Therefore, the second derivative
of P is positive, and so > 0.
Equation (2.9) also defines convexity for a portfolio of bonds
(assuming that all bonds have the same required yield).
Exercise 2.10.1 Consider a portfolio of two bonds, with prices P1 and
P2, and convexity 1 and 2, respectively. Let P be the price of the
portfolio. Show that the convexity of the portfolio is
=
1
+
2
Exercise 2.10.2 Consider a bond with price P, Macaulay duration DM
and convexity , relative to a continuously compounded yield λ. Then
= D2M –
.
As always, the mathematics is simplest if we take a continuously
compounded yield λ, and we shall confine ourselves to that situation.
Then the price of an n-year annual bond with coupon payments C and
face value F is given by
We differentiate twice with respect to λ and divide by P to get the
convexity:
Thus, the convexity is a weighted average of the squares of the
payment times. The weights are non-negative and add up to 1. So we
can express convexity in the form:
Recall that Macaulay duration is a weighted average of the payment
times—and with the same weights as used by convexity:
From these expressions for C and D we can make the following
calculations:
If we combine this with the result of Exercise 2.10.2, we find that
Macaulay duration decreases with yield:
≤ 0.
Another easy consequence of equation (2.10) is that ≤ n2. The
convexity is therefore maximum for a zero-coupon bond, when it
exactly equals n2.
Exercise 2.10.3 Show that under annual compounding, the convexity
of an annual bond is given by
=
.
IMMUNISATION
Under a small change δλ in the yield, the change δP in the price has
the quadratic approximation
δP≐ δλ +
(δλ)2.
This can be expressed in terms of the Macaulay duration D and
convexity as follows:
≐ –DM(δλ) +
(δλ)2.
Therefore, the price fluctuations of a portfolio or stream of cash
obligations can be matched by a portfolio of bonds which has the
same starting price, duration and convexity. Since three parameters
have to be matched, three different bonds will have to be used.
Example 2.10.4 Consider the following 5 year stream of obligations:
Our task is to create a portfolio of bonds whose value will match this
stream, even under significant changes in the interest rate. Suppose
the available bonds all have face value 10 and are as follows:
1. A zero-coupon bond maturing in 2 years,
2. A 10% annual bond maturing in 5 years, and
3. A 10% annual bond maturing in 4 years.
Suppose the required (continuously compounded) yield is 6%. Then
the price, duration and convexity of the payment stream are
Similar calculations can be carried out for the three bonds. The
results are given in the table below:
Let w1 = P1 ⁄ P, w2 = P2 ⁄ P and w3 = P3 ⁄ P. Then, to match the
duration and convexity these numbers should satisfy:
The solution of this system of linear equations is
w1 = 0.14, w2 = 0.38, w3 = 0.48.
Therefore the amounts invested in the three bonds should be 173.88,
457.13 and 576.20 respectively___corresponding to 19.6, 39.4 and
50.9 bonds.
The table below shows the NPV of the obligation stream and the
bond portfolio under various interest rates.
Exercise 2.10.5 Verify that our convexity formulas extend to non-□
annual bonds.
CONVEXITY AND TERM STRUCTURE
Convexity can also be based on parallel shifts in the term structure.
Consider an annual n-year bond with face value F and coupon rate C.
If we use continuous compounding its price is given by
where the si are continuously compounded spot rates. Suppose there
is a parallel shift in the term structure by an amount λ. This means
each spot rate si now becomes si + λ. The new price of the bond, as a
function of λ, is
Convexity is now defined by
This version of convexity can be used to immunise against parallel
shifts in the term structure, in conjunction with the Fisher-Weil
duration DFW of the bond.
Exercise 2.10.6
Verify that if the spot rates are annually
compounded convexity is given by
=
.
2.11 CALLABLE BONDS
The bonds we have considered so far are distinguished by having
fixed coupon payments. Bonds are also available with variable
coupon payments as well as the option of one party cancelling the
contract. We will briefly consider the second of these varieties. Since
their cash flows are not deterministic they cannot be completely
understood within our current framework.
A drop in prevailing interest rates represents a loss for the bond
issuer, who is now paying interest (in the form of coupon payments) at
a higher rate than the one now available. The issuer can limit such
losses by building in an option allowing it to buy back or call the
bonds by paying a certain premium (called the call price) to the
bondholder.
In a European callable bond the call option can be exercised
only at certain pre-specified dates. In an American one it can be
exercised at any time after a certain interval. In either case, the time
to the first date when the bond can be called is termed the call
protection period.
Example 2.11.1 Consider a five year annual bond that has face
value 1000 and was issued at par when the required yield was 10%.
Suppose this bond can be called back at any time by paying the face
value as well as a fee of 50. Now, if after 2 years the required yield
has decreased to 7%, then the price of this bond will become
P=
+
+
= 1079.
Callback results in a savings of 29 for the issuer.
□
The concepts of fair price and yield are tricky for a callable bond,
since the profit from it depends on whether, and when, it is called.
Given the market price, and a possible call date, the corresponding
yield to call or YTC is defined to be the yield if the bond is indeed
called on that date. The lowest of these YTC’s is called the yield to
worst.
Exercise 2.11.2 Consider a European callable bond, with annual
coupon payments of size C, and call dates at the end of each annual
period. Let the call price for the ith year be Ci. Show that its YTC λ at
the end of the ith year is the solution of
P=
.
Example 2.11.3 Consider a five year 10% annual bond with face
value 100. Suppose it can be called back at any of the first four
coupon times, at call prices of C1 = 110, C2 = 107, C3 = 104, and C4 =
102 respectively (the call price is to be paid in addition to the
corresponding coupon payment). If it is called at the time of the jth
coupon payment, the corresponding YTC is the solution r of the
equation
Suppose the market price at the time of issue is P = 110. Then the
YTC’s at the possible call dates are 9.09%, 7.78%, 7.40%, and
7.46%, while the YTM is 7.53%. Thus the yield to worst is 7.40%.
□
PUTTABLE BONDS
Puttable bonds are the mirror image of callable ones. Now it is the
buyer who is protected against interest rate risk by having the choice
to sell the bond back to the issuer for a premium called the put price.
Puttable bonds can also be European or American. The yield to put
or YTP for a possible put date is the yield if the bond is put back on
that date, and the yield to worst is the lowest of the YTP’s.
Proper pricing of callable and puttable bonds requires the
probabilistic modelling of interest rate fluctuations. As we will not take
up that topic in this book, this is as far as we can go with such bonds.
A brief history of duration:
1. Frederick Macaulay [29] introduced the concept of duration as a
weighted average (1938).
2. John Hicks [24] gave its interpretation as a measure of sensitivity
(1939).
3. Frank Redington [38] invented immunisation via duration and
convexity (1952).
4. Lawrence Fisher and Roman Weil [21] generalised to the case of
an arbitrary term structure (1971).
(Source: Poitras [37])
7 The usage of this term is not uniform in the literature. It is also used for the
ratio of the NPV of the positive entries to I, or (N/I) – 1.
8 The term bootstrapping is derived from the phrase ‘lift yourself by your own
bootstraps.’
3 Random Cash Flows
W
e will now study situations where the future income or
expenditure is not known beforehand. In the early days of
probability, mathematically aware investors evaluated an
investment by considering the expectation of the possible payoffs.
However, it was realised that such an evaluation is incomplete.
Followed strictly, it would recommend that everyone invest in only one
asset—the one offering the most expected return. Actual investors,
however, generally prefer to invest in a variety of assets to avoid overdependence on any one of them.
The situation was unravelled by Harry Markowitz [30, 31] in 1952.
Markowitz realised that one must take risk into account. He measured
it by the standard deviation of the returns, and created a systematic
theory of portfolio analysis. His training in operations research
enabled him to not only set up the problem of finding optimal
combinations of assets, but to solve it using the technique of
quadratic programming. His work also formed the basis for
developments by others—most notably the Capital Asset Pricing
Model (CAPM). While Markowitz focussed on how an individual can
invest optimally, CAPM analyses the consequences of every investor
acting according to his theory.
Markowitz’s other main contributions have been to develop sparse
matrix techniques and a language (SIMSCRIPT) for programming
large-scale simulations. He won the Nobel Prize in Economics in
1990, along with William Sharpe (one of the creators of CAPM) and
Merton Miller (one of the first to systematically exploit the No
Arbitrage Principle).
3.1 R
R
Consider a time interval [0,T] and an asset whose value at time 0 is
V0. Suppose its value VT at time T is not known initially (such assets
are called risky assets). Then VT can be treated as a random
variable. The rate of return is also then a random variable. The
expectation of r is called the mean return of the asset and is denoted
by r. The variance of r is denoted by σ2, and its standard deviation by
σ. We call σ2 and σ the variance and standard deviation of the
asset. While r measures the profit expected from the portfolio, σ
measures the associated risk. Our task, therefore, is to explore the
relationship between r and σ for both individual asset as well as their
collections in portfolios.
r=
Note that VT = V0(1 + r).
Exercise 3.1.1 Let two assets have (random) rates of return r1 and
r2. Use the No Arbitrage Principle to show that it is not possible for
these random variables to have a constant and non-zero difference.
Consider a portfolio of risky assets numbered from 1 through n. Let Ai
be the amount invested in the ith asset. The weight of the ith asset is
defined to be the proportion invested in it:
wi =
.
It is easy to see that ∑iwi = 1.
SHORT SELLING
In many situations, it is possible to sell an asset that you do not own!
This can be achieved if there is a gap between the sale and the time
of delivery—the asset can be acquired during the gap and then
delivered in time. This strategy is called short selling.
Now, suppose at time t = 0 the value of the asset is V0, and you
short sell it with a future delivery time. To complete your obligations,
you actually buy the asset later at time T, when its value is VT.
In this case, the initial investment is –V0 (there is a negative sign
because there is an initial receipt rather than a payment), while the
final payoff is –VT (here the negative sign is because at the end you
are paying rather than receiving). The rate of return from the asset on
short selling is therefore
=
,
which is just the same as the rate calculated in the usual way.
However, this return is calculated on a negative initial investment.
Thus, the inclusion of short selling leads to negative weights.
Another version of short-selling is that you first borrow the asset
for a certain time, and then sell it. Finally, you repurchase it from
another source and return it to the lender. The numerical
consequences are just as above—initially you receive V0 and finally
you pay VT.
Example 3.1.2 Suppose you own Rs 200. You notice two shares, S
and T, whose prices are Rs 800 and Rs 1000, respectively. Your
analysis leads you to expect that in the next few days the price of T
will increase much more sharply than that of S. To benefit from this,
you implement a strategy of using S to pay for T as follows. You start
by short-selling S, with delivery set for 10 days in the future. This
earns you Rs 800 right away, and you pool in the Rs 200 you already
possess to buy T.
At this point, you have no cash, you own T and you owe S. Your
net worth is therefore 0 + 1000 – 800 = 200, the same as before.
After 10 days, suppose your analysis is borne out, and the values
of S and T are Rs 900 and Rs 1200 respectively. You sell T and then
use part of the gains to buy S and deliver it. You are left with Rs 300
cash. Thus, overall, you have earned a rate of return of
= 0.5 = 50%.
Now let us look at the individual parts of our strategy. After the first
stage we have Rs 1000 invested in T and Rs –800 invested in S
(since we owe S). Therefore the weights for S and T are:
wS =
= –4, wT =
= 5.
Note that the weights do add to 1. The individual rates of return are
rS =
rT =
=
=
=
= 12.5%
= 20%
Don’t be misled by the positive rate of return rS. It actually represents
a loss, since it was earned on a negative initial investment.
In this story, you have lost Rs 100 on the short sale of S, but have
compensated by earning Rs 200 on the trade in T. The role of the
short sale was that it raised cash that enabled you to trade in T.
□
Exercise 3.1.3 What will happen in the above scenario if, after 10
days,
1. S is worth Rs 1000 and T is worth Rs 1100?
2. S is worth Rs 700 and T is worth Rs 800?
Short selling is often restricted or prohibited because, as in the
example above, it is popular with speculators who wish to quickly
raise cash to exploit their analysis of the market trends. Therefore it is
seen as a contributor to instability. Moreover, short selling also
increases the chances of default since the short seller may have
unexpected difficulty in buying the asset later and completing the
sale.
RANGE OF WEIGHTS
An individual weight can take on any value: it will be positive if we
own the corresponding asset and negative if we have short sold or
borrowed it. However, if short selling is not allowed, then each weight
will be non-negative and this will also force all of them to be at most
one: 0 ≤ wi ≤ 1.
PORTFOLIO RETURN AND RISK
Let ri be the rate of return from the ith asset, and let the corresponding
mean return and variance be ri and σi2. The final value of the ith asset
is Ai(1 + ri), and hence the final value of the portfolio is
.
Therefore, the rate of return of the portfolio is
r=
=
= ∑ iwiri.
Exercise 3.1.4 Verify this relationship in the scenario of Example
3.1.2.
It follows that the mean and variance for the portfolio’s rate of return
are given by (see Appendix, §B.12)
(3.1)
(3.2)
where σij is the covariance between the ith and jth rate of returns.
These formulas can also be expressed neatly using matrix algebra:
(3.3)
r=
,
(3.4)
σ2 =
.
The matrix formulations are very useful when working with large
amounts of data.
3.2 PORTFOLIO DIAGRAMS AND EFFICIENCY
In order to represent the profit-risk relationship graphically, we plot
each asset on a portfolio diagram. This diagram represents each
asset A by the mean rAand standard deviation σA of its return. (See
Figure 3.1.) You should note that risk (σ) is plotted on the horizontal
axis, and expected profit (r) on the vertical axis. The reason is that we
are trying to understand how profit depends on risk.
Figure 3.1: Portfolio diagram for a single asset A
Exercise 3.2.1 Consider the following portfolio diagram which
depicts four assets with varying properties.
Among the assets A and B in the above diagram, which is more
attractive to you? What choices would you make in all the other
pairings?
The most common response is that A is best because it
maximises the expected return while minimizing the risk. Carrying this
logic further, both B and C rank between A and D. However there is
no obvious way to choose between B and C. This choice will vary with
the investor, depending on his particular combination of greed and
fear of risk.
The biases exhibited above can be broken into two types. First, A
is preferred to C (and B to D) because it offers more mean return at
the same risk level. This bias is called non-satiation, and it is
reasonable to expect that every investor will have this bias.
Secondly, A is preferred to B (and C to D) because it offers the
same mean return but at a lower risk level. Investors who think in this
way are said to be risk-averse. As a general rule, the models used in
finance assume investors to be risk-averse. However, it is worth
noting that there are investors with other tastes. A risk-neutral
investor is one who does not care about risk: such an investor would
find A and B equally attractive. An investor could even be riskpreferring. For a risk-preferring investor, the most attractive
investment is B.
Why would an investor be attracted by risk? Well, suppose he is
desperate for high returns, but assets with sufficiently high mean
return are not available. Then his best option is to go for an asset with
a high risk level.
In Figure 3.2, A and B are two assets with the same mean return.
The vertical bars represent the possible fluctuations in the actual
return about the mean position. The higher σ for B results in a wider
bar, which in turn opens up the possibility for higher returns than are
available from A (as indicated by the dashed line).
On the whole, one may expect occasional risk-preferring
behaviour from investors with extreme needs, but the dominant mass
of investors will be risk-averse.
Figure 3.2: Some investors may like risk because it opens the
possibility of higher return.
An asset or portfolio A is said to be more efficient than another,
B, if their mean returns and standard deviations satisfy one of the
following:
1. rA > rB and σA ≤ σB
2. rA ≥rB and σA < σB
In the situation of Exercise 3.2.1 above, A is the most efficient among
the four assets, and D the least. Risk-averse investors will be
attracted to more efficient assets.
An asset or portfolio is called efficient if no other portfolio is more
efficient than it.
Exercise 3.2.2 Identify the efficient assets in the following diagram:
3.3 FEASIBLE SET
We now move on from considering individual assets to considering
their collections into portfolios. The overall task is as follows: given a
full list of available assets and the basic properties of their returns
(mean, variance, covariance), consider all the portfolios that can be
created from them. Locate each possible portfolio on the portfolio
diagram and identify the efficient ones.
Thus, consider a market in which assets A1,…,An are available.
Let the rate of return of Ai have mean ri and standard deviation σi.
Consider a portfolio P made of a combination of these assets, and let
wi be the weight of Ai in P. Then, as calculated in §3.1, we have
rP = ∑ iwiri,
σP2 = ∑ iwi2σ i2 + ∑ i≠jwiwjσij.
Note that rP and σP depend only on the proportions invested in
different assets and not on the total investment. That is, if we scale all
the investments in P by the same amount, we will not change rP and
σP.
The possible portfolios can be obtained by varying the weights,
and the region in the portfolio diagram obtained this way is called the
feasible set. We shall see that the shape of the feasible set depends
not only on the individual properties of the assets but on their
relations with each other as expressed by the covariances of their
returns.
Exercise 3.3.1 What will be the feasible set for one asset?
FEASIBLE SET FOR TWO ASSETS
Let us start by considering the simplest situation, that of two assets, A
and B. Let ρ be the correlation coefficient of rA and rB:
ρ=
.
Recall that –1 ≤ ρ ≤ 1. Now, in real life, perfect correlation ρ = ±1 is
highly unlikely and so we start with the assumption that –1 < ρ < 1.
(The cases of perfect correlation are taken up in subsequent
exercises.)
Consider a portfolio P consisting of A and B with respective
weights w and 1 – w. Then the portfolio’s mean return and variance
are given by
If we solve rP = wrA + (1 – w)rB for w and substitute in the expression
for σP, we find a relationship of the form
Exercise 3.3.2 Verify the formula above. In particular, check that
a=
> 0.
The corresponding feasible set is like a horizontal parabola, but
with a more pointed nose (Figure 3.3). In fact, it is a hyperbola. The
next exercise asks you to verify this and to establish some properties
of the hyperbola.
Figure 3.3: A plot of the feasible set for two assets A and B, with ρ =
–0.5. The solid part of the plot corresponds to both weights being
positive. In the dashed parts, one weight is negative (corresponding
to short-selling).
Exercise 3.3.3 Consider two assets A and B with ρ ≠ ± 1. Let rA–B =
rA –rB be the difference of their rates of return, rA–B = E[rA–B], and σA–
B the standard deviation of rA–B. Show that the feasible set for these
assets is a hyperbola whose asymptotes have slopes
±
and meet on the r-axis.
Now we take up the cases of perfect correlation.
Exercise 3.3.4 Assume A and B have different positions on the
portfolio diagram and that their correlation coefficient is ρ = 1.
1. Show that σA ≠ σB.
2. Let a portfolio P consist of amounts of A and B. Verify that σP = 0
when the weight of A is
w=
.
Show that this weight satisfies either w < 0 or w > 1.
Exercise 3.3.5 Show that for ρ = 1, the feasible set for assets A and
B has the form given in Figure 3.4(a).
Exercise 3.3.6 Assume A and B have correlation coefficient ρ = –1.
Let a portfolio P consist of amounts of A and B. Verify that σP = 0
when the weight of A is
w=
.
Show that this weight satisfies 0 ≤ w ≤ 1.
Exercise 3.3.7 Show that for ρ = –1, the feasible set for assets A and
B has the form given in Figure 3.4(b).
Figure 3.4: The feasible set for two assets with correlation (a) ρ = 1
(b) ρ = –1. The dashed and solid segments represent combinations
with and without short selling, respectively.
Exercise 3.3.8 Let the returns for each pair among the the assets
A,B and C have the same correlation ρ. What is the feasible set if (a)
ρ = 1, (b) ρ = –1?
DIVERSIFICATION
An interesting feature that is already apparent is that a combination of
two assets can have a σ that is lower than for either individual asset.
For example, in Figure 3.3 there is a combination of A and B which
has least σ. The process of reducing risk by combining investments is
known as diversification.
Exercise 3.3.9 Show that in the feasible set for two assets A and B,
risk is minimised by the portfolio in which the weight of A is given by
w=
.
Exercise 3.3.10 In the situation of the previous exercise, let σA = σB
and ρAB ≠ 1. Show that the risk then is minimised by w = 0.5.
FEASIBLE SET FOR MANY ASSETS
Now we move on to the case of three assets A, B and C. Taking them
in pairs, we first generate three curves of the type we just saw (Figure
3.5). In the diagrams, we assume that no pair is perfectly correlated.
Figure 3.5: Pairwise feasible curves for three assets
To obtain general combinations of A,B and C, we first take a
combination of A, B by picking a point on the curve joining A and B,
and then generate the feasible set of it and C. The next diagram
(Figure 3.6) shows the result of carrying out this process for various
initial combinations of A,B. Initially, we also assume short selling is
not allowed and so all weights are non-negative.
Figure 3.6: Feasible curves (dashed) obtained by combining C with
points on the AB curve
If we complete this process and look at all possible combinations,
but allowing only non-negative weights, we sweep out the region
shown in Figure 3.7. This is the feasible set when short selling is not
allowed.
Figure 3.7: Feasible set for three assets with no short selling
Figure 3.8 shows what is possible when negative weights are
allowed— we get a distribution that is slightly broader and also
extends indefinitely to the right.
Figure 3.8 Feasible set for three assets with short selling
For the moment, we leave matters at this somewhat imprecise
stage. In the next section, we will see how to get an exact description
of the feasible set and its boundary.
The situation for many assets follows the same pattern as for
three. The assets generate a feasible set which looks like a bullet
heading to the left (see Figure 3.9). This characteristic shape is called
the Markowitz bullet. If we consider all portfolios having the same
mean return (these will be represented by a horizontal line), the one
with minimum variance is on the left edge of the feasible set. This
edge is therefore called the minimum variance curve. Note that
there is further a unique portfolio which has the minimum variance
amongst all the feasible portfolios. This is situated on the tip of the
bullet and is called the minimum variance portfolio.
On the minimum variance curve, the points above the minimum
variance portfolio represent efficient portfolios. The upper half of the
minimum variance curve is therefore called the efficient frontier.
Note that there are two minimum variance curves and two efficient
frontiers, according to whether short selling is allowed or not. We
have depicted a situation where the minimum variance portfolio is the
same for both cases, but it need not always be so (see Example
3.4.7).
Figure 3.9: The Markowitz bullet for an arbitrary collection of assets
3.4 MARKOWITZ MODEL
We will now embark on a more detailed investigation of the Markowitz
bullet. The first thing we shall do is calculate the minimum variance
portfolio for any fixed level of mean return. Similar calculations will
yield the overall minimum variance portfolio for all mean returns. This
will identify the minimum variance curve and the efficient frontier.
Next, we shall show that the minimum variance curve is in fact the
feasible set for any two points on it, provided that short selling is
unrestricted. This Two Fund Theorem will give an explicit description
of the minimum variance curve and the efficient frontier.
MINIMUM VARIANCE CURVE WITH SHORT SELLING
Let us fix a level r of the mean return, and then look for the portfolio
which has this mean return along with the least possible σ2.
Mathematically, since σ2 is a function of the weights wi, the task is to
minimise the function
(3.5)
while satisfying the conditions
(3.6)
(3.7)
This problem is one of constrained optimisation and the
standard technique for this is the Lagrange multipliers method (see
Appendix, §A.3). According to this technique, since there are two
constraint equations,
we should
introduce two new variables, say λ and μ, and then set up the
following n equations:
(3.8)
To carry out the differentiation on the left-hand side, note that in
the expression ∑i≠jwiwjσij each i,j pair occurs twice: once as wiwjσij
and again as wjwiσji. Also, σij = σji. Therefore,
Substituting these calculations into equation (3.8), we get
We can simplify these a little more by replacing the variables λ
and μ by 2λ and 2μ respectively. Then the last set of equations
becomes
(3.9)
Combining these with the two constraint equations (3.6) and (3.7)
gives a system of n + 2 linear equations in n + 2 variables, w1,
…,wn,λ,μ. This linear system can be expressed in matrix notation as
follows:
(3.10)
Exercise 3.4.1 Consider three assets A,B and C with the following
properties.
rA = 0.2, rB = 0.4, rC = 0.6,
σA = σB = σC = 1,
σAB = σAC = σBC = 0.
Find the minimum variance portfolio amongst all their combinations
with r = 0.5.
Finding the overall minimum variance portfolio, without any constraint
on r, is slightly simpler. In this case, the constraint (3.6) is not present
and we need only introduce one new variable, μ. We get the following
linear system in the variables w1 … , wn,μ (again, after replacing μ by
2μ):
This system has the following form in matrix notation:
(3.11)
Exercise 3.4.2 Consider the assets of Exercise 3.4.1 Find their
combination with the minimum variance.
Exercise 3.4.3 Consider a universe of n assets with notation as
above. Define:
Show that the weights w = (w1,w2,…,wn)T of the minimum variance
portfolio are given by
Once the minimum variance portfolio is known, we can identify the
efficient frontier. It consists of all the points on the minimum variance
curve whose mean return is more than that for the minimum variance
portfolio.
MINIMUM VARIANCE CURVE WITHOUT SHORT SELLING
When short selling is not allowed, the weights have to be nonnegative. The problem for finding the minimum variance portfolio for a
level r of the mean return becomes:
subject to the constraints,
The constraints wi ≥ 0, being inequalities, complicate matters. The
Lagrange multipliers method no longer applies. Instead, one has to
resort to the methods of quadratic programming. The mathematics
involved, while not very abstract, is yet complicated enough for us to
avoid it in this text. Therefore, in the rest of this chapter, we will only
consider the situation where short selling is unrestricted.
Example 3.4.4 In this example, we shall give a flavour of the results
of Markowitz in the absence of short-selling. In fact, we will use one of
his examples. We assume the availability of three assets A,B and C
with the following mean returns:
rA = 0.062, rB = 0.146, rC = 0.128.
Their covariance matrix is taken to be:
Thus, the variance of A is σA2 = 0.0146, the covariance of A and B is
σAB = 0.0187, and so on.
Figure 3.10: The feasible set and efficient frontier for Example 3.4.4
The first diagram in Figure 3.10 shows the feasible set for these
assets without short selling. It has been drawn by taking about 500
random combinations of the assets with positive weights. The assets
themselves are marked by the solid squares. This already shows that
the efficient frontier may not be completely smooth. There is at least
one point where it turns sharply.
In fact, Markowitz showed that the efficient frontier has the
following form. First, there are some key points on it, which we will
call turning points. The efficient frontier is obtained by connecting
adjoining turning points via their feasible curves. Thus, the frontier
consists of pieces of hyperbolas, glued together at the turning points.
The second diagram in Figure 3.10 shows the turning points (the
stars) and efficient frontier for this example. In Example 3.4.7, below,
we carry out the calculations whose results are shown in the diagram.
□q
TWO FUND THEOREM
We return to the situation of a market based on n assets, with no
restrictions on short selling. By our earlier calculations (equation
(3.10)),we know that a portfolio has the minimum variance for a fixed
mean return r if its weights (wi) satisfy the system
SW = R
where
S=
,W=
,R=
for some choice of λ and μ.
Now, let A and B be two portfolios on the minimum variance curve.
We set up notation as follows:
1. The weight of the ith asset in A is wiA.
2. The values of λ and μ for A are λA and μA.
3. The mean and variance of the rate of return for A are denoted by
rA and .
4. The quantities connected to B are denoted similarly.
Let P be a combination of A and B. Let wA and wB be the weights of A
and B in P. We first note that the weight of the ith asset in P is given
by
wiP = w iAw A + wiBw B.
Further, we define λP = wAλA + wBλB and μP = wAμA + wBμB. We now
calculate:
Therefore, the portfolio P satisfies the system (3.12) and is on the
minimum variance curve: it is the minimum variance portfolio for the
mean return rP= wArA + wBrB.
The situation is summed up by:
Theorem 3.4.5 (Two Fund Theorem) Suppose short-selling is not
restricted. Fix two portfolios A and B on the minimum variance curve.
Then, any combination of A and B is also on the minimum variance
curve. Conversely, every point on the minimum variance curve can be
represented by a combination of A and B.
□
In particular, the minimum variance curve is a hyperbola when short
selling is unrestricted, since it is the feasible set for any two portfolios
lying on it.
The Two Fund Theorem has a simple consequence:
Theorem 3.4.6 Suppose short-selling is not restricted. Then any
combination of efficient portfolios, with all weights non-negative, is
also efficient.
The Two Fund Theorem is very useful for an investor who does not□
have the resources or inclination to do a full analysis of the available
assets. Suppose such an investor identifies two portfolios A and B
that he has reason to think are efficient (For example, he may
consider two well managed mutual funds). Now, to create an efficient
portfolio with a desired mean return r, he only has to combine A,B
appropriately. Their weights wA and wB have to be chosen such that r
= wArA + wBrB. This is the only calculation he has to make!
Example 3.4.7 We will now use the Two Fund Theorem to derive the
results of Example 3.4.4 concerning the efficient frontier in the
absence of short-selling. Since the example has three assets, all
portfolios can be described using 2 weights. Let wA be the weight of
asset A and wB the weight of asset B. Then the weight of asset C is 1
– wA – wB. Each portfolio can therefore be represented as a point on
a plane, with coordinates wA and wB . The variance and mean return
of a portfolio are functions of wA and wB :
The portfolios with the same variance therefore lie on an ellipse, while
those with the same mean return lie on a line. This gives us a family
of concentric ellipses and another family of parallel lines, as depicted
in Figure 3.11. Figure 3.11 shows that σ2 decreases as we move
towards the inner ellipses. Therefore, minimum variance portfolios (at
given levels of r) are located at the points where a constant r line
tangentially touches a constant σ2 ellipse.
Figure 3.11: (Example 3.4.7) The parallel lines have constant r. The
concentric ellipses have constant σ2, changing from 0.03 to 0.0178 to
0.015 as we move from the outermost to the innermost of the drawn
ellipses. The square box marks the overall minimum variance
portfolio. The thick slanting line is the minimum variance curve: each
member has minimum σ2 for its level of r.
We solve equation (3.11) to obtain the overall minimum variance
portfolio. It has weights wA = 1.1023 and wB = –0.0697. (Note that it
requires short selling.) Its mean return and variance are r = 0.054 and
σ2 = 0.0143.
Minimum variance portfolios at other levels of r can be obtained by
solving equation (3.10). The one at r = 0.01 has wA = 1.7236 and wB
= –0.2359. It also has σ2 = 0.0178.
Figure 3.12: (Example 3.4.7) In the absence of short selling, the
minimum variance curve is AJKB.
The Two Fund Theorem informs us that the minimum variance
portfolios lie on the straight line through the two portfolios we have
just determined. In Figure 3.11, this is the thick slanting line. Its
equation is
=
or wB = –0.2675 wA + 0.2252.
When short selling is not allowed, we are confined to the triangle
lying between the the wA and wB axes and the line wA + wB = 1
(triangle ABC in Figure 3.12). The minimum variance line intersects
the edges of this triangle at two points J and K, given by
J = (0.8419,0) and K = (0,0.2252).
Inside triangle ABC, as we walk down any r = constant line, we will
achieve minimum σ at one of the edges AJ, JK or KB. Therefore, the
minimum variance curve is now the broken line AJKB. Each of these
line segments corresponds to a piece of a hyperbola in the σ – r
portfolio diagram.
□
Exercise 3.4.8 Consider a portfolio constituted of the three assets A,
B and C in the preceding example. Regulations restrict the manager
of the portfolio to keeping each asset’s share of the portfolio to at
least 20%. What will be the efficient frontier under these conditions?
Exercise 3.4.9 Note that a line cuts an ellipse in atmost two points.
This implies that for each feasible σ–r combination, in a universe of
three fundamental assets, there will be one or two different portfolios
with that pairing of mean return and variance. In fact, there will be one
portfolio if the combination is on the minimum-variance curve, and two
portfolios otherwise. What will happen in a universe of four or more
fundamental assets?
ONE FUND THEOREM
So far, we have considered situations where every asset is risky. Let
us now consider situations where there are also risk-free assets.
Such assets have σ = 0 and the same rate of return rf by the No
Arbitrage Principle.
The feasible set arising out of a risky asset A = (σA,rA) and a riskfree asset (0, rf ) is easily calculated. If their respective weights in a
portfolio P are w and 1 – w, then
rP = wrA + (1 – w)rf
w=
and
σP2 = w2σ A2
σ P = |w|σA =
The situation is depicted in Figure 3.13.
|rP – rf|.
Figure 3.13: Feasible set for a risky asset A and a risk-free asset.
The upper and lower dashed lines represent portfolios where the riskfree asset and A have been short sold, respectively.
From this observation, it is an easy step to obtaining the feasible set
for n risky assets together with a risk-free asset. It is clear that the
new feasible set will consist of the region between two straight lines
passing through (0,rf), and that these straight lines will be symmetric
about the horizontal line at height rf. It is also clear that the edge of
the new feasible set will tangentially touch the old feasible set at
either one or two points, depending on the position of the minimum
variance portfolio (with standard deviation σV and mean return rV )
arising purely from the risky assets:
1. If rV = rf, there will be two points of tangency.
2. If rV < rf, there will be one point of tangency, and it will be on the
lower edge.
3. If rV > rf, there will be one point of tangency, and it will be on the
upper edge. (This is the situation illustrated in Figure 3.14.)
Figure 3.14: Feasible set when there is a risk-free asset and short
selling is allowed: the situation of the One Fund Theorem.
The last case is the most important one. This is because, in line
with the general expectation that riskier assets offer higher returns,
we expect rV > rf to be the typical case. Our discussion is summed up
by:
Theorem 3.4.10 (One Fund Theorem) Suppose that a risk-free
asset is available with rV > rf. Then there is a unique portfolio M of
risky assets such that the efficient frontier consists of the ray starting
at the risk-free asset and passing through M.
□
It is a fact that the One Fund Theorem is true even in the absence of
short selling. We shall not formally prove this, but it is evident from the
picture.
The One Fund Theorem is also called the Separation Theorem and
was discovered by James Tobin [50] in 1958. Tobin won the Nobel
Prize in 1981.
The ray giving the efficient frontier is called the capital market line. If
the point M = (σM,rM) is known, then the equation of this line is
r = rf +
σ.
This describes the profit-risk relationship for efficient portfolios.
Two problems remain: finding M, and establishing the profit–risk
relationship for all portfolios.
The portfolio M can be found by calculus. For any portfolio P
composed purely of risky assets with weights (wi), the slope of the
line joining it to the risk-free asset is
m=
=
=
.
Note that if we scale all the weights by the same positive constant, m
does not change. So the constraint ∑ iwi = 1 can be ignored, and we
can let the weights wi vary freely. We now have a problem of
unconstrained optimisation and we solve it by setting
= 0, i = 1,…,n.
We differentiate m using the quotient rule and find
We first solve the linear system rk – rf = ∑jσkjvj for the vj’s, and then
scale them to obtain the weights (wj) of M:
wj =
.
Although we have obtained an equation for the “one fund” M, its
practical solution may be difficult. First, in any reasonable market, n
will be large and the computations will be time-consuming as well as
unreliable due to the accumulation of roundoff errors over the many
individual calculations (solving a system of size 1000 × 1000 requires
roughly 108 multiplications and divisions). The final answer is also
likely to be highly sensitive to even small fluctuations in the estimates
of r and σ.
3.5 CAPITAL ASSET PRICING MODEL
The high point of the Markowitz model is the One Fund Theorem.
However, we noted that while the model gives us a linear system that
we can solve to get the “one fund” M, the actual numerical solution
may be unreliable. Moreover, since M is special, there ought to be
some theoretical insight into its constitution. This is provided by the
Capital Asset Pricing Model (CAPM). CAPM also completes the
description of the risk–profit relationship, extending it from efficient
portfolios to all portfolios. It exists in many versions, with varying
assumptions, and what we shall describe here is just the simplest of
these.
CAPM was developed during the 1960s by various scholars. Chief
amongst these were W F Sharpe (he won the Nobel Prize in
Economics in 1990), J V Lintner, J Mossin and J L Treynor [27, 34,
44, 22].
MARKET PORTFOLIO
CAPM is based on the following assumptions about market conditions
and the behaviour of investors:
1. Investors make their decisions only on the basis of the means
and variances of the portfolio returns.
2. All investors plan for the same time horizon T and make the
same estimates of the means, variances and covariances.
3. Each investor creates an efficient portfolio.
4. The same risk-free rate is applied to lending and borrowing and
is available to all investors and for all amounts.
The net result is that each investor will calculate the same Capital
Market Line, and will then choose a point on it, according to his level
of affinity for risk. (We are using the fact, noted at the end of the
previous section, that the One-Fund Theorem is true even in the
absence of short selling of risky assets.) Each point on the Capital
Market Line consists of a proportion of M and the risk-free asset, so
the total investment (obtained by summing over all investors) is also a
combination of the risk-free asset and M. Hence, the total investment
in risky assets is just a multiple of M and can be identified with it
(since scaling the size of an investment without changing its weights
leads to the same rate of return). Consequently, we call M the market
portfolio.
It is impossible to completely describe the market portfolio. It is
difficult to even list the risky assets fully, let alone calculate the
amounts invested in them. Instead, one usually settles on using a
comprehensive stock index as an approximation to the market
portfolio.
MARKET BETA
It is natural to explore the relationship between any one asset or
portfolio and the full market portfolio M. For example, we would like to
know how fluctuations in the market would correlate with those of the
asset. If the market return rises by 10%, can the asset return also be
expected to rise—and by how much? As a first step, we obtain a
linear approximation to the relationship by the method of Ordinary
Least Squares or OLS (see Appendix, §B.14).
According to this, the rate of return r of the asset is approximated
by
α + β rM,
where rM is the market return, with β
β=
and α = r – βrM.
The coefficient β is called the market beta or just beta of the asset.
Table 3.1: Some betas calculated with respect to the NIFTY stock
index over the years 2004–2005 and 2006–2007. As this table
illustrates, betas are usually positive and mostly vary between 0.5 and
2. (Source: National Stock Exchange of India.)
The beta of an asset gives a quick idea of how it is related to the
market. If the beta is positive, uptrends in the market will generally
correspond to uptrends in the asset. If the beta is negative, uptrends
in the market will correspond to downtrends in the asset. In addition to
its sign, the magnitude of the beta also carries information. The larger
the magnitude of beta, the greater the fluctuations in the asset. For
example, suppose β = –2. Then the fluctuations in the asset return
will generally be about twice those in the market return, and in the
opposite direction. If β = 0, the asset and market returns are
uncorrelated.
A useful property of β is linearity. Suppose a portfolio P is
composed of two assets A and B with weights wA,wB. Let rA denote
the rate of return of A, βA denote the beta of A, and so on. Then the
beta of P is given by
Figure 3.15: A plot of monthly returns on ICICI Bank shares (r)
versus the returns on the S&P CNX 500 Index (which represents the
market portfolio) rm during the years 2003–2007. There is a clear
linear component to the relationship. In §3.8 we explain how to fit a
line to such data.
More generally, if the assets in a portfolio P have weights w1,…,wn
and betas β1 , … , βn, then the portfolio beta is given by
Thus the beta of a portfolio can be determined from those of its
constituents.
CAPM FORMULA
Having identified the “One Fund” with the market portfolio, let us now
complete the profit–risk description. CAPM achieves this by
considering the relationship of any portfolio with the market portfolio.
For a given portfolio A, consider the set consisting of all combinations
of A and the market portfolio M. This set forms a curve in the feasible
set, as shown in Figure 3.16. The Capital Market Line meets this
curve at M and, as it cannot cross it, is tangential to it. Let us consider
the implications of this geometric insight.
Let P be a portfolio in which A has weight t and M has weight 1 – t.
Then its mean return and variance are given by
Figure 3.16: Illustration for the proof of the CAPM formula
In order to calculate slopes at M, which corresponds to t = 0, we
differentiate the above expressions at t = 0 to get:
The slope of the curve at M is therefore given by
By the tangency condition, this equals the slope of the Capital Market
Line:
We do a final rearrangement:
rA= rf + β(rM – rf), where β = ρAM
.
This is the basic conclusion of CAPM. The coefficient β (beta)
measures risk relative to the market (note that this is the same β as
obtained earlier via OLS). It gives the change in rA corresponding to a
unit change in rM. If we plot r against β, CAPM predicts the result will
be the straight line plotted in Figure 3.17. This line is called the
Security Market line.
CAPM suggests that the risk-profit relationship is more clearly
understood if we do not look at an asset in isolation. Instead, we
should consider it relative to the overall market. Thus β, rather than σ,
is the best fundamental variable. However, since every portfolio lies
on the Security Market Line, CAPM does not help us choose between
portfolios. The overall picture is that CAPM helps us price portfolios,
while the Markowitz Model helps us choose between them.
An incidental gain of shifting to β is that risk becomes linear.
Figure 3.17 According to CAPM, when r is plotted against β, all
assets lie on the Security Market Line. Two special points on this line
are the ones corresponding to risk-free investment (β = 0 and r = rf)
and the market portfolio M (β = 1 and r= rM).
3.6 DIVERSIFICATION
The Markowitz model has shown the benefits of diversification, i.e.,
investing in a variety of assets. By doing so, we can reduce risk while
keeping the expected return at a satisfactory level. Now we shall
apply CAPM to this idea. Inspired by the CAPM relationship for mean
returns, we consider the original returns themselves. Define a new
random variable ϵ by
rA = rf + β(rM – rf) + ε,
or
ε = rA – rf – β(rM – rf).
Taking variance on both sides, we find:
This is again rearranged as
σA2 = β2σM2 + σ ε2.
Note that if A = M then ε = 0 and the second term vanishes.
Therefore, the second term is seen as representing diversifiable
risk: risk which can be avoided by diversifying. The first term cannot
be avoided in this manner and is therefore called undiversifiable or
systemic risk: it represents the risk due to general trends in the
market.
CAPM asserts that not all risk is relevant to pricing. The part called
diversifiable does not contribute to pricing (since only β shows up in
the CAPM formula). The reason is that this risk can be tempered by
combining the asset with others—hence there is no need to demand
or give compensation for it. The undiversifiable risk, however, is
unavoidable and this is what governs pricing. This analysis of risk is
one of the striking achievements of CAPM.
Exercise 3.6.1 According to CAPM, diversifiable risk does not
contribute to pricing. Does this mean it has no relevance to
investment decisions?
3.7 CAPM AS A PRICING FORMULA
Consider an asset with initial value V0 and expected value V at some
future time. Then, according to CAPM,
= rf + β(rM – rf).
This can be solved for V0:
V0 =
.
Thus CAPM provides a way to price assets based on their future
payoff and associated systemic risk. Note how this formula
generalises that for the present value of a known cash flow, where we
divide the future value by 1 + rf .
If an asset with (unknown) future value V is being sold now at a
price P, the above formula suggests we define its net present value
by
NPV = –P +
.
Exercise 3.7.1 Consider an asset with present value V0 and
unknown future value V . Derive its certainty equivalent pricing
formula from CAPM:
V0 =
.
Exercise 3.7.2 Consider assets with current values V1 and V2. Show
the value of their combination is V1 + V2 by using:
1. No Arbitrage Principle
2. CAPM
Exercise 3.7.3 Suppose that a model visualises 3 states for a market
one year from now: boom, stagnation, and recession. The
corresponding probabilities, market returns, and value of an asset are
tabulated below. What should be the current price of the asset? (The
risk-free rate is 5%.)
3.8 NUMERICAL TECHNIQUES
The models we have been considering involve certain properties of
assets that cannot be directly observed. Specifically, they involve prior
knowledge of the distribution of asset prices in the future. Clearly, we
cannot exactly know the expectation and variance of the future return
from an asset. We can only make estimates, based on how the asset
prices have varied till date as well as our analysis of what is likely to
occur in the future. We shall now describe how historical data is used
to make these estimates.
We first take up the Markowitz Model.
MARKOWITZ MODEL
Here, the goal is to describe the feasible set and we need to know r
and σ for each relevant asset. One way to estimate them is to look at
data from the past. For example, we consider evenly spaced data
over a time period and use it to form a sequence of rates of return:
r1,r2,…,rN. This can be viewed as a random sample for the mean rate
of return r. Hence we can use sample mean and sample variance as
estimates for r and σ2 (see §B.15):
Generating the full feasible set requires more work. The first step
is to find the covariances for each pair of assets. For this, we use the
following estimator (see §B.15):
where ri,k is the k th value of the observed rate of return of the i th
stock, and Ri is the mean of the ri,k (1 ≤ k ≤ N).
Example 3.8.1 Figure 1.2 was generated by applying this technique
to the weekly returns of the 65 stocks constituting the Dow Jones
Composite Index over a one-year period in 2005–06. (The data was
downloaded from Yahoo!Finance.)
The 65 stocks form
, and we have to
calculate 1040 different covariances! However, if we collect the data
on returns into a single spreadsheet or file, the matrix of covariances
can be calculated by programs such as Excel and Mathematica. Once
this is available, the program can also be used to solve the linear
system for the points on the minimum variance curve.
Figure 3.18: Efficient portfolios for the data of Figure 1.2. The square
box marks the Dow Jones Composite Index while the diamonds
represent its constituent stocks. The filled stars show the efficient
frontier when short selling is allowed, while the unfilled stars show the
efficient frontier when it is not. (See Example 3.8.1.)
While we have not explained the quadratic programming
techniques needed for the case when short selling is not allowed,
these are already embedded in various software. Figure 3.18 shows
□
the two efficient frontiers for the data of Figure 1.2.9
Difficulties in the application of this model centre around the estimates
of r and σ, as these depend on the amount of data used. Usually, one
would like to use as much data as possible so that the estimates
stabilise (see the discussion of the effect of sample size on mean and
variance estimates in §B.15). In our present case, however, the
behaviour of the assets cannot be expected to remain the same over
time and hence older data may distort the picture instead of refining it.
SHARPE INDEX
Consider a risky asset with mean return r and standard deviation σ.
Its Sharpe index is the ratio
S=
where rf is the risk-free rate. In practice, we would use estimates
and for r and σ.
Figure 3.19: The Sharpe index S of A is given by S = tanθ
Geometrically, the Sharpe index is the slope of the line joining the
risky asset to the risk-free asset in the portfolio diagram (the dashed
line in Figure 3.19). A higher value of the index indicates that the
asset is closer to the Capital Market Line. Hence the index measures
the efficiency of the asset.
CAPM
The flip side of the theoretical elegance of CAPM is a certain lack of
solidity when it comes to numerical work. The main difficulty in its use
is the elusive nature of the market portfolio M. As remarked earlier, it
is not possible to fix the composition of M—in fact it is not even
feasible to list all its constituents.
ESTIMATING BETA
In any case, let us start with the standard first step of choosing a
certain comprehensive stock index as an approximation to M. By
manipulating historical data as in the previous section, we can find
estimates of rM as well as r (for any stock). We can also estimate the
β of any stock relative to the stock index by substituting the relevant
estimates of covariance and variance into its formula.
It is worth noting that beta can also be estimated from its
interpretation as the slope of the Ordinary Least Squares line. Thus,
suppose we have data x1,…,xN for the rates of return from the market
portfolio (or the index representing it) over equal time intervals, and
also data y1,…,yN for the corresponding rates of return of a certain
stock. We try to fit a line y = a + bx to this data. Following OLS, we
define the best line as the one that minimises the total squared error
We carry out the minimisation by applying the first derivative test. This
gives the equations
The solutions are
and
We shall call the line determined by these values of a,b the OLS line.
Exercise 3.8.2 Verify the above results.
The number b provides an estimate of the beta β of the asset relative
to the index. The advantage of using this estimate of β is that we also
get an idea of how good it is. We first define
ŷi = a + bxi.
This is the y-value corresponding to xi as predicted by the OLS line.
The question is: How much of the variation of the yi values from their
mean can be explained by the OLS line?
Figure 3.20: An illustration of the OLS fit of a line to data
Let
be the mean of the y values. For any i, the
deviation from the mean (in the original data) is yi – y. The amount of
this deviation that is explained by the OLS line is ŷi – y (Figure 3.20).
To get the overall picture, we define:
The coefficient of determination is defined by
It represents the fraction of variation that is explained by the linear
dependence of y on x. It is a fact that 0 ≤ R2 ≤ 1. The closer R2 is to 1,
the more significant is the OLS line, and the more reliable is our
estimation of β by b.
Example 3.8.3 Figure 3.15 is based on monthly returns over 5 years
(2003–2007) for the S&P CNX 500 Index and ICICI Bank stock. The
OLS line for this data produced by the above formulas is
r = 1.036 rM + 0.0058,
where r is the (predicted) monthly rate of return of ICICI stock and rM
is the monthly rate of return of the index. The coefficient of
determination is
R2 = 0.51.
This is a reasonably high value of R2 and suggests a strong
relationship between the two sets of data.
□
Now that we know how to estimate r and β for each asset, we plot
the estimated (β, r ) pairs on a single diagram. We cannot expect
these points to lie on a straight line since they are not based on the
exact market portfolio, and further, they use estimates of r and β
which may deviate significantly from the true values. Nevertheless, if
CAPM is to be useful, they should lie roughly along a line.
SECURITY MARKET LINE
An approximate Security Market Line can be drawn by connecting the
risk-free point to the one marking the stock index.
Example 3.8.4 Example 3.8.3 described the calculation of the β of
ICICI Bank versus the S&P CNX 500 index over the years 2003-2007.
Figure 3.21 shows the results of extending these calculations to a
total of 34 stocks selected from the members of the Nifty index (we
have picked all those which were included in Nifty throughout 2003–
2007).
Figure 3.21: A plot of r versus β for 35 stocks and the S&P CNX500
index. The box marks the index and the line through it is the
estimated Security Market Line. See Example 3.8.4.
The resulting plot is supportive of a linear relationship between r
and β, as predicted by CAPM. There is only one point which is
dramatically away from the Security Market Line. This is the point
representing the construction company Unitech, which had a
remarkably high mean return in this period due to a single huge
contract.
□
Figure 3.22: Illustration depicting Jensen’s index α
The Security Market Line marks the expected return for any level of β.
The performance of an individual asset over the period can be judged
by its position relative to this line. If it plots above the line, it has
performed better than average at its level of systemic risk. If it plots
below, it is below average.
Thus, the performance of the asset can be quantized by
measuring the vertical distance from the Security Market Line:
α = – rf –
(
M
– rf).
This quantity is called Jensen’s Index or Jensen’s alpha. The
symbols
, M and
represent the estimates of r, rM and β,
respectively.
Investors prefer assets with high α as these have shown superior
performance in the recent past.
CAPM AND REAL LIFE
Much ink has been spilled on the issue of whether CAPM is an
accurate or useful model. An important early criticism, which we have
already touched upon, was by Richard Roll [39]. Roll observed that
the market portfolio is impossible to describe, even at the theoretical
level. For example, the level of skill of its work-force is certainly an
asset for a company—but how can it be given a numerical value?
Even if we agree on a list of assets to be counted, it will be impossible
to obtain all the relevant data. Hence, when we test the CAPM, we
must use a substitute for the market portfolio. If the test gives a poor
result, we cannot say whether this is the fault of CAPM or of our
choice of substitute. This means that CAPM cannot be refuted by
experiments and so, in a philosophical sense, it is unscientific.
Roll’s criticism is certainly valid. However, from a practical
standpoint, the issue is not if CAPM is correct but whether it is useful.
CAPM has been used in many ways—for example, analyzing risk,
pricing assets and evaluating a firm’s performance. Do its predictions
in these matters stand up to statistical scrutiny?
Here are some of the empirical criticisms of CAPM:
1. The Security Market Line is often far from the OLS line for the
data, and so is not a good fit to it. Typically, the OLS line is flatter
than the Security Market Line—low β stocks have higher mean
returns than CAPM predicts.
2. Data suggests that market β does not completely describe the
risk–profit relationship and other variables, such as the Total
Market Capitalisation (TMC) of a stock, are also involved.
3. The r–β relationship does not always appear linear.
Naturally, many attempts have been made to create models which
resolve these difficulties. A few of the better-known ones are:
Black’s CAPM: Fischer Black (1972) replaced the assumption about
unrestricted availability of the risk-free asset with unrestricted shorting
of risky assets. This still leads to a Security Market Line on which
every asset lies, but its slope is not prescribed.10
Arbitrage Pricing Theory: This model, formulated by Stephen Ross
(1976), assumes only that investors will prefer the higher of two riskfree returns (in particular, they do not carry out mean–variance
analysis). The model develops a formula in which an asset’s mean
return depends linearly on a number of factors. It does not prescribe
what these factors should be, and leaves them to be determined by
regression analysis of data.
Fama and French’s Three-factor Model: Eugene Fama and
Kenneth French (1993–96) created a model in which asset returns
depend also on total market capitalisation (TMC) and book-to-market
equity ratio (B/M). This has become quite popular. Its main drawback
is its empirical nature, lacking a full theoretical explanation for the
choice of additional variables.
In sum, the story is that there is no completely satisfactory
replacement for CAPM. Moreover, most alternate models start with a
CAPM type formulation and then try to modify or enlarge it. Thus,
over forty years after its birth, it remains at the center of modern
finance.11
Exercise 3.8.5 The following exercises sketch the derivation of
Black’s CAPM. We assume there is no risk-free asset but there is
unrestricted short selling of risky assets.
1. Show that since every investor chooses a point on the efficient
frontier, the market portfolio must be efficient.
2. Let Z be a portfolio whose market beta is zero. Show that the
mean return of any portfolio A is then given by
rA = rZ + β(rM – rZ),
where β is the market beta of A. (HINT: In the proof of CAPM,
replace the Capital Market Line with the feasible set of M and Z.)
9 The frontierwhen short selling is not allowed has been plotted using the
software
Efront,
available
from
J
R
Varma’s
website:
http://www.iimahd.ernet.in/~jrvarma.Another option is to use the Solver tool in
Microsoft Excel and Open Office Cale.
10 We will encounter Fischer Black again when we study financial derivatives.
He is best known for the Black−Scholes options pricing formula.
11 A good reference on these matters is Fama and French [20].
4 Forwards and Futures
W
ith this chapter we begin the second phase of our study.
Earlier, we developed the general principles of how assets
can be evaluated and combined into portfolios. Now we
shall study financial derivatives, which provide ways of fine-tuning the
risk characteristics of a portfolio, such as its systemic risk or its
exposure to interest rate fluctuations. Over the last thirty odd years,
trade in derivatives has undergone exponential growth and they have
become central to investment science.
In this chapter, we shall study the most basic derivatives called
forwards and futures. Besides becoming familiar with their basic
structure, we shall also learn how to use them to modify the risk
associated to an asset or portfolio.
We now take a detailed look at forwards and futures: their pricing,
their uses, and the differences between them.
4.1 FORWARDS AND FUTURES
A typical financial derivative is a contract for a future trade of a basic
asset such as a stock, a bond, or a commodity like oil. The terms of
the contract would clearly specify the details of the future trade, such
as the time at which it will be executed, the procedure by which the
price of the asset will be determined, the manner of its delivery, etc.
As a simple example, consider a contract which offers you the
right to buy a certain share a year from now by paying Rs 200 for it at
that time. As the year expires, suppose the share price moves to Rs
250. Then the contract clearly offers you a good deal and this will
attract the attention of other investors, who become desirous of
buying the contract from you in order to benefit from it. Thus, the
contract itself becomes an object of trade and its value changes with
time, depending on various factors, such as changes in the value of
the share and fluctuations in interest rates.
We introduce some relevant terminology:
1. Underlying Asset: The asset on whose trade the derivative is
based.
2. Spot price: The current value of the underlying asset.
3. Writer: The person or firm that offers the derivative for sale.
4. Holder: The buyer of the derivative.
The main problem that will concern is:
Fundamental problem: Establish the value of a derivative from
currently available information.
Success in solving this problem for any derivative will enable us to
put it to various uses involving risk management.
FORWARD CONTRACTS
In a forward contract, the holder agrees to buy a fixed type and
amount of the underlying asset from the writer at a fixed future date
(expiration date) and at a price (exercise price) agreed upon now
(see Figure 4.1).
1. On the expiration date, the holder must pay the exercise price to
the writer.
2. In return, the writer must deliver the underlying asset to the
holder.
3. No money is exchanged at the time of signing the contract.
Figure 4.1: The structure of a forward or futures contract signed at time t = 0
with expiration time t = T and exercise price X
The exercise price is also called the strike price or forward
price. A forward contract is also just called a forward.
Example 4.1.1 A packaged food company and a farmer will trade in
a certain amount of potatoes 3 months from now, after the harvest. If
the crop is poor, prices will rise, and the company will face a loss. If
there is a bumper crop, prices will fall, and it will be the farmer who
will face a loss. Both parties can mutually eliminate their risk by
agreeing now on what price they will trade in 3 months time. This is a
forward contract.
□
FUTURES CONTRACT
The main features of a futures contract are identical to a forward
contract. A futures contract may be referred to simply as a futures,
and its exercise price may again be called its strike price or futures
price.
However, futures are traded through an exchange, are
standardised, and can be traded further (the holder can sell to a new
holder). These features make futures an easily used and flexible tool
for investment. The mathematical treatment of forwards and futures is
almost identical, and so we shall mostly treat these terms as
interchangeable.
Example 4.1.2 Consider a company that buys oil and uses it to
generate electricity for supply to certain cities. The price at which it
sells electricity is subject to government regulation and cannot be
easily changed. Oil prices, on the other hand, are highly unstable.
Therefore it is constantly exposed to the risk of loss due to a sudden
rise in oil prices.
The company can reduce this risk by buying oil futures, say on the
New York Mercantile Exchange (NYMEX), locking in the prices for the
near future (up to 6 years on NYMEX). Because these contracts are
for very large amounts of oil, delivery is not fixed for one day, but can
be executed over a month. For example, one set of oil futures on
NYMEX had December 2007 listed as the expiry month. These
futures could be traded until November 16, 2007, and delivery of the
oil had to take place between December 1 to 31, 2007.
□
The number of futures traded on various exchanges has mushroomed
in recent years:
NYMEX: From 68 million in 1994 to 216 million in 2006.
London International Financial Futures and Options Exchange
(LIFFE): From 106 million in 2000 to 501 million in 2007.
National Stock Exchange of India (NSE): From 90,580 in 2000–
01 to 139 million in 2005–06.
COMPARISON OF FUTURES WITH FORWARDS
In this section we introduce and explore the differences between
futures and forwards. First of all, a forward is a contract negotiated
between two fixed parties, and is usually not traded. A future on the
other hand is a standardised contract which is traded on an
exchange. The exchange provides a physical or electronic
mechanism to facilitate the trading as well as certain guarantees
against default. The latter, particularly, are missing for forwards.
We now look at how an exchange provides protection from
default. Consider an investor who approaches a broker in order to buy
a certain futures for a stock. The broker creates a margin account
and asks the investor to make a deposit called the initial margin.
Consider the futures contracts for a certain asset, expiring at a
time T. Such contracts may be written at any time before T. Let Xn
represent the futures price for futures written at the end of day n. We
assume that our investor bought his futures during day 0 with a
futures price X. At the end of the day, if the futures price X0 is more
than X it represents a gain for the investor, as he has locked in a
lower price than the one now available. On the other hand, the writer
of the contract sees a corresponding loss. Therefore an amount X0 –
X is transferred from the margin account of the writer to that of the
holder. If X0 < X, the opposite is done: X – X0 being transferred from
the holder’s account to the writer’s account. This can also be viewed
as a transfer of the negative amount X0 – X into the holder’s account.
This process is called marking to market, and is repeated at the
end of each trading day. It continually tracks the investor’s gain or
loss and thus provides protection from default. The total change in the
margin account over the life of the contract is given by:
(X0 – X) + (X1 – X0) +
+ (XT – XT–1) = XT – X = ST – X (4.1)
which exactly equals the holder’s profit from the contract. Similarly,
the change in the writer’s account is X – ST, which is his profit. At the
end, neither party can gain by defaulting since the value owed by the
other has already been transferred!
Exercise 4.1.3 Justify the last equality in equation (4.1).
For the process to work, it is important that the amount in the
margin account always be positive. So a maintenance margin is
fixed and, if the amount in the margin account ever falls below it, then
the investor has to make a fresh deposit.
Keeping money in the margin account is a cost for the investor.
Some brokers allow the investor to earn interest on this money, but
even this is usually at lower than market rates. On the other hand, the
investor can withdraw any surplus that accumulates above the initial
margin. The possibility of intermediate withdrawals/deposits makes
futures slightly different from forwards, since the cash flow is no
longer concentrated at the expiry time. However, the difference this
creates in pricing is slight and we shall ignore it, apart from some
comments in the next section.
For the rest of this chapter, “futures” will have the default meaning
of “futures and forwards”. Whenever some statement points only to
futures (or only to forwards), this will be explicitly stated.
4.2 FORWARD AND FUTURES PRICE
In this section we shall consider the problem of determining the right
exercise price for a futures contract. Before we begin, let us think
about what might influence this choice. The initial spot price of the
underlying asset should certainly have a role. Since a future payment
is involved, the interest rate should also come into play. At first
glance, it may seem that an expected rise in the future asset price
would push up the exercise price—then the expected future asset
price would become a factor. Perhaps you can imagine other factors
that may be relevant.
Figure 4.2 shows the typical pattern: the exercise or futures price
is very closely tied to the spot price of the underlying asset. To bring
some clarity to this situation, we take recourse to the No Arbitrage
Principle. The following examples show how we can detect when an
exercise price is too high or too low relative to the spot price, and thus
determine its correct value.
Figure 4.2: Reliance stock and futures during June–August 2005. The futures
prices are for contracts expiring on August 25 2005.
Example 4.2.1 Suppose a futures is being written with an exercise
price of € 100, its expiration date is 3 months from now, and the
current spot price of the underlying stock is € 100. Suppose also that
it is possible to borrow cash for this duration at an annually
compounded rate of interest of 4% per annum.
In this situation, an investor can make a riskless profit as follows.
First, she acquires the contract when it is written (no cost is attached)
and also short sells the stock at a price of € 100. Thus, she currently
has € 100. Investing this at 4%, after 3 months she has € 101 cash.
With € 100 of this she pays the holder of the contract, acquires the
stock, and uses this to make the delivery on the short sale. With all
her obligations cleared, she is left with a profit of € 1. This profit is
riskless––it does not depend on the future price of the stock!
□
Example 4.2.2 In the example above, suppose the exercise price is
€ 102 instead. Our previous calculation suggests this price is too high
and so an investor would attempt to profit by writing this contract. At
the same time, he borrows € 100 to buy a unit of stock. At the expiry
time, he delivers the stock to the holder, earns € 102, and uses € 101
of it to pay off the loan. He is left with € 1 riskless profit.
□
The examples, together with the No Arbitrage Principle, suggest
that the correct exercise price for that particular futures is € 101,
which is the future value of the initial spot price. With this price,
neither holder nor writer can make a riskless profit. We now apply this
line of thinking to a general situation.
Theorem 4.2.3 Consider a futures contract with expiration date T
(measured in years from the writing of the contract). Let the spot price
of the underlying asset at the time of writing be S. Further, suppose
one can borrow cash for this period at a continuously compounded
annual rate of interest r. Then the No Arbitrage Principle implies the
following exercise or futures price X for this contract:
X = SerT
Proof. If X < SerT, the holder of the contract can earn arbitrage as
follows: He initially short sells the asset for S. By time T, this amount
grows to SerT. He uses X of this to close the contract, get the asset,
and deliver to the buyer (through the short sale). With no further
obligations, he is left with a risk-free profit of SerT – X.
If X > SerT, it is the writer who can make an arbitrage profit: She
initially borrows S and uses it to buy the asset. At time T she delivers
the asset to the holder, earns X and uses SerT of that to pay off the
□
loan. She pockets a riskless profit of X – SerT.
Exercise 4.2.4 Show that the exercise price is given by X = S(1 + r)n
if the interest is compounded discretely and T equals n time periods.
One feature of this formula for X, which most people find surprising
when they first encounter it, is that expectations about future prices
play no role in it!
The discussion above illustrates that the No Arbitrage Principle is
a powerful mathematical tool for calculating the correct price of a
derivative. It will provide the base for every pricing formula that we
develop in this course.
Forward and futures prices are the same if interest rates are
constant. If they are variable, then differences arise due to marking to
market. For instance, suppose the spot price S of the underlying
asset is positively correlated to the interest rate r. If r increases, so
does S, and hence X, and the holder of a futures benefits because
marking to market leads to money being deposited into her margin
account. This gain can be withdrawn and reinvested. On the other
hand, a fall in r leads to withdrawals from the margin account and
may lead to her having to borrow money to maintain the account.
Overall, she comes out ahead because she borrows when interest
rates are low and invests when they are high.
However, the holder of a forward does not benefit in this way and
this creates slight differences between forward and futures prices. For
short periods (up to about 3 months) this difference is negligible but
has been observed to become significant beyond that. (For a detailed
exploration of the relation between forward and futures prices, see
Cox, Ingersoll and Ross [13].)
Exercise 4.2.5 Suppose the current stock price is $17, the current
futures price of a contract expiring in one year is $18, r = 8%, and
short-selling requires a 30% security deposit attracting interest at d =
4%. Is there an arbitrage opportunity?
Exercise 4.2.6 Suppose marking to market is modified in the
following way: The adjustment to the margin account at the end of
each day is the present value of the change in the futures price. Then,
even with random interest rates, forward and futures prices would be
equal.
4.3 VALUE OF A FUTURES CONTRACT
Consider a futures with an exercise price X. Suppose that, as the
expiration date approaches, the price of the underlying asset
becomes larger than X. Then, the deal offered by the futures contract
becomes attractive to other investors who would like to acquire it from
the present holder. Thus, the contract acquires a certain value and
may be sold off to a new holder. (If the asset price goes down, the
contract represents a loss for the holder; its value is then negative.)
Let the time of writing the contract be set as 0, the expiration date
be T (years), and the exercise price be X. Let the annual,
continuously compounded rate of interest be r. Consider some time t
during the life of the contract.
The value of this contract at time t will clearly depend on the value
of the stock at that time. So we set up some further notation:
St = Value of asset (spot price) at time t
Vt = Value of contract at time t
If we buy the asset at time t, we spend St. On the other hand, if we
buy it through the futures, we spend X at time T, which is equivalent
to spending Xe–r(T–t) at time t. Therefore, the profit in using the futures
to buy the asset (instead of buying it at time t from the market) is
St – X e–r(T–t).
This is the value of the contract at time t.
Theorem 4.3.1 Consider a futures contract with expiration time T. Let
the spot price of the underlying asset at time t be St, and let r be the
continuously compounded annual rate of interest. Then the value Vt
of the futures at time t is given by
Vt = St – X e–r(T–t).
Exercise 4.3.2 Show that if Vt doesn’t satisfy equation (4.2),
arbitrage will be possible.
(4.2)
□
Exercise 4.3.3 If the interest is discretely compounded and the time
from t to expiry at T equals n compounding periods, then
Vt = St – X(1 + r)–n.
It is implicit in our calculation of the value of a futures that there are
no costs or profits associated with owning the underlying asset. Thus,
the formula would apply to stocks which do not pay dividends, but not
to those that do. Nor would it apply to commodities which have
storage costs. We shall now describe a technique which will allow us
to handle these other situations as well.
Exercise 4.3.4 Consider a futures expiring in 6 months, written on a
share which pays no dividends and whose current price is Rs 100. Let
the annually compounded interest rate be 10%. With what exercise
price should the futures be written?
4.4 METHOD OF REPLICATING PORTFOLIOS
Consider two portfolios A and B. Suppose it is possible to predict with
certainty that at a certain time T in the future they will have the same
value. Then the No Arbitrage Principle implies that in fact they will
have the same value at all intermediate times as well! For, suppose A
has less value than B at some time t < T. An investor can sell B and
buy A at t and pocket the difference. At time T he sells A and buys
back B, thus returning to his original situation but having made a riskfree profit.
This observation leads to a methodical way of valuing financial
instruments. We set up two portfolios such that we are sure they have
the same final value, and one of them includes the instrument. Then
they have the same initial value. Equating these initial values gives an
equation we can solve for the required value. The art is in creating the
right portfolios.
We illustrate this method by using it to re-derive the formula for the
value of a futures contract.
Consider a portfolio created at time t = 0 consisting of the
following items:
1. The contract
2. An amount X e–rT of cash
The value of this portfolio at time t is Vt + X e–r(T–t), where Vt is the
value of the futures at time t.
At time T, the contract expires and is replaced by a unit of the
asset. An expense of X is also incurred, but this is exactly
compensated by the cash amount (by time T, X e–rT has become X).
So the value of the portfolio at time T is ST. Thus, the portfolio
replicates a unit of the asset—at T and hence at all intermediate times
also. So we obtain
St = Vt + X e–r(T–t),
or,
Vt = St – X e–r(T–t).
If we are sure that one portfolio eventually has the same value as
another, we call it a replicating portfolio of the other one. The
method of replicating portfolios, which we have just described and
illustrated, will be our basic technique for using the No Arbitrage
Principle to price derivatives.
Exercise 4.4.1 Consider two futures for the same underlying asset
(which may generate income), both expiring at T. One of them is
written at time 0 with exercise price X0 and the other is written at time
t > 0 with exercise price Xt. Use the Method of Replicating Portfolios
to show that the value of the first contract at time t is
Vt = (Xt – X) e–r(T–t),
where r is the continuously compounded interest rate.
FUTURES ON AN ASSET PROVIDING KNOWN INCOME
We have dealt so far with assets which do not produce any extra
income. Now, suppose the underlying asset provides a known income
during the life of the futures, i.e., we know what income will accrue
and when.
Theorem 4.4.2 Consider a futures with expiration date T and
exercise price X on an asset producing a known income during the
life of the futures. Let the present value at time t of the income during
the remaining life of the contract be It. Then the value Vt of the futures
contract at time t is given by
Vt = St – It – Xe–r(T–t).
(4.3)
Proof. We set up the following portfolios at time t:
P
A: One futures contract and a cash amount of Xe–r(T–t).
P
B: One unit of the asset and borrowings of It.
(Note that X and It are known at time t.) At time T, both portfolios
become one unit of the asset. Hence by the Method of Replicating
Portfolios, they have the same value at all times t < T:
Vt + Xe–r(T–t) = St – It.
□
Since no money is transferred when buying a futures contract at
time t = 0, we have V0 = 0. Substituting in (4.3), we find the right
exercise price as well:
X = erT(S 0 – I0).
(4.4)
Exercise 4.4.3 Consider a bond which matures in 9 months, has
face value ¥10, 000, and two remaining coupon payments of ¥1,000
each after 3 and 9 months. You own a futures on this bond, expiring in
4 months, and with exercise price ¥10,500. What is the value of this
futures? (Assume a continuously compounded interest rate of 10%.)
Exercise 4.4.4 Consider a futures, expiring in 3 months on an
amount of potatoes that currently costs Rs 10,000. Suppose the
storage cost for this amount of potatoes is Rs 300 per month. What
should be the exercise price of the futures? (Assume a continuously
compounded interest rate of 10%.)
FUTURES ON AN ASSET WITH KNOWN DIVIDEND YIELD
We now consider the case where income from the asset is not known
in absolute terms. However, it is known as a fraction of the spot price.
Such income will be called dividend and the fraction will be called the
dividend yield. For example, if the dividend yield is set to 2% and the
spot price at the time of the dividend payment is 200, then the actual
dividend will be 4.
The insight that solves the futures pricing problem for such an
asset is that while the payment is uncertain in terms of currency, it is
certain in terms of amount of asset.
The simplest such situation is when there is one expected
dividend payment.
Theorem 4.4.5 Consider a futures with expiration date T and
exercise price X on an asset which will generate an income at time T′
(T′< T) with dividend yield q. Then the value Vt of the futures contract
at a time t < T′ is given by
Vt =
St – Xe–r(T–t).
(4.5)
Proof. We retain portfolio A from the proof of the previous theorem,
and create a new portfolio B (at time t).
P
A: One futures contract and a cash amount of Xe–r(T–t).
P
B:
units of the underlying asset.
Let ST′ be the price of one unit of the asset at time T′. At time T′,
portfolio B earns
ST′. We immediately reinvest this earning in the
asset, acquiring
units of it. The total amount of asset owned now
becomes and so at time T our portfolio B is finally worth ST. This is
also the final worth of Portfolio A. On equating their initial values, we
find
+
= 1 unit,
Vt + Xe–r(T–t) =
St.
□
Exercise 4.4.6 Show that the exercise price of the futures contract in
the previous theorem is given by
X=
(4.6)
S0.
Exercise 4.4.7 A share is currently priced at Rs 100 and is expected
to pay a 2% dividend after 3 months. What should be the exercise
price of a futures written on this share and expiring in 6 months if the
continuously compounded annual interest rate is 10%?
The approach easily generalises to cover multiple payments with
known dividend yield.
Exercise 4.4.8 Suppose an asset will have dividend yields q1,q2,
…,qn at times t1 < t2 <
< tn during the interval [0,T]. Consider a
futures on this asset written at t = 0 and expiring at T. Show that its
exercise price is given by
X=
S0,
where S0 is the initial spot price of the underlying asset and r is the
continuously compounded interest rate.
Exercise 4.4.9 Consider the asset and futures of the previous
exercise. Show that at any instant t between 0 and t1, the value of the
futures is given by
Vt =
St – Xe–r(T–t).
The conventions associated with interest rates are extended to
dividend yields. Thus, by default, all dividend yields are given as
annual rates. If the dividend is actually calculated over a smaller time
span, the dividend yield is adjusted accordingly. For example,
suppose an annual dividend yield of 6% is quoted for a period of one
month. Then the actual dividend yield used is = 0.5%.
We can now introduce the concept of a continuous dividend
yield as a limit of more and more frequent dividend payments.
Consider an annual dividend yield of q distributed over n equally
spaced dividend payments during a time interval of T years. Then,
each individual dividend payment has yield qT⁄n. If each of these is
immediately reinvested in the asset, then by time T one unit of the
asset grows into
To obtain the continuous yield we let n →∞ and find that one unit of
the asset grows into
The notion of continuous dividend yield is useful when dealing with
diversified portfolios where the various individual dividend streams
may be numerous enough for the continuous approximation to
become accurate. It will then greatly reduce the amount of
computation. Continuous dividend yield is also appropriate when
dealing with commodities whose storage costs are proportional to
their amount, and their amount varies continuously.
Theorem 4.4.10 Consider a futures with expiration date T and
exercise price X on an asset with continuous dividend yield q. Then
the value Vt of the futures contract at time t is given by
Vt = Ste–q(T–t) – Xe–r(T–t).
(4.7)
Proof. We retain the definition of portfolio A from the proof of the
previous theorem and modify portfolio B.
P
B: We start with e–q(T–t) units of the asset at time t, and all
income from dividends is reinvested in the asset.
By our previous calculations, we find that at time T, portfolio B
consists of one unit of the asset, just like portfolio A. Hence A and B
have the same value at any earlier time t ≤ T as well. On equating
these values, we get
Vt + Xe–r(T–t) = Ste–q(T–t).
□
Exercise 4.4.11 Show that the exercise price of the futures contract
in the previous theorem is given by
X = S0e(r–q)T.
(4.8)
Exercise 4.4.12 A stock’s price is currently 400, the quoted futures
price on a 4 month contract is 405, the risk free rate is r = 10% and
the dividend yield is 10% (both continuously compounded). How
could you make an arbitrage profit?
4.5 HEDGING WITH FUTURES
In this section we take up the art of buying or selling futures to reduce
the risk associated with an asset or portfolio. Reduction of risk by
combining assets whose risks cancel is called hedging. We can
approach hedging from two points of view. Either we own something
and fear a drop in its value, or we wish to make a future purchase and
fear a rise in its price. Our aim will be to suitably trade futures so that
the changes in their values will cancel out the changes in the value of
the concerned asset.
At this point, let us introduce some terminology from the trading
world:
1. To be long in an asset is to own it. To go long means to buy.
2. Short is the opposite of long. Thus, to short is to sell and to be
short is to owe.
Consider the following sentence: “If you are long in a risky asset,
you can reduce risk by shorting a futures contract for it.” What it says
that if you own a risky asset, you can reduce risk by writing a futures
contract for it.
We shall now consider certain tactics for using futures to hedge.
The tactics avoid actual transfer of the asset at any time, to eliminate
transaction or delivery costs.
SHORT HEDGE
Suppose at t = 0 we have an asset with spot price S0. We fear a fall in
its value by time T. So we carry out a short hedge:
At t = 0: Sell (write) a futures on that asset with exercise price X0
and expiry date T.
At t = T: Buy a futures contract on that asset with exercise price
XT and expiry time T. Then close out both contracts.
At time T, we earn X0 from the first futures and pay XT to the writer of
the second futures. The asset acquired from the writer of the second
contract is delivered to the holder of the first, and the original asset
remains with us. Therefore the value of our portfolio at time T is
VT = ST + (X0 – XT).
The No Arbitrage Principle implies ST = XT. Hence
VT = X0.
There is no uncertainty in the outcome—the risk has been completely
eliminated.
Exercise 4.5.1 Show that a short hedge is equivalent to selling the
asset at t = 0 and investing the proceeds at the risk-free rate.
Our approach reduces risk at intermediate times also. Suppose the
second futures contract was bought at time t < T, with expiry date T
and exercise price Xt. Then
The term St – e–r(T–t)Xt represents the risk, and is called the basis.
Without the hedge, the basis would have been St. Since S and X are
strongly correlated (move up or down together), the variation in St –
e–r(T–t)Xt can be expected to be much less than the variation in St (in
fact the No Arbitrage Principle predicts that St –e–r(T–t)Xt = 0). In this
way, the short hedge reduces risk.
The second step in the short hedge is optional, and can be
skipped if we actually wish to sell the asset through the original
futures. However, even when we wish to sell the asset at time T, we
may still find it beneficial to carry out the second step. The reason is
that, typically, it is the obligation of the writer to deliver the asset to
whoever is the holder at the time of expiry of the contract—in
particular, any delivery costs are borne by the writer. The second step
makes us a holder and so we can demand delivery where the holder
of the first contract resides. In this way, we avoid paying the delivery
costs ourselves. Further, we proceed to sell the asset itself in the local
market at the current price.
LONG HEDGE
Suppose we expect an inflow of funds by time T and intend to use it
to buy an asset. We are worried the spot price will rise in the interim.
We hedge against this scenario by carrying out a long hedge:
At t = 0: buy a futures with exercise price X0 and expiry date T.
At t = T: sell (write) a futures contract with exercise price XT and
expiry date T. Then close out both contracts.
Exercise 4.5.2 Show that a long hedge is equivalent to buying the
asset at the price X0 at time T.
Like a short hedge, a long hedge completely eliminates risk. We
can skip the second step if we wish to actually acquire the security
itself, rather than just its value.
The analysis of risk associated with a long hedge exactly mirrors
that for a short hedge, and we leave it to you.
CROSS HEDGE
We have seen that it is possible to use futures to completely eliminate
risk, provided they are available for the asset whose value we wish to
hedge. In reality, futures are only available for those shares and
commodities with a high enough volume of trade (For example: on the
NSE, in February 2008, futures were available only for 225 out of
about 1000 listed stocks). In particular, they are not available for
individual assets such as a particular office building. Now, suppose
you are the owner of that building and wish to hedge against
fluctuations in its value. What can you do?
The answer is that you can’t do anything about the risks which are
specific to your building – such as the sudden discovery that it doesn’t
satisfy local safety rules. But it is possible to hedge against risks
arising out of general trends in the market or, more specifically, the
real estate market. This can be done by using futures which are
based on the stock of a prominent real estate company or on an index
which tracks the real estate sector.
This discussion leads us to the notion of a cross hedge, in which
the asset underlying the futures is not the same as the asset held (or
desired) by the investor. Let us denote by S the spot price of the asset
held (or desired) by the investor, and by S* that of the asset
underlying the futures.
Then the value (or cost) at time T is:
X0 + ST – XT = X0 + (ST*– X T) + (ST – ST*) = X 0 + (ST – ST*)
In this case, risk is not completely eliminated. It is measured by the
additional basis ST – ST*. It will be low if S and S* are strongly and
positively correlated.
ROLLING HEDGE
We have noticed that we may not be able to execute a perfect hedge
because futures with the right underlying asset may not be available.
Another difficulty is that we may not be able to exactly match the
expiry time of the futures with the time of purchase, either because
contracts with the right expiry date are not available, or because we
are uncertain as to the exact date when we will wish to trade the
asset. Then we can use contracts with a longer expiry date (as
illustrated in the discussion on short hedges).
If contracts with a long enough life are not available, we can
implement a sequence of hedges using futures with shorter lives.
Such a rolling hedge will reduce risk but not eliminate it. The risk
arises from fluctuations in interest rates, since the futures used in the
later stages of the rolling hedge will not lock in today’s interest rate.
Example 4.5.3 Suppose the current annually compounded interest
rate is 12%. We wish to hedge an asset over the next 6 months, but
the maximum time to expiry of a futures is 3 months. We implement a
rolling hedge by using one futures for the first 3 months and another
for the next 3 months. At each stage, we lock in the current risk-free
rate. Now, suppose after 3 months the interest rate drops to 8%. Then
the overall growth of value of our hedged portfolio will be given by
(1 + 0.03)(1 + 0.02) = 1.0506.
This is equivalent to an annual growth of 10.12%.
□
OPTIMAL HEDGE RATIO
The short and long hedging strategies described so far assume that
convenient futures are available, and then they work perfectly. When
convenient futures are not available, we have to implement a cross or
rolling hedge, in which case we are able to reduce risk but not
eliminate it completely. We shall now undertake a statistical treatment
of the problem with the intention of quantifying the remaining risk and
using this knowledge to minimise it.
The approach is very general and does not assume the No
Arbitrage Principle, or any particular relationship between the hedged
asset and the futures used in the hedging process. Nor do we
necessarily hedge by using one futures for each unit of the asset.
Let δS be the change in the spot price of the hedged asset over
the hedge period and δX the change in the exercise price of the
futures contract. Consider these as random variables with respective
standard deviations σS and σX. Let ρ be the coefficient of correlation
of δS and δX. Finally, let h be the hedge ratio–the ratio of contracts
written to units of asset held (or sought). We do the calculations for a
short hedge for one unit of the asset.
The change in value over the hedge period is δS – hδX. The
variance of this change is:
ν = σS2 + h2σ X2 – 2ρσSσXh.
We wish to minimise ν (since it represents risk). So we apply the first
derivative test and solve dv⁄dh = 0 to obtain.
h=ρ
.
(4.9)
This value of h is called the optimal hedge ratio. The parameters ρ,
σS and σX have to be determined from the historical data.
Exercise 4.5.4 Suppose the No Arbitrage Principle holds and a short
hedge is carried out using a futures with the same underlying asset as
the asset being hedged. Show that the optimal hedge ratio equals 1.
4.6 CURRENCY FUTURES
Futures can be based on any quantity that changes with time. In this
section we will study currency futures, which are based on the
exchange rate of two currencies. Such futures are used to manage
the risk arising out of fluctuating exchange rates.
Currency futures were created in 1972 by the Chicago Mercantile
Exchange (CME). In India, both BSE and NSE introduced trading in
currency futures in late 2008.
In a currency futures, one currency plays the role of the underlying
asset while the other is used in pricing. For example, Indian currency
futures have the US Dollar as the underlying or base currency while
the Indian rupee is the variable currency used in the pricing. A
sample contract could have $1000 as the underlying asset with an
exercise price of Rs 45,000 and expiry in one month. This contract
effectively fixes an exchange rate of Rs 45 per dollar for the future
trade. The holder of this contract is obliged to pay its writer Rs 45,000
on expiry and will then receive $1000 in return. Alternately, the holder
may receive the Rupee equivalent of $1000 based on the market
exchange rate at the time of expiry. Suppose the actual exchange
rate after a month is Rs 46 per dollar. Then the holder receives Rs
46,000 from the writer and makes a net profit of Rs 1,000.
How should currency futures be priced? We start by noting that
the underlying currency must be treated as an asset that generates
income, since it can be deposited in its native country and earn
interest. This income is not known in terms of the variable currency
since the exchange rate fluctuates. Fortunately, it can be treated as a
case of known dividend yield.
Suppose a currency futures is written for one unit of the base
currency, with expiry at time T from now. Let one unit of the base
currency be worth S units of the variable currency at present. Further,
let rb be the (continuously compounded) risk-free rate applicable to
the base currency, and let rv be the corresponding rate for the
variable currency. Then over the time T one unit of the base currency
will become erbT units, and so rb is the dividend yield. Therefore, as
shown by equation (4.8), the correct exercise price for the contract is
X = Se(rv–rb)T.
(4.10)
Thus the current exchange rate S and the interest rates for the two
currencies determine the exchange rate X to be used in the futures.
Example 4.6.1 Consider a futures with US dollars as the base
currency and Indian rupees as the variable currency. The contract is
written on November 1 and expires on November 30. On November
1, we have the following information:
Exchange rate (rupees per
dollar)
=
47.71
US 1-month spot rate
=
3%
Indian 1-month spot rate
=
6%
Therefore, the contract will be based at the following rate:
47.71e(0.06–0.03)⁄12 = 47.83.
For example, a contract with $1000 as underlying would have
exercise price Rs 47,830.
□
Exercise 4.6.2 Suppose the spot rates are given with discrete
compounding, there are m compounding periods per year, and the life
of the currency futures is one compounding period. Show that
equation (4.10) will be modified to
X=S
.
The relatively high interest rates in the Indian market make it
attractive to foreign investors. They would, however, be concerned
about whether high earnings in rupees would really translate into high
earnings in their native currencies—the risk would come from a
potential fall in the value of the Rupee. Currency futures provide a
way around this by enabling them to lock in the future exchange rate.
Example 4.6.3 A US-based company, let us call it XC, invests $10
million in Indian risk-free assets on November 1. The prevailing
interest and exchange rates are as described in Example 4.6.1. So,
the first thing that happens is that the dollars are converted into Rs 10
× 106 × 47.71 = 477.1 million. Over a month, these will become Rs
477.1 × 106 × e0.06⁄12 = 479.5 million. If the exchange rate stays
stable, this can be reconverted into $479.5 × 106⁄47.71 =10.05 million.
Of course, this is the same as earning 6% annually in dollars. Now,
suppose the exchange rate fluctuates and after a month we have Rs
50 per dollar. Then the final value in dollars is only 479.5 × 106⁄50
=9.59 million—a loss!
XC can avoid the risk of loss by buying currency futures expiring in
a month. Recall that these fix the final rate as Rs 47.83 per dollar.
Assume each contract has to be in multiples of $ 1000 (this is the rule
followed by BSE). So XC can invest in contracts. Then XC will finally
end up with $10.025 million, an annualised growth rate of 3%. (This
was expected: if you remove all risk, you have to end up with the riskfree rate, and for dollars the risk-free rate was given to be 3%.)
□
= 10,025
In this example, XC was able to completely hedge against exchange
rate risk, but then it lost out on the hoped for gains from the higher
interest rate for Rupees. It has to let some element of risk remain.
One option for it would be to carry out only a partial hedge with a
smaller number of futures. Another approach could be to invest in
high-performing but risky Indian assets while continuing to hedge
against exchange rate risk.
4.7 STOCK INDEX FUTURES
A stock index futures has a stock index as the underlying asset.
Since stock indices track general trends, these futures are used to
hedge against systemic risk (risk due to movements of the market as
a whole, rather than of individual stocks).
A stock index futures works just like a futures with a stock as
underlying asset, except that at the expiry date a multiple of the value
of the stock index is delivered rather than the index itself (since it
would be impractical to deliver a portfolio of stocks mirroring the
index).
Example 4.7.1 On the Bombay Stock Exchange (BSE), futures on
the BSE Sensex index have a multiplier of 15. This means that the
Sensex is treated as an asset whose value in rupees is 15 times the
index value. Thus, suppose the index value on a particular date is
15,000 and a futures is being written on it with expiry in 3 months.
Assume a continuously compounded annual interest rate of 5%. Then
the exercise price of the futures will be calculated taking the current
spot price to be Rs 15 × 15,000 = 225,000.
X = 225,000 e0.05⁄4 = 227,830.
Suppose that after 3 months the index stands at 16,000. Then its
value is taken as Rs 15 × 16,000 = 240,000. At this point, the holder
owes Rs 227,830 to the writer, and the writer owes Rs 240,000 to the
holder. Naturally, matters can be settled via a single payment of Rs
240,000 – 227,830 = 12,170 from the writer to the holder.
□
The multiplier m by which the index is multiplied to get a currency
value varies with the index as well as the exchange. We have already
noted that for the Sensex we have m = Rs 15 on the BSE (we would
say that Sensex futures are on ‘Rs 15 times the index’). Some other
examples are:
1. For S&P 500 futures on the Chicago Mercantile Exchange, m =
$500.
2. For S&P 500 futures on the London International Financial
Futures Exchange (LIFFE), m = £250.
3. The Nikkei 225 futures on LIFFE have m = £5.
In pricing or valuing an index futures, we have to consider whether
any income should be associated with the index. The answer
depends on the particular index and how it is calculated. Some stock
indices ignore the dividend payments from their members—these
should be treated as being without income. Others take the dividends
into account—these should be treated as having continuous dividend
yield, since the number of dividend payments is large. We will
develop our formulas for the first case. They are easily modified for
the second case by replacing the continuous risk-free rate r by r – q,
where q is the dividend yield for the index.
If we have a portfolio, we should track how it changes relative to
an index to determine how much hedging is needed. For example, if
our portfolio moves just as much as the index, we need less hedging
than if it moves twice as much.
In fact, to hedge a portfolio by a stock index futures, we can use
the optimal hedge ratio as calculated earlier (equation (4.9)):
h=
ρPX,
where σP is the standard deviation of the changes in the portfolio
value, σF is the standard deviation of the changes in the exercise
price of the stock index futures, and ρPX is the correlation coefficient of
the two changes.
We can rephrase this in terms of the index itself. Let It stand for
the value of the index at time t (after multiplying by the multiplier m),
and δI for the change in the value of the index over the time interval
[0,T]. We assume index futures are available expiring at T. Then,
Therefore, δI and δX have the same standard deviation: σI = σX. This
calculation also shows that δI and δX have the same correlation with
δP: ρPI = ρPX. Hence,
h=
ρPI.
At this point, h starts to resemble the β in CAPM, with the stock index
standing in for the market portfolio, but based on return rather than
rate of return. If we adjust accordingly, we find:
h=β
(4.11)
.
where P0 = Initial value of portfolio.
Exercise 4.7.2 Verify the relationship between h and β.
Example 4.7.3 Consider a portfolio with a starting worth of 100
million. It is hedged using 6 month futures on an index whose current
value is 17,000 and dividend yield is 3%. Each futures is on 100 times
the index. The portfolio has β = 1.5 relative to this index. The risk-free
rate is 6%.
Based on this information, the optimal number of futures to be
shorted is
N=β
= 1.5 ×
= 88.23.
Therefore, the portfolio managers short 88 index futures. The
exercise price of each of these futures is given by
X = I0 e(r – q)T = 100 × 17000 × e(0.06 – 0.03)⁄2 = 1,725,692.
Suppose that over the next 3 months there is a fall in the market,
and the index falls to 15,000. Then the futures price becomes and
marking to market will compensate the portfolio managers to the tune
of
X′ = 100 × 15000 × e(0.06–0.03)⁄4 = 1,511,292
88 × (1725692 – 1511292) = 18,867,200.
This will act as immediate cash compensation for the hit the portfolio
must also have taken from the fall in the market, and can be
reinvested in more stable assets.
□
USE OF STOCK INDEX FUTURES TO ADJUST BETA
Suppose a portfolio P is hedged by shorting N stock index futures
which are written at time t = 0 and expire at t = T. Let us investigate
what effect this has on the beta of the portfolio over the time interval
[0,T]. We shall use the following notation:
Shorting N index futures creates a new portfolio P′ consisting of P and
the shorted futures. For this portfolio we create the following notation:
The value (to the writer) at t = T of an index futures written at t = 0 is
X – IT, while at t = 0 it is zero. Hence,
Therefore, the return on the new portfolio is
r′ =
=
=r+
(X - IT).
Let us denote the beta of the original portfolio P by β. The beta of the
new portfolio is
If we wish to change β to a particular value β′, we can solve the above
equation for the number of contracts we should short:
(4.12)
Note that the optimal number N calculated in the previous section
reduces β to zero. Hence it eliminates systemic risk.
Example 4.7.4 Consider the portfolio in Example 4.7.3. Suppose that
at the start of the hedging period its managers wish to change its beta
to β′ = 0.5. Then the number of futures to be shorted is (using
equation (4.12)):
So the managers actually short 59 futures, and this adjusts the beta
to
5 Stock Price Models
O
ne of the remarkable features of our treatment of futures in
the previous chapter was that we only used the current price
of the underlying asset. Expectations of its future behaviour
played no role, and so we did not need to model the future
fluctuations in the asset price. As we move to more complicated
derivatives, this is no longer the case. The No Arbitrage Principle
has to be combined with some model of price fluctuations in order to
study the pricing and behaviour of the derivative. This chapter,
therefore, is devoted to describing the basic models for the evolution
of stock prices.
The first mathematical treatment of continuously fluctuating
prices was carried out by Louis Bachelier (1870–1946) in Paris. In
his doctoral thesis titled Theorie de la Speculation [4],12 published in
1900, he treated bond and stock prices as continuous, but random,
functions of time and developed what is now called additive
Brownian motion. He applied his theory to estimate prices for options
and to evaluate the risk involved in various investment strategies.
His work was not immediately appreciated, but over time it has
influenced important innovators in both mathematics and economics.
He continued to research in Brownian motion through his life, with
his last major contribution coming in 1941.
Among mathematicians who have acknowledged his influence,
we may mention A. N. Kolmogorov and Kiyosi Ito. Kolmogorov
developed Bachelier’s ideas in creating the general theory of Markov
processes in the 1930s, while Ito was led to his famous results in
stochastic calculus. Curiously, while these further developments
were not motivated by finance, they have turned out to be perfectly
suited for it and terms such as Martingales, stochastic differential
equations, and Ito’s lemma, are now part of the basic vocabulary of
financial analysts.
Bachelier was noticed by economists in the 1950s. In particular,
his thesis was read by Paul Samuelson, who improved upon his
work and suggested the Lognormal model (or geometric Brownian
motion) for stock prices. Bachelier’s treatment of options in many
ways also anticipates the work of Fischer Black, Myron Scholes and
Robert Merton in the 1970s. In appreciation of Bachelier’s work, he
is now often called the father of mathematical finance.
This chapter makes extensive use of the Binomial, Normal and
Lognormal random variables. You should revise the relevant
sections of the Appendix (§B.4 to B.7) prior to reading it.
5.1 LOGNORMAL MODEL
We shall develop a way to model the evolution of stock prices over a
time period [0,T], from an initial value S to a final value ST. The
model is probabilistic: it treats ST as a random variable and gives its
probability distribution.
If the price change has no randomness, we have a risk-free
situation, and then the price must grow at the risk-free rate:
ST = SerT.
To introduce randomness into the rate of return, we consider the
following model:
ST = SeμT+cTZ,
where Z is a standard normal variable, and μ and cT are some
constants (parameters of the stock). The idea is that μ represents a
steady trend, to which the cTZ term adds random fluctuations. We
have included a dependence on T since the variability measured by
cT must depend on T; intuitively, we expect it to grow with T.
2⁄2
Exercise 5.1.1 If Z is a standard normal variable, then E[ecZ] = ec
.
Therefore, in our model the random terms cause a steady increase
in the expected return, since
2⁄2
>
E[ecTZ] = ecT
1.
We wish to have a model where the random terms do not contribute
any regular growth. Hence, we adjust it as follows:
2⁄2
S.
ST = eμTecTZ–cT
To explore the dependence on T, let us track the changes in the
stock price over successive intervals, [0,T] and [T,2T]. We let Z1 and
Z2 be standard normal variables representing the random
fluctuations in these two intervals. On applying our model to these
intervals in succession, we find:
2⁄2
S2T = ST eμT+cTZ2–cT
2
= S eμ(2T)+cT(Z1+Z2)–cT .
Now we make our main assumption: the random fluctuations over
non-overlapping time intervals are given by independent random
variables. (We say two intervals overlap if they have more than one
point in common. Thus, [0,1] overlaps with [0.5, 2] but not with [2,3]
or [1,2].)
Then, Z1 + Z2 is again a normal variable, with mean 0 and
variance 2 (see §B.11 and specifically, Exercise B.11.4). Therefore,
S2T = S eμ(2T)+
cTZ–cT2 .
where Z is standard normal. On the other hand, if we treat [0,2T] as
a single interval, we get
2⁄2
.
S2T = S eμ(2T)+c2TZ–c2T
To make our answers match, we must have c2T =
achieved by setting
cT. This is
cT2 = σ2T,
where σ2 is a positive constant. Thus, our final model gives the
following expression for the spot price at T:
2⁄2)T+σW
T,
(5.1)
ST = S0e(μ-σ
where WT is normal with mean 0 and variance T. The parameters μ
and σ are said to represent drift and volatility, respectively. The
model is called Lognormal because it is based on the lognormal
distribution.
Exercise 5.1.2 An analyst estimates XC stock as having an annual
drift μ = 0.2 and volatility σ = 0.3. What is the probability of a 20%
return over the next 6 months?
Exercise 5.1.3 Assuming the Lognormal model with parameters μ
and σ, show that
Expected return
after Δt
=
S(eμΔt – 1).
Variance of the
return after Δt
=
S2e2μΔt(eσ
– 1).
2Δt
Therefore, to first order (see Appendix, page 197, we have
Expected return after Δt
≐
SμΔt,
Variance of the return
after Δt
≐
S2σ2Δt.
Exercise 5.1.4 Verify the first order approximations given above.
Remark.
For simplicity, we sometimes use the first order
approximation to the Lognormal model:
ST ≐ S(1 + (μ – σ2⁄2)T + σW T).
Figure 5.1 shows examples of data evolving according to the
Lognormal model with varying values of μ and σ. On comparing the
graphs (a) and (b), we see that an increase in σ creates paths with
larger jumps, leading to faster variation in values. A comparison of
(b) and (c), on the other hand, shows that a change in μ creates
paths with the same speed of variation. The difference is that paths
created when μ is higher will, on average, show a greater long-term
upward trend. In other words, when we increase μ, the possible
paths are the same but upward tending ones become more likely.
The practical implication of this is that when we observe just one
path coming out of a geometric Browmian motion (for example, the
actually observed prices of a particular stock), we can hope to use it
to estimate σ, since σ affects the nature of individual paths. But we
cannot hope to get a good estimate of μ. We say that σ is
observable, but μ is not (see Example 5.2.2).
5.2 GEOMETRIC BROWNIAN MOTION
The Lognormal model can be rephrased to describe the evolution of
stock prices over any time interval [a,b] as follows:
2⁄2)(b–a)+σW
[a,b].
Sb = Sa e(μ–σ
where W[a,b] is normal with mean 0 and variance b – a.
It is worth identifying certain features of this model:
1. ln(Sb⁄Sa) is normal with mean (μ – σ2⁄2)(b – a) and variance σ2(b
– a).
2. ln(Sb⁄Sa) depends only on b – a, and not on a itself.
This kind of behaviour is known as Geometric Brownian Motion or
GBM.
Exercise 5.2.1 Let Xt be the futures price for a contract written at
time t and expiring at T. If the spot price of the underlying asset
follows GBM with parameters μ and σ, show Xt follows GBM with
parameters μ – r and σ.
ESTIMATING GBM PARAMETERS
Suppose we have regular samples Si (i = 0,1,…,n) of spot prices
gathered over consecutive time intervals of length Δt each. Then, we
first form the new sequence
Ui = ln
.
Figure 5.1: Each graph shows five simulations of the Lognormal
model with varying values of drift μ and volatility σ: (a) μ = 0.1 and σ
= 0.2, (b) μ = 0.1 and σ = 0.1, (c) μ = 0 and σ = 0.1.
The Ui are called log returns, since they can also be written as
ln(Si+1) – ln(Si), i.e., as the returns of the logs of the prices.
According to GBM,
Ui = (μ – σ2⁄2)Δt + σWΔt,
where WΔt is normal with mean 0 and variance Δt. Therefore,
Ui ~ N((μ – σ2⁄2)Δt,σ
).
Moreover, the Ui are independent so that U1,…,Un can be seen as a
random sample of size n of a normal variable with mean (μ – σ2⁄2)Δt
and variance σ2Δt.
Let U and S2 denote the sample mean and variance, respectively,
of the observations of the Ui. If the actually observed values of the
spot prices are si, then the observed values of the log returns are ui
= ln(si + 1⁄si) and the observed sample mean and variance are
Then we obtain estimates of μ and σ by solving the approximate
equalities:
(μ – σ2⁄2)Δt
≈
u.
σ2Δt
≈
s2.
This gives:
μ
≈
σ
≈
,
.
Figure 5.2: The first chart presents the daily closing prices of the
BSE Sensex index over 8 years from July 1997 to July 2005. On
fitting the Lognormal model to this data, we obtain μ = 0.06 and σ =
0.29. The second chart is a simulation of the lognormal model with
the same parameter values.
We had noted earlier that σ is observable, but μ is not. This indicates
that of these two estimates, only the one of σ should be reliable.
Consider the following example.
Example 5.2.2 Let us consider a GBM with μ = 0.2 and σ = 0.3
(These are quite typical values for stocks.) Suppose we first simulate
a path for this GBM and then apply the above estimators to this path.
We hope to recover the original μ and σ, not perfectly but reasonably
well. Table 5.1 shows the result of doing this 10 times, with paths of
520 steps each (corresponding to taking weekly prices over 10
years).
Table 5.1
Notice how the μ estimates jump all over the place from 0.028 to
0.305, but those for σ are stable and accurate.
□
Data from stock exchanges has to be cleaned up before it can be
used in these calculations for the following seasons:
1. Since an exchange is closed on the weekends and on certain
holidays, the data is not entirely gathered at regular intervals. If
we wish to analyze daily data, we have to remove those Ui
which correspond to a gap of more than 1 day.
2. Some changes in the stock price are not to be taken literally. For
example, on July 1, 2004, Infosys shares fell by 76% on the
NSE. This was caused simply by Infosys issuing 3 bonus shares
for every share that already existed. The number of shares went
up by a factor of 4 and so the price fell to a quarter.
3. Another reason for a sudden drop in price can be the
announcement of a dividend payment. In the days before its
anticipated announcement prices rise because buyers expect an
imminent extra profit. As soon as the payment is made, prices
fall accordingly.
To give a true picture of changes in the value of a stock,
exchanges release an adjusted price which takes bonus issues and
dividend payments into account. It is this adjusted price which should
follow GBM.
5.3 SUITABILITY OF GBM FOR STOCK PRICES
It is evident from its definition that the evolution of a GBM during
some time span [a, b] is independent of the occurrences before a (all
that matters is the price Sa at a). It is not obvious that stock prices
have this property, and some caution in applying GBM is indicated.
Apart from independence of jumps, the other feature of GBM is
the use of the normal distribution to represent the individual jumps.
Example 5.3.1 In Figure 5.3 we plot a histogram for the frequency
distribution of the daily log returns Ui = ln(Si⁄Si–1) for Infosys shares
over 2001–2003. We have superimposed the frequency distribution
of the normal distribution with the same mean and variance as this
data. The histogram is more peaked in the centre than the normal
distribution, and does not die out as quickly on either side. There are
some highly negative values which would be extremely unlikely if a
normal distribution were indeed being followed. In fact, the
probability of the corresponding normal distribution taking values
less than –0.2 is only 0.000002, while in reality there were 3 such
values (giving a relative frequency of about 0.005).
Figure 5.3: The histogram shows the relative frequency distribution
of the daily log returns for Infosys stock from January 2001 to
December 2003. This is compared with a normal distribution (solid
curve) and a Cauchy distribution (dashed curve). See Example
5.3.1.
Therefore, one can consider replacing the normal distribution with
one which does not die so quickly on each side. Figure 5.3 also
shows a Cauchy distribution with the same median and interquartile
range as the given data, which is visibly a much better fit than the
normal distribution.
□
This example turns out to be quite typical. In general, large price
fluctuations are more common than a normal distribution would
suggest. A heavy tailed distribution, such as the Cauchy
distribution, would be more suitable. Unfortunately, heavy tailed
distributions are mathematically difficult to work with—for instance,
their mean and variance generally do not exist. (See §B.18 for more
information on such distributions.) And even the Cauchy distribution
fails in one respect. It is symmetric, while the data is typically slightly
asymmetric—large falls occur more often than large increases.
A final problem with GBM is the assumption of a constant
volatility. In reality, there are alternating periods of calm and
turbulence, as illustrated by Figure 5.4. There are models that take
this into account by letting the volatility change randomly with time,
but with clustering of high and low values. Popular models of this
type are the ARCH and GARCH families. (ARCH was created by
Robert Engle[18, 19] in 1982; he received the 2003 Nobel Prize).
Figure 5.4: Twenty years of monthly log returns of the Dow Jones
Industrial Average, showing extended periods of low or high volatility
We shall, therefore, use GBM in our work with the understanding
that it doesn’t give the closest approximation to reality, but at least it
gives one that we can easily manipulate to get useful insights.
Overall, while GBM has its limitations, it remains a favourite model.
We shall soon apply it to obtain the Black–Scholes model for options
pricing.
5.4 BINOMIAL TREE MODEL
The Binomial tree model simulates stock price movements by
conceiving them as a sequence of small up or down jumps. The
basic building block of this model is the following branch:
Here, we have started with a spot price of S and assumed that
over some small time interval Δt it can either move up by a factor U
to SU, or down by a factor D to SD. Further, we let p denote the
probability of an up move, and 1 – p that of a down move.
The price movements over some time interval [0,T] are modelled
as a sequence of such steps. We break it into n equal intervals of
length Δt = T⁄n each, and let the price follow one of the up or down
branches over each subinterval. We assume each step is
independent of the previous ones, and that the parameters p, U and
D are the same for each branch (see Figure 5.5).
The possible prices at T are SUkDn-k, with k = 0,1,…,n. They
follow a binomial distribution:
p[ST = SUkDn–k] =
pk(1 – p)n-k.
To estimate the parameters of the binomial tree model, we match
it with Geometric Brownian motion. Recall that according to that
model, the spot price at time Δt is given by
2⁄2)T+σW
SΔt = S0e(μ–σ
Δt
where WΔt is normal with mean 0 and variance Δt. Thus,
ln SΔt = (μ – σ2⁄2)Δt + σWΔt
if we set the starting price S0 to 1. Then we have,
E[lnSΔt]
=
νΔt,
Var[lnSΔt]
=
σ2Δt,
where we have set ν = μ – σ2⁄2. Now consider a one-step binomial
tree for stock prices where the step is over the time Δt. the up move
is taken with a probability p, and the down move with a probability 1
– p. Let us also write u = lnU and d = ln D. Then (see Exercise
5.4.1),
Figure 5.5: The binomial tree model for stock prices
E[lnSΔt]
=
pu + (1 – p)d,
V ar[lnSΔt]
=
p(1 – p)(u – d)2.
Exercise 5.4.1 Consider a random variable X taking values a and b
with probabilities p,1 – p respectively. Show that E[X] = pa + (1 – p)b
and Var[X] = p(1 – p)(a – b)2.
On matching the calculations from the two models, we get
pu + (1 – p)d
=
νΔt ,
p(1 – p)(u – d)2
=
σ2Δt.
Since we have two equations for three variables, we have some
freedom to choose convenient solutions. Thus, we set U = D–1,
which makes the binomial tree symmetric in that an up move exactly
cancels a down move. This gives u = –d. Then equations (5.4) and
(5.5) become
(2p – 1)u = νΔt.
(5.6)
4p(1 – p)u2 = σ2Δt.
(5.7)
We square (5.6) and add it to (5.7) to get
u2 = σ2Δt + (νΔt)2.
Thus, we have the following values for the three parameters:
, D = e–
U=e
,p=
.
For small Δt, we obtain first-order approximations by keeping only
the lowest power of Δt. Thus, we neglect (Δt)2 in favour of Δt, and Δt
in favour of
. This leads to the following popular estimates (recall
that Δt = T⁄n):
U ≈ eσ
D ≈ e–σ
p≈
(5.8)
.
(5.9)
.
.
(5.10)
An interesting aspect of these estimates of U and D is that they do
not involve the drift! This is a virtue as the jumps in the underlying
tree now depend only on σ, which is observable. Thus, the possible
paths for the price are independent of μ–its role is only in
determining the probability associated to a path.
At this stage we have two models for stock prices, one
continuous and the other discrete. We shall soon see that
sometimes one model is convenient, sometimes the other.
Therefore, we wish to be reassured that the two models are
consistent with each other. It turns out that if we let the time step go
to zero in the binomial tree model, it tends towards GBM. We shall
not prove this. Figure 5.6 illustrates this by an example. The first
diagram shows that even for n = 20 the probability distribution of the
final stock price, under the binomial tree approach, has a distinctly
lognormal look. We confirm this in the second diagram by comparing
the cumulative distribution functions of the final stock price under the
two models. (We have taken T = 1⁄4. The GBM model has μ = 0.2
and σ = 0.3. The parameters for the Binomial Tree have been
calculated from the first-order estimates (5.8) to (5.10), using n =
20.)
(a)
(b)
Figure 5.6: Comparison of binomial tree and GBM models. Graph
(a) is the pdf of the price distribution under a binomial tree model
with n = 20. Graph (b) compares the cdf of this price distribution with
a matching GBM model.
12 English translations of Bachelier’s thesis have been published in [11, 17].
For a description of his work and its influence, see [49].
6 Options
O
ptions have a long history. One of the early references is to
the philosopher Thales of Miletus (circa 600 BC), who made a
fortune through options on olive presses. According to
Aristotle [2], one winter Thales paid a small fee for the first right to
rent these presses during the olive season. If the harvest was good
and the presses were in demand (they are used to produce olive oil),
Thales planned to exercise his options and sublet the presses at a
much higher rent. If the harvest was poor and nothing was to be
gained from renting the presses, he could let his right lapse losing
only the initial fee. As it happened, the harvest was good, and Thales
made a large profit.
Incidentally, Thales was also credited by later generations with
introducing deduction into geometry and providing the first proofs of
basic results such as the ASA rule for congruence of triangles.
Aristotle said of Thales that for him, “The primary question was not
What do we know, but How do we know it.”
The first detailed mathematical treatment of options was in
Bachelier’s 1900 thesis [4], where he used his Brownian motion
model to estimate the risks involved with various investment
strategies based on options. His work was not significantly improved
till the 1973 publications of Fischer Black, Myron Scholes and Robert
Merton [6, 32]. These described what is now called the
Black−Scholes Model and gave an extensive analysis of options
pricing and the use of options in hedging and speculation. The model
came into being just as the use of computers was becoming
widespread and the result was an explosion in the use of options.
Scholes and Merton received the Nobel Prize in 1997, Black having
unfortunately passed away two years earlier.
In this chapter we shall carry out a study of options and their use
in hedging. However, instead of the Black–Scholes model, we shall
use a simplified discrete version known as the Binomial Options
Pricing Model or BOPM. BOPM was introduced in 1978 by William
Sharpe and extended in 1979 by John Cox, Stephen Ross and Mark
Rubinstein [15].
Figure 6.1: The structure of a European call option signed at time t = 0 with
expiration time t = T, exercise price X, and call premium C. The dashed arrows
at the t = T stage indicate that the trade is optional.
6.1 CALL OPTIONS
In forwards and futures, both parties are committed to a future trade.
Naturally, each would desire the chance to drop out if the gain from
the contract becomes less than is available on the open market. This
desire creates an opening for a new kind of contract, in which one
party pays the other a fee for the right to cancel the trade. Such a
contract is called an option and comes in two flavours, depending on
which party buys the right to cancel.
NSE introduced trading in options in June 2001. The number of
traded options grew from 1.2 million during 2001–02 to 18.2 million
during 2005–06.
In a European Call Option (Figure 6.1), the holder buys the right to
make a future purchase, without the obligation to do so:
1. The holder pays the writer an initial fee (the call premium) to buy
the contract. The contract details an amount of the underlying
asset, an expiration date, and an exercise price (or strike
price).
2. On the expiration date, the holder may pay the writer the exercise
price.
3. If the holder pays up, the writer must deliver the specified amount
of the underlying asset.
If the exchange happens, we say the contract has been exercised.
Example 6.1.1 On June 3, 2005, the closing price for TISCO stock on
NSE was Rs 349.70. Call options on this stock were available with a
variety of exercise prices and expiry dates. Table 6.1 shows some of
the available exercise prices (X) for calls expiring on June 30, 2005,
as well as the premium (C) at which they could have been purchased.
The closing price for TISCO stock on June 30 was Rs 339.75.
Table 6.1: The table shows the closing call premiums (C) on June 3,
2005, for a range of exercise prices (X) that were available for call
options on TISCO stock. The options expired on June 30, 2005.
(Source: NSE)
Suppose that on June 3 you had bought a call with exercise price
Rs 320 at the closing price of Rs 29.75. On June 30 you would be
able to buy a stock worth Rs 339.75 for just Rs 320. Unfortunately,
you had already paid Rs 29.75 for this opportunity, so you end up with
a total loss of Rs 10.
You would come out ahead if the final stock price were above Rs
349.75 (ignoring the possibility of earning interest). Finally, had the
June 30 price dropped to below Rs 320, you would not exercise the
option, since exercising it would cause further loss.
□
Exercise 6.1.2 Suppose two call options are identical, except that
one has a higher exercise price. Which one will have a higher call
premium?
A variation on the above structure is an American Call Option –
such a contract can be exercised at any time between its birth and
expiry. Thus the holder of an American call can either exercise it at
any time by paying the exercise price X, or let it lapse. Since an
American call gives the holder more rights than a European call, it is
obvious that it will have at least as high a premium. In other words, let
CA be the premium of an American call and CE the premium of a
European call such that both calls have the same underlying asset,
expiry time and exercise price. Then CA ≥ CE.
A call option can be written on various kinds of assets, e.g.,
commodities, bonds, stocks, stock indices, and futures. It can be
traded through an exchange or directly. When a call is traded on an
exchange, the exchange standardises the expiry times, the amounts
of underlying asset, and the exercise price. The exchange also
arranges a process of marking-to-market (as for futures) to protect
against default risk.
Like futures, options can be traded at any time of their life to new
holders. The price at which they are sold is again called their
premium. The task before us is to find the correct premium at which
a call should be sold. To do this, we have to establish how good a
deal is being offered to the holder.
Figure 6.2 illustrates how the final payoff to the holder of a
European call option depends on the spot price of the underlying
asset at the expiry time. If the final spot price ST is greater than the
exercise price X, the holder exercises the call and profits by ST – X.
Otherwise, she does not exercise it (since it is cheaper for her to buy
the asset by paying ST in the market) and the final payoff is zero.
Thus the final payoff is given by
max{0, ST − X},
and this is also the premium of the call at time T.
Figure 6.2: The payoff to the holder of a European call option on its expiry at T
as a function of the final spot price ST of the underlying asset
Since the final payoff is guaranteed to be non-negative, both
common sense and the No Arbitrage Principle dictate that the call
premium must be non-negative:
C ≥ 0.
(6.1)
Another insight comes from observing that the holder of a European
call has a superior final payoff as compared to the holder of a futures
with the same asset, expiry time and exercise price. Therefore, the
call premium must be higher than the value of the corresponding
futures. If the asset generates no income or cost, we get:
C ≥ S – Xe–rT.
(6.2)
Combining the bounds (6.1) and (6.2) we get the following bounds for
the premium of a European call:
C ≥ max{0,S – Xe–rT}.
(6.3)
Exercise 6.1.3 Write down the form the inequality (6.3) will have if the
underlying asset generates either known income or a continuous
dividend yield.
Exercise 6.1.4 Show that the bound (6.3) is also valid for an
American call on an asset without income.
An upper bound on C can be obtained if it is known that the asset
price cannot be negative. In this case, the payoff from the call is
always less than or equal to ST . Hence the value of the call is less
than the value of owning the asset:
C ≤ S.
(6.4)
For example, this inequality is valid when the underlying asset is a
stock (whose price cannot be negative) but not when it is a futures
(whose value can be negative).
Figure 6.3: This graph compares the premiums of calls on Maruti stock (stars)
with the lower bound given by 6.3 (diamonds). The horizontal axis represents
the time interval 23 May 2005, to 29 June 2005. The calls had an exercise or
strike price of Rs 460 and expired on June 30, 2005. During this period, the
price of a Maruti share ranged between Rs. 435 and Rs. 478.
Exercise 6.1.5 Show that the inequality (6.4) is also valid for an
American call on an asset without income.
Bounds are useful if they are not too far away from the actual values.
Figure (6.3) illustrates how the lower bound given by the inequality
(6.3) is usually reasonably close to the actual premium values. The
upper bound given by (6.3), though correct, is too large to be useful.
Here is our first surprise concerning call options:
Theorem 6.1.6 An American call has the same value as a European
call with the same parameters if the underlying asset generates no
income.
Proof. Consider an American call with expiry at T and exercise price
X. Let Ct be its premium at a time t < T, and St the spot price of the
underlying asset at that time. From (6.3), we see that
Ct ≥ St – Xe–r(T–t),
and hence Ct > St – X. Since St – X is the payoff from exercising the
call, we see that it is never optimal to exercise the call. It would be
better to sell it. Thus, an American call will never be exercised before
expiry, and hence has the same value as a European call with the
same parameters.
□
This conclusion fails when the asset generates income, because then
the early exercise of an American call would have the added benefit
of bringing in a share of this income.
6.2 PUT OPTIONS
A put option is the reverse of a call option. Here, the holder buys the
right to sell the asset to the writer.
Figure 6.4: The structure of a European put option signed at time t = 0 with
expiration time t = T, exercise price X, and put premium P. The dashed arrows
at the t = T stage indicate that the trade is optional.
In a European put option the holder buys the right to make a
future sale, without the obligation to do so, in the following manner:
1. The holder pays the writer an initial fee (the put premium) to buy
the contract.
2. On the expiration date the holder may deliver the underlying
asset to the writer.
3. If the holder delivers the asset, the writer must pay the exercise
price (or strike price).
Our earlier remarks about calls apply as well to puts. In particular,
we have American put options, which give the holder the right to
exercise the contract before the expiry time. The premium of an
American put will be at least as much as the premium of a European
put with the same parameters. And this time there are no surprises—
in general, the premium of the American put will be strictly greater
(even when the asset generates no income).
Example 6.2.1 Consider an American put with X = 100 and expiry in
one year. Suppose the risk-free rate is r = 15%. If the spot price of the
underlying stock drops to 10, then immediate exercise will net 90 and
over a year this will become 90 × 1.15 = 103.5. This is more than the
maximum possible profit from holding the put to expiry.
□
Exercise 6.2.2 Show that if the risk-free rate r = 0, then the premium
of an American put on an asset which generates no income equals
that of a European put with the same parameters.
Figure 6.5: The payoff to the holder of a European put option expiring at T as a
function of the final spot price ST of the underlying asset
Figure 6.5 shows the final payoff from a European put expiring at
T and with exercise price X. The formula for this final payoff is
max{0, X – ST}.
This immediately tells us that the put premium P at time t = 0 must
satisfy P ≥ 0. Next, we observe that the payoff to the put holder is
always at least as much as that to the writer of a futures with the
same parameters. Hence the put has greater value. For an asset
without income, this gives:
P ≥ Xe–rT – S.
(6.5)
Combining (6.5) with P ≥ 0, we get the following lower bound for the
premium of a European put:
P ≥ max{0,Xe–rT – S}.
(6.6)
Exercise 6.2.3 Write down the form the inequality (6.6) will have if the
underlying asset generates either known income or a continuous
dividend yield.
An upper bound on P can be obtained if it is known that the asset
price cannot be negative. In this case, the maximum possible payoff
from a put is its exercise price X. Hence the premium of a European
put cannot exceed the present value of X:
P ≤ Xe–rT.
(6.7)
Exercise 6.2.4 How will you modify the bounds (6.6) and (6.7) for an
American put on an asset without income whose spot price cannot be
negative?
Figure 6.6: The payoffs to the holders of (from left to right) a European call, a
European put and a forward, each with the same underlying asset, expiry time
T and exercise price X
6.3 PUT—CALL PARITY
We have already found it useful to compare calls and puts with
forwards and futures. Figure 6.6 collects the payoff patterns for a
European call, a European put and a forward, with the same
underlying asset, expiry time T, and exercise price X. The comparison
shows that for S T ≥ X the holders of the call and the forward receive
the same payoff. For ST ≤ X, the writer of the put and the holder of the
forward receive the same payoff. Thus, if we simultaneously become
the holder of the call and the writer of the put, our final payoff at T will
exactly match that of the holder of the forward. Therefore, the initial
value of our portfolio must match that of the forward:
C – P = S – X e–rT.
Theorem 6.3.1 (Put−Call Parity) Suppose call and put options are
available on the same underlying asset with the same expiry date T
and exercise price X. Let the continuously compounded risk-free rate
be r, and let S be the spot price of the underlying asset. Then the call
premium C and the put premium P are related by
P + S = C + Xe–rT.
(6.8)
□
It is important to note that put–call parity is only valid for
European options. The form in equation (6.8) holds when the asset
generates no income. It is easily adapted to the cases when the asset
either generates known income or has a known dividend yield.
Exercise 6.3.2 Give a formal proof of put–call parity by means of the
method of replicating portfolios, and without assuming the existence
of a forward with the right features.
An interesting application of the put–call parity is that by rearranging
it, we can create a combination of three assets that mimics the fourth
one. For example, we see that a portfolio which has Xe–rT cash and is
also long one call and short one share is equivalent to being long one
put.
Exercise 6.3.3 What will be the put–call parity formula when (a) the
asset generates a known income, (b) the asset has a constant and
continuous dividend yield?
Exercise 6.3.4 Consider options on a stock following GBM with drift
parameter μ. One may reason it as follows: If μ is higher, then the
expected value of ST is higher, so the call premium should increase
while the put premium should decrease. Evaluate this conclusion in
light of put-call parity.
Let us summarise the progress we have made:
We have lower bounds for the option premiums.
When the asset price must be positive, there are upper bounds
as well.
When the asset generates no income, American and European
calls have the same premium.
European options satisfy put-call parity.
This is about as far as we can get with the No Arbitrage Principle
alone. Significant further progress is only possible by combining the
No Arbitrage Principle with models of asset price fluctuations. We
shall now begin this task, using the models developed in the previous
chapter. It is relevant to note here that these particular models have
been found to be most acceptable for stock prices. (For example,
Ross [40] shows that crude oil price data is not consistent with the
assumption of independent jumps.) Thus, from now on, asset should
really be read as stock.
6.4 BINOMIAL OPTIONS PRICING MODEL
The Binomial Options Pricing Model, or BOPM, is based on the
Binomial Tree Model for price fluctuations. In this model we break up
the time span [0,T] into n equal parts, and imagine that over each part
the asset price can either move up by a factor U or down by a factor
D. Thus, the basic object is the following branch:
We remark that typically U > 1 > D. (But see Exercise 6.4.2.)
ONE-STEP BOPM
First, consider the n = 1 case. Imagine a European call option on this
asset which expires at T and has exercise price X. If its initial
premium is C, then the evolution of its value over the interval [0,T] is
represented by:
Here, Cu is the call payoff if the asset price moves up, and Cd is
the call payoff if the asset price moves down.
It is clear that Cu > Cd. So, the call value goes up if the asset price
goes up, and down if the asset price goes down. Now, if we become
the writer of the call, the position reverses. Profits from the asset will
be cancelled by losses from the call. If we have the right amounts of
long asset and short calls, the fluctuations can exactly cancel and we
shall have a risk-free portfolio. Thus, suppose we own h units of the
asset and write 1 call. The value of this portfolio evolves according to
the following branch:
For this portfolio to be risk-free, we need hSU –Cu = hSD –Cd. We
solve this for h:
h=
.
The ratio h is called the option’s delta, and this process of creating
a risk-free portfolio is called delta hedging.
Since the portfolio is risk-free, its initial value equals the present value
of its value at T. Thus,
hS – C = e–rT(hSU – C u).
Substituting the value of h, we get:
C = e–rT p *Cu + (1 – p*)Cd ,
(6.9)
Exercise 6.4.1 Verify equation (6.9).
Exercise 6.4.2 Use the No Arbitrage Principle to show that D < erT <
U, and hence 0 < p* < 1.
Note that the probabilities of the up and down moves did not enter
into our calculations.
TWO-STEP BOPM
We now let the above process happen twice in succession, cutting
the time interval [0, T] into the equal parts [0,T/2] and [T/2,T]. Then
we have the following picture for the evolution of the spot price:
We label the corresponding payoffs from the call expiring at T as
follows:
Note that we have
Cuu = max{0,SU2 – X},
Cud = max{0,SUD – X},
Cdd = max{0,SD2 – X}.
Applying the one-step BOPM to the branches with nodes at Cu and
Cd gives:
Cu = e–rT/2 p*Cuu + (1 – p*)Cud ,
Cd = e–rT/2 p*Cud + (1 – p*)Cdd ,
where p* =
. We apply the one-step BOPM, once again, to
the branch with node at C to get:
C = e–rT/2 p*Cu + (1 – p*)Cd
= e–rT p*2Cuu + 2p*(1 – p*)Cud + (1 – p*)2Cdd .
Exercise 6.4.3 Suppose we have a two-step BOPM with S = 100, X =
100, U = 1.1, D = 0.9, r = 10% and T = 1. Show that C = 10.71.
MANY-STEP BOPM
A certain pattern in the call premium formulas should now be evident.
First, we list all the possible final spot prices. Over n steps, these are
ST = SUkDn–k, k = 0,1,…,n.
The payoff from the call corresponding to the final value SUkDn–k is
max{SUkDn–k – X,0}. We multiply this payoff by the binomial
expression
p*k(1 – p *)n-k, where not]pstar@p
*
p* =
.
Finally, we sum all these terms and divide the sum by Rn = erT.
The general formula is thus obtained:
Theorem 6.4.4 (Many-Step BOPM) Consider an asset whose price
follows an n-step binomial tree with parameters U and D and initial
price S over the time interval [0,T]. Let the continuously compounded
risk-free rate be r. Consider a call premium on this asset that expires
at T and has exercise price X. Then the call premium C is given by
(6.10)
where p* =
.
□
We have given this formula as a reasonable generalisation of the one
and two-step BOPM. You may enjoy proving it by induction!
Exercise 6.4.5 Show that in an n-step BOPM for a European call,
C=
The formula (6.10) for C has a very interesting form. First, the division
by erT represents discounting to a present value. This present value is
taken as a weighted average of the payoffs from the call at expiry.
The weights are the probabilities associated with a binomial random
variable with parameters n and p*. Thus, one is led to interpreting p*
as a probability, which is possible since we have already determined
that 0 < p* < 1 (Exercise (6.4.2). Since it arose out of making the
process risk-free, it is called the risk neutral probability.
Figure 6.7: Premiums of a European call plotted against the initial spot price.
The dots represent the predictions of a 3-step BOPM. The smooth curve is
obtained from a 100-step BOPM. The dashed line is the graph of the function S
– Xe-rT, which is asymptotic to the premium plots. (See Exercise 6.4.5)
Another interesting aspect of the BOPM is that the premium
depends on T (represented by n), S, X, r and volatility (represented by
U,D), but not on the real-world probabilities of price increase or
decrease.
For European puts we have a similar analysis, except that the call
payoffs at expiry are replaced by the put payoffs.
Theorem 6.4.6 Consider an asset whose price follows an n-step
binomial tree with parameters U and D and initial price S over the
time interval [0,T]. Let the continuously compounded risk-free rate be
r. Consider a European put on this asset that expires at T and has
exercise price X. Then the put premium P is given by
(6.11)
where p* =
.
□
Exercise 6.4.7 Show that the BOPM formulas for European put and
call premiums satisfy put–call parity. (This reassures us that BOPM
correctly captures the properties of options.)
ESTIMATING BOPM PARAMETERS
To put BOPM to honest work, we need estimates of the parameters U
and D. We have already seen one way of making these estimates in
the previous chapter where we matched the binomial tree to the
lognormal model and obtained
U ≐ eσ
, D≐e–σ
,
where σ is the volatility parameter in the Lognormal model. In the
same chapter, we also described how to obtain σ from historical data.
Example 6.4.8 Figure 6.8 compares the actual closing premiums of
European call options on Maruti stock with the theoretical ones
calculated by a 10-step BOPM. The options expired on June 30,
2005, and had an exercise price of Rs 460. For the BOPM
calculations, we have assumed an annual risk-free rate of 5% and
taken the volatility to be σ = 0.26.
Figure 6.8: A comparison of actual premiums (stars) of call options on Maruti
stock with the predictions from a 10-step BOPM
The main issue here is the choice of σ. This example illustrates that □
the model works well if we have the right value of σ, but how do we
obtain it? It is true that we have earlier given a way of estimating σ
from historical prices, but a little reflection throws up some obvious
problems. How much data should we use? What should be its
frequency? Unfortunately, our choices in these matters can have a
dramatic impact on the value that we get for σ.
One solution that has become popular is to work back from the
options themselves. Find the σ that makes BOPM give accurate
values for one set of options and use it in calculations for others! We
will return to this idea in the next chapter.
6.5 PRICING AMERICAN OPTIONS
Recall that an American call has the same value as the corresponding
European call. We will now see how the BOPM can be used to
compute the premium of an American put. In applying BOPM to an
American option, we assume that the decision to exercise or not can
only be made at the nodes of the branching process.
We start by considering the one-step situation:
The corresponding diagram for the payoffs from the American put
is:
Here, we use P* to refer to the value of the American put. If we
use P to indicate the value of the corresponding European put, we
have, Pu* = Pu and Pd * = Pd, since an American put held till expiry is
equivalent to a European put.
Now, at the starting point the holder has the option of either
exercising the put right away, or holding it till expiry. If he holds it till
expiry, it becomes a European put whose value is given by the BOPM
for European puts: e–rT p*Pu* + (1 – p*)Pd* . On the other hand,
exercising it immediately is worth X – S. The holder will obviously take
that step which gives him more value. Therefore,
P* = max
.
This one-step BOPM for American puts can be used to piece
together an n-step BOPM in the usual way. We show below how the
two-step BOPM is created.
We work our way back from the right end of the tree. We first note
that:
Puu* = max{X – SU2,0},
Pud* = max{X – SU D,0},
Pdd* = max{X – SD2,0}.
Now, we apply the one-step BOPM to the two second-stage
branches:
Pu* = max
,
Pd* = max
.
Finally, we apply the one-step BOPM to the first branch:
P* = max
.
By now it should be clear how the process will work for an n-step tree.
This is an easy process to implement numerically. However, it does
not lead to a closed form solution (unlike the case of European put),
and so does not give any analytic insight.
Example 6.5.1 Consider a 1-step BOPM for an American put with T =
1, erT = 1.05, S = 100, U = 1.1, D = 0.9 and X = 100. Then the riskneutral probability is given by
p* =
= 0.75.
We have the final payoffs
Pu* = max{0,X – SU} = max{0,100 – 110} = 0,
Pd* = max{0,X – SD} = max{0,100 – 90} = 10.
The put premium is therefore given by
P* = max
= max{0,
× 0.25 × 10}
= 2.38.
In this case it is best not to exercise early as that has a zero payoff.
On the other hand, if we change X to 105, the payoff from exercising
early is superior.
□
Exercise 6.5.2 Verify that according to BOPM, an American call has
the same premium as a European call with the same parameters.
6.6 FACTORS INFLUENCING OPTION PREMIUMS
According to BOPM, the factors affecting the value of a call or put
option are the time to expiry, exercise price, current spot price,
volatility and the risk-free interest rate.
Expiry Time (T): An increase in T increases the range of possible
profits (because of the increase in range of the final spot price ST),
while the loss is always limited to the premium. Thus, the value of an
option increases with T. In an American option one has the additional
possibility of closing out the contract if a large profit is available early.
A longer T means a greater possibility of encountering such a profit,
so this strengthens the positive connection between the value of a call
or put option and T.
Figure 6.9 confirms these ideas. The curve showing the premiums
of American puts is both higher and has greater slope.
Figure 6.9: Variation of put premium with time to expiry plotted using a 10-step
BOPM. The higher curve is for an American put and the lower one for a
European put with the same underlying asset and exercise price.
Current Spot Price (S): The value of a call rises with S since this
suggests that ST will also be high, bringing in greater return from the
call. On the other hand, since a put brings more profit as ST
decreases, the value of a put goes down when S rises.
Exercise Price (X): A higher X decreases the profit from a call and so
lowers its value. The value of a put will rise with X.
Volatility (σ): A higher σ indicates chances of greater profit from the
option. So, the value of an option rises with volatility.
Risk-free Interest-rate (r): Suppose r increases. Then the present
value of the final payoff decreases and this will pull down the value of
the option. However, there is another factor: if r increases, one
expects a general rise in prices, including those of stocks. This will
pull up the value of a call, and pull down that of a put. Thus, we
certainly expect the value of a put to decrease with an increase in r.
For calls, we have two opposite influences and cannot immediately
say which pull is stronger. It has been empirically observed (and is
also a consequence of BOPM) that the second influence dominates
and thus the value of a call increases with r.
Figure 6.10: Variation of option premiums with the risk-free rate, calculated
from a 10-step BOPM. All the cases have S = X = 10, σ = 30% and T = 1.
6.7 OPTIONS ON ASSETS WITH DIVIDENDS
The BOPM approach can also be applied when the asset has a
known dividend yield. For simplicity, we assume a constant and
continuous dividend yield, though the approach can be easily
extended to discrete or varying yields.
Let us start with a one-step binomial tree for an asset which also
has a continuous dividend yield q. Let the up and down factors be U
and D, the time interval be Δt, the initial asset price be S, and the
exercise price for a European call on the asset be X. Let the
continuously compounded risk-free rate be r.
As usual, we initially create a portfolio P which is long h units of
the asset and short one call. All the dividend earned over the Δt
interval is reinvested in the asset so that at the end of the process the
portfolio is long heqΔt asset units and short one call. We denote the
final value of the call by Cu (when the asset price moves up) and Cd
(when the asset price moves down).
Exercise 6.7.1 Show that P is risk-free if the hedge ratio is set to
h = e–qΔt
.
Suppose h is set to the value given in the above exercise. Then the
portfolio, being risk-free, must grow at the risk-free rate. Therefore, its
initial value is simply the present value of its final value:
hS – C = e–rΔt(heqΔtSU – C u).
Exercise 6.7.2 The call premium is given by
C = e–rΔt(p *Cu + (1 – p*)Cd),
where
p* =
Cu = max{0,SU – X}
Cd = max{0,SD – X}.
Starting with these calculations, it is easy to carry out an n-step
BOPM over a time interval T.
Theorem 6.7.3 The value of a European call on an asset with
dividend yield q is
(6.12)
where
p* =
is, again, called the risk-neutral probability.
□
Exercise 6.7.4 Show that the price of a European put on an asset
with dividend yield q is given by
where
p* =
.
Exercise 6.7.5 Show that in the current context, put–call parity takes
the following form:
P + Se–qT = C + Xe–rT.
6.8 DYNAMIC HEDGING
Dynamic hedging is a process whereby a portfolio is constantly
readjusted to keep it risk-free.
Example 6.8.1 Consider a two-step BOPM for a European call with S
= 100, X = 100, U = 1.1, D = 0.9 and r = 5%. Then, we have Cuu = 21
and Cud = Cdd = 0. With the aid of these numbers, we calculate Cu =
15 and Cd = 0. Moreover, the corresponding hedge ratios are hu =
0.9545 and hd = 0. Our next calculation is C = 10.714, and the hedge
ratio for this step is h = 0.75.
We see that to keep the portfolio risk-free we have to adjust the
hedge ratio at each step, depending on the move that happens. Let’s
follow the process through one sequence of possible events.
At t = 0, we write 1000 calls and buy 750 shares, thus keeping a
hedge ratio of 0.75. Our net investment is
750 × 100 − 1000 × 10.714 = 64,285.
At t = 1, suppose the spot price has gone up by U = 1.1 to 110.
Then the value of our portfolio becomes
750 × 110 − 1000 × 15 = 67,500.
Note that the portfolio value has increased by exactly the risk-free
rate of 5%. To keep the portfolio risk-free over the next stage of the
tree, we have to adjust its hedge ratio to 0.9545. We can do this by
either buying 204.5 shares (so we have a total of 954.5) or by buying
back 214.3 calls (decreasing their number to 785.7). While the steps
are equivalent mathematically, the second one involves a much lower
investment and is therefore more practical. Thus, we decide to buy
back 214.3 calls at a price of 15 each, and we borrow 214.3 × 15 =
3214 to do so.
At t = 2, suppose the spot price again rises. Then the value of the
portfolio is
750 × 121 − 785.7 × 21 − 3214 × 1.05 = 70,875.
Again, the portfolio value has increased by exactly the risk-free rate. □
This is only a toy example in that the real prices would not actually go
up or down by the prescribed factors, and so the tree has to be
reconstructed after each step for the new price. This also means that
we cannot expect the hedge to be perfect. In the next example we
illustrate one way of handling an actual sequence of prices.
Example 6.8.2 Suppose a stock is estimated to follow GBM with drift
μ = 0.2 and volatility σ = 0.3. We wish to hedge a portfolio of 1000
shares of this stock over 3 months using call options expiring in 90
days. The initial price S of the stock is 10 and this is also the exercise
price X of the options. The risk-free rate is 5%.
To keep down the transaction costs associated with trading
options, we decide to readjust the portfolio every 9 days. We then set
up a 10-step binomial tree with each step being of 9 days. We start by
hedging for the first 9 days by using the first hedge ratio from this
tree. After 9 days we observe the new stock price and create a
corresponding 9-step binomial tree. We calculate the new hedge ratio
and accordingly change the number of written call options. Thus,
every 9 days, we shorten the number of steps in the tree by one.
The time-step of 9 days is the same for each binomial tree, so we
use the following up/down factors and risk-free probability throughout.
U = es
D=
= 1.048
= 0.954
p* =
= 0.501
The initial 10-step BOPM gives the following starting hedge ratio and
call premium:
h = 0.561,
C = 0.6388.
Our first step is to write 1000/h = 1783 calls. Our portfolio now
consists of 1000 shares, 1783 shorted calls and 0.6388 × 1783 =
1139 in cash. The cash cancels the value of the shorted calls, so the
initial value of this portfolio is 1000 × 10 = 10,000.
Now, suppose the stock prices take the following values at 9-day
intervals (they have been randomly generated to follow a GBM with
the given drift and volatility):
10, 10.02, 9.53, 10.43, 10.30, 10.16, 10.30, 9.89, 10.17, 9.70, 10.05.
After 9 days, the new stock price is 10.02. The 9-step BOPM gives
the hedge ratio and call premium values as:
h = 0.563,
C = 0.6436.
Therefore, we have to adjust the portfolio so that it has 1000/0.563 =
1776 written calls. We buy back 7 calls at an expense of 7 × 0.6436 =
4.51. The position becomes:
Value of shares
=
1000 × 10.02 = 10020
Value of written calls
=
−1776 × 0.6436 = −1143
Cash
=
1139 e9r/365 − 4.51 = 1136
Total value of hedged portfolio
=
10020 − 1143 + 1136 = 10013.
Figure 6.11 shows the result of repeating this process till the expiry of
the calls. The hedging is quite successful in removing the large
oscillations present in the stock price.
□
Figure 6.11:Dynamic hedging with calls. The diamonds represent the unhedged
shares and the stars represent the hedged portfolio (Example 6.8.2).
6.9 RISK-NEUTRAL VALUATION
Let us take another look at the structure of the BOPM formulas for
European call premiums. We observed earlier that the premium turns
out to be the present value of the expected future payoffs, calculated
using the risk-neutral probability p*. A little later, we saw that the
premium of a European put has the same form.
Further insight into p* comes by considering risk-neutral investors.
Since they are blind to risk, they do not demand any compensation for
it. If all investors are risk-neutral, then the expected value of any asset
will grow at the risk-free rate. Now consider a one-step binomial tree
for an asset, over a time T. Let the up move have probability p. The
expected final value of the asset is
E[ST] = pSU + (1 – p)SD.
We assume a world of risk-neutral investors. Then we have
E[ST] = erTS
pSU + (1 – p)SD = erTS
p=
= p*.
So the risk-neutral probability p* can also be interpreted as the
probability of an up move in a risk-neutral world. Yet another
description of p* that emerges from this calculation is that it is the
probability that makes today’s spot price equal to the present value of
the expected future payoff.
Our results for European options extend immediately to any
derivative whose final payoff depends only on the final spot price of
the underlying asset. We shall call such a derivative a European
derivative. The extension of the BOPM formula to such a derivative
is automatic since our calculations of European options premiums are
independent of the formula for the final payoffs.
We have reached the Principle of Risk-Neutral Valuation:
The value of a European derivative is the present value of the
expectation of the final payoff, where the expectation is calculated
using risk-neutral probabilities.
As an application of this principle, we can re-derive the formula for
the value of a futures contract. The payoffs at the end of an n-step
binomial tree are SUkDn−k – X, and hence the initial value of the
futures is given by
= e–rT
= e–rT S(p*U + (1 – p*)D)n – X
= e–rT SerT – X = S – Xe-rT.
Exercise 6.9.1 Consider a contract which will pay you the square of
the price of the underlying asset at a future time T. What is the correct
price for this contract?
Exercise 6.9.2 Consider a cash-or-nothing call option. If the final
spot price ST is greater than or equal to the exercise price X, the
holder receives 1 unit of the currency. If ST < X, the holder receives
nothing. Price this contract.
We shall take a closer look at the risk-neutral probabilities when the
underlying asset actually follows geometric Brownian motion.
Consider an n-step binomial tree in each step of which the stock price
can move up or down by factors U and D respectively. We denote the
stock price at each step as follows: S = S0, S1, S2, …, Sn-1, Sn = ST.
The log returns are defined by
Ui = ln
, i = 1,…,n.
The risk-neutral probability p* gives the probability of an up move in a
risk-neutral world. Its value is
p* =
.
This choice of p* ensures, our options pricing model will be free of
arbitrage.
Each Ui is an independent Bernoulli variable taking values u =
ln(U), d = ln (D), with probabilities p* and 1 – p* respectively.
Therefore,
V ar[Ui] = p*(1 – p*)(u – d)2.
We can now describe the variance of the overall log return under the
risk-neutral probability:
Var ln
= Var ln
= np*(1 – p*)(u – d)2.
At this stage we bring in our earlier estimates of u,d (from matching
with a GBM with drift μ and volatility σ):
u=σ
, d = –σ
.
We shall look at what happens to our model as we go to the
continuous limit, i.e., we let n →∞. First,
Therefore,
Thus the risk-neutral probability preserves the volatility of the
underlying GBM! We have obtained the basic properties of the riskneutral probability in BOPM:
1. The expected stock price grows at the risk-free rate (this
prevents arbitrage).
2. The volatility is the same as for the underlying GBM (at least in
the limiting sense).
We are now ready to take the step up to the Black–Scholes model.
7 The Black–Scholes Model
I
n this chapter we shall take up the Black-Scholes model for the
pricing of European options whose underlying asset follows
Geometric Brownian Motion. The most satisfying and
mathematically rigourous way of doing this is to follow the same
scheme as was adopted for the BOPM: create a combination
involving the derivative and the asset that evolves in a risk-free
manner with time. However, this approach requires a prior study of
random processes that evolve continuously with time—mathematics
beyond what is usually covered at the undergraduate level. We shall
proceed in a different way: we assume that the continuous model
must have the same fundamental properties as the discrete one.
That is, we assume that the Principle of Risk-Neutral Valuation must
be valid in the continuous case, and use it to price various European
derivatives.
The great advantage of the Black-Scholes formula is the ease
with which it can be used to examine the exact relationship of the
option’s price with various factors. The numerical insights from
BOPM can be improved to formulas of rates of change. This makes
it much easier to carry out procedures like dynamic hedging. It also
enables investors to fine-tune the exposure of their portfolios to
changes in specific factors such as stock prices, volatility, and
interest rates.
7.1 RISK-NEUTRAL VALUATION
Let us begin by recalling the Principle of Risk-Neutral Valuation as
developed in the previous chapter:
The value of a European derivative is the present value of the
expectation of the final payoff, where the expectation is calculated
using risk-neutral probabilities.
We obtained this principle from the n-step BOPM. The risk-neutral
probability was defined as the one that makes the current stock price
equal to the present value of the expected final stock price. We also
discovered that when the underlying asset follows Geometric
Brownian Motion, the risk-neutral probability preserves its volatility
(in the limit n →∞).
It is time to take up the direct pricing of European derivatives on an
asset following GBM. We proceed in two steps (which we shall again
term the Principle of Risk-Neutral Valuation):
1. Replace the original GBM by a risk-neutral one with the same
volatility.
2. Calculate the present value of the expected payoff from the riskneutral GBM.
The first step is easily done. Suppose the asset price follows GBM
with drift μ and volatility σ:
2⁄2)T+σ
ST = Se(μ–σ
Z,
Z ~ N(0,1).
Its risk-neutral version is ST*
2⁄2)T+σ
ST* = Se(r–σ
Z,
Z ~ N(0,1).
Exercise 7.1.1 Verify that
S = e–rT E[S T*], Var ln
= σ2T.
The hard work is in implementing the second stage. Suppose the
payoff from the European derivative is given by a function f(ST) of
the final stock price. Then we formulate the value of the derivative as
V = e–rT E[f(S T*)].
Before we tackle European options, let us warm up by applying riskneutral valuation to simpler situations.
Example 7.1.2 Consider a futures on an asset following GBM with
parameters μ,σ. Let its exercise price be X and the time remaining to
expiry be T. According to the Principle of Risk-Neutral Valuation, its
value is
V = e–rTE[S T*– X].
Therefore,
Example 7.1.3 Consider a contract which will pay you the square□
of the final spot price of the underlying asset. What should you pay
to buy this contract? We assume the underlying asset follows GBM
with volatility σ and calculate the value as follows:
7.2 THE BLACK–SCHOLES FORMULA
□
Theorem 7.2.1 (Black-Scholes Formula for European Call
Options) Consider aEuropean call with expiry date T and exercise
price X on an asset following GBM13 with volatility σ. Then the call
premium at t = 0 is given by
C = SΦ(w) – Xe–rTΦ(w – σ
). (7.1)
where
w=
. (7.2)
Proof. By the Principle of Risk-Neutral Valuation, the premium at t =
0 is given by
C = e–rTE[(S T*– X)+]. (7.3)
where x+ = max{x,0} and
2⁄ 2)T+σ
ST* = Se(r–σ
Z,
Z ~ N(0,1).
Now we calculate the right hand side of equation (7.3) to obtain C.
2
We need to identify where the function Se(r–σ ⁄2)T+σ x is greater
than X. Since this function is monotonically increasing, we need to
2⁄2)T+σ
just identify the value x = a, where Se(r–σ
is
a=
x
= X. The solution
.
Therefore,
(7.4)
Let us separately evaluate the two terms in equation (7.4). First, we
calculate
where Z is a standard normal variable. Let Φ be the cumulative
distribution function of Z: Φ(z) = ℙ[Z ≤ z]. Then, by the symmetry of
Z, we have ℙ[Z ≥ z] = ℙ[Z ≤–z] = Φ(–z). Hence,
where
w = –a + σ
=
Now we calculate the second term:
.
Substituting the values of I and II back into our original equation for
the call premium gives the Black– Scholes formula.
□
We can use the Black– Scholes formula to see how C depends
on various parameters (we had done this earlier using intuition and
BOPM). It is convenient to write ST* = SeW, where W ~ N((r – σ 2⁄2),σ
). We also use ↑ to indicate an increase and ↓ to indicate a
decrease.
1. If S ↑, then clearly E[(SeW – X)+] ↑ and so C ↑.
2. If X ↑, then clearly E[(SeW – X)+] ↓ and so C ↓.
3. To see what happens when r changes, we rearrange the
expression for C as follows:
Now, if r ↑, then e–rT ↓, and so C ↑.
4. The nature of the dependence on σ and T cannot be seen so
directly from the formula. We will work that out a little later.
Figure 7.1: These diagrams illustrate the call premium predictions of the
Black–Scholes formula. In both we have fixed T = 0.25. Further, in the first
diagram S = 100, σ = 0.3, and C is plotted against the exercise price X for
three different values of r (0, 0.5 and 1 from left to right). In the second, X =
100, r = 0.1, and C is plotted against S for different values of σ (0.4, 0.2 and
0.001 from left to right).
Exercise 7.2.2 Use Put–Call Parity to derive the Black–Scholes
formula for a European put:
P = –SΦ(–w) + Xe–rTΦ(–w + σ
).
(7.5)
Recall that the value of an American call is the same as that of a
corresponding European call, so that too can be obtained from the
Black–Scholes model. However, the value of an American put
cannot be obtained from this model.
Exercise 7.2.3 Consider the cash-or-nothing call option defined in
Exercise 6.9.2. Price this contract assuming that the underlying
asset follows a GBM with volatility σ.
Exercise 7.2.4 A stock-or-nothing call option delivers the stock if
ST ≥ X and nothing if ST < X. Price this contract, assuming the
underlying stock follows a GBM with volatility σ.
7.3 OPTIONS ON FUTURES
Options are also written with a futures as the underlying asset. The
contract details the underlying asset and the expiry date of the
futures. A call option, on exercise, would make its holder the holder
of the futures. A put option would make him the writer of the futures.
Let us create some notation for such an option:
An option on a futures could be European or American. In either
case, if it is exercised at a time t, then the writer has to deliver a
futures contract expiring at TF and with futures price X. Since the
available contracts will be marked-to-market, the writer will actually
deliver a futures with futures price Xt and a compensatory cash
amount of e–r(TF–t)(Xt –X). Two points must be noted about this
delivery:
1. The marked-to-market futures has zero value at delivery.
2. The compensatory cash is discounted to the present since the
futures price is a future payment.
The Black–Scholes approach works quite neatly for European
options on futures. Let us carry out the calculations for a European
call option on a futures and obtain its premium CF. We assume the
asset underlying the futures follows a GBM with volatility σ. In this
case, the final payoff at time TO has the form:
This is the same payoff as that for a European call on the asset
underlying the futures and with exercise price Xe–r(TF–TO). Therefore,
the value CF is given by the Black–Scholes formula for such a call:
where
w=
=
.
American options on futures can be handled by BOPM.
7.4 OPTIONS ON ASSETS WITH DIVIDENDS
In this section we consider assets which follow GBM and also have a
continuous dividend yield. In §6.7, we had carried out the BOPM
analysis for European options on such assets. Now, we shall derive
the continuous version. Let us start by reviewing our earlier work.
We note that:
1. The calculations can be carried out for any European derivative.
2. The derivative premium is the present value of the expectation
of the future payoff, where the expectation is calculated using
the risk-neutral probability p*.
3. Under the risk-neutral probability, the current asset price is the
present value of the expectation of the future asset price.
In other words, under BOPM, the Principle of Risk-Neutral Valuation
extends to European derivatives on assets with continuous
dividends.
Now we take up an asset whose price evolves from S to ST over
a time interval T following a GBM with volatility σ. We approximate
this by an n-step binomial tree with U = eσ
and D = 1⁄U.
Suppose also that the asset has a continuous dividend yield q.
Exercise 7.4.1 Let p* be the risk-neutral probability for this asset.
Show that
1.
2.
We see that the risk-neutral probability preserves volatility. Thus,
in the continuous case, the risk-neutral version of the original GBM
should have the same volatility. Its drift, on the other hand, is r – q:
this choice ensures that S is the expected value of ST under the riskneutral GBM. Therefore, in applying the principle of risk-neutral
valuation, we replace the original GBM
2⁄2)T+σ
ST = Se(μ–σ
Z
with the risk-neutral one
2⁄2)T+σ
ST* = Se(r–q–σ
Z,
where Z has the standard normal distribution. The premium of a
European derivative on this asset is now given by
V = e–rTE[f(S T*)],
where f is the payoff function of the derivative. In particular, the
premiums of European options can be easily obtained by modifying
our earlier calculations.
Exercise 7.4.2 Show that the premiums of European puts and calls
on an asset with dividend yield q and volatility σ are given by
7.5 BLACK–SCHOLES AND BOPM
An alternate derivation of the Black–Scholes equation is to start with
a BOPM having time-step △ t, and then let △ t → 0 to get a
continuous model. In taking this limit, the binomial distribution in the
BOPM converts to the normal distribution in the Black–Scholes. In
particular, by taking a small enough △ t, we get a discrete
approximation of the Black–Scholes. This is illustrated in Figure 7.2,
which shows that even a 10-step BOPM gives a good approximation
to the continuous limit.
The Black–Scholes formula has a simple form which lends itself
to quick calculation as well as analytic treatment leading to general
conclusions. BOPM involves considerably more computation (if large
tree sizes are used). On the other hand, it is a much more flexible
tool than Black–Scholes. For instance, BOPM can be used to price
American puts whereas Black–Scholes cannot. BOPM can also
easily handle the so-called exotic options, which have more
complicated rules. In general, we could say that the scope of BOPM
is wider but the analysis from Black–Scholes is deeper.
Figure 7.2: Comparison of Black–Scholes and BOPM. The call premium C
has been plotted against the elapsed time t. The continuous path tracks the
prices predicted by Black–Scholes, while each dot has been calculated using
a 10-step BOPM. (In this instance, we have taken T = 1, S = X = 100, r =
10% and σ = 0.3.)
Example 7.5.1 We consider the case of a barrier call. Such an
option becomes worthless if the spot price ever crosses a certain
barrier. A barrier call places limits on the profits available to the
holder, so its value is obviously less than that of a standard call. We
show below how BOPM can be used to find the right price for a
barrier call.
Consider a European call with X = 100 and T = 3. Suppose the
underlying stock has initial spot price S = 100 and can move up by a
factor U = 1.1 or down by a factor D = 0.9 during a unit time period.
Let the risk-free rate be r = 0.05 during a unit time period. In addition,
suppose there is a barrier at 95: the call will expire worthlessly if the
spot price goes below 95. We will use a 3-step BOPM to find the call
premium C.
First, from the given data we calculate
R = 1 + r = 1.05, q =
= 0.6, 1 – q = 0.4.
The binomial tree for S is given in Figure 7.3.
The dashed line represents the barrier at 95. This call cannot be
priced by setting up a tree for the call payoffs in the usual manner,
since the value at any location is not completely determined by the
subsequent payoffs—it also depends on the path taken earlier. If that
path ever fell below the barrier, the value is zero.
Hence, when calculating the risk-neutral probability for a final
payoff, we have to actually calculate the probability of reaching it by
a path that doesn’t cross the barrier. In this example, the paths that
have positive payoff are UUU, UUD, and UDU. We tabulate the
probabilities and payoffs for these paths below:
Figure 7.3
Therefore the expected payoff is
EP = 0.216 × 33.1 + 2 × 0.144 × 8.9 = 9.71.
The call premium is
C = Present value of EP =
= 8.39.
□
As this example illustrates, exotic options may require us to track
paths rather than nodes. Now, in an n-step BOPM, the number of
possible paths is 2n, as there are 2 choices of movement at each
stage. The number of nodes is only 1 + 2 +
+ (n + 1) = (n + 1)(n +
2) ⁄ 2 ≈ n2⁄2. For example, if n = 100, the number of paths is 1.3 ×
1030 while the number of nodes is 5151. The tremendous amount of
computation involved may hinder us from getting accurate values out
of BOPM.
Figure 7.4: The first chart shows implied volatility of the Nifty stock index,
calculated from call option prices on 31 August 2005 and plotted against time
to expiry (in months). The two curves represent options with exercise price
2350 and 2400 respectively. The second chart plots implied volatility of
Reliance shares, measured from call options expiring on 29 September 2005,
against the exercise price of the corresponding call.
A Black–Scholes kind of approach can also be applied to such
options. It requires us to assign probabilities to the paths of Brownian
motion and to integrate over sets of paths to calculate expected
values. Calculations of this kind are important in quantum physics,
and the techniques developed there find an application in finance!
7.6 IMPLIED VOLATILITY
From the data about past stock prices, we can estimate the
historical volatility σ of the spot price. We can substitute this in the
Black–Scholes formula to get the corresponding value of a European
option. Also, we can reverse this process: starting with the actual
price of the option in the market, we can solve the Black–Scholes
equation for σ. The value we get is called the implied volatility,
denoted by σI.
The implied volatility can be seen as reflecting the opinion of the
market about the future volatility of the stock price. Indeed, studies
suggest that implied volatility is better than historical volatility for
forecasting the future.
In terms of calculation, there are some difficulties with implied
volatility. One is that the Black-Scholes equation cannot be explicitly
solved for σI, so some numerical technique has to be applied. A
deeper problem is that different options for the same stock typically
lead to different values of σI, so one has to take some kind of
weighted average of these values. Such problems also exist with
historical volatility since the estimated value depends on the period
from which the data is taken.
Example 7.6.1 The implied volatility of the Nifty stock index, as
measured from call option prices on 31 August 2005, varied from 0.1
to 0.126. Its historical volatility was 0.18 over the previous year, 0.26
over the previous 5 years, and 0.31 over the previous 15 years.
□
Figure 7.4 shows some examples of implied volatility patterns. The
pattern we see in the second chart in that figure is common enough
to have a name—the volatility smile. The Black–Scholes model does
not predict this smile. Its prediction is of a flat line (with σI = σ).
A good amount of current research is devoted to finding
explanations for the smile. These fall into two categories. One
focusses on the information available to traders. For instance, it
might be that traders investing in options with extreme exercise
prices are doing so because they expect higher volatility. The other
kind of research investigates the assumptions underlying the Black–
Scholes model, such as a constant risk-free rate or a fixed volatility.
Models in which the risk-free rate and volatility vary randomly with
time, have been able to generate volatility smiles of the types
observed in reality.
7.7 DYNAMIC HEDGING
DELTA
Consider a European call with exercise price X and expiry time T on
a stock with initial spot price S, drift μ and volatility σ. Let the risk-
free rate be r. Then, according to the Black–Scholes model, the call
premium C is a function of S, X, T, σ and r. We are interested in how
changes in any of these will affect C. To start, we consider the effect
of changes in S. To this end, we define the delta of the option by
δC =
.
We can calculate the delta by starting from the the Black–Scholes
formula for C:
C = SΦ(w) – Xe–rTΦ(w – σ
).
Differentiate it using the chain rule:
So our final result is:
δC = Φ(w).
Note that δC > 0 and so C increases if S does. We shall now see
how knowledge of delta is used in hedging.
DELTA HEDGING
Consider a portfolio which is long one share and h calls. Then its
value is V = S + hC. To hedge against changes in the spot price, we
set
= 0.
This gives:
0 = 1 + hδC = 1 + hΦ(w), or h = –
.
In other words, we have to short 1 ⁄ Φ(w) calls. Such a portfolio is
called delta neutral and this process is called delta hedging.
Example 7.7.1 The process of delta hedging again leads to
dynamic hedging: a portfolio would be periodically readjusted to
keep it delta-neutral. As an illustration, we take up the same
simulated stock prices used in Example 6.8.2. Recall that the stock
was assumed to follow GBM with drift μ = 0.2 and volatility σ = 0.3.
The task was to hedge a portfolio of 1000 shares of this stock over 3
months using call options expiring in 90 days. The initial price S of
the stock was 10 and this was also the exercise price X of the
options. The risk-free rate was 5%. Finally, we had decided to adjust
the hedged portfolio every 9 days.
The calculations for the first time-step are:
w
=
Call premium, C
=
SΦ(w) – Xe–rTΦ(w – σ
=
0.653
= 0.157.
Number of shorted calls, N
=
Cash in portfolio, D
=
N × C = 1162.
Hedged portfolio value
=
10,000.
= 1778.
)
Figure 7.5: Delta hedging a stock portfolio with calls (Example 7.7.1)
After 9 days, the spot price is 10.02. Black–Scholes gives the new
call premium to be 0.628. The value of the portfolio is now
10.02 × 1000 – 1778 × 0.628 + 1162 e0.05×9⁄365 = 10,067.
We repeat this process over the subsequent stages. Figure 7.5
shows the result. It is almost identical to the hedging carried out via
BOPM. You must have noticed, however, that it was easier to set up
and faster to carry out.
□
GAMMA
The most important parameter affecting C is the spot price S. For a
closer look at their relationship, we involve the second derivative as
well. Therefore, we define the gamma of the call by
γC =
.
It is explicitly given by the following formula:
γC =
2⁄2
.
e–w
Exercise 7.7.2 Confirm the formula for the gamma of a call given
above.
DELTA–GAMMA HEDGING
We have already encountered delta hedging, where the delta is set
to zero by shorting an appropriate number of calls. We can further
reduce risk due to changes in the spot price by also setting the
gamma to zero.
To do this, we create a portfolio consisting of the stock and two
different calls on this stock (e.g. they could have different expiry
dates or exercise prices). Suppose it has one share, x1 units of the
first call and x2 units of the second call. We similarly use Ti, Xi, Ci to
represent the expiry time, exercise price and premium of the ith call.
Then the value V of the portfolio is given by:
V = S + x1C1 + x2C2.
Setting δV and γV to zero gives the following equations:
1 + x1Φ(w1) + x2Φ(w2)
=
0,
2⁄2
+
=
0.
x1e–w1
2⁄2
x2e–w2
This is a 2 × 2 linear system and is easily solved for x1 and x2.
Figure 7.6: An instance of delta–gamma hedging (Example 7.7.3)
Example 7.7.3 Let us take up the situation of Example 7.7.1 again.
We assume another call option on the stock is also available with
expiry in 90 days and exercise price 11. Figure 7.6 shows the result
of carrying out delta–gamma hedging in this situation. Note that the
hedged portfolio shows noticeably less fluctuation than was the case
with delta hedging alone.
□
7.8 THE GREEKS
In the last section we investigated how the price C of a European call
varies with the spot price S and used the results to form portfolios
which reduced the risk arising from fluctuations in S. Similar
calculations and applications can be carried out for the other
parameters affecting C.
THETA
First we consider the time factor. Let t be a time instant during the life
of the call. Then the time to expiry is T – t, and the value of the call at
t is given by
C(t) = StΦ(w(t)) – Xe–r(T–t)Φ(w(t) – σ
where
w(t) =
.
We define the theta of the call by
To calculate ΘC, we first differentiate w(t):
),
keeping in mind that, since this is a partial differentiation, we take St
to be a constant S. Now we complete the calculation of ΘC:
This looks intimidating but if we substitute the value of ∂w⁄∂t|t=0 and
patiently look for cancellations, we are rewarded with the following
result:
2⁄2
–
e–w
ΘC = –
rXe–rTΦ(w – σ
).
(7.8)
Note that ΘC < 0 and so the value of the call decreases with time.
Alternately, we interpret this as: the value of the call increases with
the time to expiry T.
VEGA
Next, we consider the dependence on the volatility σ. We define the
vega of the call by
VC =
.
The calculation of the vega is quite simple and gives
VC = S
2⁄2
.
e–w
We see that VC > 0 and so C increases with σ.
RHO
The parameter rho measures the dependence on the risk-free rate r,
and is defined by
ρC =
.
Its value is given by
ρC = TXe–rTΦ(w –σ
). (7.10)
We have ρC > 0 and so C increases with r.
The various parameters we have defined are collectively known as
“The Greeks”. (Vega is the only one which is not named after a
Greek alphabet.) More generally, consider any portfolio based on
one stock and associated options. If we denote its value by V , then
the Greeks for this portfolio are defined by
Delta: δV =
Gamma: γ =
V
Vega: VV =
Theta: ΘV =
Rho: ρV =
All the derivatives are evaluated at t = 0.
Exercise 7.8.1 Show that the Greeks for a European put are given
by
δP
=
–Φ(–w)
γP
=
VP
=
2⁄2
e–w
S
ΘP
=
–
ρP
=
–TXe–rTΦ(σ
2⁄2
e–w
+ rXe–rTΦ(σ
– w)
– w).
Suppose each parameter x changes by a value △ x over a time
period △ t. Then the change in the value of the portfolio is
approximated by
△V ≐ δV △S +
(△S)2 + V V △σ + ΘV △t + ρV △r.
V can be hedged against changes in one or more parameters by
setting the corresponding Greeks to zero.
7.9 THE BLACK–SCHOLES PDE
In this section we shall obtain the partial differential equation (or
PDE) that was at the centre of the original work of Black, Scholes
and Merton. Consider a European derivative whose final payoff at
the expiry time T is given by a function f of the final spot price ST.
Suppose the underlying asset price follows a GBM with volatility σ.
Then the value of the derivative as a function of spot price S and
time t is given by the Principle of Risk-Neutral Valuation:
We first differentiate with respect to S to calculate the delta and
gamma of the derivative (see §A4):
Next, we calculate ΘV , the derivative with respect to t:
We denote the last integral by I and integrate by parts:
If f′ is “nice” and does not grow too fast as x → ±∞, for example if
it is a polynomial, then the limits in the last expression are zero and
we have I = σS
γV . Substituting this in the expression for ΘV
gives
ΘV = rV – rS δV –
γV .
We rearrange this as follows:
ΘV + rS δV +
γV – rV = 0.
These calculations are valid for any t (not only t = 0) and so we get
+ rS
+
– rV = 0.
This partial differential equation is known as the Black–Scholes
PDE.
Exercise 7.9.1 Verify that the Black–Scholes formula for the
premium of a European call or put satisfies the Black–Scholes PDE.
As remarked earlier, we are reversing history. The original route uses
the study of continuously evolving random processes (or stochastic
calculus) to set up the Black-Scholes PDE. The fact that it does not
involve the drift leads to risk-neutral valuation and gives one way
(our way) of finding derivative premiums. (This observation was first
made by John Cox and Stephen Ross in 1976 [14].) Alternately, we
directly solve the Black-Scholes PDE together with the boundary
condition coming from the payoff function of the derivative:
(7.12)
In this regard, it is interesting to note that the Black–Scholes PDE is
essentially just the heat equation:
=
.
We can convert the Black–Scholes PDE to the heat equation by the
following substitutions:
s
=
T – t,
x
=
ln(S ⁄ X) + (r – )(T – t),
D(x,s)
=
er(T–t)V (S,t).
These substitutions also convert the boundary condition (7.12) into
the initial condition
The benefit is that the heat equation has been extensively studied
and there are standard techniques for its solution, both theoretical
and numerical.
Exercise 7.9.2 Verify that the change of variables given above
convert the Black–Scholes PDE to the heat equation.
The connection of GBM with the heat equation is yet another
discovery that was first made by Bachelier. The essential difference
between the work of Bachelier and Black–Scholes is that Bachelier
used real-world probabilities instead of risk-neutral ones. There are
two objections to taking the expectations calculated from a real-world
probability as prices. The first, as we emphasised in earlier parts of
this book, is that risk also contributes to price. The second is that it
may be impossible to consistently price all assets and derivatives by
the real-world expectation of their future payoffs! A simple example
(from [43]) is given below.
Example 7.9.3 Suppose every asset is priced by its expected
return. Let St be the price of one US Dollar at time t in Indian rupees.
If interest rates are zero, then the futures price for a contract to sell
$1 at time T is Rs X = S0 = E[ST]. The forward price for a contract to
sell Re 1 at time T is similarly $Y = E[1 ⁄ ST]. By the No Arbitrage
Principle, we must have Y = 1 ⁄ X, and hence
E
=
.
However, by Jensen’s inequality (a basic result in probability), for
any non-constant random variable X taking only positive values, E[1 ⁄
X] > 1 ⁄ E[X]. So this equality is impossible.
□
When we use risk-neutral probabilities, we avoid this difficulty
because then we use a different assignation of probabilities for each
asset rather than a single framework of probabilities for all assets
simultaneously.
Example 7.9.4 We illustrate the point made above by using a 1-step
BOPM in the context of Example 7.9.3. Let r = 0 and the initial
exchange rate be Rs 40 per US dollar. Consider a BOPM model with
U = 1.1 and D = 0.9. Then the tree for dollar prices (in rupees) is
The risk-free probabilities for the up and down moves in this tree are
Now we consider the tree for rupee prices (in dollars):
The risk-free probabilities for the up and down moves here are
q* =
= 0.45 and 1 - q* = 0.55.
Thus, each tree has a distinct risk-neutral probability distribution, and
this enables us to avoid the inconsistency encountered in Example
7.9.3.
□
7.10 SPECULATING WITH OPTIONS
So far we have studied derivatives as tools for reducing risk. They
have another application: speculation. If an investor has information
on how the market, or a particular asset, will behave in the future,
she can profit by using derivatives to lock in favourable prices now.
This can be done in a methodical way, and we will now look at
various strategies used by speculators.
Table 7.1: Data for call options on TCS stock traded on the National
Stock Exchange on August 31, 2005. All these options had 29
September 2005 as the expiry date.
Example 7.10.1 Consider an investor whose analysis leads her to
expect 30% annualised return from TCS (Tata Consultancy Services)
stock during the 1-month period starting on September 1, 2005. She
has to decide how best to profit from this analysis.
The first possibility is to buy TCS stock. The closing price for this
on August 31, 2005, was Rs 1405. Over one month, her expected
gain per share is
× 1405 = 35.
To estimate the associated risk, she decides to use the implied
volatility. The relevant data for this is given in Table 7.1 From the
data on various call options and assuming a risk free rate of 5%, she
finds that the implied volatility ranges from 0.188 to 0.235. She
weights each value by the corresponding number of contracts traded
to arrive at the average implied volatility:
σI = 0.224.
Therefore, a first order approximation to the standard deviation of the
expected return after 1 month is
S σI
= 1405 × 0.224 ×
= 91.
Thus, to first order, the return is normally distributed with standard
deviation 91. Its mean, according to our analyst, is 35. If we let Z
denote a standard normal variable, then the probability of a positive
return is
ℙ[ST ≥ 0] = ℙ Z ≥-
= 65%.
The second tactic would be to profit by investing in calls which
expire in 1 month. For instance, the closing price on August 31 of a
call on TCS stock with exercise price X = 1350 and expiry on 29
September, was C = 75. The expected payoff from buying this call is
and hence the expected return is 98 – 75 = 23, which amounts to an
annualised rate of 368%! With such an incredible expected return,
the associated risk must have some unpleasant features. We first
find the probability of a positive return:
This compares well with investing in the stock. The real danger is
when ST sinks below X, for then the investment in the call is
completely lost. The probability of this is:
It is this last factor, the possibility of being totally wiped out, which is
the downside of speculating with options.
□
This example illustrates both the temptation and the danger
associated with speculating via options. To tread a safer path,
investors use a variety of combinations of options which reduce the
danger at the expense of the amount of possible profit.
BULL AND BEAR SPREADS
Consider an investor who is confident that a certain stock will
increase significantly in value over a time period T (i.e., the investor
is “bullish” about the stock). She can speculate accordingly by
buying a call in that stock, say with expiry time T and exercise price
X1 at a price of C1. To reduce her investment/risk she can also sell a
call on the same stock with the same expiry time T and an exercise
price X2 > X1. It follows that the premium C2 earned on the shorted
call satisfies C2 < C1. This combination of calls is called a bull spread
and has the following profit pattern (as a function of the final spot
price ST ):
The profit from a bull spread is given by
Profit =
Compared to buying a single call, a bull spread gives up the lure of
unbounded profit in favour of reducing the possible loss.
Example 7.10.2 Let us return to the situation of the previous
example. Suppose our analyst constructs a bull spread in which the
long call has X1 = 1320 and C1 = 96, and the short call has X2 =
1440 and C2 = 23. Both calls expire on September 29. Then the
profit function is
Profit =
The maximum profit has been capped at 47, and the possible loss is
now about 75% of what it would be if the investor just went long in
the first call.
□
Exercise 7.10.3 Why is a bull spread better than simply investing in
fewer calls?
Exercise 7.10.4 Show how to construct a bull spread by using puts.
An investor who is confident that a certain stock will decrease
significantly in value over a time period T (i.e., the investor is
“bearish” about the stock), can speculate accordingly by buying a put
in that stock. Suppose it has expiry time T, exercise price X1 and a
price of P1. To reduce his investment/risk he can also sell a put on
the same stock with the same expiry time T and an exercise price X2
< X1. It follows that the premium P2 earned on the shorted put
satisfies P2 < P1. This combination of puts is called a bear spread
and has the following profit pattern (as a function of the final spot
price ST ):
Exercise 7.10.5 Show how to construct a bear spread by using
calls.
BUTTERFLIES
Butterflies are combinations used by investors who expect a certain
amount of variation in the spot price but are not sure about the
direction in which it will occur. Butterflies come in two flavours
depending on whether the expected variations are small or large.
Consider an investor who expects the future spot price to lie in a
certain small range. He can speculate by creating a combination of
options whose profit profile is as follows:
Thus, he profits if the spot price stays in the expected range, and
has limited loss if it does not. This kind of profile can be created via
the following steps:
1. Buy two call options with exercise prices X1 and X3 respectively,
choosing X3 > X1.
2. Sell two call options with exercise price X2, choosing X2 halfway
between X1 and X3.
If C1 , C2,C3 are the corresponding call premia, we have C3 < C2 <
C1.
Exercise 7.10.6 Show how to construct a butterfly by using puts.
A reverse butterfly is used when a large fluctuation is expected in
the spot price, but its direction is not known. It can be obtained by
shorting a butterfly. Its profit profile is:
STRADDLES
The strategies considered so far used only calls or only puts.
Now we consider combinations of puts and calls. Straddles, like
butterflies, are used to speculate on the expected volatility of the
spot price. However they raise both the possible profit as well as
loss.
A bottom straddle or straddle purchase is used when the
investor wants to bet on a large move in the spot price. It consists of
buying a call and put with the same expiry date and exercise price.
A top straddle or straddle write is used to bet on a small
movement in the spot price. It is constructed by shorting a bottom
straddle. Therefore its profile is:
This is a rather dangerous strategy, as it has the potential of
unlimited loss!
STRANGLES
Strangles are like straddles, except that the put and call have
different exercise prices. Therefore the profit profile has a flat portion
in the centre:
A strangle would be used by an investor expecting a large jump
in the spot price.
COLLAR
A collar has a similar profile to a bull spread. It is used by investors
who want to protect the gains made by a stock that they own. For
example, if you own a stock that has done well recently and you wish
to sell it after a month, you can use a collar as protection against a
subsequent drop in its price. A collar consists of the following pieces:
1. The held stock, with initial price S.
2. A long put expiring at T, with exercise price XP.
3. A short call expiring at T with exercise price XC, such that XC >
S > XP.
Suppose the put has initial premium P and the call has initial
premium C. Then there is an initial expense of P – C in setting up the
collar (and it is possible that this is a gain rather than an expense).
The final value of this portfolio has the following structure as a
function of ST:
Exercise 7.10.7 Explain how to create combinations of options with
the following profit profiles.
13 Recall that GBM is a good model for stock which does not pay dividends.
8 Value at Risk
O
ver the previous chapters we have developed an extensive
range of techniques for estimating and handling fluctuations.
These techniques have mainly been limited to small
fluctuations that we may encounter on a typical day. On the other
hand, a firm needs to have an idea of the losses it may face under
sudden and extreme developments. These may be broken up into
two cases. The first is when there is a sudden breakdown of
normality, such as the defaulting on loans by a large country, a war
leading to a breakdown in the supply of essential commodities, or a
natural calamity. In the second case, the normal relationships and
trends continue to hold, but chance leads to a clustering of adverse
events for the firm.
In this chapter we shall study a tool known as Value at Risk, or
VaR, which has become popular for estimating the possible losses
from developments of the second type. (For the first type, there is
little meaningful mathematical analysis.) These estimates are used
to judge the quality of a portfolio and to modify it accordingly.
The concept of VaR provides a single descriptor of the risk of loss
associated with a portfolio. While the Greeks give local descriptions
of risk arising from various fluctuations, VaR attempts to give a
global summary by measuring the possible loss under extreme
circumstances. (We emphasise, again, that it does not cover the
most extreme.)
The popularisation of VaR began in 1994 with the publications of
the RiskMetrics Group of J. P. Morgan (for example, the 1996
RiskMetricsTM – Technical Document [10]). It is now popular both for
internal risk management by companies as well as a tool used by
regulatory authorities. Its global acceptance in this regard is
encapsulated in the Basel II Accord of 2004, which provided a
framework, based on VaR, to regulate how much capital a bank
should set aside for use on a rainy day.
8.1 DEFINITION OF VAR
Typically, the VaR of a portfolio is given as a statement of the
following form:
The n-day VaR at P% is V
or
The n-day P% VaR is V .
This means that the loss over the next n days will be less than V with
probability P%. P is usually of the order of 95 to 99.14 Note that V is
the magnitude of the possible loss and is given as a positive number.
It is also assumed that the composition of the portfolio does not
change during this time.
The probability P% gives an idea of how often losses beyond the
VaR limit can occur. For example, if we calculate the one-day 95%
VaR, we can expect losses to exceed it once in 20 days. At the 99%
level we expect this to happen about once in 100 days. On the other
hand, we do not learn much about how large the losses will be on
these bad days. Anything could happen.
Figure 8.1: An illustration of the 95% VaR for a portfolio whose return is
normally distributed. The shaded region is the 5% quantile.
Example 8.1.1 Consider an asset whose return R over the next
month is estimated to be normally distributed with mean 100 and
standard deviation 150. Then,
Z=
is standard normal. Suppose we wish to find the one-month 95%
VaR. Let us denote it by V . It is defined by
0.95 = ℙ[R ≥ –V ] = ℙ
.
Therefore,
= Φ(0.05) = –1.645
V = 147.
In the next example we use VaR to study the different risk
characteristics of shares and calls.
□
Example 8.1.2 We take up the scenario of TCS stock and calls of
Example 7.10.1. In this example, the initial price of a TCS share was
Rs 1405 and its 1-month return R was projected to be a normal
variable with mean 35 and standard deviation 91. The 1-month VaR
for this investment can be calculated, as in the previous example, at
various levels.
Now, let us consider the alternate strategy of buying a call. In the
example, a 1-month call on TCS stock was available with exercise
price X = 1350 at a premium C = 75. The probability of a complete
loss of the investment was calculated to be ℙ[ST ≤ X – C = 16%, so
that the 84% VaR is 75. Thus VaR shows a danger of full loss of the
investment at all the usual levels of 90% and above.
□
Estimating the VaR was easy in this example because we were
looking at the call value at expiry, which has a simple relation with
ST. It would be rather more complicated to estimate VaR over a
shorter period such as 10 days, as we would then have to track the
change in the call premium. To do this theoretically, we could use a
model such as Black–Scholes, but we would still face the obstacle of
having to find the probability distribution of the call premium.
We start by noting that this is a non-trivial problem. First of all, the
return from the call will not be normal due to the non-linear relation
between the call premium and the spot price.
Another difficulty is in dealing with large portfolios. The VaR of the
portfolio cannot be obtained from the VaR of its parts but has to be
calculated in one go for the full portfolio. This entails modelling the
probability distribution for the full portfolio.
Below, we give three approaches to this problem.
In the first, we use a linear approximation to the portfolio value
so that it becomes normally distributed. While this simplifies the
mathematics, it introduces potentially large errors.
In the second, we use a quadratic approximation, thus taking
into account the non-linearity to some extent. This already
makes it hard to obtain an explicit probability distribution.
Finally, we sidestep this problem by using Monte Carlo
simulation.
Exercise 8.1.3 Consider a portfolio consisting of Rs 100,000
investments in each of assets A and B. Assume the daily volatilities
of A and B are 1% and the coefficient of correlation between their
returns is 0.3. Suppose further that the two daily returns follow a
bivariate normal distribution. What is the 2-day 99% VaR for this
portfolio?
8.2 LINEAR MODEL
Consider a portfolio P consisting of ni assets of type i. If a unit of the
ith asset has value Vi, then the value of the portfolio is
Suppose Vi depends on the value Si of some underlying variable (for
example, the ith asset is an option and Si is the spot price of the
underlying stock, or the ith asset is a bond and Si is the yield). The
basic question is:
If each Si changes by an amount △Si, what is the corresponding
change △Vi in Vi?
If we can answer this question, we can also find the change in the
value of the entire portfolio:
Now, to first order, we have
△Vi ≐
△Si.
Note that ∂Vi ⁄ ∂Si can be obtained from either a model such as
Black–Scholes or estimated from historical data. We thus have our
linear model:
For VaR calculations, we have to find the probability distribution of
△ VP. A reasonable assumption is that each △ Si is normal. If we
further assume that the collection of all the △Si’s has a multivariate
normal distribution, then △VP, being their linear combination, is also
normal. From the historical data, we can estimate the means,
variances, and covariances of the △ Si’s, and this will completely
specify the distribution of △VP.
Example 8.2.1 Consider the TCS situation of Example 8.1.2. Recall
that on 31 Aug, 2005 the closing price for TCS stock was 1405. Its
annualised return for the next month was projected to be 30% and its
implied volatility was σ = 0.224. The risk free rate was assumed to
be 5%. Moreover, a call option on this stock with exercise price 1350
and expiry on 29, September 2005 was priced at 75. Suppose the
analyst needs to estimate the 95% VaR for this call over the next 10
days.
First, consider the stock. The expected rate of return over the
next 10 days is 30 × 10⁄365 = 0.82%. Therefore, the expected
change in the stock price after t = 10 days is
E[△S] = 1405 × 0.0082 = 11.
The standard deviation of △S is approximated by
S0 σ
= 1405 × 0.224 ×
= 52.
Therefore △S can be taken to be normally distributed with mean 11
and standard deviation 52. Next, we calculate the delta of the call:
w
=
δC
=
= 0.71
Φ(w) = 0.76
Hence, the change in the call premium over 10 days is approximated
as
△C ≈ δC△S = 0.76△S
So △C is approximately normally distributed with mean 0.76 × 11 =
8.36 and standard deviation 0.76 × 52 = 39.52. The 95% VaR V for
the call can now be estimated:
Exercise 8.2.2 Suppose the daily change in the value of a
□
portfolio is modelled as depending linearly on two uncorrelated and
normally distributed factors. The delta of the portfolio is 6 with
respect to the first factor and –4 with respect to the second. The
standard deviations of the factors are 20 and 8, respectively, and the
means are 0. What is the 1-day 95% VaR?
Figure 8.2: Comparison of the probability density functions of Z (solid curve)
and Z 2 (dashed), where Z is a standard normal variable. The pdf of Z 2 is fZ 2
(x) =
.
8.3 QUADRATIC MODEL
The linear model has been found to work well when the portfolio
includes stocks, bonds and futures, but not when options are also
present. In this case, we can try to improve the quality of our
approximations by making them quadratic. The second order
approximation for changes in Vi is
△Vi ≐
△Si +
(△Si)2.
The second derivative can be estimated from a model or from
historical data. We now get the full quadratic model:
The difficulty in dealing with this model is that even when △ Si is
normal, ( △ Si)2 is not—see Figure 8.2. When there is only one
underlying asset, we can use the results of Exercises B.3.1–B.3.3 to
model the distribution of the portfolio.
Example 8.3.1 We again take up the call of Example 8.2.1, with
exercise price 1350 and price 75. We have calculated w = 0.71 and
δC = 0.76 for this call. We try to refine our earlier work in two ways:
We use gamma for a second order approximation to the dependence
on the spot price, and theta to track the change in value due to the
passing of time.
Figure 8.3: The pdf of the quadratic approximation to the portfolio return in
Example 8.3.1. Notice how the quadratic approximation enables the modelling
of an asymmetric distribution.
γC
=
ΘC
=
2⁄2
e–w
–
= 0.0034
2⁄2
–rXe–rTΦ(w
e–w
–σ
) = – 218.8
The new approximation to the call premium change △C is
△C
≈
δC △S + γC (△S)2 + ΘC △t
=
0.76△S + 0.0017(△S)2 – 5.99
=
0.0017(447△S + (△S)2) – 5.99
=
0.0017((△S + 223.5)2 – 223.52) – 5.99
=
0.0017(△S + 223.5)2 – 90.91.
Combining △ S ~ N(11,52) with the results of Exercises B.3.1 to
B.3.3, we get the approximate cdf of △C:
F△C(x)
≈
F(△S+223.5)2
=
F△S+223.5
- F△S+223.5 -
=Φ
-Φ
.
With this formula in hand, it is not hard to locate the 95% VaR: it is
53.17.
□
When there are several underlying assets with various correlations,
the distribution of the full portfolio is much harder to describe. Rather
than try, we will present a numerical technique that uses random
numbers to simulate this distribution and estimate the probabilities
related to it.
8.4 MONTE CARLO SIMULATION
In this section we shall use the technique of Monte Carlo simulation,
as described in the Appendix (§B.20).
Example 8.4.1
Suppose the quadratic model for a portfolio takes the form
△VP ≐ △S1 + △S2 + △S3 + 2(△S1)2 – 10(△S 2)2,
where △S1 ~ N(–10,0.5), △S2 ~ N(5,1) and △S3 ~ N(0,3). Let the
correlation matrix be
.
Using the procedure described in §B.20, we can simulate
standard normal variables Zi having this correlation matrix. We then
set △S1 = –10 + 0.5Z1, △S2 = 5 + Z2 and △S3 = 3Z3, noting that
linear changes don’t affect correlations. For each simulated set of
values of the Zi, we first calculate the corresponding value of the
△Si, and then the value of △VP.
Figure 8.4 shows the histogram for a set of 104 values of △ VP
obtained this way. The bottom 5% of the data set is marked by –
259.66. Thus our estimate for the 95% VaR of this portfolio is
259.66.
We should wonder about the reliability of this estimate—if we run
the simulation again, will we see a similar value? Figure 8.5 shows
the distribution of 100 values of the 95% VaR obtained by repeating
this process. The range is relatively narrow, from 255 to 269, which
is reassuring. We can feel confident that the mean of 262 is a
reliable estimate of the VaR.
Figure 8.4: Histogram showing the frequencies for the simulated data of
Example 8.4.1. The dark region marks the 95% VaR.
If we have models for the values of all the assets in the portfolio,□
we need not use the linear or quadratic approximations. Instead, we
can use the models to exactly calculate the change △VP arising from
any set of changes △ Si. This is called full valuation and while it
does away with one set of approximations, it considerably increases
the numerical work. We illustrate it for the case of a single call.
Example 8.4.2 We pay a final visit to the situation of the call on TCS
stock (Examples 8.1.2 and 8.3.1). We have estimated the 10-day
return from one share to be normal with mean 11 and standard
deviation 52. Since the initial share price was 1405, the final share
price has a N(1416,52) distribution.
We can simulate the final price S10 and for each value s that we
obtain, we calculate the new call premium c using the Black-Scholes
Formula. In our case, we have
Figure 8.5: Histogram of 100 simulated values of the 95% VaR
and the new call premium
c
=
sΦ(w) – Xe– r(T– t)Φ(w – σ
=
sΦ(w) – 1349.65 Φ(w – 0.051).
)
Figure 8.6 shows 105 simulated values of the return from the call
over 10 days. The 95% VaR from this simulation is 56.21.
Figure 8.6: Histogram of simulated returns from TCS call. The
dashed and solid curves represent the linear and quadratic
approximations calculated in Examples 8.1.2 and 8.3.1, respectively.
TESTING VAR PERFORMANCE
□
It should be obvious by now that VaR may be estimated in a number
of ways. Indeed, we have only touched on the simplest situations
and procedures. It becomes important, therefore, to also have a way
of testing the results from a particular VaR model. This is done by
backtesting: we apply the VaR model to historical data and see how
well it stands up. For example, a 95% VaR should not be violated
much more than 5% of the time. We could also look at various other
aspects of the violations and see if they are in accord with our
model: Do they appear to be independent of each other? Is their
average what we would expect?
8.5 THE MARTINGALE
Martingales are random processes which evolve with time, with the
property that the current value is the expectation of the future values
(given the present value). We will not define them more formally, but
we can get an example by starting with a GBM St with drift μ and
volatility σ. Recall that if Sa is known, we have
E[St+a] = eμtS a.
This is not a martingale, but we use it to create one by defining
Mt = e–μtS t.
Note that knowing the value of Ma is equivalent to knowing that of
Sa. So we have:
E[Mt+a] = e–μ(t+a)E[S t+a] = e–μ(t+a)eμtS a = Ma
This gives an indication of how a process which is not a martingale
may be converted into one, and also of how martingales become
relevant to finance.
However, this section is not about martingales, important though
they are. Instead it is about a betting strategy which is called the
martingale and is at least a few centuries old. In its simplest form, it
concerns a game in which bets are placed on repeated tosses of a
coin. If you bet x on the coin showing heads and it shows tails, you
lose x. Otherwise you gain x. The strategy is as follows:
1. Always bet on the coin showing heads.
2. If you win on any toss, take your winnings and leave the game.
3. If you lose, double your bet on the next toss.
Now suppose you start by betting Rs 1 and the first head appears on
the N + 1 toss. Your stake on that throw is 2N and your total gain
(after deducting your previous losses) is
2N –
= 2N –
= 1.
Surely a head will eventually turn up – it appears you have a
guaranteed profit of Rs 1 from this strategy. Difficulties arise,
however, because a long string of tails may lead to losses that drive
you out of the game before the first head turns up.
So this strategy leads to a curious situation. Almost all the time you
will win a small amount. But if you do lose, you will lose very badly
indeed. All the risk has been concentrated into one tiny and therefore
extremely toxic zone.
Let us do a mean–variance analysis of this strategy over 10
tosses. There are two possibilities, win and lose, with the following
payoffs and probabilities:
Scenario
Payoff
Lose
1 – 210
Win
1
Probability
1–
Therefore, the mean and variance of the payoff are:
Thus, in spite of the initial impression of giving sure profits, the
martingale is a poor strategy for a risk-averse investor. Nor is it good
for risk-preferring investors who only accept risk in exchange for a
possibility of large profit. It may be fair to say that it is suitable only
for the risk ignorant.
What does VaR have to say about the martingale? Over 10
tosses, the probability of a loss is 1⁄210 or just under 0.1%. Thus we
would be justified in saying that the 99.9% VaR is zero! In other
words, the usual levels of VaR are unable to detect the risks
associated with this strategy. This re-emphasises our earlier
comment that VaR does not help us reach the most extreme cases.
The use of VaR as a regulatory tool therefore becomes
problematic. The VaR value is used to decide how much capital a
bank should keep safe or deposit with government authorities. This
may naturally drive banks towards martingale-like strategies which
have low VaR, and thus increase the chance of a single massive
catastrophe relative to a sequence of small ones.
14 Sometimes VaR is defined by the discounted loss. Since VaR is usually
calculated for small time periods, this has little numerical impact.
Appendix A
Calculus
This chapter contains the calculus that is used in this book. Some of
it should be familiar to you, while the latter parts may be new. You do
not need to read it all at one go. Refer to it as the need arises.
Our treatment of calculus is admittedly superficial and you may
want to supplement it with a more detailed text. There are
innumerable books that you can consult. Most of them will teach you
how to calculate—Apostol [1] will teach you how to think.
A.1 ONE VARIABLE CALCULUS
DIFFERENTIAL CALCULUS
Let us start with a quick review of calculus in one variable. If we have
a function f : I → ℝ, where I is an open interval in ℝ, then the
derivative or differential of f at a ∈ I is defined by provided the limit
exists. The quantity f ′(x) is visualised as the instantaneous rate of
change of the value f(a). Practically, if we know f ′(a) and f(a), then
we can estimate the value of f at a nearby point a + h by If we draw
the graph of f, then f ′(a) provides the slope of the line which is
tangent to the graph at (a, f(a)). (See Fig. A.1.)
f(a + h) ≈ f(a) + f ′(a)h.
Recall that two functions f, g can be added, subtracted,
multiplied, divided or composed according to the following rules:
Addition: (f + g)(x) = f(x) + g(x).
Subtraction: (f – g)(x) = f(x) – g(x).
Multiplication: (fg)(x) = f(x)g(x).
Division: (x) =
(Provided g(x) ≠ 0).
Composition: (f ∘ g)(x) = f(g(x)).
Figure A.1: The graph of a function f(x) showing the tangent line at a point (a,
f(a)). The tangent line has slope f ′ (a).
ALGEBRA OF ERIVATIVES
For the purposes of this book, one has to be aware of the following
basic rules involving derivatives.
Let f,g be two functions. Assuming the existence of their derivatives
at the point of concern, we have:
Linearity: If a,b are two numbers, then
(af + bg) ′ (x) = af ′ (x) + bg ′ (x).
Product rule: (fg) ′ (x) = f ′ (x)g(x) + f(x)g ′ (x).
Quotient rule: If g(x) ≠ 0,
′(x) =
.
Chain rule:
(f ∘ g) ′ (x) = f ′ (g(x))g ′ (x).
These rules allow us to differentiate a complicated function by
breaking the problem into simpler parts. To complete the method we
need a list of the derivatives of the basic functions:
Figure A.2: The graph of a function f(x) showing local extrema at x = a,b.
There is a local maximum at x = a and a local minimum at x = b.
Constants: If f(x) = c for some fixed number c, then f ′ (x) = 0.
Monomials: If f(x) = xn, then f ′ (x) = nxn–1.
Exponential: If f(x) = ex, then f ′ (x) = ex.
Logarithm: If f(x) = ln(x) = loge(x), then f ′ (x) = 1 ⁄ x.
An important use of derivatives is in estimating how large or small
the values of a given function can be. We say f has a local
maximum at a point a if there is an interval I = (a–δ,a + δ) around a
such that f(x) ≤ f(a) for each x ∈ I. There is a local minimum if f(x) ≥
f(a) for each x ∈ I. And we say there is a local extremum if there is
either a local maximum or a local minimum. These situations are
depicted in Fig. A.2. The figure serves to remind us that at a local
extremum the tangent line is horizontal, i.e., the derivative is zero.
To detect points of local extremum, we use the first derivative
test: f ′ (a) = 0. Having found such an a, we use the second
derivative test to explore the specific nature of the point: If f ′′ (a) < 0
it is a local maximum, and if f ′′ (a) > 0 it is a local minimum.
Unfortunately, if f ′′ (a) = 0, we get no information and in fact the point
may fail to be a local extremum at all!
For example, the functions x3, x4 and –x4 all have zero first and
second derivatives at x = 0. For x3, x = 0 is not even a local
extremum, while for x4 it is a local minimum and for –x4 it is a local
maximum. (See Fig A.3).
FIRST AND SECOND ORDER APPROXIMATIONS
The derivative of f at a gives a linear approximation to it, which is
accurate when we are close to a:
f(x) ≐ f(a) + f ′ (a)(x – a).
The symbol ≐ signifies that the left-hand side is a first order
approximation to f near a: it has the same value and first derivative
as f at a. By matching higher order derivatives, we get higher order
approximations. To get a second order approximation to f at a, we
start with a quadratic function g(x) = A + B(x – a) + C(x – a)2 and
match the values of f,g as well as their first two derivatives at a:
Figure A.3: The graphs of the functions x3 (solid), x4 (dotted) and –x4
(dashed)
Therefore the second order approximation to f at a is
f(x) ≐ f(a) + f′(a)(x – a) +
(x –a)2.
Example A.1.1 Consider f(x) = ex. The first order approximation to f
at 0 is 1 + x, while the second order approximation is 1 + x + x2⁄2. □
Higher order approximations can be similarly obtained but are not
used in this book.
Exercise A.1.2. Let f(x) ≐ A + B(x – a) and g(x) ≐ C + D(x – a).
Show that
The following result is fundamental:
Mean Value Theorem Let f : [a,b] → ℝ be continuous, and let it be
differentiable on (a,b). Then there is a c ∈ (a,b) such that
f′(c) =
.
Suppose we know that a function f has zero derivative
everywhere on an interval. Then the Mean Value Theorem informs
us that that f must have a constant value on that interval.
INTEGRAL CALCULUS
If functions f,g are related by f ′ (x) = g(x) at every x we call f the antiderivative or indefinite integral of g and denote the relationship by
Note that the indefinite integral of a function g(x) is not unique: If f ′
(x) = g(x), then for any constant c we also have (f + c) ′ (x) = g(x).
A very important fact is that this is the full extent of nonuniqueness, and if f,h are both indefinite integrals of g then there
must be a constant c such that h(x) = f(x) + c for each x. For f ′ = h ′=
g implies (f – h) ′= 0 and then it follows from the Mean Value
Theorem that f – h = c.
Example A.1.3 Here is a useful application of the last fact. Suppose
we know that a function f : ℝ → ℝ has the following property: f ′ (x) =
q f(x) for every x and with a constant q. Then it follows that
(f(x)e–qx) =
(x) e–qx + f(x)
= q f(x) e–qx – q f(x) e–qx = 0.
Hence there is a constant A such that f(x) e–qx = A, or f(x) = A eqx for
every x. Substituting x = 0, we find that f(0) = A, and so f has to be of
the form
f(x) = f(0) eqx.
□
Now suppose
. Then we define the definite
integral of f over an interval [a,b] by
Although g is not unique, the definite integral is unique due to the
fact that indefinite integrals can only differ by constants. If the
number b is replaced by a variable x, the definite integral becomes a
function of x:
Since F(x) = g(x) – g(a), it is immediate that F′(x) = g′(x) = f(x).
Figure A.4: The area of the shaded region under the graph of f is given by
When f ≥ 0, the Fundamental Theorem of Calculus tells us that
equals the area under the graph of f over the interval [a,b].
(See Figure A.4.)
For evaluating a definite integral, the following properties are
useful:
Linearity: If f,g are two functions and c,d two numbers, then
Splitting: If a ≤ b ≤ c, then
Change of Variables: If f,g are two functions and g is monotonic,
then
The term monotonic indicates that the function g(x) either
always increases in value (g is monotonically increasing) or
always decreases in value (g is monotonically decreasing) as the
value of x increases. For example, ex is a monotonically increasing
function, while –x is a monotonically decreasing function. On the
other hand, x2 is not monotone.
IMPROPER INTEGRALS
We have defined definite integrals over an interval [a,b] and stated
that these represent the area under a curve. In some problems the
curve extends over the whole real line and we are interested in the
entire area under it. Naturally, we represent this area by
but how is this defined? Well, we first define the integral from some
point a to ∞ as a limit of ones we already know how to calculate:
Similarly, we define the integral from – ∞ to a:
Now we put the pieces together:
Two limits are involved in this definition and if either of them does not
exist, we say that
diverges or does not exist.
Integrals where both f and the range of integration are bounded
are called proper while those where either is unbounded are called
improper. Improper integrals are very important in probability, as
most calculations of expectation and variance are based on them.
Example A.1.4 Consider the exponential function f(x) = e–x. Then
On the other hand,
does not exist. Thus
both diverge.
□
Exercise A.1.5 Let f be a continuous function such that
Show that for any fixed number z,
Note that we are not assuming that
exists!
A.2 PARTIAL DERIVATIVES
Consider a function f : U → ℝ where U is a subset of ℝ2. Such a
function is described by an expression of the form f(x,y), where the
pair (x,y) ranges over all of U. The basic approach to studying how
the values of f change with x and y is to fix one of these variables
and vary the other: this reduces the problem to the one-variable
situation and we already have techniques for studying that.
Thus, we define two partial derivatives. In one of them we vary
x and fix y: this is called the partial derivative with respect to x.
Similarly, the partial derivative with respect to y is obtained by fixing
x and varying y. The formal definitions are:
Alternate popular notation is
fx =
and fy =
.
Partial derivatives are easy to calculate. To find ∂f ⁄ ∂x, just treat y as
a constant and differentiate with respect to x in the usual manner.
For example,
More generally, f may be a function of n variables x1,…,xn, and we
define the partial derivatives of f with respect to each of these n
variables. Let ei denote the vector with all entries 0 except for a 1 in
the ith position. The partial derivative of f at a = (a1,…,an) with
respect to the variable xi is defined by
The partial derivatives of f are collected in a single vector called the
gradient of f and denoted by ∇f :
∇f(a) =
.
A.3 LAGRANGE MULTIPLIERS
We shall consider the problem of finding the extreme values of a
function f : U → ℝ, where U is a subset of ℝn, subject to a constraint
g(x) = c, where g : U → ℝn and c is a constant. This means that we
are interested in finding the maximum or minimum value of f(x), with
x varying over the set G = g–1(c) = {x : g(x) = c}.
The Lagrange multipliers method applies to this situation. We
shall not motivate the method as that would involve developing a
good amount of multi-variable calculus, but shall describe it carefully.
Given f and the constraint g(x) = c we introduce a new variable λ
(called the Lagrange multiplier) and set up the equation ∇ ( f + λg)
= 0. This is an equation involving vectors and leads to n equations
involving scalars:
Together with the original constraint g(x1,…,xn) = c, this gives n + 1
equations for the n + 1 variables λ,x1,…,xn. If we succeed in solving
this system, we have a candidate for p = (x1,…,xn) (we also have λ,
but that is superfluous knowledge). In general, there would be
multiple solutions and by evaluating f at each of these we would find
the extreme values.
We shall also have occasion to consider situations where there
are two constraints instead of one. Let these be g(x) = c and h(x) =
d. Then we introduce two Lagrange multipliers λ and μ, and set up
the equation
∇( f + λg + μh) = 0.
On expanding in coordinates, this again gives n equations, and by
adding the two constraint equations we get a total of n + 2 equations
for the n + 2 variables λ, μ, x1,…,xn.
A.4 DIFFERENTIATING UNDER THE INTEGRAL SIGN
We have noted that differentiation is linear:
Since integration is viewed as a kind of continuous sum, it is natural
to ask whether the following relationship also holds:
In fact it does not always hold. The next theorem describes a broad
class of situations when it holds.
Theorem A.4.1 Let f : [a,b] × [c,d] → ℝ be a function such that:
1. For each
is defined.
2.
(x,y) is defined and continuous on [a,b] × [c,d].
Then
given by
is differentiable on [c,d], and its derivative is
We use this result in Chapter 7 to derive the Black-Scholes PDE for□
European options.
A.5 DOUBLE INTEGRALS
A function f : U → ℝ, where U is a subset of the Cartesian plane ℝ2,
can be integrated over the entire region U by a process known as
double integration. Double integration is used in probability to
express the relationship between two different random variables.
Figure A.5: Choice of order of integration in double integration. Over the same
region, the first diagram corresponds to
and the second
to
Look at the shaded region in the two diagrams in Fig. A.5. Suppose
it is named U. In the first diagram, we have marked its boundary as
made of two curves: the lower part is the graph of y = g(x) and the
upper part is the graph of y = h(x). In both cases, x varies from a to
b. If we look at any of the vertical line segments filling out U, then on
each of these segments, x is fixed while y varies from g(x) to h(x). So
we can carry out the integral
and this represents the integral of f over the vertical line segment
located by x. The function I(x) is defined for x ∈ [a,b] and so we can
integrate it too:
The last expression represents the double integral of f over U. We
could have used horizontal line segments instead of vertical ones.
This approach is depicted by the second diagram in Fig. A.5 and
leads to the expression
It is possible that the two choices give different results and in that
case we would not have a well-defined double integral. However, if f
is continuous then it is guaranteed that they will give the same result,
and we can use the one which is more convenient. In this case, we
may also write for the double integral, indicating the region of
integration and the integrand but not the choice of order of
integration.
CHANGE OF VARIABLES
Consider a double integral
Suppose we wish to replace x,y by new variables u,v (probably to
change the integrand to a more tractable form), which are related to
them by
x = x(u, v),
y = y(u, v),
The Jacobian of this change of variables is defined to be the
following determinant:
J(u,v) = det
.
The change of variables formula is
where V consists of the (u,v) values corresponding to the (x,y)
values in U.
Example A.5.1 Let us carry out a change of variables to polar
coordinates. The Cartesian coordinates (x,y) and polar coordinates
(r,θ) of the same point are related by:
x = r cos θ
x = r sin θ
The Jacobian of the change from Cartesian to polar coordinates
is
=
⋅
-
⋅
= r cos2θ + r sin2θ = r.
Now consider the double integral of a function f over the disk with
center at origin and radius R. In polar coordinates, this corresponds
to a rectangle with the r side going from 0 to R and the θ side going
from 0 to 2π. Therefore the double integral can be expressed as
Note that r is already positive, so in the change of variables we have
|r| = r.
IMPROPER DOUBLE INTEGRALS
□
Improper double integrals are defined in the same way as improper
integrals: as limits of proper ones.
Example A.5.2 We shall use improper double integrals to compute
an improper integral which is very important in probability:
. This is done by a rather clever trick. First we write
The last expression involves a double integral over the entire xyplane. We rewrite it in terms of polar coordinates: x = r cosθ, y = r
sinθ and J(r,θ) = r. Thus we get
On taking square roots, we get:
□
Appendix B
Probability and Statistics
This appendix on probability aims at giving you a compact
description of those aspects of the subject which are most necessary
for finance. The presentation emphasises the intuition or motive
behind the definitions and results, rather than the formal details. To
fill in these details you could consult Ross [41], Freund [23], or
Ashenfelter, Levine and Zimmerman [3].
B.1 BASIC PROBABILITY
The sample space is the collection of objects whose statistical
properties are to be studied. Each such object is called an outcome,
and a collection of outcomes is called an event. The act of selecting
a particular outcome is called a random experiment or just
experiment.
Mathematically, the sample space is a set, an outcome is a
member of this set, and an event is a subset of the sample space.
Typical notation is to use not]S@S S for the sample space, lower
case letters such as x for individual outcomes, and capital letters
such as E for events.
Example B.1.1 Suppose we are interested in those stocks listed on
the National Stock Exchange of India (NSE) that have shown a
return of at least 20% over the past year. Then the sample space is
S = The collection of stocks listed on the NSE.
Each such stock constitutes an outcome. We are interested in the
outcomes constituting the following event:
E = {S ∈ S : The return on S is at least 20% over the past year}.
In a particular context, we would be interested in some specific □q
numerical property of the outcomes. In the above example it was the
return over the past year. This property allots a number to each
outcome, and so it can be viewed as a function whose domain is the
sample space S and range is in the real numbers ℝ.
Therefore, we introduce the concept of a random variable: it is a
function X : S → ℝ. Our interest is in taking a particular random
variable X and studying how its values are distributed. What is the
average? How much variation is there in its values? Are very large
values unlikely enough to be ignored?
We present two viewpoints on the meaning of probability. Both
are relevant to finance and, interestingly, they lead to the same
mathematical definition of probability!
Viewpoint 1: The probability of an event should predict the relative
frequency of its occurrence. That is, suppose we say the probability
of a random stock having increased in value over the last month is
0.6. Then, if we look at 100 different stocks, about 60 of them should
have increased in value. The prediction should be more accurate if
we look at larger numbers of stocks.
Viewpoint 2: The probability of an event reflects our (subjective)
opinion about how likely it is to occur, in comparison to other events.
Thus, if we allot probability 0.4 to event A and 0.2 to event B, we are
expressing the opinion that A is twice as likely as B.
Viewpoint 1 is appropriate when we are analyzing the historical data
to predict the future. Viewpoint 2 is useful in analyzing how an
individual may act when faced with certain information. Both
viewpoints are captured by the following mathematical formulation:
Let not]Omega@Ω Ω be the collection of all events: we call it
the event algebra.15
A probability function is a function not]prob@ℙ ℙ : Ω → [0,1]
such that:
1. ℙ(S) = 1.
2.
, if the Ei are pairwise disjoint events (i.e.,
i≠j implies Ei ∩ Ej = ∅).
Let ℙ : Ω → [0,1] be a probability function. Then it automatically has
the following properties:
1. ℙ(∅) = 0.
2.
, if the Ei are pairwise disjoint events.
3. ℙ(Ac) = 1 – ℙ(A), where Ac denotes the complement of A in S.
4.
for any collection of events E1,E2,… .
B.2 RANDOM VARIABLES
We return to our main question: How likely are different values (or
ranges of values) of a random variable X : S → ℝ?
If we just plug in the definition of a random variable, we realise
that our question can be phrased as follows: What is the probability
of the event whose outcomes correspond to a given value (or range
of values) of X?
Thus, suppose we are asked: What is the probability that X takes
on a value greater than 100? This is to be interpreted as: What is the
probability of the event whose outcomes t all satisfy X(t) > 100? That
is,
ℙ(X > 100) = ℙ({t : X(t) > 100}).
It is convenient to consider two types of random variables, ones
whose values vary discretely (discrete random variables) and those
whose values vary continuously (continuous random variables).16
Example B.2.1
1. Let us allot +1 to the stocks whose value rose on a given day,
and –1 to those whose value fell. Then we have created a
random variable whose possible values are ±1. This is a
discrete random variable.
2. Let us allot the whole number +n to the stocks whose value rose
by between n and n + 1 on a given day, and –n to those whose
value fell by between n and n + 1. Then we have created a
random variable whose possible values are all the integers. This
is also a discrete random variable.
3. If, in the previous example, we let X be the actual change in
value, it is still discrete (since all changes are in multiples of Rs
0.01). However, now the values are so close that it is simpler to
ignore the discreteness and model X as a continuous random
variable.
□q
DISCRETE RANDOM VARIABLES
Let S be the sample space, Ω the event algebra, and ℙ : Ω → [0,1] a
probability function. Let X : S → ℝ be a discrete random variable
with range x1,x2,… (the range can be finite or infinite). The
probability of any particular value x is
ℙ(X = x) = ℙ({t ∈ S : X(t) = x}).
These values create a function fX : ℝ → [0,1]:
fX(x) = ℙ(X = x),
which is called the probability distribution function of X. We shall
also refer to it by the abbreviation pdf. We can find the probability of
a range of values of X by just summing up the probabilities of all the
individual values in that range. For instance,
In particular, summing over the entire range gives
since the total probability must be 1.
Example B.2.2 (Discrete Uniform Distribution) Consider an X
whose range is {0,1,2,…,n} and each value is equally likely. Then its
pdf is given by:
□q
CONTINUOUS RANDOM VARIABLES
Suppose the values of a random variable X vary continuously over
some range, such as [0,1]. From a real life viewpoint, since exact
measurements of a continuously varying quantity are impossible, it is
only reasonable to ask for the probability of an observation lying in a
range, such as (0.49,0.51), rather than its having an exact value like
0.5.
The notion of a probability distribution of a continuous random
variable is developed with this in mind. Recall that in the discrete
context the probability of a range of values was obtained by
summing over that range. So, in the continuous case, we seek to
obtain probability of a range by integrating over it.
Given a continuous random variable X, we define its probability
density function (or pdf) to be a function not]fX@fX fX : ℝ → [0,∞)
such that for any a,b with a ≤ b,
In particular,
From this definition, it also follows that
any individual value of a continuous random variable has zero
probability:
Remark Thus the number fX(x) does not represent the probability
that X = x. Individual values of fX have no significance, only the
integrals of fX do! (Contrast this with the discrete case.)
Example B.2.3 (Continuous Uniform Distribution) A continuous
random variable X is called uniform if its probability density function
has the form
For then, with a ≤ s ≤ t ≤ b,
Thus the probability of X taking a value in the interval [s,t] is
proportional to its length, and does not depend on its location. The
interval [a,b] is called the range of X.
Consider the picture below. Since I1 and I2 have the same length,
their probabilities are equal. As I3 has twice their length, its
probability is also double of theirs.
The graph of fX is:
□q
If two random variables X,Y have the same pdf, we write not] equald
and say that they have the same distribution. Note that this
does not mean the two random variables are equal. They may not
even have the same sample space.
Exercise B.2.4 Let X count the number of heads arising out of a
single toss of a fair coin, and Y the number of tails. Then X ≠ Y but
.
B.3 CUMULATIVE DISTRIBUTION FUNCTION
Let S be a sample space, ℙ a probability function on its event
algebra, and X : S → ℝ a random variable. Then the cumulative
distribution function or cdf of X, denoted not]FX@FX FX, is defined
by:
FX(x) = ℙ(X ≤ x).
Thus FX is a function from ℝ into the interval [0,1]. It is easy to see
that
where fX is the pdf of X.
The cdf has certain advantages over the pdf. For one, it is
defined in a uniform manner for all random variables, whether
discrete or continuous. It may happen that we wish to analyse
whether the values taken by one random variable are likely to be
close to those taken by another. If one is discrete and the other
continuous, the pdf’s do not provide a convenient way to test this,
but the cdf’s do.
When X is a continuous random variable, the cdf has the
additional advantage of being defined explicitly, while the pdf is
defined indirectly by means of a property it should possess. This
makes it easier to find a formula for the cdf as compared to the pdf.
Often, we first find the cdf, and then exploit the following equation to
obtain the pdf (in the continuous case):
This relation follows from the Fundamental Theorem of Calculus
(page 200).
Exercise B.3.1 Let FX be the cdf of a continuous random variable X.
Then the cdf of aX + b satisfies
Exercise B.3.2 Let fX be the pdf of a continuous random variable X.
Then the pdf of aX + b satisfies
Exercise B.3.3 Let fX be the pdf of a continuous random variable X.
Then the cdf and pdf of X2 satisfy (for x > 0):
Example B.3.4 Let X be a discrete random variable taking values
0,1,2, each with probability 1⁄3. Then its cdf is
The graph of FX is
Example B.3.5 Let X be a continuous random variable whose
probability density function is:
The cdf of X is
The graph of FX is:
□q
□q
Figure B.1: The first diagram shows how to read off quartiles from a
cdf plot. The second shows the median and interquartile range
marked against the density function.
These two examples illustrate the basic properties of the cdf FX:
1. FX is an increasing function: s ≥ t implies FX(s) ≥ FX(t).
2. FX need not be strictly increasing: s > t does not imply FX(s) >
FX(t).
3. When X is discrete, FX is not continuous. There are jumps at the
points where X has non-zero probability.
4. When X is continuous, FX is continuous.
5.
Since FX may have repeating or missing values, we cannot
always define its inverse function. However, we can get something
close to it.
Given a number q ∈ (0,1), the q-quantile of X is defined to be the
least value xq such that FX(xq) ≥ q. When X is continuous, so is FX,
and then xq is the least value such that FX(xq) = q.
The numbers x0.25, x0.5 and x0.75 are called quartiles. The
number x0.5 is also called the median and is used to estimate the
centre of the values of the distribution. The difference x0.75 – x0.25 is
called the interquartile range––it measures the amount of spread of
the values of the distribution (see Figure B.1)
Exercise B.3.6 Identify the quartiles and interquartile range for the
random variable in Example B.3.4.
Exercise B.3.7 Identify the quartiles and interquartile range for the
random variable in Example B.3.5.
The median and interquartile range provide a summary of the
basic features of the random variable. Unfortunately, they are not so
well suited to the study of combinations of random variables. For
example, suppose a random variable is the sum of two random
variables whose medians are known. This knowledge does not
suffice to give the median of the sum. We shall, therefore, develop
more sophisticated measures of the centre and the spread. These
will be called expectation and variance, respectively.
Now we shall look at two kinds of random variables, one discrete
and one continuous, which are especially important.
B.4 BINOMIAL RANDOM VARIABLE
Consider a random variable X which can take on only two values,
say 0 and 1 (the choice of values is not important). Suppose the
probability of the value 0 is q and of 1 is p. Then we have:
1. 0 ≤ p, q ≤ 1,
2. p + q = 1
Suppose we observe X n times. What are the likely distributions of
0’s and 1’s? Specifically, we ask: What is the probability of observing
1 k times?
We calculate as follows.
combinations of n 0’s and 1’s:
Recall that not]binom@
the formula
Let
us
consider
all
possible
is read as “n choose k” and is given by
, where n! = n(n – 1) … 2 ⋅ 1.
The probability of each individual combination with k 1’s = pk(1 – p)n–
k.
Therefore,
We therefore say a random variable Y has a binomial distribution
with parameters n and p if it has range 0,1,…,n and its probability
distribution is:
We call Y a binomial random variable and write not]Bnp@B(n,p)Y
~ B(n,p). When n = 1, we say we have a Bernoulli random
variable.
As illustrated above, binomial distributions arise naturally
wherever we are faced with a sequence of choices. In finance, the
binomial distribution is part of the Binomial Tree Model used to study
stocks and options.
Figure B.2 depicts the pdf’s of the binomial distributions with p =
0.2 and p = 0.5, using different types of squares for each. Both have
n = 10.
Exercise B.4.1 In Figure B.2, identify which points correspond to p
= 0.2 and which to p = 0.5.
Figure B.2: Two binomial distributions. Both have n = 10 while the p
values are 0.2 and 0.5.
B.5 NORMAL RANDOM VARIABLE
This kind of probability distribution is at once the most common in
nature, among the easiest to work with mathematically, and
theoretically at the heart of probability and statistical inference.
Among its remarkable properties is that any phenomenon occurring
on a large scale tends to be governed by it. When in doubt about the
nature of a distribution, assume it is (nearly) normal, and you will
usually get good results!
We first define the standard normal random variable. This has
a probability density function of the form
It has the following ‘bell-shaped’ graph:
Exercise B.5.1 Can you explain the factor
(Hint: Think about
the requirement that total probability should be 1.)
Note that the graph is symmetric about the y-axis. The axis of
symmetry can be moved to another position m, by replacing x by x –
m. In the following diagram, the dashed line represents the standard
normal distribution:
Also, starting from the standard normal distribution, we can create
one with a similar shape but bunched more tightly (or loosely)
around the y-axis. We achieve this by replacing x with x⁄s:
By combining both kinds of changes, we reach the definition of a
general normal distribution:
A random variable X has a normal distribution with parameters
μ,σ, if its probability density function has the form:
We call X a normal random variable and write not]Nms@N(μ,σ)X ~
N(μ,σ). The parameter μ can be any real number, while σ has to be
positive.
The axis of symmetry of this distribution is determined by μ and
its clustering about the axis of symmetry is controlled by σ.
Exercise B.5.2 Will increasing σ make the graph more tightly
bunched around the axis of symmetry? What will it do to the peak
height of the graph?
Exercise B.3.2 shows that under shifts and scalings, normal
variables stay variable.
Exercise B.5.3 Let X ~ N(μ,σ) and a,b ∈ ℝ with a≠0. Then
aX + b ~ N(aμ + b,|a|σ).
Exercise B.5.4 Show that X ~ N(μ,σ) if and only if
Through this link, all questions about normal distributions can be
converted to questions about the standard normal distribution. For
instance, let X,Z be as above. Then:
In the empirical sciences, errors in observation tend to be normally
distributed: they are clustered around zero, small errors are
common, and very large errors are very rare. Regarding this,
observe from the graph that by ±3 the density of the standard normal
distribution has essentially become zero: in fact the probability of a
standard normal variable taking on a value outside [–3,3] is just
0.0027. In theoretical work, the normal distribution is the main tool in
determining whether the gap between theoretical predictions and
observed reality can be attributed solely to errors in observation.
B.6 EXPECTATION AND VARIANCE
Suppose we have some data consisting of numbers xi, each
occurring fi times. Then the total number of data points is
The average of this data is defined to be:
.
Now, if we have a discrete random variable X, the probability fX(xi)
predicts the relative frequency with which xi will occur in a large
number of observations of X, i.e., we view fX(xi) as a prediction of
.
And then,
becomes a predictor for the average x of the observations of X.
∙ The expectation of a discrete random variable X is defined to be
not]Exp@E[X]
On replacing the sum by an integral we arrive at the notion of
expectation of a continuous random variable:
∙ The expectation of a continuous random variable is defined to be
Expectation is also called mean and is denoted by μX or just μ.
Exercise B.6.1 Make the following calculations:
1. X has the discrete uniform distribution with range 0,…,n. Then
E[X] = n⁄2.
2. X has the uniform distribution on [0,1]. Then E[X] = 1⁄2.
3. X ~ B(n,p) implies E[X] = np.
4. X ~ N(μ,σ) implies E[X] = μ.
Some elementary properties of expectation are:
1. E[c] = c, for any constant c. (A constant c can be viewed as a
discrete random variable whose range consists of the single
value c.)
2. E[cX] = c E[X], for any constant c.
Suppose X : S → ℝ is a random variable and g : ℝ → ℝ is any
function. Then their composition g ∘ X : S → ℝ, defined by (g ∘ X)(w)
= g(X(w)), is a new random variable which we will call g(X).
Example B.6.2 Let g(x) = xr. Then g ∘ X is denoted Xr.
□q
Suppose X is discrete with range {xi}. Then, the range of g(X) is
{g(xi)}. Therefore we can calculate the expectation of g(X) as
follows:17
If X is continuous, one has the analogous formula:
Example B.6.3 Let g(x) = x2. Then
□q
With these facts in hand, the following result is easy to prove.
Exercise B.6.4 Let X be any random variable, and g,h two real
functions. Then
E[g(X) + h(X)] = E[g(X)] + E[h(X)].
VARIANCE
Given some data
, its average x is seen as a central value
about which the data is clustered. The significance of the average is
greater if the clustering is tight, less otherwise. To measure the
tightness of the clustering, we use the variance of the data:
Variance is just the average of the squared distance from each data
point to the average (of the data).
Therefore, in the analogous situation where we have a random
variable X, if we wish to know how close to its expectation its values
are likely to be, we again define a quantity called the variance of X:
not]var@Var[X]
Var[X] = E[(X – E[X])2].
Alternate notation for variance is not]sigmasq@σX2 σX2 or just σ2.
The quantity not]sigma@σX σX or σ, the (non-negative) square root
of the variance, is called the standard deviation of X. Its advantage
is that it is in the same units as X.
Exercise B.6.5 Note that Var[X] ≥ 0, so σ is defined. When can we
have Var[X] = 0?
Exercise B.6.6 Will a larger value of variance indicate tighter
clustering around the mean?
Sometimes, it is convenient to use the following alternative formula
for variance:
Var[X] = E[X2] – E[X]2.
This is obtained as follows.
The elementary properties of variance are:
1. Var[X + a] = Var[X], if a is any constant.
2. Var[aX] = a2 Var[X], if a is any constant.
Exercise B.6.7 Will it be correct to say that σaX = a σX for any
constant a?
Exercise B.6.8 Let X be a random variable with expectation μ and
standard deviation σ. Then
has expectation 0 and
standard deviation 1.
Exercise B.6.9 Suppose X has the discrete uniform distribution with
range {0,…,n}. We have seen that E[X] = n ⁄ 2. Show that its variance
is
n(n + 2).
Exercise B.6.10 Suppose X has the continuous uniform distribution
with range [0,1]. We have seen that E[X] = 1 ⁄ 2. Show that its
variance is 1 ⁄ 12.
Example B.6.11 Suppose X ~ B(n,p). We know E[X] = np.
Therefore,
And so,
□q
Example B.6.12
Therefore,
Suppose X ~ N(μ,σ). We know E[X] = μ.
Now we integrate by parts:
□q
Therefore
Var[X] = σ2
Now we can also illustrate our earlier statement about how the
normal distribution serves as a substitute for other distributions.
Figure B.3 compares the pdf of a binomial distribution (n = 100 and p
= 0.2) with that of a normal distribution with the same mean and
variance (μ = np = 20 and σ2 = np(1 – p) = 4).
Generally, the normal distribution is a good approximation to the
binomial distribution if n is large. One criterion that is often used is
that, for a reasonable approximation, we should have both np and
n(1 – p) greater than 5.
Figure B.3: Normal approximation to a binomial distribution
B.7 LOGNORMAL RANDOM VARIABLE
If X ~ N(μ,σ) then Y = eX is called a lognormal random variable
with parameters μ and σ. The name comes from “The log of Y is
normal.”
Exercise B.7.1 Let X ~ N(μ,σ). Then
The t = 1,2 cases of the Exercise immediately give the following: If Y
is a lognormal variable with parameters μ and σ, then
Lognormal random variables are used in finance to model the
variation of stock prices with time. The fact that they can only take
positive values is one factor that makes them suitable for this. Figure
B.4 shows the probability density functions of some lognormal
random variables. Note that as σ becomes smaller, the shape more
closely approximates that of a normal random variable.
Figure B.4: Lognormal density functions: (A) μ = 0,σ = 1, (B) μ = 1,σ
= 1, (C) μ = 0, σ = 0.4.
B.8 CAUCHY RANDOM VARIABLE
A Cauchy random variable with parameters δ,γ is given by the
probability density function
(B.1)
where δ and γ are called the location and scale parameters. The
location parameter δ can be any real number, while the scale
parameter γ has to be positive.
Figure B.5: Comparison of the pdf’s of a Cauchy and a normal
random variable (the dashed curve). The Cauchy pdf is thinner in the
middle and does not die as quickly on the sides.
Exercise B.8.1 Verify that Equation B.1 defines a probability density
function. Further, show that a Cauchy random variable has no mean
or variance.
Exercise B.8.2 Let X have a Cauchy distribution with parameters
δ,γ. Show that the median of X is δ while the interquartile range is
2γ. (This justifies the names of the parameters.)
Exercise B.8.3 Let X have a Cauchy distribution with parameters δ,
γ. Show that
has a Cauchy distribution with parameters 0 and
1.
Figure B.5 compares the probability density functions of a
Cauchy and a normal random variable. The Cauchy probability
density function is thinner in the middle and does not die as quickly
on the sides. These fat or heavy tails make it suitable for modeling
phenomena where extreme events are somewhat likely. They are
also the reason that its mean and variance do not exist.
B.9 BIVARIATE DISTRIBUTIONSS
So far we have dealt with individual random variables. The
expectation and variance of a random variable give us information
on where, and to what extent, the values of the random variable are
concentrated. An investor may use these to estimate likely profits
from her portfolio as well as the possible fluctuations in this profit.
The next step is to study the relationships that exist between
different random variables. For instance, our investor might like to
know the nature and strength of the connection between her portfolio
and a stock market index such as the NIFTY. If the NIFTY goes up,
is her portfolio likely to do the same? How much of a rise can she
hope for?
This leads us to the study of pairs of random variables and the
probabilities associated with their joint values. Are high values of one
associated with high or low values of the other? Is there a significant
connection at all?
Let S be a sample space and X,Y : S → ℝ two random variables.
Then we say that X,Y are jointly distributed.
Let X,Y be jointly distributed discrete random variables. The joint
probability distribution function (or joint pdf) not]fXY@fX,Y fX,Y of X
and Y is defined by
fX,Y (x,y) = ℙ(X = x,Y = y).
Since fX,Y is a function of two variables, we call it a bivariate
distribution. It can be used to find any probability associated with X
and Y . Let X have range {xi } and Y have range {yj}. Then:
1.
2.
.
3.
.
We will be interested in various combinations of X and Y . Therefore,
consider a function g : ℝ2 → ℝ. We use it to define a new random
variable g(X,Y ) : S → ℝ by
g(X,Y )(w) = g(X(w),Y (w)).
The expectation of this new random variable can be obtained, as
usual, by multiplying its values with their probabilities:
We create analogous definitions when X,Y are jointly distributed
continuous random variables.
Now let X,Y be jointly distributed continuous random variables.
Their joint probability density function (or joint pdf) fX,Y is a
function of two variables whose integrals give the probability of X,Y
lying in any range:
Then the following are easy to prove:
1.
2.
Suppose X,Y are jointly distributed continuous random variables and
we have a function g : ℝ2 → ℝ. Then we define g(X,Y ) as before,
and its expectation is given by
Suppose X,Y are jointly distributed random variables. Then
expectation distributes over their sum:
E[X + Y ] = E[X] + E[Y ].
We shall write the proof for the case when X,Y are both discrete.
Exercise B.9.1 Show expectation distributes over the sum of two
jointly distributed continuous random variables.
The behaviour of variance with respect to sums is slightly more
complicated. We shall need to introduce the concept of covariance to
describe it.
Let X,Y be jointly distributed random variables. Then the
covariance of X and Y is defined to be
E[(X – μX)(Y – μY)].
It is denoted by not]cov@Cov[X,Y ] Cov[X,Y ] or not]sigmaXY@σXY
σXY .
Suppose large values of X tend to go with large values of Y , and
small values of X with small values of Y . Then X – μX and Y – μY will
generally have the same sign, hence the product will tend to be
positive, and its average – which is covariance – will be positive. On
the other hand, if large values of X tend to go with small values of Y ,
and small values of X with large values of Y , then X – μX and Y – μY
will generally have opposite signs. In this case, covariance will be
negative. A zero covariance indicates that X and Y do not have a
simple relation (if they have any relation at all). (See Figure B.6.)
Exercise B.9.2 Cov[X,X] = Var[X].
Exercise B.9.3 Cov[X,Y ] = Cov[Y,X].
Exercise B.9.4 Cov[aX,Y ] = a Cov[X,Y ].
Exercise B.9.5 Cov[X + Y,Z] = Cov[X,Z] + Cov[Y,Z].
Exercise B.9.6 Cov[X,Y ] = E[XY ] – E[X] E[Y ].
Exercise B.9.7 Var[X + Y ] = Var[X] + Var[Y ] + 2 Cov[X,Y ].
Figure B.6: Observed values of two jointly distributed standard
normal variables with covariances 0.95, – 0.6 and 0 respectively
Compare the last exercise with the identity connecting the dot
product and length of vectors:
This motivates us to think of covariance as a kind of dot product
between different random variables, with variance as squared
length. Geometric analogies then lead to useful statistical insights,
such as the next statement and its proof.
A very important fact is that
| Cov[X,Y ]|≤ σXσY .
This is an analogue of the geometric fact that |u⋅v|≤||u|| ||v||. We first
verify this when Var[X] = Var[Y ] = 1:
If X, Y are arbitrary, then X ⁄ σX and Y ⁄ σY have variance 1, and so
1≥
=
|Cov[X,Y ]|
implies
|Cov[X,Y ]|≤ σXσY .
The correlation coefficient of X,Y is defined to be not]rho@ρ
ρ = ρX,Y =
.
The inequality |Cov[X,Y ]|≤ σXσY immediately implies:
–1 ≤ ρX,Y ≤ 1.
The advantage of the correlation coefficient is that it is not affected
by the units used in the measurement:
Exercise B.9.8 If we replace X by X ′ = aX + b and Y by Y ′ = cY +
d, where a and c have the same sign, we will have
ρX ′,Y ′ = ρX,Y .
Exercise B.9.9. Suppose a die is tossed once. Let X take on the
value 1 if the result is ≤ 4 and the value 0 otherwise. Similarly, let Y
take on the value 1 if the result is even and the value 0 otherwise.
1. Show that the values of fX,Y are given by:
2. Show that μX = 2 ⁄ 3 and μY = 1⁄2.
3. Show that Cov[X,Y ] = 0.
B.10 CONDITIONAL PROBABILITY
Consider a sample space S with event algebra Ω and a probability
function ℙ : Ω → [0,1]. Let A,B ⊂ Ω be events. If we know B has
occurred, what is the probability that A has also occurred? We
reason that since we know B has occurred, in effect B has become
our sample space.
Therefore all probabilities of events inside B should be multiplied by
1 ⁄ ℙ(B), to keep the total probability at 1. As for the occurrence of A,
the points outside B are irrelevant, so our answer should be ℙ(A ∩
B) times the correcting factor 1 ⁄ ℙ(B).
The conditional probability of A, given B, is therefore defined to
be
ℙ(A|B) =
.
Note that this requires ℙ(B) > 0.
We now apply this idea to random variables. Let X,Y be jointly
distributed discrete random variables. Then we have
ℙ(Y = y|X = x) =
=
.
Hence, the conditional probability distribution (or conditional pdf)
of Y , given X = x, is defined to be not]fYXx@fY ∥X=x
fY |X=x(y) =
.
This definition also serves to define the conditional probability
density function (conditional pdf) when X,Y are continuous.
The conditional pdf is a valid pdf in its own right. For example, in
the discrete case, we have
1. 0 ≤ fY |X=x(y) ≤ 1,
2.
Since fY |X=x is a pdf, we can use it to define expectations.
The conditional expectation of Y , given X = x, is defined to be
E[Y |X = x] =
Note that E[Y |X = x] is a function of x. It is also denoted by μY |X=x or
μY |x, and its graph is called the regression curve or curve of
regression of Y on X.
Example B.10.1 In finance, it is often reasonable to model the rates
of return of certain assets as being normally distributed. The
multivariate normal distribution provides a way to model multiple
normally distributed variables with desired correlations. For the time
being, we shall only describe a special case of this distribution.
Namely, we take two jointly distributed standard normal variables,
and assume their joint pdf to be given by
fX,Y (x,y) =
,
where –1 < ρ < 1. X and Y are then called a bivariate normal pair.
We leave the following for you to verify.
Exercise B.10.2 Verify that:
1. The function fX,Y is a valid joint pdf.
2. X and Y are indeed standard normal.
3. The parameter ρ in the definition is equal to the correlation
coefficient of X and Y .
Figure B.7: On the left is a plot of the joint pdf of a bivariate normal
pair of variables X,Y with ρ = 0.7. On the right is a contour plot of the
same pdf together with the line of regression y = ρx of Y on X. The
dashed vertical line shows that for a fixed value X = x, the line of
regression gives the mean of Y .
The conditional probability distribution of Y , given X = x, is
therefore defined by:
This is the pdf of a normal variable with mean ρx and variance 1
– ρ2. Hence the conditional expectation is
E[Y |X = x] = ρ x.
The curve of regression of Y on X is the straight line y = ρ x. q
□
Exercise B.10.3 In the above example, what is the curve of
regression of X on Y ? Is it the same as the curve of regression of Y
on X?
Figure B.8: Regression curve for a bivariate normal pair of variables
with ρ = 0.7, plotted against a large number of observations of their
paired values. Note how, for any small range of x, the number of
observations above and below the line are approximately equal.
The function E[Y |X = x] creates a new random variable E[Y |X] as
follows: For each outcome w of the sample space, we first evaluate x
= X(w), and this defines a number E[Y |X = x] which depends on w.
This is expressed by the following equation:
E[Y |X](w) = E[Y |X = X(w)].
Exercise B.10.4 Consider the bivariate normal pair of variables
from Example B.10.1 Show that E[Y |X] = ρX.
Below, we calculate the expectation of this new random variable
in the continuous case.
Similar calculations can be carried out in the discrete case, so that
we have the general result:
E[Y ] = E[E[Y |X]].
This result is useful when we deal with experiments carried out in
stages, and we have information on how the results of one stage
depend on those of the previous ones.
B.11 INDEPENDENCE
Let X,Y be jointly distributed random variables. We consider Y to be
independent of X, if knowledge of the value taken by X tells us
nothing about the value taken by Y . Mathematically, this means:
fY |X=x(y) = fY (y).
This is easily rearranged to:
fX,Y (x,y) = fX(x)fY (y)
(B.2)
Therefore we formally define two jointly distributed random variables
X,Y to be independent if they satisfy the identity (B.2). Note that the
definition is symmetric in X,Y .
Exercise B.11.1 If X,Y are independent random variables and g :
ℝ2 → ℝ is any function of the form g(x,y) = m(x)n(y), then
E[g(X,Y )] = E[m(X)]E[n(Y )].
Exercise B.11.2 If X,Y are independent, then Cov[X,Y ] = 0.
A common error is to think zero covariance implies independence–in
fact it only indicates the possibility of independence.
Exercise B.11.3 If X,Y are independent, then
Var[X + Y ] = Var[X] + Var[Y ].
We return again to the normal distribution. The following fact about it
is key to its wide usability: If Xi ~ N(μi,σi) with i = 1,2 are
independent, then
Of course, the mean and variance would add up like this for any
collection of independent random variables. The important feature
here is the preservation of normality. We shall use the cdf to
demonstrate this. First, let X,Y be any two jointly distributed random
variables. Then the cdf of X + Y can be expressed as follows:
We differentiate to obtain the density function:
Now let X ~ N(0,σ) and Y ~ N(0,1) be independent normal
variables. Then, from the above calculation,
Hence X + Y ~ N(0,
particular case:
). It is easy to generalise from this
Exercise B.11.4 Let Xi ~ N(μi , σi) be independent (i = 1,2). Then
X1 + X2 ~ N(μ,σ)
where μ = μ1 + μ2 and σ2 = σ12 + σ22.
The next exercise is for those who enjoy a mathematical
challenge (or at least one in integration) — it is not essential to our
work.
Exercise B.11.5 Let X,Y be independent random variables following
a Cauchy distribution with location parameter δ = 0 and scale
parameter γ = 1. Show X + Y has a Cauchy distribution with δ = 0
and γ = 2.
B.12 MULTIVARIATE DISTRIBUTIONS
The definitions and results about pairs of jointly distributed random
variables are easily extended to the case when we have an arbitrary
number of jointly distributed random variables.
Let X1,…,Xn : S → ℝ be jointly distributed. If they are all discrete,
we define their joint probability distribution function (joint pdf) by
f(a1,…,an) = ℙ[X1 = a1,…,Xn = an].
If they are all continuous, their joint probability density function is
a function f : ℝn → ℝ with the property that
The joint density function pdf f can be used to obtain the joint pdf
of any subset of X1,…,Xn. For instance, when the Xi are discrete, the
calculation shows that the joint pdf of X1,…,Xn–1 is Similarly, if they
are continuous, the joint pdf of X1,…,Xn–1 is
EXPECTATION AND VARIANCE
Any function g : ℝn → ℝ can be composed with the Xi to create a
new random variable g(X1,…,Xn) : S → ℝ defined by
g(X1,…,Xn)(w) = g(X1(w),…,Xn(w)).
The expectation of this new variable is defined by
if the Xi are discrete, and
if they are continuous.
The expectation and variance of the sum are given by the following
generalisations of the two variable case:
INDEPENDENCE
Jointly distributed random variables X1,…,Xn are called independent
if their joint pdf f is given by f(x1,…,xn) = f1(x1) fn(xn), where fi is the
pdf of Xi .
Exercise B.12.1 If X1,…, Xn are independent, show that
1. Each subcollection of the Xi is also independent.
2. Cov[Xi , Xj] = 0 whenever i ≠ j.
3.
B.13 COVARIANCE MATRIX
Let X1,…,Xn be jointly distributed random variables. Let σij = Cov[Xi ,
Xj]. The covariance matrix C for these variables is the n × n matrix
whose (i , j) entry is σij (we use the notation σii = σi2):
C=
.
The correlation matrix is similarly defined: it has entries ρij = σij ⁄
σiσj. Note that if σi = 1 for each i, then the covariance and correlation
matrices coincide.
The covariance matrix is useful for conveniently arranging
calculations involving multiple variables. For example, we have the
identity
xTCx = Var ∑ ixiXi
(B.3)
where x is the column vector whose entries are the numbers x1,
…,xn. The proof of this identity is as follows:
Let us now mention that an n × n matrix P is called positivedefinite if it has the following two properties: Symmetry: P = PT.
Positivity: For any non-zero column vector x with n entries, xTPx >
0.
A covariance matrix C is clearly symmetric. Moreover, we have xT
Cx = Var ∑ ixiXi ≥ 0, so it almost satisfies the positivity condition. In
applications, it is often a reasonable assumption that none of the Xi
is completely determined by the others. Under this assumption, ∑i xi
Xi can only be a constant if each xi is zero. For, suppose we have ∑i
xi Xi = c, where c is a constant and one of the xi is non-zero. Let the
non-zero one be x1. Then X1 is completely determined by the others:
X1 =
c –∑ i=2nx iXi .
This gives us the following chain of steps: If x is non-zero, then ∑i xi
Xi is not constant, hence xTCx = Var ∑i xi Xi > 0, and C becomes
positive-definite!
Why do we care whether C is positive-definite? One reason is
that then it has a Cholesky Decomposition: it can be expressed in
the form C = AAT, where A is a lower triangular matrix and the
diagonal entries of A are positive. We shall not offer a proof of this
result. However, once we know A exists, it is not hard to find it. First,
we write the desired expression:
=
On matching the (1,1) entries of the two sides, we see that
σ12 = a 112
a 11 = σ1.
Next, we match the (1,2) entries:
σ12 = a11a21
a21 =
.
Proceeding in this manner, from left to right, and top to bottom, we
recursively find all the entries of A.
Exercise B.13.1 Verify that the entries of A satisfy
aij =
The Cholesky decomposition makes it easy for us to manipulate
covariances. For example, suppose we start with independent
standard normal variables Z1,…,Zn and we wish to construct normal
variables X1,…,Xn with a given covariance matrix C = (σij). Let the
Cholesky decomposition be C = AAT. Define the Xi by
=A
Then
.
B.14 LINEAR REGRESSION AND LEAST SQUARES
In this section, we take up the problem of finding the best linear
approximation to the true relationship between two jointly distributed
random variables. Suppose the variables are called X and Y and we
are looking for an expression of the form α + βX which will give the
best approximation to Y . The meaning of the word ‘best’ is obviously
the main point of contention here–while different choices are
available, we shall only discuss the most popular one. This goes by
the name of Ordinary Least Squares (or OLS) and seeks to
minimise the expression
h(α,β) = E[(Y – α – βX)2].
To find the minimum, we apply the first derivative test and set the two
partial derivatives of h to zero:
These two equations can be put in a standard form,
The solutions are
β=
and α = E[Y ] – βE[X].
One nice property of the OLS line is that it matches the regression
curve whenever the latter is a line. Thus, suppose we know that the
regression curve is a line:
E[Y |X = x] = a + bx, or E[Y |X] = a + bX
(B.4)
Taking expectations on both sides of (B.4), we get
E[Y ] = E[E[Y |X]] = E[a + bX] = a + bE[X],
which matches the first equation of OLS. Next, we multiply both
sides of (B.4) by X before taking expectations:
E[XY ] = E[E[XY |X]] = E[aX + bX2] = aE[X] + bE[X2].
Thus we obtain the second equation of OLS. This shows the
regression line is the same as the one from OLS.
Linear regression and OLS are amongst the most commonly
used techniques in modeling phenomena, and the terms are
sometimes used interchangeably. We need them first while studying
the Capital Asset Pricing Model, as well as in later topics (and
frequently).
B.15 RANDOM SAMPLING
The subject of statistical inference is concerned with the task of
using observed data to draw conclusions about the distribution of
properties in a population. The main obstruction is that it may not be
possible (or even desirable) to observe all the members of the
population. We are forced to draw conclusions by observing only a
fraction of the members, and these conclusions are necessarily
probabilistic rather than certain.
Sampling is the act of repeatedly observing the values of a
random variable. Each observation is itself represented by a random
variable which has the same distribution as the original one. We will
also make the usually reasonable assumptions that the observations
do not disturb the process connected to the original random variable,
and that each choice is independent of the others.
A random sample is a finite sequence of jointly distributed
random variables X1,…,Xn such that
1. Each Xi has the same probability density function fX.
2. The Xi are independent.
The common probability density function fX for all the Xi is called the
population density. We also say that we are sampling from a
population of type X. The parameters associated to X are called
the population parameters.
Example B.15.1 Consider an investor who wishes to understand the
possible fluctuations in the price of a particular stock over a week.
She may have available data for past prices of this stock. Then she
can look at the prices at 1 week intervals, and calculate the
corresponding rates of return. Suppose the prices are x1 , … , xn,
with one week separating successive prices. This gives her n – 1
rates of return r1,…,rn–1, with
ri =
.
If the possible rate of return over a week is represented by a random
variable R, then each ri can be viewed as an observation of a
random variable Ri which has the same distribution as R.
Independence of the Ri is not obvious in this case, but has been
found to be a reasonable assumption in practice.
□q
This example should also serve to remind you of a notation we have
been using. Random variables are indicated by upper-case letters
and their values by the corresponding lower-case letters. Thus, if a
random variable is named X, an observed value of it will be denoted
by x.
Broadly, the task of sampling is to estimate the type of a random
variable, as well as the associated parameters. Thus we might first
try to establish that a random variable has a binomial distribution,
and then estimate n and p. The population parameters can be
estimated in various ways, but are most commonly approached
through the mean and variance. For instance, if we have estimates
and 2 for the mean and variance of a binomial variable, we could
then substitute these in μ = np and σ2 = np(1 – p) to estimate the
population parameters n and p.
Notation A random sample X1,…,Xn represents the possibilities for a
sequence of measurements of a random variable. Values of actual
measurements will be represented by x1,…,xn and will be collectively
called an observation of the random sample.
B.16 SAMPLE MEAN, VARIANCE AND COVARIANCE
SAMPLE MEAN
Let X1,…,Xn be a random sample. Its sample mean is the random
variable not]Xbar@X X defined by
Observed values of the sample mean are used as estimates of the
population mean μ. Therefore, we need to be reassured that, on
average at least, we will see the right value. This is easily done:
Moreover, we would like the variance of X to be small so that its
values are more tightly clustered around μ. We have
where σ2 is the population variance. Thus the variance of the sample
mean goes to zero as the sample size increases, and the sample
mean becomes a more reliable estimator of the population mean
(see Figure B.9).
Figure B.9: An illustration of the behaviour of the sample mean. For
the first diagram, 10,000 independent observations were made of a
binomial population with parameters p = 0.3 and n = 30, so that the
population mean was 9. We see that the observed sample mean
converges towards the correct value as the sample size increases to
10,000. The second diagram shows quite different behaviour when
the population follows a Cauchy distribution (in this case, with
location parameter 0 and scale parameter 1). Here, the sample
mean does not settle down at all.
Exercise B.16.1 If X1,…,Xn is a random sample from an N(μ,σ)
population then X ~ N(μ, σ ⁄ ).
The Cauchy distribution is an exception to the previous description–
we cannot make the observed average stabilise by taking larger
samples. This is not surprising, since the distribution does not have
an expectation (see Figure B.9 again).
Notation Let x1,…,xn be an observation of a random sample X1,
…,Xn. Then
will be called an observed value of X.
SAMPLE VARIANCE
Let X1,…,Xn be a random sample. Its sample variance is the
random variable not]Ssq@S2 S2 defined by
S2 =
Do the values of S2 give good estimates of the population variance?
We first reformulate the definition of S2:
Now we take the expectation:
= n(σ2 + μ2) - σ2 - nμ2 = (n - 1)σ2.
Thus, we obtain E[S2] = σ2
A rather longer calculation, which we do not include, shows that18
where μ4 = E[(X - μ)4] is called the fourth central moment.
Consequently,
In sum, we see that sample variance has the right average (σ2)
and clusters more tightly around it if we use larger samples (See
Figure B.10).
Figure B.10: An illustration of the convergence of the sample
variance to the population value. 10,000 independent observations
were made of a binomial population with parameters p = 0.3 and n =
30, so that the population variance was 6.3. The diagram shows the
trend of the sample variance as the sample size increases to 10,000.
Exercise B.16.2 Let S2 be the sample variance of a sample of size
n from an N(μ, σ) population. Show that Var[S2] =
.
Notation Let x1,…,xn be an observation of a random sample X1,
…,Xn. Then
will be called an observed value of
S2.
Finally, the square root of the sample variance is denoted by S and
is called the sample standard deviation.
SAMPLE COVARIANCE AND CORRELATION
Suppose we have two jointly distributed variables X,Y (with
respective means μX and μY ) and we wish to estimate their
covariance. Let (X1,Y1),…,(Xn,Yn) be a sequence of independent
observations of (X,Y ). This means that Xi is independent of both Xj
and Yj whenever i≠j. These assumptions imply:
1. E[XiYi] = E[XY ] for every i.
2. E[XiYj] = E[Xi]E[Yj] = μXμY whenever i≠j.
We define the sample covariance to be the random variable
not]SXY@SXY
The expectation of the sample covariance equals the covariance of
X and Y . To prove this, we first obtain the following identity:
Now take expectation on both sides:
and hence E[SXY ] = Cov[X,Y ].
Exercise B.16.3 Show that
The variance of the sample covariance has an explicit formula,
which we state without proof:19
Var[SXY ] =
,
where μ2,2 = E[(X - μX)2(Y - μY )2]. This formula reassures us that, as
n → ∞, the variance of the sample covariance dies down to zero.
Thus increasing the sample size leads to better estimates of the
covariance of X and Y .
The sample correlation R is defined in analogy with correlation
by dividing the sample covariance by the sample standard
deviations: not]R@R
R=
,
where SX2 and SY
respectively.
2
are the sample variances of X and Y
B.17 CENTRAL LIMIT THEOREM
For a more detailed understanding of the sample mean, we have to
study its probability distribution. Naturally, this depends on the base
distribution. To gain some insight, let us look at some simulations
with different distributions and sample sizes.
Figures (B.11) to B.13 depict the results of drawing samples from
certain populations. In each case, the distribution of the sample
mean is approximated by a normal distribution. The figures show
that for large sample sizes, the approximation is almost perfect.
By doing some normalising, we can connect everything to the
standard normal distribution. Given a sequence of independent
random variables Xi, all with the distribution X, define Yn =
,
and Zn =
, where μ = E[X] and σ2 = Var[X]. Yn is the sample
mean of the sample X1,…,Xn and has mean μ and standard
deviation σ⁄
. Therefore Z n has mean 0 and standard deviation 1.
In our examples, since the sample means Yn are becoming normal,
their normalised versions Zn will become standard normal. It turns
out that this is a general phenomenon.
Figure B.11: Behaviour of sample mean from a uniform distribution.
Each histogram is based on a simulation of 50,000 samples. The
sample size is 2 for the first figure and 6 for the second.
Figure B.12: Behaviour of sample mean from a lognormal
distribution, based on simulations of 10,000 samples. The sample
size is 6 for the first figure and 200 for the second.
Figure B.13: Behaviour of sample mean from a Bernoulli
distribution, based on simulations of 10,000 samples. The sample
size is 6 for the first figure and 100 for the second.
Theorem B.17.1 (Central Limit Theorem) Let (Xi) be a sequence of
independent random variables, with identical distribution. Suppose
their common mean is μ and standard deviation is σ. Let Yn be the
sample mean of the sample X1 , … , Xn and
Zn =
.
Then, as n →∞, the random variables Zn converge to the standard
normal distribution Z in the sense that
□q
We have not developed, in this Appendix, the mathematical
techniques required to prove this theorem. However, the
accompanying figures should have served to make it appear
plausible to you. It is also useful to note that when the distribution
being sampled is normal, the sample mean is exactly normally
distributed.
The Central Limit Theorem is truly remarkable––whatever
distribution you start with, its independent sums start looking like the
normal distribution. The only caveat is that the starting distribution
should possess a mean and a variance.
B.18 STABLE DISTRIBUTIONS
A distribution X is said to be stable if for any random sample X1,
…,Xn from it, an arbitrary linear combination is just a scale and
translation of X:
a1X1 +
+ anXn aX + b,
for some a,b ∈ ℝ.
Example B.18.1 We have seen normal distributions are stable. If X1,
…,Xn are a random sample from an N(μ,σ) population, and X ~
N(μ,σ), then
X1 +
+ Xn aX + b,
where a =
and b = (n -
)μ. q
□
Now, suppose X is stable with mean μ and standard deviation σ. Let
(Xi ) be a sequence of independent random variables with Xi X, and
set
Yn = (X1 +
Xn), Zn =
.
Then E[Zn] = 0 and Var[Zn] = 1.
Since X is stable, we have Zn anX + bn. On equating the mean and
variance, we get
anμ + bn = 0, an2σ2 = 1
which implies an = ± , bn = ∓ .
Hence the sequence Zn can consist of only two distributions:
Zn
±
.
By the Central Limit Theorem, the distribution of Zn converges to the
standard normal distribution Z. Therefore, the Zn’s must eventually
have the distribution Z. Since Z - Z, we find that
~ N(0,1), and hence X ~ N(μ,σ).
So we have established that the normal distribution is the only stable
distribution with mean and variance. On the other hand, the Cauchy
distribution is a stable distribution without mean or variance. In fact,
there is a large family of stable distributions. They have certain
common features––they are all continuous, and with a single peak
tapering off on either side. Unfortunately, they are known indirectly
through the Fourier transform of their density functions, and this has
made it harder to develop intuition about them. The lack of mean or
variance is also an obstacle to developing intuition. Computer
simulation, however, has made it possible to use them in
applications and their use is growing.
B.19 DATA FITTING
In choosing which distribution best fits some given data, there are
two stages: First, we have to choose an appropriate type of
distribution (binomial, normal, etc.) and then within the type we look
for the particular member that works best. The first choice usually
corresponds to our choice of model, and the second to fixing the
parameters of the model. For example, if we adopt the Geometric
Brownian Motion model for stock prices, we have decided to fit a
normal distribution to the log of the returns. The model has two
parameters–– drift and volatility––and determining them corresponds
to fitting a specific normal distribution to the data.
The simplest way to make a distribution fit the data is to match
some key features. For example, by matching the mean and
variance of the data to those of the distribution, we get two equations
involving the parameters of the distribution. This suffices for
distributions like the binomial and normal ones, which have only two
parameters. If more parameters have to be fixed, we can match the
higher moments. The kth central moment of a set of data (xi) is
defined to be:
The corresponding kth central moment of a random variable X is
E[(X - E[X])k].
The third moment is seen as capturing information about the
symmetry of the density function about the mean, and the fourth as
describing how quickly the “tails” die to zero. However, these
representations are not very reliable. Higher moments of the data
are very sensitive to extreme events and therefore less reliable - on
the whole, it is deemed best to only match the mean and variance.
A more nuanced approach is to match the quantiles. Quantiles of a
distribution were defined in §B.3. For a set of data (xi), we
analogously define the q-quantile (0 < q < 1) to be the least number
xq such that at least 100q% of the data is less than or equal to q.
Example B.19.1 Figure B.14 is based on the data for Infosys stock
depicted in Figure 5.3. The quantiles for this data are shown as
discs. The solid curve represents the normal distribution with the
same mean and variance as the data. We see that it is too low on
the left and too high on the right, as far as most of the data is
concerned. This has happened because the extreme events at the
ends have pulled it away from the centre. We can compensate by
ignoring the extreme events. Then we get a near perfect fit (the
dashed curve) to the middle 80% of the data.
Figure B.14: Fitting normal distributions to data using quantiles
We are not able, however, to get a good match through the full (0,1)
range. This suggests that the data does not truly represent a normal
distribution.
□q
This example illustrates how quantiles can be used to visualise the fit
of a distribution to data, and to adjust it accordingly. For example, if
we do not care about extreme events and just want a model that
does well under normal circumstances, the distribution depicted by
the dashed curve would do very well indeed.
Another positive feature of this approach is that it works even when
the mean and variance of the distribution do not exist. In the
example at hand, if we reject the normal distribution as a good model
for the data, we can look for a replacement among the class of
stable distributions. The non-normal members of this class will not
have a variance (and often not even a mean) so the quantile
alternative will be essential.
Example B.19.2 A Cauchy distribution with parameters δ,γ has
median δ and interquartile range 2γ and this can be used to get a
quick fit to the data. For example, the Infosys data studied in this
section has median 0 and interquartile range 0.04, so we fit a
Cauchy distribution with δ = 0 and γ = 0.02.
□q
Figure B.15: Fitting a stable distribution to the data of Figure B.14.
The second diagram compares the density function of the fitted
stable distribution (solid) to a normal one (dashed). Note the slight
asymmetry and the heavier tails of the stable distribution.(See
Example B.19.3)
Example B.19.3 It is possible to get a much closer fit to the data in
the previous examples by using a general stable distribution. Figure
B.15 shows one possibility.20 Stable distributions are parametrised
by four variables, and we mention for the record that the one shown
in the figure has α = 1.53, β = 0.1665, γ = 0.021 and δ = -0.001,
where we use the 0-parametrisation from Nolan[35]. q
□
B.20 MONTE CARLO SIMULATION
Every modern computing system (from scientific calculators upward)
comes equipped with a system for generating “random numbers”. A
typical scientific calculator has a key marked RAN or RAND, on
pressing which a number between 0 and 1 is obtained. Repeated
pressing of the key produces a sequence such as the following:
0.0391, 0.0921, 0.7406, 0.7691, 0.8491, 0.9403, 0.1396, 0.9285,
0.0607.
The numbers in this sequence are generated using some simple
arithmetic, but still have interesting statistical properties. They are
random in the sense that knowing many members of the sequence
does not enable us to predict the next with any certainty (in other
words, the rule used to generate the numbers cannot be guessed
from the output). Moreover, they are chosen without any bias toward
any subinterval of [0,1]. Thus, the procedure simulates sampling
from a uniform distribution over [0,1].
Figure B.16: Uniformly distributed random numbers (See Example B.20.1.)
Example B.20.1 In spreadsheet programs such as Microsoft Excel
and OpenOffice Calc, the RAND function generates random numbers
which are uniformly distributed over [0,1]. Figure B.16 shows the
results of a run of 1000 consecutive evaluations of RAND. The first
diagram plots the numbers consecutively along the y-axis, showing
that there is no obvious pattern in their arrangement. The second
diagram shows their distribution among different parts of the [0, 1]
interval.
□q
With this basic building block, one can generate sequences that
simulate sampling from any distribution we wish to consider. In
particular, let us note that if we can simulate random sampling of a
particular distribution, we can also simulate sampling of a number of
independent random variables with this distribution. For example,
suppose we wish to generate 1000 pairs of values of independent
random variables X,Y which are uniformly distributed over [0,1].
Then we generate 2000 values of the uniform distribution over [0,1]
and allocate them alternately to X and Y .
Example B.20.2 Suppose X1,…,X12 are independent random
variables, all uniformly distributed over [0,1]. Define
The Central Limit Theorem (§B.17) tells us Y is approximately
normal, and it is easily checked that E[Y ] = 0, Var[Y ] = 1. Thus Y is
approximately standard normal. This observation gives us a
straightforward way to simulate a standard normal distribution if we
have a method of simulating uniform distributions. Specifically,
suppose x1,…,x12 is one set of generated values for X1,…,X12. Then
is a value for Y .
Figure B.17: Comparison of the density functions of a sum of twelve
independent uniform distributions (solid) with a standard normal distribution
(dashed) (Example B.20.2).
Figure B.17 shows the approximation is almost perfect. One
concern could be that the values of Y are confined to the interval
[-6,6] instead of varying over all of the real line. However, the
probability of a standard normal variable taking a value outside of
[-6,6] is a tiny 2 × 10-9, so this is only an issue when simulating very
large data sets.
Figure B.18 shows an example of using this procedure to simulate
1000 observations of a standard normal random variable.
Figure B.18: Normally distributed random numbers (Example B.20.2)
The process of simulating a standard normal distribution provides□q
a base for simulating many other distributions. Suppose z1,z2,… are
simulated values of the standard normal distribution. Then w1 = μ +
σz1,w2 = μ + σz2,… are simulated values of the N(μ,σ) distribution.
Further, ew1,ew2,… are simulated values of the lognormal
distribution with parameters μ and σ.
Example B.20.3 Cauchy distributions can be easily simulated due
to the following fact: If X is uniformly distributed over [-π,π] then Y =
tanX has a
Cauchy distribution with location parameter δ = 0 and scale
parameter γ = 1. This follows from the computation below.
Therefore the pdf of Y is fY (y) =
(y) =
Thus, if x1,x2,…
are simulated values of a uniform distribution over [-π,π], then
tan(x1),tan(x2),… are simulated values of a Cauchy distribution with
δ = 0 and γ = 1.
□q
We will now see how to simulate correlated variables. To simulate n
standard normal variables with covariance matrix C = (σij), first
simulate n independent standard normal variables Z1,…,Zn. Next,
find the Cholesky decomposition C = AAT (assume C is positivedefinite so that the Cholesky decomposition exists). Finally, define
new variables X1,…,Xn by
=A
.
The Xi are normal since they are linear combinations of independent
normal variables. We can easily see that each Xi has zero mean:
We have already computed on page 240 that the covariances are
Cov[Xi,Xj] = σij.
15 The event algebra does not always include all the subsets of the sample
space S. But we will let this issue lie undisturbed.
16 These are not the only types. But these types contain all the ones we need.
17 Our calculation is valid if g is one-one. With slightly more effort we can
make it valid for any g.
18 Few books include the proof. You can find one version on the MathWorld
website: http://mathworld.wolfram.com/SampleVarianceDistribution.html. Also,
it is a special case of the formula for the variance of the sample covariance
given later.
19 You can find it online at the University of Alabama’s “Virtual Laboratories in
Probability” website: <http://www.math.uah.edu/stat/sample/Covariance.xhtml>
.
20
The computations for this example
programSTABLE,
available
from
J.
http://academic2.american.edu/~jpnolan>.
were carried
P.
Nolan’s
out via the
website:
<
Appendix C
Solutions to Selected Exercises
CHAPTER 1
Exercise 1.2.3 Let us compare the given situations with the
requirements of an arbitrage opportunity: zero initial investment and
zero probability of loss together with a positive probability of profit.
The first two situations can be arbitrage if interest rates are low
enough. For, we can start by borrowing Rs 10, thus avoiding an
initial investment of our own money. This will earn us Rs 20 by the
end of the set time (ten years or one day), which we can use to pay
off the loan. If the interest on the loan is low enough, we will still
have money left over and this will constitute our risk-free profit. In the
second situation, this is almost certain to happen.
The lottery ticket does not constitute arbitrage since there is no
guarantee of profit (indeed, a loss is almost certain).
The free lottery ticket does provide an arbitrage opportunity.
There is no possibility of loss, and a very small one of success.
In the fifth situation, we can loan some amount from Bank A and
deposit it in Bank B for a year. At the end we will earn 15% interest
on it, which can be used to pay off the 10% interest on the loan. The
remaining 5% is our arbitrage profit.
The last situation does not, by itself, provide an arbitrage
opportunity.
Exercise 1.3.7 Let each instalment be of an amount I.
a. After 6 months, the debt is 10,000 × 1.075 = 10,750, since the
interest rate for 6 months is 15/2=7.5%. On payment of the first
instalment, a debt of 10,750 – I remains. Over the final 6
months, this becomes (10,750 – I)×1.075. For the debt to be
paid off, this amount must equal the last instalment and we get
the equation
I = (10,750 – I) × 1.075.
The solution is I = 5569.28.
b. Of the first instalment, Rs 750 goes towards the interest and Rs
4819.28 towards the principal. The last instalment pays off the
remaining principal amount of Rs 5180.72 as well as interest
amounting to Rs 5569.28 – 5180.72 = 388.56.
Exercise 1.3.9 Let us base our calculations on an investment of Rs
1. Let the rate of interest be r. Then the three cases yield the
following equations:
a. 2 = 1 + 10r
b. 2 = (1 + r)10
c. 2 = e10r
The respective solutions are r = 0.1, 0.072 and 0.069, or 10%, 7.2%
and 6.9%.
Exercise 1.3.10 The arbitrage strategy is to borrow an amount X for
a year and deposit it in the same bank. Withdraw it after 6 months
and immediately reinvest it in that bank. After 1 year, the invested
amount becomes
1.0252X = 1.050625X.
We use 1.05X to pay off the loan and are left with a risk-free profit of
0.000625X.
Exercise 1.3.13 The annual growth factor is 1.0212 = 1.268, so the
effective annual rate is 26.8%.
Exercise 1.3.14
a. Let the invested amount be Rs 1, the discrete rate be rd, and the
continuous rate be rc. Since the interest earned over 1 year is
the same for both A and B, we find that
Hence
which shows that the interest earned over 6 months is also the
same for both A and B. If we graph the interest earnings against
time, we get the diagram as shown in the next page.
In particular, at the 9 month mark, A has earned more interest
than B.
b. Since the effective rate is 10%, we obtain the following
equations:
2 = erc = 1.1.
This yields rd = 0.098 and rc = 0.095. Over the first 6 months, using
years as units, the difference between the interests earned by A and
B is
f(t) = 1000(1 + 0.098t – e0.095t), 0 ≤ t ≤ 0.5.
To locate the maximum we use the first derivative test:
0 = f′(t) = 1000(0.098 – 0.095e0.095t),
which gives t = 0.33. Hence the maximum gap is f(0.33) = 0.49, or a
mere 49 paise!
Exercise 1.4.2 When continuous rates are used, the relationship
between the original and final purchasing power is
Purchasing power after 1 year =
.
If an amount A is invested at the risk-free rate r for a year, at the end
we have the effective amount (in terms of purchasing power):
= er–fA.
Therefore the real risk-free rate is r – f.
Exercise 1.4.3 The growth in purchasing power is
= 1.166.
Therefore the return in real terms is 16.6%.
CHAPTER 2
Exercise 2.4.3 In part 1, we apply the formula for a finite geometric
sum:
For part 2, we substitute C = λF in the above formula to get:
P=
((1 + λ)n – 1)F + F = F.
This shows that when the yield equals the coupon rate, the bond is
sold at par. If the yield is greater, the price must fall below the face
value, and the bond is at a discount. If the yield is lower, the price
must rise and the bond is at a premium.
Exercise 2.4.4 Similar to the solution above.
Exercise 2.5.2 After 6 months, the accrued interest is 10/2 = 5, and
so the purchase or dirty price is 102 + 5 = 107.
Exercise 2.7.1 To save ink, we write the solution for m = 1. The
price is given by
Hence,
Dividing through by P gives
=–
.
Exercise 2.7.3 In the limit, the bond becomes a perpetuity with
annual payments of C. Hence,
where x = e–λ. Now the denominator is a geometric series:
1 + x + x2 + x3 +
=
.
On differentiating both sides with respect to x we get
1 + 2x + 3x2 +
=
,
which gives a formula for the numerator as well. Therefore, we have
obtained
Exercise 2.7.4 Go through the same steps as in the previous
exercise, with x =
. The final answer is
Exercise 2.8.3 If we invest Re 1 for n years, it becomes (1 + sn)n. If
we invest it for m years and then reinvest it for the remaining n – m
years, it becomes (1 + sm)m(1 + fm,n)n–m. Since both routes are riskfree (due to the use of the forward rate), the No Arbitrage Principle
says the final sums must be the same:
(1 + sn)n = (1 + s m)m(1 + f m,n)n–m.
We solve this for fm,n to get the required formula.
Exercise 2.8.4 Just as above.
Exercise 2.8.5 Proceed as in Exercise 2.8.3 and obtain:
This implies nsn = msm + (n – m)fm,n, which gives the formula for fm,n.
Exercise 2.8.7 Use the linearity of differentiation, as done for
Macaulay duration on page 42.
Exercise 2.8.8 Similar to the calculations for modified duration in
Exercise 2.7.1.
Exercise 2.11.2 If the bond is called at the end of the ith year, for
the purposes of calculating YTC it becomes an i year bond with face
value Ci. Apply Exercise 2.4.3.
C
3
Exercise 3.1.1 Suppose r1 = r2 + c, with c > 0, over a time interval
[0,T]. Then we implement the following arbitrage strategy. At time t =
0, short sell an amount of the second asset for a price V . Invest this
V in the first asset. Then, at time t = T, sell the first asset. This will
earn us V (1 + r1) = V (1 + r2) + V c. We use V (1 + r2) of this to buy
the required amount of the second asset and deliver it to complete
the short sale. We are left with a risk-free profit of V c.
In these calculations, note that r1,r2 are random, but their difference
c is not.
Exercise 3.1.3 In both cases, we are left with Rs 100 cash – a loss
of Rs 100.
Exercise 3.1.4 Let the weights for S and T be denoted by wS and
wT. Their values are:
wS = –
= –4 and wT =
= 5.
The individual rates of return are:
rS =
=
and rT =
= .
Hence the overall rate of return should be given by as was
calculated directly in the example.
wS rS + wT rT = –4 ×
+5×
= 50%,
Exercise 3.2.2 C and B are more efficient than A, while C and D are
more efficient than E. Hence the efficient ones are B, C and D.
Exercise 3.3.1 Just that asset itself.
Exercise 3.3.2 The equation rP = wrA + (1 – w)rB can be solved for
w. We get
w=
and 1 – w =
.
Substitute these in the formula for σP2 to get
σ2P =
.
The RHS is a quadratic in rP, hence it can be put in the form a rP2 +
b rP + c. To obtain a, we collect all the coefficients of rP2:
a=
.
To see a > 0 note that the numerator is positive because it is the
variance of rA – rB.
Exercise 3.3.3 This exercise is for the mathematically curious, so
try it yourself.
Exercise 3.3.4
1. This solution is modelled on the proof that –1 ≤ ρ ≤ 1. Consider
the variance of the random variable rA – rB. If σA = σB = σ, and
using ρ = 1, we have Hence rA – rB is a constant. By Exercise
1.3.14 we must have rA = rB, and then A and B have the same
coordinates in the portfolio diagram. On the other hand, it was
given that their coordinates are different. So we must have σA ≠
σB.
Var[rA – rB] = σ2A + σ2 B – 2σ AσB = σ2 + σ2 – 2σ2 = 0.
2. The portfolio variance is
σ2P = w2σ2
A
+ (1 – w)2σ2
B
+ 2w(1 – w)σ AσB = (wσA + (1 –
w)σB)2,
and so 0 = σP = |wσA + (1 – w)σB|. We solve this to get If σB >
σA, we get w > 1. If σB < σA, we get w < 0.
w=
.
Exercise 3.3.5 The portfolio mean return and standard deviation are
Exercise 3.3.6 Similar to the solution of part 2 of Exercise 3.3.4.
Exercise 3.3.9 Consider the portfolio variance as a function of the
weight w:
σ2P = w2σ2A + (1 – w)2σ2B + 2w(1 – w)σAB.
To minimise the variance, we calculate its derivative with respect to
w and set it to zero.
Rearrange this equation to obtain the formula for w.
Exercise 3.3.10 Let σA = σB = σ. Then we have
w=
=
= .
Exercise 3.4.1 We have to solve the following linear system.
=
This system has the following block structure (indicated by the lines
in the matrices above):
=
This gives two vector equations:
IW + RL = 0
RTW = R′
Substituting the first in the second gives a 2 × 2 system for λ,μ:
Now, we can obtain the weights of the minimum variance portfolio
with r = 0.5:
The variance of this portfolio is
σ2 = u2 + v2 + w2 = 0.459
Exercise 3.4.3
becomes
In the notation of this exercise, Equation 3.4.3
=
Carrying out matrix multiplication on the LHS, we get two equations:
Cw + uμ = 0
(C.1)
uTw = 1
(C.2)
We eliminate w between these equations. Equation (C.1) gives
w = –μC–1u
and we substitute this in (C.2) to get
μ=
.
(C.3)
Substituting this in (C.3), we get the desired result.
Exercise 3.7.1 Substitute in CAPM. This gives
r=
= rf +
(rM – rf).
Multiply both sides by V0 and solve for V0.
Exercise 3.7.3 We shall use the Certainty Equivalent Pricing
Formula. We have the following calculations:
Substituting all these into the Certainty Equivalent Pricing Formula,
along with rf = 1.05, gives V0 = 7.42 million.
CHAPTER 4
Exercise 4.2.6 Reinvest the income from the margin account at the
prevailing risk-free rate till the expiry date.
Exercise 4.3.4 We have T = 0.5 years, S0 = 100, r = 0.1. Therefore,
X = 100 × 1 +
= 105.
We note in passing that, had r been continuously compounded, the
numerical result would hardly have changed:
X = 100 e0.05 = 105.13.
Exercise 4.4.1 At time t, create two portfolios:
Portfolio A: The futures with exercise price X and Xe–r(T–t)
cash.
Portfolio B: The futures with exercise price Xt and Xte–r(T–t)
cash.
At time T both portfolios become the asset, hence have equal value.
Therefore they have equal value at time t:
Vt + Xe–r(T–t) = 0 + Xte–r(T–t).
Exercise 4.4.3 The spot price of the bond is
S0 = 1000 e–0.25×0.1 + 11000 e–0.75×0.1 = 975.31 + 10205.18 =
¥11,180.49.
The net present value of the income from the bond during the life of
the futures is:
I0 = 1000 e–0.25×0.1 = ¥975.31.
Since X = ¥10,500, the value of the futures is
V = S0 – I0 – Xe–rT = 10205.18 – 10500 e–0.1×4⁄12 = ¥49.41.
Exercise 4.4.4 Due to the storage costs, the asset has an
associated negative income:
I0 = –300 e–0.1×1/12 – 300 e–0.1×2/12 – 300 e–0.1×3/12 = Rs 885.14.
Therefore the exercise price is
X = (S0 – I0) erT = (10000 + 885.14) e0.1×3/12 = Rs 11,160.70.
Exercise 4.4.7 Rs 103.07
Exercise 4.7.2 Let the rates of return from the portfolio and the
index be rP and rI respectively. Then
rP =
rI =
.
Therefore
β=
=
=
h.
CHAPTER 5
Exercise 5.1.1 See the solution of Exercise B.7.1.
Exercise 5.1.3 We have
2⁄2)△t+σ
S△t = S e(μ–σ
Z,
where Z is standard normal. Therefore,
Hence
the
expected
return
is
Exercise 5.1.4 Use the first order approximation ex ≐ 1 + x.
Exercise 5.2.1 Combining the GBM formula with the formula for
futures price, we obtain which is the formula for a GBM with drift μ –
r and volatility σ.
Exercise 5.4.1 This problem can be done directly through the
formulas for expectation and variance for discrete random variables.
We can also do it in the following way. Let Y takes the values 1 and
0 with probabilities p and 1 – p respectively — so it is a Bernoulli
variable and has mean p and variance p(1 – p). Therefore,
Y=
.
CHAPTER 6
Exercise 6.1.3 We compare with the value of the corresponding
futures to get:
Exercise 6.2.3
Exercise 6.2.4 If an American put is exercised at t = 0, there is a
payoff of X – S > Xe–rT – S. So the lower bound can be improved to
P ≥ max{0, X – S}.
The upper bound similarly changes to P ≤ X.
Exercise 6.3.2 Create the following portfolios at time t = 0:
Portfolio A: 1 put and 1 share.
Portfolio B: 1 call and Xe–rT cash.
Exercise 6.3.4 The same reasoning would suggest that put prices
would fall. But put–call parity says put and call prices move together
—the only resolution is that the prices must be independent of the
expected future asset price.
Exercise 6.4.7 Note that for any number x,
max{0,x}– max{0,–x} = x.
Therefore,
max{0, SU kD n–k – X}– max{0,X – SU kD n–k} = SU kD n–k – X.
Substitute this in the n-step BOPM formula for C – P:
Exercise 6.9.1 An n-step BOPM gives the value of the contract as
If we take U = eσ
obtain the value
and D = e–σ
, and then let n →∞, we
2)T
.
V = S2e(r+σ
This limit takes some effort to calculate. A hint: substitute the power
series expansions and consider the first couple of terms. Using the
log function helps. It is also possible to attack the problem via
L’Hôpital’s Rule.
CHAPTER 7
Exercise 7.2.3 Use risk-neutral valuation. The payoff function is
f(ST) =
And so the derivative value is
where (see page 158)
–a =
=w–σ
.
Exercise 7.2.4
A stock-or-nothing call is equivalent to one
European call and X cash-or-nothing calls.
Exercise 7.8.1
European call.
Combine put–call parity with the Greeks for a
Exercise 7.10.3 Reducing the number of calls lowers the slope of
the payoff function and hence the profit from moderate increases in
the asset price.
CHAPTER 8
Exercise 8.1.3 The 2-day standard deviation of the return from each
asset is
σ = 105 × 0.01 ×
= 1414.21.
So the standard deviation of the 2-day return from the portfolio is
σP =
= 1140.18.
Over a short period such as 2 days, it is reasonable to take the mean
return to be zero. So we model the return as a N(0,1140.18) variable.
The 1% quantile for this is –2652.44. Hence the 99% 2-day VaR is
estimated to be 2652.44.
Exercise 8.2.2 The change in portfolio value is modelled as
△V ≐ 6S1 – 4S2
where 1 ~ N(0,20) and S2 ~ N(0,8). So we model
dV ~ N(0,
) = N(0, 21.54).
The 5% quantile is –35.43. So the 1-day 95% VaR is 35.43.
APPENDIX A
Exercise A.1.5 We start by noting a consequence of the Mean
Value Theorem: If f is a continuous function on [a,b] then there is a c
∈ [a,b] such that
To prove this, let
.
Then we have F′(x) = f(x), F(a) = 0 and
The Mean
Value Theorem implies the existence of c ∈ [a,b] such that
Now we can start on the exercise. We calculate
where b′∈ [b – z,b] and a′∈ [a – z,a]. Now b →∞ implies b′→∞ and a
→–∞ implies a′→–∞. Therefore,
APPENDIX B
Exercise B.3.2 We check that the given formula for faX+B satisfies
FaX+b
(x)
=
The
case a < 0 is shown below (The a > 0 case is similar and easier):
Exercise B.3.6 From the graph of FX we see that the leftmost point
x at which F(x) ≥ 0.25 is x = 0. Hence x0.25 = 0. Similarly, we see that
x0.5 = 1 and x0.75 = 2. Therefore the interquartile range is 2 – 0 = 2.
Exercise B.3.7 From the graph of FX, we read that
x0.25 = 0.25, x0.5 = 0.5, x0.75 = 0.75.
Hence the interquartile range is 0.75 – 0.25 = 0.5.
Exercise B.6.5 A continuous X cannot have zero variance. A
discrete X can have zero variance if and only if it is constant.
Exercise B.7.1
Note that
= 1 since
this is the integral of the probability density function of a normal
variable (with mean σt and variance 1).
Exercise B.8.1 To confirm we have a pdf, we have to check that the
total integral is 1:
And,
Now, and so the expectation integral diverges. Hence, a Cauchy
random variable has no mean. So, it does not have a variance
either!
Exercise B.8.2 The cdf is
We calculate the quartiles:
Hence, median= x0.5 = δ, and interquartile range = x0.75 – x0.25 = 2γ.
Exercise B.8.3 Let Y = (X – δ)/γ. We work with the cdf of Y :
Therefore the pdf of Y is fY (t) =
.
Exercise B.10.4 Since E[Y |X = x] = ρ x, we have
E[Y |X](w) = E[Y |X = X(w)] = ρX(w),
and hence E[Y |X] = ρX.
Exercise B.11.4 We have already worked out that if X ~ N(0,σ) and
Y ~ N(0,1) are independent, then X + Y ~ N(0,
). Now let X1 ~
N(μ1, σ1) and X2 ~ N(μ2, σ2) be independent. Define
Since X1 and X2 are independent, so are X and Y . Now,
Exercise B.13.1 Note that since A is lower triangular, all entries aij
with i < j are zero. So we are only concerned with the entries that
have i ≥ j. To find aij , with i ≥ j, we multiply the ith row of A with the jth
column of AT. This is supposed to equal σij, so we obtain the
equation
ai1aj1 + ai2aj2 +
+ aijajj = σij.
When i > j, this gives
When i = j, the original equation becomes
a2i1 + a2 i2 +
+ a ii2 = σ2i,
and we solve it for aii:
Exercise B.16.1 The fact that X will have mean μ and standard
deviation σ/
has been proven just before this exercise. Further,
we know from the previous units that the sum of independent normal
variables is normal.
Exercise B.16.2 We derive it from the general formula given
previously. The main task is to calculate the fourth central moment
for X ~ N(μ, σ):
Substitute this result in the given formula for Var[S2].
Bibliography
1. Tom M. Apostol. Calculus, Vol I and II. Wiley India, 2nd edition,
2007. Original printing 1966.
2. Aristotle. Politics. A Treatise on Government. Translated by
William Ellis. J. M. Dent and Sons Ltd., London, 1912.
Downloadable from Project Gutenberg: http://gutenberg.net.
3. Orley Ashenfelter, Phillip B. Levine, and David J. Zimmerman.
Statistics and Econometrics: Methods and Applications.
JohnWiley and Sons, New York, 2003.
4. Louis Bachelier. Théorie de la spéculation. Annales
Scientifiques de l’École Normale Supérieure 3e série, 17:21–86,
1900. Downloadable from
http://www.numdam.org/item?id=ASENS_1900_3_17_21_0 .
5. Simon Benninga. Financial Modeling. MIT Press, Cambridge,
MA, 1997.
6. Fischer Black and Myron Scholes. The pricing of options and
corporate liabilities. Journal of Political Economy, 81(3):637–
654, 1973.
7. Zvi Bodie, Alex Kane, and Alan J. Marcus. Investments.
Irwin/McGraw-Hill, Boston, 5th edition, 2002.
8. Richard A. Brealey, Stewart C. Myers, Franklin Allen, and
Pitabas Mohanty. Principles of Corporate Finance. Tata
McGraw-Hill, India, 8th edition, 2007.
9. Marek Capiński and Tomasz Zastawniak. Mathematics for
Finance: An Introduction to Financial Engineering. Springer
Undergraduate Mathematics Series. Springer-Verlag, London,
2003.
10. Morgan Guaranty Trust Company. RiskMetricsTM – Technical
Document.
http://www.jpmorgan.com/RiskManagement/RiskMetrics/RiskMe
trics.html, 4th edition, 1996.
11. Paul Cootner, editor. The Random Character of Stock Market
Prices. MIT Press, Cambridge, MA, 1964.
12. Jean-Michel Courtault, Yuri Kabanov, Bernard Bru, Pierre
Crépel, Isabelle Lebon, and Arnaud Le Marchand. Louis
Bachelier on the Centenary of Théorie de la spéculation.
Mathematical Finance, 10(3):341–353, 2000.
13. John C. Cox, Jonathan E. Ingersoll Jr., and Stephen A. Ross.
The relation between forward prices and futures prices. Journal
of Financial Economics, 9:321–346, 1981.
14. John C. Cox and Stephen A. Ross. The valuation of options for
alternative stochastic processes. Journal of Financial
Economics, 3:145–66, 1976.
15. John C. Cox, Stephen A. Ross, and Mark Rubinstein. Option
pricing: A simplified approach. Journal of Financial Economics,
7:229–263, 1979.
16. Keith Cuthbertson and Dirk Nitzsche. Financial Engineering.
John Wiley and Sons, London, 2001.
17. Mark Davis and Alison Etheridge. Louis Bachelier’s Theory of
Speculation. Princeton University Press, Princeton, 2006.
18. Robert F. Engle. Autoregressive conditional heteroscedasticity
with estimates of variance of United Kingdom inflation.
Econometrica, 50:987–1008, 1982.
19. Robert F. Engle. GARCH 101: an introduction to the use of
ARCH/ GARCH models in applied econometrics. NYU Working
Paper
No.
FIN-01-030.
Available
http://ssrn.com/abstract=1294571, 2001.
at
SSRN:
20. Eugene F. Fama and Kenneth R. French. The Capital Asset
Pricing Model: Theory and Evidence. SSRN eLibrary, 2003.
21. Lawrence Fisher and Roman Weil. Coping with the risk of
interest rate fluctuations: Returns to bondholders from naive and
optimal strategies. Journal of Business, 44:408–31, 1971.
22. Craig W. French. Jack Treynor’s Toward a Theory of Market
Value of Risky Assets. SSRN eLibrary, 2002.
23. John E. Freund. Mathematical Statistics. Prentice-Hall India,
New Delhi, 5th edition, 2001.
24. John Hicks. Value and Capital. Clarendon Press, UK, 1939.
25. Barnabas Hughes. The earliest correct algebraic solutions of
cubic equations. In Ronald Calinger, editor, Vita Mathematica.
Mathematical Association of America, 1997.
26. John C. Hull. Options, Futures and Other Derivatives. PrenticeHall India, New Delhi, 7th edition, 2008.
27. John Lintner. The valuation of risk assets and the selection of
risky investments in stock portfolios and capital budgets. Review
of Economics and Statistics, 47:13–37, 1965.
28. David G. Luenberger. Investment Science. Oxford University
Press, New York, 1998.
29. Frederick R. Macaulay. The Movements of Interest Rates. Bond
Yields and Stock Prices in the United States since 1856.
National Bureau of Economic Research, New York, 1938.
30. Harry M. Markowitz. Portfolio selection. Journal of Finance,
7(1):77– 91, 1952.
31. Harry M. Markowitz. Portfolio Selection; Efficient Diversification
of Investments. John Wiley, New York, 1959.
32. Robert C. Merton. Theory of rational option pricing. Bell Journal
of Economics and Management Science, 4(1):141183, 1973.
33. Thomas Mikosch. Elementary Stochastic Calculus, with Finance
in View. World Scientific Publishing, 1998.
34. Jan Mossin. Equilibrium in a capital
Econometrica, 34(4): 768–783, 1966.
asset
market.
35. John P. Nolan. Stable Distributions - Models for Heavy Tailed
Data. Birkhäuser, Boston, 2010. In progress, Chapter 1 online at
htpp://academic2.american.edu/~jpnolan.
36. Geoffrey Poitras. The Early History of Financial Economics,
1478–1776. Edward Elgar Publishing, UK, 2000.
37. Geoffrey Poitras. Frederick R. Macaulay, Frank M. Redington
and the emergence of modern fixed income analysis. In
Geoffrey Poitras, editor, Pioneers of Financial Economics
(Vol.2). Edward Elgar, UK, 2007.
38. Frank M. Redington. Review of the principles of life office
valuations. Journal of the Institute of Actuaries, 78:286–340,
1952.
39. Richard Roll. A critique of the asset pricing theory’s tests. Part I:
On past and potential testability of the theory. Journal of
Financial Economics, 4(2):129–176, 1977.
40. Sheldon M. Ross. An Introduction to Mathematical Finance:
Options and Other Topics. Cambridge University Press,
Cambridge, 1999.
41. Sheldon M. Ross. A First Course in Probability. Prentice-Hall
International, New Jersey, 6th edition, 2001.
42. Mark Rubinstein. A History of the Theory of Investments. John
Wiley & Sons, Hoboken, 2006.
43. Walter Schachermayer. Introduction to the mathematics of
financial markets. In Pierre Bernard, editor, Lectures on
Probability Theory and Statistics, Saint-Flour summer school
2000, volume 1816 of Lecture Notes in Mathematics, pages
111–177. Springer-Verlag, Heidelberg, 2003.
44. William F. Sharpe. Capital asset prices - A theory of market
equilibrium under conditions of risk. Journal of Finance,
19(3):425–42, 1964.
45. William F. Sharpe, Gordon J. Alexander, and Jeffrey V. Bailey.
Investments. Prentice-Hall India, New Delhi, 3rd edition, 2000.
46. Steven E. Shreve. Stochastic Calculus for Finance I: The
Binomial Asset Pricing Model. Springer-Verlag, New York, 2004.
47. Steven E. Shreve. Stochastic Calculus for Finance
Continuous-Time Models. Springer-Verlag, New York, 2004.
II:
48. Joseph Stampfli and Victor Goodman. The Mathematics of
Finance: Modeling and Hedging. Brooks/Cole Thomson
Learning, Australia, 2001.
49. Murad S. Taqqu. Bachelier and his Times: A Conversation with
Bernard Bru. In H. Geman, D. Madan, S. R. Pliska, and T. Vorst,
editors, Mathematical Finance–Bachelier Congress 2000.
Springer-Verlag,
2002.
Downloadable
from
http://www.bu.edu/mathfn/people/bachelier-english43-fin.pdf.
50. James Tobin. Liquidity preference as behavior towards risk. The
Review of Economic Studies, 25:65–86, 1958.
Download