Statistics for Finance

advertisement
Statistics for Finance
1. Lecture 4:Regression.
The theme of this lecture is how to fit data to certain functions. Suppose that
we collect data (xi , yi ), for i = 1, 2, . . . and we would like to fit them to a straight
line of the form y = β0 + β1 x. Then, essentially we need to estimate the intercept
β0 and the slope β1 x. A popular method to do this is to use the Least Squares
Method, which amounts to finding β0 , β1 , that minimise the quantity
S(β0 , β1 ) =
n
X
(yi − β0 − β1 xi )2 .
i=1
To do so we differentiate with respect to β0 , β1 and set the derivatives equal to zero
n
X
∂S
= −2
(yi − β0 − β1 xi )
∂β0
i=1
n
X
∂S
= −2
xi (yi − β0 − β1 xi )
∂β1
i=1
and setting the derivatives equal to zero we get
n
X
yi = nβ̂0 + β̂1
i=1
n
X
n
X
xi
i=1
n
X
xi yi = β̂0 xi + β̂1
i=1
x2i
i=1
and solving for β̂0 , β̂1 we get
Pn
Pn
Pn 2 Pn
xi y i
i=1 xi
i=1 xi
i=1 yi −
(1)
β̂0 =
2i=1
Pn 2
Pn
n i=1 xi −
i=1 xi
(2)
β̂1 =
n
Pn
i=1
n
xi y i −
Pn
i=1
Pn
x2i −
i=1
xi
Pn
Pn
i=1
xi
i=1
2
and it can be shown that
(3)
(4)
β̂1
β̂0 = y − β̂1 x
Pn
(x − x)(yi − y)
i=1
Pn i
=
2
i=1 (xi − x)
1
yi
2
The x variables are often called the predictor variables and the y variables are
called the response variables.
More complicated examples will be the case when we want to fit data to linear
functions of more than one variables, e.g.
y = β 0 + β 1 x1 + β 2 x2 + β 3 x3 .
Be careful not to confuse the variables x1 , x2 , x3 that appear above with the data
values xi , for i = 1, 2, . . . , that appeared before. The data values, now, should be
represented as xij , for i = 1, 2, . . . , n and j = 1, 2, 3. Following the least squares
method the problem amounts to finding the constants β̂0 , β̂1 , β̂2 , β̂3 , that minimise
the functional
n
X
S(β0 , β1 , β2 , β3 ) =
(yi − β0 + β1 xi1 + β2 xi2 + β3 xi3 )2
i=1
This is also possible to be done following the previous procedure, i.e. differentiating
with respect to the parameters and solving the equations.
Often one would like to fit data to nonlinear functions, such as
f (t) = Ae−at + Be−bt .
The least squares functional in this case would be
S(A, B, a, b) =
n
X
(yi − Ae−ati − Bebti )2
i=1
and in this case the equations that one is led to are nonlinear and usually cannot
be solved explicitly. One then needs to resort to an iterative procedure.
For our purposes we will focus to fitting data to linear functions. Let us remark,
though, that it may happen that fitting data to nonlinear functions may be reduced
to fitting data to linear function. For example suppose we want to fit data to the
function
y = β0 e−β1 x
where the uknown parameters are β0 , β1 . This nonlinear function can turn into a
linear function by taking the logarithms
log y = log β0 − β1 x
and then we can use the data (xi , log yi ) to find the parameters from the functional
S(β0 , β1 ) =
n
X
i=1
(log yi − log β0 + β1 xi )2
3
Definition 1. A linear functional
f (x1 , x2 , . . . , xp−1 ) = β0 + β1 x1 + · · · + βp−1 xp−1
is called a linear regression of y on x
1.1. Statistical Properties.
Almost always there is some “noise” in the data which affects the reliability of
the estimation of the parameters of the linear regression. A simple way to model
the presence of noise is by considering independent variables ei , i = 1, 2, . . . with
mean zero and variance σ 2 . Then we can assert that the observed values of y is a
linear function of x plus the noise, i.e.
y i = β 0 + β 1 xi + e i
i = 1, 2, . . . , n
This is known as the statistical standard model. These equations can be written
as
(5)
yi − ei = β0 + β1 xi
i = 1, 2, . . . , n
and therefore we can use the equations (1), (2) to derive the estimators for β0 , β1 .
Notice that because of the presence of the noise ei , i = 1, 2 . . . , the estimators β̂0
and β̂1 will be random variables.
Theorem 1. Under the assumptions of the statistical standard model the least square
estimates are unbiased, i.e. E[β̂j ] = βj , for j = 0, 1.
Proof. From the assumption that E[ei ] = 0 we have that E[yi ] = β0 +β1 xi . Therefore
from equation (1) we get that
Pn
Pn
Pn 2 Pn
xi E[yi ]
i=1 xi
i=1 E[yi ] −
i=1 xi
E[β̂0 ] =
i=1
Pn
Pn 2
2
n i=1 xi −
i=1 xi
Pn 2 Pn
Pn
Pn
Pn 2 nβ0 + β1 i=1 xi −
i=1 xi
i=1 xi β0
i=1 xi + β1
i=1 xi
=
2
Pn 2
Pn
n i=1 xi −
i=1 xi
= β0
The proof corresponding to β1 is similar (Exercise).
Theorem 2. Under the assumptions of the standard statistical model we have
P
σ 2 ni=1 x2i
Var(β̂0 ) = Pn
2
Pn
n i=1 x2i −
x
i
i=1
Var(β̂1 ) =
nσ 2
n
Pn
i=1
x2i −
Pn
i=1
xi
2
4
Cov(β̂0 , β̂1 ) =
−σ 2
Pn
n
i=1
Pn
xi
Pn
i=1
x2i −
i=1
xi
2
In the previous theorem we see that the variances depend on the xi ’s and on the
variance σ 2 . We would need to estimate the σ 2 . This can be done writing
ei = yi − β0 − β1 xi
therefore, it is natural to try to estimate the variance σ 2 by the average deviations
of the data from the fitted line, i.e. yi − β̂0 − β̂1 xi .
We define the residual sum of squares (RSS) by
RSS =
n
X
(yi − β̂0 − β̂1 xi )2 .
i=1
It can be shown that the quantity
s2 =
(6)
RSS
n−2
is an unbiased estimator of σ 2 . The divisor n − 2 is because two parameters (β̂0 , β̂1 )
have been estimated from the data, therefore reducing the degrees of freedom to
n − 2.
Once we estimate the σ 2 we can estimate the variances of β̂0 , β̂1 by the formulae
of Theorem 2, where we replace σ 2 by s2 . The estimators for the variances wil be
denoted by s2β̂ , s2β̂ .
0
1
1.2. Assessing the Fit.
The residual introduced in the previous section i.e.
êi = yi − β̂0 − β̂1 xi
can be used in assesing the quality of the fit. Often we plot the residual versus the
x values. Such plots may reveal systematic misfit. Since the noise ei is considered
to be a collection of independent random variables, the residuals should bear no
relation to the x values and ideally the plot should look as a horizontal blur.
For example let’s look at the following data examining the relationship between
the depth of a stream and the rate of its flow
5
Depth Flow
.32
.636
.29
.319
.28
.734
.42
1.327
.29
.487
.41
.924
.76
7.350
.73
5.890
.46
1.979
.40
1.124
The following diagrams show the least squares line as well as the residual diagram.
6
The above diagrams indicate some deviations from a linear fit. This is a bit more
apparent from the residual fit.
One can attemp to investigate a possible nonlinear dependence and therefore try
to plot the log values of the data. The diagrams we get in this case are
In this case the data seem to better fit the line and also the residuals are fairly
enough scattered.
Normal probability plots can also be used to assess the fitting. For examples look
at the book of Rice, page 556.
1.3. Correlation and Regression.
7
We will explore the relation between correlation and fitting data by the least
square method. First we have,
n
1X
(7)
sxx =
(xi − x)2
n i=1
n
(8)
syy
1X
=
(yi − y)2
n i=1
sxy
1X
(xi − x)(yi − y)
=
n i=1
n
(9)
the sample variances and covariance, between the predictor and the response. We
also have the sample correlation
sxy
r=√
sxx syy
As you may have seen from the homework the least squares slope is given by
sxy
β̂1 =
sxx
and so the sample correlation is given by
r
sxx
.
r = β̂1
syy
If we denote the regression respone variable by ŷ, i.e. ŷ = β̂0 + β̂1 x then we have
that (exercise)
(10)
ŷ − y
x−x
=r√
√
syy
sxx
The interpretation of this equation is as follows: Suppose that r > 0 and that
the predictor variable is one standard deviation greater than its average. Then the
standard deviation of the response from its mean is r standard deviations. Notice
that always r ≤ 1, therefore the response tends to have less deviation than its
predictor.
1.4. Matrix Approach to Least Squares.
Matrix analysis offers a convenient way to represent and analyse linear equations.
Suppose that the linear model
y = β0 + β1 x1 + · · · + βp−1 xp−1
is to fit the data
yi , xi1 , . . . , xi,p−1 ,
i = 1, . . . , n
8
Then the observations (y1 , . . . , yn ) will be represented by a vector Y and the unknowns (β0 , . . . , βp−1 ) will be represented by a vector β. Finally we will have the
n × p matrix


1 x11 . . . x1,p−1
 1 x21 . . . x2,p−1 

X=
..
 ... ...

.
1 xn1 . . . xn,p−1
Then the vector of the fitted or predicted values Ŷ can be written as
Ŷ = Xβ
The least squares problem can then be phrased as finding the vector β that minimises
S(β) =
n
X
(yi − β0 − · · · − βp−1 xi,p−1 )2
i=1
= kY − Xβk2
= kY − Ŷk2
P
where we used the notation kuk = ni u2i for a vector u. If A is a matrix then AT
is the transpose, meaning ATij = Aji . We also have that
S(β) = kY − Xβk2 = (Y − Xβ)T (Y − Xβ)
= YT Y − (Xβ)T Y − YT (Xβ)T − (Xβ)T (Xβ)
To find the minimiser β we need to solve the equation
∇β S(β) = 0
which writes as
XT Xβ̂ = XT Y
If XT X is nonsingular, i.e. invertible, then the solution of the above equation is
β̂ = (XT X)−1 XT Y
We can also incorporate in the above formulation the noise ei , i = 1, 2, . . . , n as a
noise vector e = (e1 , . . . , en )T . The corresponding equation then equations (5) write
as
Y = Xβ + e
The covariance matrix of the vector e is
Σ = σ2I
where I is the identity matrix and σ 2 = Var(ei ), while we assume that ei ’s are
normal i.i.d with mean zero.
9
We can reprove that the least squares estimator β̂ is unbiased since
β̂ = (XT X)−1 XT Y
= (XT X)−1 XT (Xβ + e)
= β + (XT X)−1 XT e
and therefore
E[β̂] = β + (XT X)−1 XT E[e] = β
Using this formulation we can also compute the covariance matrix of the least squares
estimator. To do this we need to the following theorem
Theorem 3. Let Z = c+AX, where X is a random vector, A is a fixed, nonrandom
matrix and c a constant, nonrandom vector. Let ΣY Y the covariance matric of Y,
then the covariance matrix of Z is
ΣZZ = AΣY Y AT .
Using this theorem we can prove that the covariance matrix of the least squares
estimator is
(11)
Σβ̂ β̂ = σ 2 (XT X)−1 .
Some Financial Applications: Regression Hedging.
We will present an application of regression into determining the optimal hedge
of a bond option. This paragraph follows the exposition in Ruppert pg 181, where
we refer for more details and applications.
Market makers buy securities at bid price and make a profit by selling them at
an ask price. Suppose a market maker has just purchased a bond from a pension
fund, which would like to ideally sell immediately. On the other hand he might be
bound to sell them after a certain maturity time. The market maker is at risk that
the bond price could drop due to change in interest rates. In order to minimize
this risk the market maker will resort to hedging. This means that he is willing
to assume another risk that is likely to be in the opposite direction from the risk
due to holding the immature bonds. In this way the two risks may cancel. To
hedge an interest rate risk of the bond being held, the market maker can sell other,
more liquid bonds, short. For example he might decide to sell a 30-year Treasury
bond, which is quite liquid and can be sold immediately. Regression hedging
determines the optimal amount of the 30-year old treasury to sell short in order to
hedge the risk of the bond just purchased. In this way he hopes that the price of
the portfolio of the long position in the first bond and the 30-year Treasury changes
little as yields change.
10
Suppose the first bond has maturity of 25 years. Let y30 be the yield on the
30-year bonds, i.e. the interest rate. Let P30 be the price of 1$ in face amount of
30-year bonds, i.e.the amount paid to the holder at maturity, in this case 30 years
from now.
There is also a relevant quantity called the duration, DUR30 such that the change
in price, ∆P30 and the change in yield ∆y30 are related by
∆P30 ' −P30 DUR30 ∆y30
for small values of ∆y30 (Duration is the weighted average of the maturities of future
cash flows with weights proportional to the net present value of the cash flows. For
a zero-coupon bond, duration equals time to maturity. Duration is a measure of
interest rate risk). A similar equation holds for the 25-year bond. Consider, now,
a portfolio that holds F25 in 25-year bonds and is short F30 in 30-year bonds. The
value of the portfolio then is
F25 P25 − F30 P30 .
If ∆y30 , ∆y25 are the changes in the yields, then the change in the value of the bond
is approximatelly
F30 P30 DUR30 ∆y30 − F25 P25 DUR25 ∆y25 .
Suppose that the regression of ∆y30 on ∆y25 is
∆y30 = β̂0 + β̂1 ∆y25 .
We also adapt the usual assumption that β̂0 ' 0.Then we get that the change in
price is
F30 P30 DUR30 β̂1 − F25 P25 DUR25 ∆y25 .
This change is approximately zero if
P25 DUR25
F30 = F25
P30 DUR30 β̂1
and this tells us how much face value of the 30-year bond we need to sell short in
order to hedge F25 face value of the 25-year bond.
1.5. Exercises.
1. Derive relations (3),(4)
2. Finish the proof of Theorem .
P
3. Prove Theorem 2 You may find the fact that (xi − x) = 0 as well as Exercise
1, useful.
4. Prove relation (10)
5. Prove theorem 3
6. Prove relation (11)
11
7. A study of commercial bank branches obtains data on the number of independent businesses. It records the amount of money (in GBP 1000) each business
deposits a year (x) and the amount (in GBP 1000) they save within this year (y).
The data from the research are summarised below
x : 31.5 33.1 27.4 24.5 27.0 27.8 23.3 24.7 16.9 18.1
y : 18.1 20.0 20.8 21.5 22.0 22.4 22.9 24.0 25.4 27.3
(I) Identify the least squares estimates for β0 , β1 in the model y = β0 + β1 x.
(II) Predict y for x = 19.5.
(III) Identify the sample standard deviation about the regression line, i.e. the
residual standard deviation.
Download