BU7527 - Mathematics of Contingent Claims Mike Peardon Michaelmas Term, 2015

advertisement
BU7527 - Mathematics of Contingent Claims
Mike Peardon
School of Mathematics
Trinity College Dublin
Michaelmas Term, 2015
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
1 / 27
Sampling
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
2 / 27
Sample mean
Consider taking n independent samplings of a stochastic variable:
{X1 , X2 , X3 , . . . Xn }
X̄(n) , the mean of the sample is itself a stochastic variable.
Sample mean
For a sequence of n random numbers, {X1 , X2 , X3 , . . . Xn }. The sample
mean is
1 n
X̄(n) = ∑ Xi
n i=1
X̄(n) is also a random number.
If all entries have the same mean, µX then
E[X̄(n) ] =
Mike Peardon (TCD)
1 n
E [ Xi ] = µ X
n i∑
=1
BU7527
Michaelmas Term, 2015
3 / 27
Variance of sample mean
If Xi and Xj are independent then E[Xi Xj ] = E[Xi ]E[Xj ]. This means
ν[X̄(N) ] = E[X̄(N) ] − E[X]2 =
1
N2
N
∑ E[Xi Xj ] − E[X]2
i,j=1
(
=
=
N
1 N
2
E[Xi Xj ]
E
[
X
]
+
∑
∑
i
N 2 i=1
j=1,j6=i
1 1
E[X2 ] − E[X ]2 = ν [X ]
N
N
)
− E[X ]2
Variance of the sample mean
ν[X̄(N) ] =
Mike Peardon (TCD)
BU7527
1
ν [X ]
N
Michaelmas Term, 2015
4 / 27
Chebyshev’s inequality
Chebyshev’s inequality
For any e > 0,
ν [X ]
P (|X − E[X]| ≥ e) ≤ 2
e
Proof: with the short-hand µ = E[X],
ν [X ] =
Z
dx (x − µ)2 pX (x) ≥
so
ν [X ] ≥ e2
Z
dx
Z
dx
|x−µ|≥e
(x − µ)2 pX (x) ≥
Z
dx e2 pX (x)
|x−µ|≥e
pX (x) = e2 P (|x − µ| ≥ e)
|x−µ|≥e
Weak law of large numbers
The Chebyshev inequality implies for any e > 0,
lim P |X̄(N) − E[X]| ≥ e = 0
N →∞
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
5 / 27
The law of large numbers
Jakob Bernoulli: “Even the stupidest man — by some instinct of
nature per se and by no previous instruction (this is truly amazing) —
knows for sure the the more observations that are taken, the less the
danger will be of straying from the mark”(Ars Conjectandi - 1713).
But the strong law of large numbers was only proved in the 20th
century (Kolmogorov, Chebyshev, Markov, Borel, Cantelli, . . . ).
The strong law of large numbers
If X̄(n) is the sample mean of n independent, identically distributed
random numbers with well-defined expected value µX and variance,
then X̄(n) converges almost surely to µX .
P( lim X̄(n) = µX ) = 1
n→ ∞
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
6 / 27
Example: exponential random numbers
X
0.299921
1.539283
1.084130
1.129681
0.001301
1.238275
4.597920
0.679552
0.528081
1.275064
0.873661
1.018920
0.980259
1.115647
1.664513
0.340858
X̄(2)
X̄(4)
X̄(8)
X̄(16)
0.919602
1.013254
1.106906
1.321258
0.619788
1.629262
2.638736
1.147942
0.901572
0.923931
0.946290
0.974625
1.047953
1.025319
1.002685
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
7 / 27
The central limit theorem
As the sample size n grows, the sample mean looks more and more
like a normally distributed
√ random number with mean µX and
standard deviation σX / n
The central limit theorem (de Moivre, Laplace, Lyapunov,. . . )
The sample mean of n independent, identically distributed random
numbers, each drawn from a distribution with expected value µX and
standard deviation σX obeys
−aσ
+aσ
1
lim P( √ < X̄(n) − µX < √ ) = √
n→ ∞
n
n
2π
Mike Peardon (TCD)
BU7527
Z a
2 /2
−a
e−x
dx
Michaelmas Term, 2015
8 / 27
The central limit theorem (2)
The law of large numbers tells us we can find the expected value of
a random number by repeated sampling
The central limit theorem tells us how to estimate the uncertainty in
our determination when we use a finite (but large) sampling.
The uncertainty falls with increasing sample size like
Mike Peardon (TCD)
BU7527
√1
n
Michaelmas Term, 2015
9 / 27
The central limit theorem
An example: means of bigger sample averages of a random number
X with n = 1, 2, 5, 50
14
14
12
12
n=1
10
8
8
6
6
4
4
2
2
0
0
0.2 0.4 0.6 0.8
0
1
14
0.2 0.4 0.6 0.8
1
12
n=5
10
8
6
6
4
4
2
2
0
0.2 0.4 0.6 0.8
n=50
10
8
Mike Peardon (TCD)
0
14
12
0
n=2
10
1
BU7527
0
0
0.2 0.4 0.6 0.8
1
Michaelmas Term, 2015
10 / 27
Confidence intervals
The central limit theorem tells us that for sufficiently large sample
sizes, all sample means are normally distributed. We can use this to
estimate probabilities that the true expected value of a random
number lies in a range.
One sigma
What is the probability
√ a sample mean X̄ is more than one standard
deviation σX̄ = σX / n from the expected value µX ? If n is large, we have
1
P(−σX̄ < X̄ − µX < σX̄ ) = √
2π
Z 1
−1
2 /2
e−x
dx = 68.3%
These ranges define confidence intervals .
Most commonly seen are the 95% and 99% intervals
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
11 / 27
Confidence intervals (2)
Most commonly seen are the 95%(2σ) and 99%(3σ) intervals.
P (−σX̄
P (−2σX̄
P (−3σX̄
P (−4σX̄
P (−5σX̄
P (−10σX̄
< X̄ − µX
< X̄ − µX
< X̄ − µX
< X̄ − µX
< X̄ − µX
< X̄ − µX
< σX̄ )
< 2σX̄ )
< 3σX̄ )
< 4σX̄ )
< 5σX̄ )
< 10σX̄ )
68.2%
95.4%
99.7%
99.994%
99.99994%
99.9999999999999999999985%
The standard deviation is usually measured from the sample
variance.
Beware - the “variance of the variance” is usually large.
Five-sigma events have been known ...
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
12 / 27
Sample variance
With data alone, we need a way to estimate the variance of a
distribution. This can be estimated by measuring the sample
variance:
Sample variance
For n > 1 independent, identically distributed samples of a random
number X, with sample mean X̄, the sample variance is
σ̄X2 =
n
1
(Xi − X̄)2
n − 1 i∑
=1
Now we quantify fluctuations without reference to (or without
knowing) the expected value, µX .
Note the n − 1 factor. One “degree of freedom” is absorbed into
“guessing” the expected value of X
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
13 / 27
Student’s t-distribution
In 1908, William Gosset, while working for Guinness in St.James’
Gate published under the pseudonym “Student”
Computes the scaling to define a confidence interval when the
variance and mean of the underlying distribution are unknown and
have been estimated
Student’s t-distribution
fT ( t ) = p
Γ( n2 )
π (n − 1)Γ( n−2 1 )
t2
1+
n−1
−n/2
Used to find the scaling factor c(γ, n) to compute the γ confidence
interval for the sample mean
P(−cσ̄ < µX < cσ̄) = γ
For n > 10, the t-distribution looks very similar to the normal
distribution
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
14 / 27
Student’s t-distribution (2)
fX(x)
0.4
0.2
0
-3
-2
-1
0
x
1
2
3
blue - normal distribution
red - Student t with n = 2.
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
15 / 27
Student’s t-distribution (3)
For example, with just 2 samples, the sample mean and variance
can be computed but now the confidence levels are:
P (−σ̄X
P (−2σ̄X
P (−3σ̄X
P (−4σ̄X
P (−5σ̄X
P (−10σ̄X
< X̄ − µX
< X̄ − µX
< X̄ − µX
< X̄ − µX
< X̄ − µX
< X̄ − µX
< σ̄X )
< 2σ̄X )
< 3σ̄X )
< 4σ̄X )
< 5σ̄X )
< 10σ̄X )
50%
70.5%
79.5%
84.4%
87.4%
93.7%
“Confidences” are much lower because variance is very poorly
determined with only two samples.
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
16 / 27
Modelling statistical (Monte Carlo) data
Often, we carry out experiments to test a hypothesis. Since the
result is a stochastic variable, the hypothesis can never be proved
or disproved.
Need a way to assign a probability that the hypothesis is false. One
place to begin: the χ2 statistic.
Suppose we have n measurements, Ȳi , i = 1..n each with standard
deviation σi . Also, we have a model which predicts each
measurement, giving yi .
The χ2 statistic
χ2 =
Mike Peardon (TCD)
n
(Ȳi − yi )2
σi2
i=1
∑
BU7527
Michaelmas Term, 2015
17 / 27
Goodness of fit
χ2 ≥ 0 and χ2 = 0 implies Ȳi = yi for all i = 1..n (ie the model and
the data agree perfectly).
Bigger values of χ2 imply the model is less likely to be true.
Note χ2 is itself a stochastic variable
Rule-of-thumb
χ2 ≈ n for a good model
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
18 / 27
Models with unknown parameters - fitting
The model may depend on parameters αp , p = 1 . . . m
Now, χ2 is a function of these parameters; χ2 (α).
If the parameters are not know a priori, the “best fit” model is
described by the set of parameters, α∗ that minimise χ2 (α), so
∂χ2 (α) =0
∂αp α∗
p
∗
For linear models; yi = ∑m
p=1 αp qi , finding α is equivalent to solving
a linear system.
For more general models, finding minima of χ2 can be a challenge. . .
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
19 / 27
Example - one parameter fit
Fit a straight line through the origin
Consider the following measured data Yi ± σi , i = 1..5 for inputs xi
i
1
2
3
4
5
xi
0.1
0.5
0.7
0.9
1.0
Yi
0.25
0.90
1.20
1.70
2.20
σi
0.05
0.10
0.05
0.10
0.20
Fit this to a straight line through the origin, so our model is
y(x) = αx
with α an unknown parameter we want to determine
Result: α = 1.8097 and χ2 = 8.0.
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
20 / 27
Example - one parameter fit (2)
3
2.5
y
2
1.5
1
0.5
0
0
Mike Peardon (TCD)
0.2
0.4
x
BU7527
0.6
0.8
1
Michaelmas Term, 2015
21 / 27
Models with unknown parameters - fitting (2)
Example: fitting data to a straight line
Suppose for a set of inputs, xi , i = 1..n we measure output Ȳi ± σi .
If Y is modelled by a simple straight-line function; yi = α1 + α2 xi ,
what values of {α1 , α2 } minimise χ2 ?
χ2 (α1 , α2 ) is given by
χ2 ( α1 , α2 ) =
The minimum is at
α1∗ =
α2∗ =
Mike Peardon (TCD)
n
(Ȳi − α1 − α2 xi )2
σi2
i=1
∑
A22 b1 − A12 b2
A11 A22 − A212
A11 b2 − A12 b1
A11 A22 − A212
BU7527
Michaelmas Term, 2015
22 / 27
Models with unknown parameters - fitting (3)
Example: fitting data to a straight line
n
A11 =
1
∑ σ2
i=1 i
n
A12 =
xi
∑ σ2
i=1 i
b1 =
n
Ȳi
i=1
i
∑ σ2
n
x2i
xi Ȳi
b
=
2
∑ σ2
∑ σ2
i=1
i
i=1 ∗i
The best-fit parameters,
α1,2 are themselves
stochastic
variables,
and so have a probabilistic distribution
n
A22 =
A range of likely values must be given; the width is approximated by
s
s
A22
A11
α
α
,σ =
σ1 =
A11 A22 − A212 2
A11 A22 − A212
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
23 / 27
Example - two parameter fit (2)
3
2.5
y
2
1.5
1
0.5
0
0
0.2
0.4
x
0.6
0.8
1
Now χ2 goes down from 8.0 → 7.1.
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
24 / 27
Example - try both fits again ...
3.5
3
2.5
y
2
1.5
1
0.5
0
0
0.2
0.4
x
0.6
0.8
1
Now χ2 is 357 for the y = αx model but still 7.1 for the y = α1 + α2 x
model. The first model should be ruled out.
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
25 / 27
Uncertainty propagates
The best fit parameter(s) α∗ have been determined from statistical
data - so we must quote an uncertainty. How precisely have they
been determined?
α∗ is a function of the statistical data, Ȳ. A statistical fluctuation in Ȳ
∗
of dȲ would result in a fluctuation in α∗ of dα
dȲ.
dȲ
All the measured Y values fluctuate but if they are independent, the
fluctuations only add in quadrature so:
Error in the best fit parameters:
σα2∗ =
m
∑
i=1
Mike Peardon (TCD)
dα∗
dYi
BU7527
2
σi2
Michaelmas Term, 2015
26 / 27
Uncertainty propagates (2)
Back to our example:
One-parameter fit
We found α∗ = b/A with
A=
So
dα∗
dyi
=
1 db
A dyi
n
x2
i=1
i
n
xi yi
2
i=1 σi
∑ σi2 and b = ∑
since A is fixed. We get
σα2∗
1
= 2
A
n
∑
i=1
xi
σi2
!2
σi2 =
1
A
Back to our first example: We quote α∗ = 1.81 ± 0.05.
Mike Peardon (TCD)
BU7527
Michaelmas Term, 2015
27 / 27
Download