BU7527 - Mathematics of Contingent Claims Mike Peardon School of Mathematics Trinity College Dublin Michaelmas Term, 2015 Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 1 / 27 Sampling Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 2 / 27 Sample mean Consider taking n independent samplings of a stochastic variable: {X1 , X2 , X3 , . . . Xn } X̄(n) , the mean of the sample is itself a stochastic variable. Sample mean For a sequence of n random numbers, {X1 , X2 , X3 , . . . Xn }. The sample mean is 1 n X̄(n) = ∑ Xi n i=1 X̄(n) is also a random number. If all entries have the same mean, µX then E[X̄(n) ] = Mike Peardon (TCD) 1 n E [ Xi ] = µ X n i∑ =1 BU7527 Michaelmas Term, 2015 3 / 27 Variance of sample mean If Xi and Xj are independent then E[Xi Xj ] = E[Xi ]E[Xj ]. This means ν[X̄(N) ] = E[X̄(N) ] − E[X]2 = 1 N2 N ∑ E[Xi Xj ] − E[X]2 i,j=1 ( = = N 1 N 2 E[Xi Xj ] E [ X ] + ∑ ∑ i N 2 i=1 j=1,j6=i 1 1 E[X2 ] − E[X ]2 = ν [X ] N N ) − E[X ]2 Variance of the sample mean ν[X̄(N) ] = Mike Peardon (TCD) BU7527 1 ν [X ] N Michaelmas Term, 2015 4 / 27 Chebyshev’s inequality Chebyshev’s inequality For any e > 0, ν [X ] P (|X − E[X]| ≥ e) ≤ 2 e Proof: with the short-hand µ = E[X], ν [X ] = Z dx (x − µ)2 pX (x) ≥ so ν [X ] ≥ e2 Z dx Z dx |x−µ|≥e (x − µ)2 pX (x) ≥ Z dx e2 pX (x) |x−µ|≥e pX (x) = e2 P (|x − µ| ≥ e) |x−µ|≥e Weak law of large numbers The Chebyshev inequality implies for any e > 0, lim P |X̄(N) − E[X]| ≥ e = 0 N →∞ Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 5 / 27 The law of large numbers Jakob Bernoulli: “Even the stupidest man — by some instinct of nature per se and by no previous instruction (this is truly amazing) — knows for sure the the more observations that are taken, the less the danger will be of straying from the mark”(Ars Conjectandi - 1713). But the strong law of large numbers was only proved in the 20th century (Kolmogorov, Chebyshev, Markov, Borel, Cantelli, . . . ). The strong law of large numbers If X̄(n) is the sample mean of n independent, identically distributed random numbers with well-defined expected value µX and variance, then X̄(n) converges almost surely to µX . P( lim X̄(n) = µX ) = 1 n→ ∞ Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 6 / 27 Example: exponential random numbers X 0.299921 1.539283 1.084130 1.129681 0.001301 1.238275 4.597920 0.679552 0.528081 1.275064 0.873661 1.018920 0.980259 1.115647 1.664513 0.340858 X̄(2) X̄(4) X̄(8) X̄(16) 0.919602 1.013254 1.106906 1.321258 0.619788 1.629262 2.638736 1.147942 0.901572 0.923931 0.946290 0.974625 1.047953 1.025319 1.002685 Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 7 / 27 The central limit theorem As the sample size n grows, the sample mean looks more and more like a normally distributed √ random number with mean µX and standard deviation σX / n The central limit theorem (de Moivre, Laplace, Lyapunov,. . . ) The sample mean of n independent, identically distributed random numbers, each drawn from a distribution with expected value µX and standard deviation σX obeys −aσ +aσ 1 lim P( √ < X̄(n) − µX < √ ) = √ n→ ∞ n n 2π Mike Peardon (TCD) BU7527 Z a 2 /2 −a e−x dx Michaelmas Term, 2015 8 / 27 The central limit theorem (2) The law of large numbers tells us we can find the expected value of a random number by repeated sampling The central limit theorem tells us how to estimate the uncertainty in our determination when we use a finite (but large) sampling. The uncertainty falls with increasing sample size like Mike Peardon (TCD) BU7527 √1 n Michaelmas Term, 2015 9 / 27 The central limit theorem An example: means of bigger sample averages of a random number X with n = 1, 2, 5, 50 14 14 12 12 n=1 10 8 8 6 6 4 4 2 2 0 0 0.2 0.4 0.6 0.8 0 1 14 0.2 0.4 0.6 0.8 1 12 n=5 10 8 6 6 4 4 2 2 0 0.2 0.4 0.6 0.8 n=50 10 8 Mike Peardon (TCD) 0 14 12 0 n=2 10 1 BU7527 0 0 0.2 0.4 0.6 0.8 1 Michaelmas Term, 2015 10 / 27 Confidence intervals The central limit theorem tells us that for sufficiently large sample sizes, all sample means are normally distributed. We can use this to estimate probabilities that the true expected value of a random number lies in a range. One sigma What is the probability √ a sample mean X̄ is more than one standard deviation σX̄ = σX / n from the expected value µX ? If n is large, we have 1 P(−σX̄ < X̄ − µX < σX̄ ) = √ 2π Z 1 −1 2 /2 e−x dx = 68.3% These ranges define confidence intervals . Most commonly seen are the 95% and 99% intervals Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 11 / 27 Confidence intervals (2) Most commonly seen are the 95%(2σ) and 99%(3σ) intervals. P (−σX̄ P (−2σX̄ P (−3σX̄ P (−4σX̄ P (−5σX̄ P (−10σX̄ < X̄ − µX < X̄ − µX < X̄ − µX < X̄ − µX < X̄ − µX < X̄ − µX < σX̄ ) < 2σX̄ ) < 3σX̄ ) < 4σX̄ ) < 5σX̄ ) < 10σX̄ ) 68.2% 95.4% 99.7% 99.994% 99.99994% 99.9999999999999999999985% The standard deviation is usually measured from the sample variance. Beware - the “variance of the variance” is usually large. Five-sigma events have been known ... Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 12 / 27 Sample variance With data alone, we need a way to estimate the variance of a distribution. This can be estimated by measuring the sample variance: Sample variance For n > 1 independent, identically distributed samples of a random number X, with sample mean X̄, the sample variance is σ̄X2 = n 1 (Xi − X̄)2 n − 1 i∑ =1 Now we quantify fluctuations without reference to (or without knowing) the expected value, µX . Note the n − 1 factor. One “degree of freedom” is absorbed into “guessing” the expected value of X Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 13 / 27 Student’s t-distribution In 1908, William Gosset, while working for Guinness in St.James’ Gate published under the pseudonym “Student” Computes the scaling to define a confidence interval when the variance and mean of the underlying distribution are unknown and have been estimated Student’s t-distribution fT ( t ) = p Γ( n2 ) π (n − 1)Γ( n−2 1 ) t2 1+ n−1 −n/2 Used to find the scaling factor c(γ, n) to compute the γ confidence interval for the sample mean P(−cσ̄ < µX < cσ̄) = γ For n > 10, the t-distribution looks very similar to the normal distribution Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 14 / 27 Student’s t-distribution (2) fX(x) 0.4 0.2 0 -3 -2 -1 0 x 1 2 3 blue - normal distribution red - Student t with n = 2. Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 15 / 27 Student’s t-distribution (3) For example, with just 2 samples, the sample mean and variance can be computed but now the confidence levels are: P (−σ̄X P (−2σ̄X P (−3σ̄X P (−4σ̄X P (−5σ̄X P (−10σ̄X < X̄ − µX < X̄ − µX < X̄ − µX < X̄ − µX < X̄ − µX < X̄ − µX < σ̄X ) < 2σ̄X ) < 3σ̄X ) < 4σ̄X ) < 5σ̄X ) < 10σ̄X ) 50% 70.5% 79.5% 84.4% 87.4% 93.7% “Confidences” are much lower because variance is very poorly determined with only two samples. Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 16 / 27 Modelling statistical (Monte Carlo) data Often, we carry out experiments to test a hypothesis. Since the result is a stochastic variable, the hypothesis can never be proved or disproved. Need a way to assign a probability that the hypothesis is false. One place to begin: the χ2 statistic. Suppose we have n measurements, Ȳi , i = 1..n each with standard deviation σi . Also, we have a model which predicts each measurement, giving yi . The χ2 statistic χ2 = Mike Peardon (TCD) n (Ȳi − yi )2 σi2 i=1 ∑ BU7527 Michaelmas Term, 2015 17 / 27 Goodness of fit χ2 ≥ 0 and χ2 = 0 implies Ȳi = yi for all i = 1..n (ie the model and the data agree perfectly). Bigger values of χ2 imply the model is less likely to be true. Note χ2 is itself a stochastic variable Rule-of-thumb χ2 ≈ n for a good model Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 18 / 27 Models with unknown parameters - fitting The model may depend on parameters αp , p = 1 . . . m Now, χ2 is a function of these parameters; χ2 (α). If the parameters are not know a priori, the “best fit” model is described by the set of parameters, α∗ that minimise χ2 (α), so ∂χ2 (α) =0 ∂αp α∗ p ∗ For linear models; yi = ∑m p=1 αp qi , finding α is equivalent to solving a linear system. For more general models, finding minima of χ2 can be a challenge. . . Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 19 / 27 Example - one parameter fit Fit a straight line through the origin Consider the following measured data Yi ± σi , i = 1..5 for inputs xi i 1 2 3 4 5 xi 0.1 0.5 0.7 0.9 1.0 Yi 0.25 0.90 1.20 1.70 2.20 σi 0.05 0.10 0.05 0.10 0.20 Fit this to a straight line through the origin, so our model is y(x) = αx with α an unknown parameter we want to determine Result: α = 1.8097 and χ2 = 8.0. Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 20 / 27 Example - one parameter fit (2) 3 2.5 y 2 1.5 1 0.5 0 0 Mike Peardon (TCD) 0.2 0.4 x BU7527 0.6 0.8 1 Michaelmas Term, 2015 21 / 27 Models with unknown parameters - fitting (2) Example: fitting data to a straight line Suppose for a set of inputs, xi , i = 1..n we measure output Ȳi ± σi . If Y is modelled by a simple straight-line function; yi = α1 + α2 xi , what values of {α1 , α2 } minimise χ2 ? χ2 (α1 , α2 ) is given by χ2 ( α1 , α2 ) = The minimum is at α1∗ = α2∗ = Mike Peardon (TCD) n (Ȳi − α1 − α2 xi )2 σi2 i=1 ∑ A22 b1 − A12 b2 A11 A22 − A212 A11 b2 − A12 b1 A11 A22 − A212 BU7527 Michaelmas Term, 2015 22 / 27 Models with unknown parameters - fitting (3) Example: fitting data to a straight line n A11 = 1 ∑ σ2 i=1 i n A12 = xi ∑ σ2 i=1 i b1 = n Ȳi i=1 i ∑ σ2 n x2i xi Ȳi b = 2 ∑ σ2 ∑ σ2 i=1 i i=1 ∗i The best-fit parameters, α1,2 are themselves stochastic variables, and so have a probabilistic distribution n A22 = A range of likely values must be given; the width is approximated by s s A22 A11 α α ,σ = σ1 = A11 A22 − A212 2 A11 A22 − A212 Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 23 / 27 Example - two parameter fit (2) 3 2.5 y 2 1.5 1 0.5 0 0 0.2 0.4 x 0.6 0.8 1 Now χ2 goes down from 8.0 → 7.1. Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 24 / 27 Example - try both fits again ... 3.5 3 2.5 y 2 1.5 1 0.5 0 0 0.2 0.4 x 0.6 0.8 1 Now χ2 is 357 for the y = αx model but still 7.1 for the y = α1 + α2 x model. The first model should be ruled out. Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 25 / 27 Uncertainty propagates The best fit parameter(s) α∗ have been determined from statistical data - so we must quote an uncertainty. How precisely have they been determined? α∗ is a function of the statistical data, Ȳ. A statistical fluctuation in Ȳ ∗ of dȲ would result in a fluctuation in α∗ of dα dȲ. dȲ All the measured Y values fluctuate but if they are independent, the fluctuations only add in quadrature so: Error in the best fit parameters: σα2∗ = m ∑ i=1 Mike Peardon (TCD) dα∗ dYi BU7527 2 σi2 Michaelmas Term, 2015 26 / 27 Uncertainty propagates (2) Back to our example: One-parameter fit We found α∗ = b/A with A= So dα∗ dyi = 1 db A dyi n x2 i=1 i n xi yi 2 i=1 σi ∑ σi2 and b = ∑ since A is fixed. We get σα2∗ 1 = 2 A n ∑ i=1 xi σi2 !2 σi2 = 1 A Back to our first example: We quote α∗ = 1.81 ± 0.05. Mike Peardon (TCD) BU7527 Michaelmas Term, 2015 27 / 27