MATH5315 Applied Statistics and Probability 2011-2012 Lecturer: Andrew J. Baczkowski – room 9.21i – email: sta6ajb@leeds.ac.uk Regularly updated information about the module is available on the internet at: http://www.maths.leeds.ac.uk/∼sta6ajb/math5315/math5315.html Module Objective: The aim of the module is to provide a grounding in the aspects of statistics, in particular statistical modelling, that are of relevance to actuarial and financial work. The module introduces and develops the fundamental concepts of probability and statistics used in applied financial analysis. Provisional Detailed Syllabus: Part I: Fundamentals of Probability (11 lectures) Summarising data; Introduction to probability; Random variables; Probability distributions; Generating functions; Joint distributions; The central limit theorem; Conditional expectation. Part II: Fundamentals of Statistics (9 lectures) Sampling and statistical inference; Point estimation; Confidence intervals; Hypothesis testing. Part III: Applied Statistics (10 lectures) Correlation and regression (OLS); Analysis of variance (ANOVA); Univariate time series analysis and forecasting (ARMA); Multivariate time series analysis (VAR); Cointegration; Volatility models (ARCH/GARCH). Booklist: Sections from the two books will form the course notes for this module. 1. “Subject CT3 Probability and Mathematical Statistics Core Technical, Core Reading”, published by Institute of Actuaries, price ≈ £45. Referred to as CT3. 2. “Introductory Econometrics for Finance (2nd edition)” by C. Brookes, published by Cambridge University Press, 2008, price ≈ £40. Referred to as IEF. IT is ESSENTIAL that you have access to these books. You MUST prepare the material BEFORE the lectures, which will consist of examples and further explanation to illustrate the book material. Timetable: Lectures (weeks 1-5): Tuesdays 10-11 in RSLT14 and Tuesdays 12-1 in RSLT08. Lectures (weeks 1-4, 6-11): Fridays 1-3 in RSLT08. Seminar (weeks 1-4, 6): Fridays 3-4 in E.C.Stoner Building, room 9.90. Practical (weeks 7-11): Fridays 3-4 in Irene Manton North cluster. (RSLT is the Roger Stevens Lecture Theatre Block). Assessment: 70% of marks for two hour examination at end of semester. 30% of marks for continuous assessment practical work. Examination Paper: Format currently planned as follows for the TWO hour paper. Eleven section A questions each worth TWO marks. Eleven section B questions each worth THREE marks. Eleven section C questions each worth FIVE marks. You attempt TEN section A questions, TEN section B questions, and TEN section C questions. Exercise Sheets for MATH5315: None. I will introduce examples as we need them. There are some questions (and answers) available on the module web-page. 1 MATH5315 Applied Statistics and Probability Lecture 1: Summarising Data References: CT3 Unit 1. (CT3 denotes “Subject CT3 Probability and Mathematical Statistics Core Technical, Core Reading”, published by Institute of Actuaries, price ≈ £45.) §2 Tabular and graphical methods. §2.1 Types of data. Discrete and continuous data. §2.2 Frequency distribution. A line chart is better for discrete data than a bar chart! §2.3 Histograms. §2.5 Lineplots. Dotplots. Cumulative frequency is frequency ≤ x. §3 Measures of Location. §3.1 The mean. Sample mean x̄. §3.2 The median. §4 Measures of spread. §4.1 The standard deviation. §4.2 Moments. Sample standard deviation s, sample variance s2 . Sample moments m′k = n n i=1 i=1 1X k 1X xi , m k = (xi − x̄)k . n n §4.3 The range. §4.4 The interquartile range. 1 2 (Q3 − Q1 ). More often people use the semi-interquartile range SIQ = §5 Symmetry and skewness. §5.1 Boxplots. 2 MATH5315 Applied Statistics and Probability Lecture 2: Introduction to Probability References: CT3 Unit 2. §1 Introduction to sets. Sample space S, event A. §1.1 Complementary sets. §1.2 Set operations. A′ . More usually Ac might be used. Union ∪ and intersection ∩. §2 Probability axioms and the addition rule. §2.1 Basic probability axioms. P{A ∪ B} = P{A} + P{B}. §2.2 The addition rule. P{S} = 1, 0 ≤ P{A} ≤ 1. If A and B are mutually exclusive, In general P{A ∪ B} = P{A} + P{B} − P{A ∩ B}. §3 Conditional probability. §3.1 Independent events. P{A|B} = P{A ∩ B} . P{B} A and B are independent if and only if P{A ∩ B} = P{A} P{B}. §3.2 Theorem of total probability. P{A} = n X j=1 §3.3 Bayes’ Theorem. P{Ei |A} = P{A ∩ Ej }. P{A|Ei } P{Ei } . n X P{A|Ej } P{Ej } j=1 3 MATH5315 Applied Statistics and Probability Lecture 3: Random Variables References: CT3 Unit 3. §1 Discrete random variables. Random variable X. Probability function fX (x) = P{X = x}; notation pX (x) would be better! X Cumulative distribution function (cdf) FX (x) = P{X ≤ x} = fX (xi ). xi ≤x §2 Continuous random variables. Probability density function (pdf) fX (x). Z x Z b dFX (x) fX (t)dt, so fX (x) = fX (x)dx. Cdf FX (x) = P{a ≤ X ≤ b} = . dx −∞ a §3 Expected values. §3.1 Mean. E[X] or µ. More generally E[g(X)]. §3.2 Variance and standard deviation. notation!) Standard deviation σ. §3.3 Linear functions of X. §3.4 Moments. using µ3 /σ 3 . Variance V[X] often denoted σ 2 . (I prefer Var[X] as E[aX + b] = aµ + b. V[aX + b] = a2 σ 2 . µk = E[(X − µ)k ] is kth central moment of X about µ. Can measure skewness §4 Functions of a random variable. §4.1 Discrete random variables. y1 = u(x1 ). Y = u(X). If have a 1–1 mapping, P{Y = y1 } = P{X = x1 } where §4.2 Continuous random variables. dx Y = u(X) so X = w(Y ). fY (y) = fX (x) . dy 4 MATH5315 Applied Statistics and Probability Lecture 4: Probability Distributions I References: CT3 Unit 4. §2 Discrete distributions. §2.1 Uniform distribution. §2.2 Bernoulli distribution. P{X = x} = 1 k for x = 1, 2, . . . , k. Bernoulli trial. §2.3 Binomial distribution. If X is number of successes in n Bernoulli trials, X ∼ Bin(n, θ); P{X = x} = nx θ x (1 − θ)n−x for x = 0, 1, 2, . . . , n. §2.4 Geometric distribution. If X is number of Bernoulli trials until first success, X ∼ geometric(θ); P{X = x} = θ(1 − θ)x−1 for x = 1, 2, 3, . . .. §2.7 Poisson distribution. If X ∼ Poisson(λ), then P{X = x} = 5 λx e−λ for x = 0, 1, 2, . . . . x! MATH5315 Applied Statistics and Probability Lecture 5: Probability Distributions II References: CT3 Unit 4. §3 Continuous distributions. 1 for α < x < β. β−α §3.1 Uniform distribution. If X ∼ uniform(α, β), fX (x) = §3.2 Gamma distribution. (n − 1)! for n ∈ Z. Gamma function Γ(α), Γ(1) = 1, Γ(α) = (α − 1)Γ(α − 1), Γ(n) = α α λα xα−1 e−λx for x > 0. E[X] = , V[X] = 2 . Γ(α) λ λ Exponential distribution: X ∼ exponential(λ) ≡ gamma(1, λ). Chi-squared distribution: X ∼ χ2ν ≡ gamma( 21 ν, λ = 12 ). If X ∼ gamma(α, λ), fX (x) = §3.3 Beta distribution. (NOT needed for the exam.) x < 1 where B(α, β) = Mean is µ = fX (x) = xα−1 (1 − x)β−1 for 0 < B(α, β) Γ(α)Γ(β) . Γ(α + β) α αβ . Variance is µ = . α+β (α + β)2 (α + β + 1) 1 N(µ, σ 2 ), −(x − µ)2 2σ 2 exp §3.4 Normal distribution. If X ∼ fX (x) = √ 2πσ 2 ∞. X −µ If Z = , then Z ∼ N(0, 1). Values P{Z < z} = Φ(z) are tabulated. σ Z §3.6 t-distribution. If X ∼ χ2ν and Z ∼ N(0, 1) independently, then T = p §3.7 F-distribution. If X ∼ χ2n1 and Y ∼ χ2n2 and are independent, then F = 6 X/ν for −∞ < x < ∼ tν . X/n1 ∼ Fn1 ,n2 . Y /n2 MATH5315 Applied Statistics and Probability Lecture 6: Generating Functions References: CT3 Unit 5. §1 Probability generating functions. If X is discrete taking value k with probability pk ∞ X X p k tk . (k = 0, 1, 2, . . .), then GX (t) = E[t ] = k=0 GX (1) = 1, GX (0) = p0 . §1.1 Important examples. mial). Poisson. Uniform. Binomial. Geometric (as special case of negative bino- §1.2 Evaluating moments. G′X (t) G′′X (t) = ∞ X k=2 = k(k − 1)pk tk−2 so G′′X (1) = ∞ X kpk t k−1 so G′X (1) = k=1 ∞ X k=2 ∞ X kpk = E[X]. k=1 k(k − 1)pk = E[X(X − 1)]. §2 Moment generating function. mX (t) = E[etX ]. Z In continuous case mX (t) = etx fX (x)dx. x Z Z r tx (r) (r) m (t) = x e fX (x)dx so m (0) = xr fX (x)dx = E[X r ]. x §2.1 Important examples. x gamma(α, λ). N(µ, σ 2 ). §4 Linear functions. If Y = a + bX, GY (t) = E[tY ] = E[ta+bX ] = ta E[(tb )X ] = ta GX (tb ). Similarly, mY (t) = eat mX (bt). 7 MATH5315 Applied Statistics and Probability Lecture 7: Joint Distributions I References: CT3 Unit 6. §1 Joint distributions. §1.1 Joint probability (density) functions. Discrete case f (x, y) = P{X = x, Y = y}, though a better notation might Z xp2X,Y (x, y) as in §1.3! Z y2 be f (x, y)dxdy. Continuous case f (x, y) is joint pdf. P{x1 < X < x2 , y1 < Y < y2 } = y1 x1 §1.2 Marginal probability (density) X functions. Discrete case fX (x) = P{X = x} = f (x, y) though a better notation might be pX (x) = P{X = x} y as in §1.3! Continuous case, marginal pdf is fX (x) = Z f (x, y)dy. y P{A ∩ B} . P{B} fX,Y (x, y) pX,Y (x, y) . Continuous case fX|Y =y (x|y) = . Discrete case P{X = x|Y = y} = pY (y) fY (y) §1.3 Conditional probability (density) functions. Recall P{A|B} = §1.4 Independence of random variables. If X and Y are independent, then fX,Y (x, y) = fX (x)fY (y) for all x and y. If X and Y are independent, then g(X) and h(Y ) will be independent. §2 Expectations of functions of two random variables. XX Discrete case E[g(X, Y )] = g(x, y)P{X = x, Y = y}. x y Z Z g(x, y)fX,Y (x, y)dxdy. Continuous case E[g(X, Y )] = §2.1 Expectations. x y §2.2 Expectations of sums and products. E[ag(X) + bh(Y )] = aE[g(X)] + bE[h(Y )]. If X and Y are independent, E[g(X)h(Y )] = E[g(X)]E[h(Y )]. 8 MATH5315 Applied Statistics and Probability Lecture 8: Joint Distributions II References: CT3 Unit 6. §2.3 Covariance and correlation coefficient. cov(X, Y ) = E[(X − µX )(Y − µY )] = E[XY ] − µX µY . cov(X, X) = V[X]. cov(X, Y ) , often denoted ρ, −1 ≤ ρ ≤ 1. corr(X, Y ) = p V[X]V[Y ] If ρ = 0, X and Y are uncorrelated. §2.3.1 Useful results on handling covariances. cov(aX + b, cY + d) = ac cov(X, Y ). cov(X, Y + Z) = cov(X, Y ) + cov(X, Z). If X and Y are independent, cov(X, Y ) = 0. Converse not necessarily true. §2.4 Variance of a sum. V[X + Y ] = V[X] + V[Y ] + 2cov(X, Y ). §3 Convolutions. Suppose XZ =X +Y. Discrete case P{Z = z} = P{X = x, Y = z − x}. x Z Continuous case fZ (Z = z) = fX,Y (x, y = z − x)dx. x Simplifies in independence case. 9 MATH5315 Applied Statistics and Probability Lecture 9: More on generating functions References: CT3 Units 5 and 6. §3 (Unit 5) Cumulant generating function. ′ (0) = E[X]. C ′′ (0) = V[X]. CX (t) = log mX (t). CX X 2 t t3 CX (t) = κ1 t + κ2 + κ3 + · · ·. 2! 3! §3.1 of linear combinations of random variables. " n(Unit#6) Moments n X X ci E[Xi ]. ci Xi = E i=1 i=1 # " n n X XX X c2i V[Xi ] + 2 ci cj cov(Xi , Xj ). Special case is (mutual) independence case. ci Xi = V i=1 i=1 1≤i<j≤n §3.2 (Unit 6) Distributions of linear combinations of independent random variables. Discrete case via probability generating function (pgf). Let S = c1 X + c2 Y . GS (t) = E[tc1 X+c2 Y ] = E[(tc1 )X ]E[(tc2 )Y ] = GX (tc1 )GY (tc2 ). Binomial case: If X ∼ Bin(m, θ) and Y ∼ Bin(n, θ), X + Y ∼ Bin(m + n, θ). Poisson case: If X ∼ Poisson(λ) and Y ∼ Poisson(γ), then X + Y ∼ Poisson(λ + γ). Continuous case via moment generating functions (mgf). Let S = c1 X + c2 Y . mS (t) = E[e(c1 X+c2 Y )t ] = E[e(c1 t)X ]E[e(c2 t)Y ] = mX (c1 t)mY (c2 t). k X ind Xi ∼ gamma(k, λ). Exponential case: If Xi ∼ exponential(λ), i = 1, 2, . . . , k, i=1 Gamma case: If X ∼ gamma(α, λ) and Y ∼ gamma(δ, λ), then X + Y ∼ gamma(α + δ, λ). Chi-square case: If X ∼ χ2m and Y ∼ χ2n , then X + Y ∼ χ2m+n . 2 ) and Y ∼ N(µ , σ 2 ), then X + Y ∼ N(µ + µ , σ 2 + σ 2 ). Normal case: If X ∼ N(µX , σX Y X Y Y X Y 10 MATH5315 Applied Statistics and Probability Lecture 10: Central Limit Theorem References: CT3 Unit 7. §1 The central limit theorem. For X1 , X2 , . . . , Xn iid with common mean µ and common X̄ − µ √ ≈ N(0, 1) for large n. variance σ 2 (< ∞), σ/ n §2 Normal approximations. §2.1 Binomial distribution. §2.2 Poisson distribution. If X ∼ Bin(n, θ) and n is large, X ≈ N(nθ, nθ(1 − θ). ind If Xi ∼ Poisson(λ), from lecture 8, large n, this is approximately N(nλ, nλ). §2.3 Gamma distribution. ind n X i=1 From lecture 8, if Xi ∼ exponential(λ), For large n, this is approximately N(n/λ, n/λ2 ). Since χ2k ≡ gamma( 21 k, 12 ), χ2k ≈ N(k, 2k) for large k. Xi ∼ Poisson(nλ). For n X i=1 Xi ∼ gamma(n, λ). 2 §3 The continuity correction. ) If X ∼ Poisson(µ) and µ is large, X ≈ N(µ, σ = µ). ( x + 12 − µ , where Z ∼ N(0, 1). P{X ≤ x} ≈ P Z ≤ σ 11 MATH5315 Applied Statistics and Probability Lecture 11: Conditional Expectation References: CT3 Unit 14. §1 The conditional expectation E[Y |X = x]. §2 The random variable E[Y |X]. value of a random variable g(X). E[E[Y |X]] = E[Y ]. If g(x) = E[Y |X = x], then consider this as the observed §3 The random variable V[Y |X] and the “E[V] + V[E]” result. V[Y |X = x] = E[{Y − g(x)}2 |x] = E[Y 2 |X = x] − g(x)2 . V[Y |X] = E[{Y − g(X)}2 |X] = E[Y 2 |X] − g(X)2 . E[V[Y |X]] = E[E[Y 2 |X]] − E[g(X)2 ] = E[Y 2 ] − E[g(X)2 ] so E[Y 2 ] = E[V[Y |X]] + E[g(X)2 ]. V[Y ] = E[Y 2 ] − {E[Y ]}2 = E[V[Y |X]] + E[g(X)2 ] − {E[g(X)]}2 = E[V[Y |X]] + V[g(X)] so that V[Y ] = E[V[Y |X]] + V[E[Y |X]]. §5 Compound distributions. S = X1 + X2 + · · · + XN . 2 . E[S|N = n] = nµX , V[S|N = n] = nσX E[S] = E[E[S|N = n]] = E[N µX ] = E[N ]µX . 2 ] + V[N µ ] = µ σ 2 + σ 2 µ2 . V[S] = E[V[S|N ]] + V[E[S|N ]] = E[N σX X N X N X tS tS Mgf of S: mS (t) = E[e ] = E[E[e |N ]]. E[etS |N = n] = {mX (t)}n . mS (t) = E[{mX (t)}N ] = GN (mX (t)) in terms of pgf of N . 12 MATH5315 Applied Statistics and Probability Lecture 12: Poisson process and simulating random variables References: CT3 Unit 4. §4 The Poisson process. Point-like events ocur randomly and independently in time at an average rate λ per unit time. Let N (t) be the number of events in interval [0, t] with N (0) = 0. Let pn (t) = P{N (t) = n}. pn (t + h) ≈ pn−1 (t)[λh] + pn (t)[1 − λh] so pn (t + h) − pn (t) ≈ λh[pn−1 (t) − pn (t)]. As h → 0, p′n (t) = λ[pn−1 (t) − pn (t)]. Similarly p′0 (t) = −λp0 (t). ∞ X sn pn (t). Define G(s, t) = n=0 Thus G′ (s, t) = λsG(s, t) − λG(s, t) so log G(s, t) = λt(s − 1) as G(s, 0) = 1. Thus G(s, t) = exp λt(s − 1) so N (t) ∼ Poisson(λt). Time T1 to first event satisfies T1 ∼ exponential(λ). Time between events has the same exponential distribution. §5 Random number simulation. §5.1 Basic simulation method. method. Generate U ∼ uniform(0, 1). Then use inverse transformation §5.2 Continuous distributions. example. If F (x) = P{X ≤ x}, let x = F −1 (u). X ∼ exponential(λ) §5.3 Discrete distributions. 13 MATH5315 Applied Statistics and Probability Lecture 13: Sampling and Statistical Inference I References: CT3 Unit 8. §1 Basic definitions. §2 Moments of the sample mean and variance. n §2.1 The sample mean. 1X σ2 X̄ = Xi . E[X̄] = µ, V[X̄] = . n n i=1 n §2.2 The sample variance. 1 1 X (Xi − X̄)2 = S = n−1 n−1 2 i=1 n X Xi2 i=1 − nX̄ 2 §3 Sampling distributions for the normal. §3.1 The sample mean. ind If Xi ∼ N(µ, σ 2 ), then Z = In general Z = X̄ − µ √ ≈ N(0, 1) for large n. σ/ n X̄ − µ √ ∼ N(0, 1) for all n. σ/ n (n − 1)S 2 ind If Xi ∼ N(µ, σ 2 ), then ∼ χ2n−1 . σ2 U = χ2k tabulated values χk (α) satisfy P U > χ2k (α) = α. §3.2 The sample variance. §3.3 Independence of the sample mean and variance. 14 Xi ∼ N(µ, σ 2 ) case. ! . E[S 2 ] = σ 2 . MATH5315 Applied Statistics and Probability Lecture 14: Sampling and Statistical Inference II References: CT3 Unit 8. §4 The t distribution. N(0, 1) for independent N(0, 1) and χ2k distributions. tk = q 2 χk /k X̄ − µ √ ∼ tn−1 . S/ n Properties of tk distribution. Tabulated values tk (α) satisfy P{T > tk (α)} = α for T ∼ tk . Also P{T < −tk (α)} = α ind If Xi ∼ N(µ, σ 2 ), then t = §5 The F result for variance ratios. U/ν1 ∼ Fν1 ,ν2 . V /ν2 If S1 and S2 are based on samples of size n1 and n2 respectively from normal populations with S 2 /σ 2 variances σ12 and σ22 respectively, then F = 12 12 ∼ Fn1 −1,n2 −1 . S2 /σ2 1 = α. Tabulated values Fν1 ,ν2 (α) satisfy P{F > Fν1 ,ν2 (α)} = α for F ∼ Fν1 ,ν2 . Also P F < Fν2 ,ν1 (α) If U ∼ χ2ν1 and V ∼ χ2ν2 are independent, then F = 15 MATH5315 Applied Statistics and Probability Lecture 15: Point Estimation I References: CT3 Unit 9. §1 The method of moments. §1.1 The one-parameter case. §1.2 The two-parameter case. §3 Unbiasedness. Let g(X) be an estimator of a parameter θ. Bias is Bias(g(X)) = E[g(X)] − θ. Unbiasedness. §4 Mean square error. Let g(X) be an estimator of a parameter θ. M SE(g(X) = E (g(X) − θ)2 . M SE(g(X) = V[g(X)] + Bias(g(X))2 . 16 MATH5315 Applied Statistics and Probability Lecture 16: Point Estimation II References: CT3 Unit 9. §2 The method of maximum likelihood. §2.1 The one-parameter case. Likelihood L(θ) = n Y f (xi ; θ). i=1 §2.1.1 Example. §2.2 The two-parameter case. §5 Asymptotic distribution of MLE. For large n, θ̂ ≈ N(θ, v) where v = nE " 1 2 # . ∂ log f (X; θ) ∂θ 17 MATH5315 Applied Statistics and Probability Lecture 17: Confidence Intervals I References: CT3 Unit 10. §1 Confidence intervals in general. n o Want values θ̂1 and θ̂2 such that P θ̂1 < θ < θ̂2 = 0.95. §2 Distribution of confidence intervals. §2.1 The pivotal method. Look for a pivotal quantity g(X, θ) such that, for example, if g(X, θ) increases as θ increases, then g(X, θ) < g2 ⇔ θ < θ2 and g1 < g(X, θ) ⇔ θ1 < θ. X̄ − µ ind √ . For example, if Xi ∼ N(µ, σ 2 ), then g(X, µ) = σ/ n §2.2 Confidence limits. σ σ σ The interval X̄ − 1.96 √ , X̄ + 1.96 √ can be written as X̄ ± 1.96 √ . n n n §2.3 Sample size. §3 Confidence intervals for the normal distribution. §3.1 The mean. S X̄ − µ √ ∼ tn−1 so 95% confidence interval for µ is X̄ ± tn−1 (2.5%) √ . Recall t = S/ n n §3.2 The variance. (n − 1)S 2 (n − 1)S 2 (n − 1)S 2 < σ2 < 2 . Recall ∼ χ2n−1 so 95% confidence interval for σ 2 is 2 2 σ χn−1 (2.5%) χn−1 (97.5%) 18 MATH5315 Applied Statistics and Probability Lecture 18: Confidence Intervals II References: CT3 Unit 10. §4 Confidence intervals for binomial and Poisson. §4.1 The binomial. Course text a bit muddled here I think! If P{h1 (θ) < X < h2 (θ)} ≥ 0.95, then P{X ≥ h2 (θ)} ≤ 0.025 and P{X ≤ h1 (θ)} ≤ 0.025. Now X < h2 (θ) if θ > θ2 (X) and similarly X > h1 (θ) if θ < θ1 (X). Thus interval is θ2 < θ < θ1 , where, for example, P{X ≥ x|θ = θ2 } = 0.025. §4.1.1 The normal approximation. §5 Confidence intervals for two sample problems. §5.1 Two normal means. σ2 σ2 Confidence interval with known variances based on fact that X̄1 − X̄2 ∼ N µ1 − µ2 , 1 + 2 so n1 n2 X̄1 − X̄2 − (µ1 − µ2 ) r ∼ N(0, 1). σ22 σ12 n1 + n2 If σ12 = σ22 = σ 2 is unknown, can be estimated using s2p = X̄1 − X̄2 − (µ1 − µ2 ) r ∼ tn1 +n2 −2 . 1 1 2 s p n1 + n2 (n1 − 1)s21 + (n2 − 1)s22 and then n1 + n2 − 2 S12 /S22 S12 /σ12 ∼ F so ∼ Fn1 −1,n2 −1 . n1 −1,n2 −1 S22 /σ22 σ12 /σ22 Also, > Fν1 ,ν2 (α)} = α, then if F ∼ Fν1 ,ν2 with P{F 2 2 S /S P Fn1 −1,n2 −1 (0.975) < 12 22 < Fn1 −1,n2 −1 (0.025) = 0.95 re-arranges to give σ1 /σ2 2 S1 1 S2 1 σ2 1 · < 12 < 12 · where = Fn2 −1,n1 −1 (0.025). 2 Fn1 −1,n2 −1 (0.975) S2 Fn1 −1,n2 −1 (0.025) σ2 S2 Fn1 −1,n2 −1 (0.975) §5.2 Two population variances. Recall §5.3 Two population proportions. X1 ∼ Bin(n1 , θ1 ) ≈N(n1 θ1 , n1 θ1 (1 − θ1 )) and X2 ∼ Bin(n2 , θ2 ) ≈N(n2 θ2 , n2 θ2 (1 − θ2 )). θi (1 − θi ) θ1 (1 − θ1 ) θ2 (1 − θ2 ) Xi ≈ N θi , + for i = 1, 2 and so θ̂1 − θ̂2 ≈ N θ1 − θ2 , . Thus θ̂i = ni ni n1 n2 ! θ̂1 (1 − θ̂1 ) θ̂2 (1 − θ̂2 ) + so can obtain confidence inIn practice we assume θ̂1 − θ̂2 ≈ N θ1 − θ2 , n1 n2 terval for θ1 − θ2 by assuming the variance is known. §6 Paired data. D̄ − µD √ ∼ tn−1 . SD / n (NOT needed for the exam.) 19 Form pairs Di = X1i − X2i . Then MATH5315 Applied Statistics and Probability Lecture 19: Hypothesis Testing I References: CT3 Unit 11. §1 Hypotheses, test statistics, decisions and errors. Null hypothesis H0 . Alternative hypothesis H1 . Critical region. α = P{Type I error} = P{Reject H0 when H0 true}. β = P{Type II error} = P{Accept H0 when H0 false}. Power = 1 − β = P{Reject H0 when parameter is θ}. §2 Classical testing, significance and P -values. §2.1 “Best” tests. Neyman-Pearson Lemma H0 : θ = θ0 vs. H1 : θ = θ1 best test based on L0 likelihood ratio with critical region C satisfying ≤ k. L1 §2.2 P -values. P = P{A value occurs as or more extreme than the one observed|H0 true}. §3 Basic tests – single parameter. §3.1 Testing the value of a population mean. Testing H0 : µ = µ0 . X̄ − µ0 X̄ − µ0 √ ∼ N(0, 1) or T = √ ∼ tn−1 if H0 true. Test based on Z = σ/ n S/ n §3.2 Testing the value of population variance. (n − 1)S 2 ∼ χ2n−1 if H0 true. Test based on σ02 Testing H0 : σ 2 = σ02 . §3.3 Testing the value of a population proportion. Testing H0 : θ = θ0 . Test bsed on X ∼ Bin(n, θ0 ) ≈ N(nθ0 , nθ0 (1 − θ0 )) if H0 true. §4 Basic tests – two independent samples. §4.1 Testing the value of the difference between two population means. Testing H0 : µ1 − µ2 = δ. X̄1 − X̄2 − δ X̄1 − X̄2 − δ Test based on Z = q 2 ∼ N(0, 1) or T = q ∼ tn1 +n2 −2 if H0 true. σ1 σ22 1 1 S + p n1 n2 n1 + n2 §4.2 Testing the value of the ratio of two population variances. S2 Test based on 12 ∼ Fn1 −1,n2 −1 if H0 true. S2 Testing H0 : σ12 = σ22 . §4.3 Testing the value of the difference between two population proportions. Testing H0 : θ1 = θ2 (= θ). θ̂1 − θ̂2 Test based on q if H0 true. θ̂(1−θ̂) θ̂(1−θ̂) + n1 n2 20 MATH5315 Applied Statistics and Probability Lecture 20: Hypothesis Testing II References: CT3 Unit 11. §7 χ2 tests. Test statistic is X (fi − ei )2 where ei are expected values under H0 and fi are observed values. ei i X (Oi − Ei )2 ! I prefer the notation Ei i §7.1 Goodness of fit. §7.1.1 Degrees of freedom. Number of groups − Number of constraints on ei − Number of fitted parameters. §7.1.2 The “accuracy” of the χ2 approximation. Ensure all ei > 5 by combining groups (cells). §7.1.3 Example. §7.2 Contingency tables. For r × c table, degrees of freedom is (r − 1)(c − 1). Row total × Column total Expected frequencies for any cell are . Grand Total Tests of homogeneity and independence not clearly distinguished. 21 MATH5315 Applied Statistics and Probability Lecture 21: Correlation and Regression I References: CT3 Unit 12; IEF chapter 2, pages 27-38, 44-51. §0 (CT3) Introduction. Scatter plot and summary statistics Sxx , Syy , Sxy . §1 (CT3) Correlation analysis. §1.1 (CT3) Data summary. Sample correlation r = p §1.2 (CT3) √ The normal model and inference. r n−2 ∼ tn−2 . If ρ = 0, √ 1 − r2 1+r 1+ρ 1 1 1 If W = 2 log , then W ≈ N 2 log , . 1−r 1 − ρ n − 3 Can re-write this as W = tanh −1 r, so that W ≈ N tanh Sxy . Sxx Syy −1 1 . ρ, n−3 §2 (CT3) Regression analysis – the simple linear model. Yi = α + βxi + ei , i = 1, 2, . . . , n. E[ei ] = 0, V[ei ] = σ 2 . §2.1 (CT3) Introduction. §2.2 (CT3) Fitting the model. α̂ and β̂ minimise q = n X i=1 e2i = n X i=1 Fitted line is ŷ = α̂ + β̂xi where α̂ = ȳ − β̂ x̄ and β̂ = (yi − α − βxi )2 . Least squares derivation. n 1 X σ2 . σ̂ 2 = (yi − ŷi )2 . E[β̂] = β, V[β̂] = Sxx n−2 i=1 §2.3 the X (CT3) Partitioning X Xvariability of the responses. (yi − ȳ)2 = (yi − ŷi )2 + (ŷi − ȳ)2 . SST OT = SSRES + SSREG. i i i 2 Sxy SST OT = Syy . SSREG = . Sxx 2 E[SST OT ] = (n − 1)σ , E[SSREG] = σ 2 + β 2 Sxx , E[SSRES] = (n − 2)σ 2 . SSREG . Coefficient of determination R2 = SST OT Cases where line closely fits the data and where line a poor fit to the data. 22 Sxy . Sxx MATH5315 Applied Statistics and Probability Lecture 22: Correlation and Regression II References: CT3 Unit 12; IEF chapter 2, pages 38-39, 51-66. §2.4 (CT3) The full normal model and inference. 2 σ2 (n − 2)σ̂ Assumptions. β̂ ∼ N β, ∼ χ2n−2 . , independently of Sxx σ2 §2.5 (CT3)p Inferences on the slope parameter β. (β̂ − β)/ σ 2 /Sxx β̂ − β p =s ∼ tn−2 . 2 2 ((n − 2)σ̂ /σ )/(n − 2) σ̂ 2 Sxx §2.6 (CT3) Estimating a mean response and predicting an individual response. 1 (x0 − x̄)2 + σ2 . µ0 = E[Y |x0 ] = α + βx0 estimated by µ̂0 = α̂ + β̂x0 . E[µ̂0 ] = µ0 . V[µ̂0 ] = n Sxx µ̂ − µ0 p0 ∼ tn−2 . V[µ̂0 ] Estimate individual response y0 = α + βx0 + e0 , estimated by ŷ0 = α̂ + β̂x0 . y0 − ŷ0 Since E[y0 − ŷ0 ] = 0, V[y0 − ŷ0 ] = σ 2 + V[µ̂0 ], so s ∼ tn−2 . 2 1 (x − x̄) 0 σ2 1 + + n Sxx §2.7 (CT3) Checking the model. Residuals êi = yi − ŷi . Residual plot êi vs. xi . §2.8 (CT3) Extending the scope of the linear model. (NOT needed for the exam.) Transformations to give linearity. §3 (CT3) The multiple linear regression model. (NOT needed for the exam.) α + β1 xi1 + · · · βk xik + ei , i = 1, 2, . . . , n. 23 Yi = MATH5315 Applied Statistics and Probability Lecture 23: Analysis of Variance References: CT3 Unit 13. §0 (CT3) Introduction. §1 (CT3) One-way analysis of variance. §1.1 (CT3) The model. Yij = µ + τi + eij for i = 1, 2, . . . , k, j = 1, 2, . . . , ni , n = k X ni . i=1 ind eij ∼ N(0, σ 2 ) so Yij ∼ N(µ + τi , σ 2 ). k X ni τi = 0, then τi is the ith treatment effect, and is µ the overall mean. If i=1 ni k X X (Yij − µ − τi )2 to give §1.2 (CT3) Estimation of the parameters. Minimise q = i=1 j=1 ni 1 X (ni − 1)Si2 ind 2 µ̂ = Ȳ, τ̂i = Ȳi − Ȳ. If Si2 = (Yij − Ȳi)2 , then ∼ χni −1 . ni − 1 σ2 j=1 k 1 X (n − k)σ̂ 2 2 Hence σ̂ = (ni − 1)Si2 satisfies ∼ χ2n−k . n−k σ2 i=1 §1.3 (CT3) Partitioning the variability. ni ni k k X k X X X X ni (Ȳi − Ȳ)2 , (Yij − Ȳi)2 + (Yij − Ȳ)2 = i=1 j=1 i=1 j=1 i=1 i.e., SST = SSR + SSB where SST is total sum of squares, SSR is residual sum of squares (withintreatments sum of squares), SSB is between-treatments sum of squares. SSB M SB is mean squares ∼ Fk−1,n−k , where M SB = If H0 : τ1 = τ2 = · · · = τk = 0 is true, then M SR k−1 SSR between treatments and M SR = is residual mean square. n−k §1.4 (CT3) Example. §1.5 (CT3) Checking the model. Residuals are rij = êij = Yij − µ̂ − τ̂i = Yij − Ȳi. §1.6 (CT3) Estimating the treatment means. σ̂ 95% confidence interval for µ + τi is Ȳi ± tn−k (2.5%) √ . ni 95% confidence interval for τi − τj is Ȳi − Ȳj ± tn−k (2.5%)σ̂ s 1 1 + . ni nj §1.7 (CT3) Further comments. Linear regression model Yi = a + bxi + ei , i = 1, 2, . . . , n can n n n X X X (Ŷi − Ȳ )2 , i.e., SST = SSR + SSREG . (Yi − Ŷi )2 + (Yi − Ȳ )2 = be analysed as i=1 i=1 SSREG ∼ F1,n−2 . If H0 : b = 0 is true, SSR /(n − 2) i=1 24 MATH5315 Applied Statistics and Probability Lecture 24: Univariate Time Series Analysis and Forecasting I References: IEF chapter 5, pages 206-223. §5.1 Introduction. §5.2 Some notation and concepts. §5.2.2 A weakly stationary process. E[Yt ] = µ, V[Yt ] = σ 2 , cov(Yt1 , Yt2 ) = γt1 −t2 . γs Autocovariance function is cov(Yt , Yt−s ) = γs . Autocorrelation function is τs = . γ0 §5.2.3 A white noise process. ind E[Yt ] = µ, V[Yt ] = σ 2 , γs = 0 for s 6= 0. If Yt ∼ N(µ, σ 2 ) for t = 1, 2, . . . , T , τ̂s ≈ N(0, 1/T ). Box-Pierce test that τ1 = · · · = τm = 0; if H0 true, Q = T m X k=1 τ̂k2 ∼ χ2m . ind §5.3 Moving average processes. MA(q) is Yt = µ+Ut +θ1 Ut−1 +· · ·+θq Ut−q with Ut ∼ (0, σ 2 ). Backshift operator L (Most textbooks would use B!) satisfies LYt = Yt−1 . Thus Yt = µ + θ(L)Ut with θ(L) = 1 + θ1 L + · · · + θq Lq . V[Yt ] = γ0 = (1 + θ12 + · · · + θq2 )σ 2 , γs = (θs + θs+1 θ1 + · · · + θq θq−s )σ 2 for s = 1, 2, . . . , q. Example 5.2. §5.4 Autoregressive processes. AR(p) is Yt = µ + φ1 Yt−1 + φ2 Yt−2 + · · · + φp Yt−p + Ut . Can write as φ(L)Yt = µ + Ut with φ(L) = 1 − φ1 L − · · · − φp Lp . §5.4.1 The stationarity condition. AR(p) process stationary if roots of φ(z) = 0 lie outside unit circle. Can then write AR(p) process as MA(∞) Yt = φ−1 (L)Ut . Example 5.3. §5.4.2 Wold’s decomposition theorem. (NOT needed for the exam.) All we really need here is that (1 − φ1 − φ2 − · · · − φp )E[Yt ] = µ and the autocorrelation function satisfies the Yule-Walker equations τr = τ1−r φ1 + τ2−r φ2 + · · · + τp−r φp for r = 1, 2, . . . , p with τ−s = τs . Example 5.4. §5.5 The partial autocorrelation function. The pacf τkk can be found from fitting the model Yt = µ + τk,1 Yt−1 + · · · τk,k−1 Yt−k+1 + τkk Yt−k + Ut . §5.5.1 The invertibility condition. MA(q) process is invertible if roots of θ(z) = 0 lie outside the unit circle. The process Yt = θ(L)Ut can then be written as an AR(∞) process θ −1 (L)Yt = Ut . 25 MATH5315 Applied Statistics and Probability Lecture 25: Univariate Time Series Analysis and Forecasting II References: IEF chapter 5, pages 223-238, 247-251. §5.6 ARMA processes. ARMA(p, q) process is φ(L)Yt = µ + θ(L)Ut . Mean satisfies (1 − φ1 − φ2 − · · · − φp )E[Yt ] = µ. AR(p) process: acf (oscillatory) geometric decay, pacf zero after lag p. MA(q) process: acf zero after lag q, pacf (oscillatory) geometric decay. ARMA(p, q) process: acf like AR, pacf like MA. §5.6.1 Sample acf and pacf plots for standard processes. Correlogram (acf plot) has lines √ √ drawn at ±1.96/ n to indicate significant τ̂k ; pacf plot also has lines drawn at ±1.96/ n. §5.7 Building ARMA models: the Box-Jenkins approach. Determine model order; estimate parameters; check model validity. Parsimonious models best! §5.7.1 Information criteria for ARIMA model selection. §5.7.3 ARIMA modelling. AIC widely used (with SBIC). Data differenced to give stationarity. §5.8 Constructing ARMA models in EViews. (NOT needed for the exam.) We use R. §5.11.4 Forecasting with time series versus structural models. (NOT needed for the exam.) Conditional expectation is E[Yt+1 |Ωt ] = E[Yt+1 |Y1 , Y2 , . . . , Yt ]. §5.11.5 Forecasting with ARMA models. (NOT needed for the exam.) ARMA(p, q) q p q p X X X X bj Ût+s−j ai Ŷt+s−i + bj Ut−j . Forecast at time t + s is Ŷt+s = ai Yt−i + model Yt = i=1 i=1 j=1 j=1 where Ŷk = Yk for k ≤ t, Ûk = 0 for k ≤ t, Ûk = 0 for k > t. IEF uses notation ft,s ≡ Ŷt+s . §5.11.6 Forecasting the future value of an MA(q) process. (NOT needed for the exam.) MA(3) process Yt = µ + Ut + θ1 Ut−1 + θ2 Ut−2 + θ3 Ut−3 . eg, Yt+2 = µ + Ut+2 + θ1 Ut+1 + θ2 Ut + θ3 Ut−1 . Thus E[Yt+2 |Ωt ] = µ + θ2 Ut + θ3 Ut−1 . §5.11.7 Forecasting the future value of an AR(p) process. (NOT needed for the exam.) AR(2) process Yt = µ + φ1 Yt−1 + φ2 Yt−2 + Ut . eg, Yt+2 = µ + φ1 Yt+1 + φ2 Yt + Ut+2 . Thus E[Yt+2 |Ωt ] = µ + φ1 Ŷt+1 + φ2 Yt . 26 MATH5315 Applied Statistics and Probability Lecture 26: Multivariate Models I References: IEF chapter 3, pages 88-93; chapter 6, pages 265-271, 276-277. §3.1 Generalising the simple model to multiple linear regression. §3.2 The constant term. Writing the linear model in the form y = Xβ + u. §3.3 How are the parameters (the elements of the β vector) calculated? β̂ = (X ′ X)−1 X ′ y found by minimising (y − Xβ)′ (y − Xβ). §6.1 Motivations. Structural equations. Reduced form equations. §6.2 Simultaneous equations bias. §6.3 So how can simultaneous equations models be validly estimated? §6.4 Can the original coefficients be retrieved from the π s ? §6.4.1 What determines whether an equation is identified or not? §6.8 Estimation procedures for simultaneous equations systems. §6.8.1 Indirect least squares (ILS). (NOT needed for the exam.) §6.8.2 Estimation of just identified and overidentified systems using 2SLS. (NOT needed for the exam.) Using R systemfit command. 27 MATH5315 Applied Statistics and Probability Lecture 27: Multivariate Models II References: IEF chapter 6, pages 290-293, 294-296, 298, 308-315. §6.11 Vector autoregressive models. §6.11.1 Advantages of VAR modelling. §6.11.2 Problems with VARs. §6.11.3 Choosing the optimal lag length for a VAR. §6.11.5 Information criteria for VAR lag length selection. §6.12 Does the VAR include contemporaneous terms? §6.14 VARs with exogenous variables. §6.17 VAR estimation in EViews. (NOT needed for the exam.) We use the R package vars. 28 MATH5315 Applied Statistics and Probability Lecture 28: Cointegration References: IEF chapter 7, pages 318-329, 335-341. §7.1 Stationarity and unit root testing. §7.1.1 Why are tests for non-stationarity necessary? §7.1.2 Two types of non-stationarity. §7.1.3 Some more definitions and terminology. §7.1.4 Testing for a unit root. §7.3 Cointegration. §7.3.1 Definition of cointegration. §7.4 Equilibrium correction or error correction models. §7.5 Testing for cointegration in regression: a residuals-based approach. §7.6 Methods of parameter estimation in cointegrated systems. (NOT needed for the exam.) §7.6.1 The Engle-Granger 2-step method. (NOT needed for the exam.) 29 MATH5315 Applied Statistics and Probability Lecture 29: ARCH Models References: IEF chapter 8, pages 379-381, 383-384, 385-389. §8.1 Motivations: an excursion into non-linearity land. §8.1.1 Types of non-linear models. §8.2 Models for volatility. §8.3 Historical volatility. §8.6 Autoregressive volatility models. §8.7 Autoregressive conditionally heteroscedastic (ARCH) models. §8.7.1 Another way of expressing ARCH models. §8.7.2 Non-negativity constraints. §8.7.3 Testing for ‘ARCH effects’. §8.7.5 Limitations of ARCH(q) models. 30 MATH5315 Applied Statistics and Probability Lecture 30: GARCH Models References: IEF chapter 8, pages 392-399. §8.8 Generalised ARCH (GARCH) models. §8.8.1 The unconditional variance under a GARCH specification. §8.9 Estimation of ARCH/GARCH models. §8.9.1 Parameter estimation using maximum likelihood. (NOT needed for the exam.) §8.9.2 Non-normality and maximum likelihood. (NOT needed for the exam.) 31