Statistics for Finance 1. Lecture 5: Confidence Intervals, Hypothesis Testing. 1.1. Confidence Intervals. Suppose that we have a Normal N (µ, σ 2 ) distribution with unkown mean and and standard deviation. We have seen so far how to produce estimators for these quantities. These estimators tell what is a most likely value for these parameters. However it is very unlikely that these estimators will produce the exact value of the inknown parameter. It would be more desirable to produce a range of value, such that the unkown parameter will lie within these values with high probability. This is achieved by the construction of confidence intervals. We will defer a formal defintion for later on, after we see the philosophy behind the construction of the confidence interval. 1.1.1. Confidence Interval for Mean with Known Standard Deviation. Suppose we have a distribution, not necessarily normal, with known variance σ 2 , but unkown mean µ. Suppose that we form a sample X1 , X2 , . . . , Xn from this distribution. Then the sample mean X= X1 + · · · + Xn n will have variance Var(X) = σ 2 /n and mean E[X] = µ, while the Central Limit Theorem will tell us that the distribution of X −µ √ σ/ n is approximately standard normal, that is µ ¶ X −µ √ <z P −z < ' Φ(z) − Φ(−z) σ/ n ' 2Φ(z) − 1 R z −x2 /2 √ where Φ(z) = −∞ e / 2π. By a simple manipulation in the above we get that µ ¶ σ σ P X − z√ < µ < X + z√ ' 2Φ(z) − 1. n n Suppose now that we choose z = zα/2 such that 2Φ(zα/2 ) − 1 = 1 − α (this equation cannot be solved explicitly, but there are tables that give you the values zα/2 for 1 2 different values of α, or you can use some statistical software) then µ ¶ σ σ P X − zα/2 √ < µ < X + zα/2 √ ' 1 − α. n n So, for example, if α = 0.05, we obtain from the tables that z0.025 ' 1.96 and therefore the interval [X − 1.96 √σn , X + 1.96 √σn ] will contain the unkown value of the mean µ with probability 0.95. 1.1.2. Confidence Interval for Mean with Unkown Standard Deviation. Suppose now that we want to construct a conidence interval for the mean of a distribution, but this thime we don’t know itsh standard deviation. In this i case the σ σ 2 variance σ in the (1 − α)-confidence interval X − zα/2 √n , X + zα/2 √n should be P replaced by an estimator, which we choose to be the sample variance ŝ2 = ni=1 (Xi − X)2 /n − 1. In other words we would tend to say that the (1−α)−confidence interval for the mean µ is · ¸ ŝ ŝ X − zα/2 √ , X + zα/2 √ . n n This is not exactly correct, though. The reason is that when we replace σ by s i.e. when we consider the fraction X −µ √ ŝ/ n the correct approximation of the distribution of this random variable is not the standard normal distribution, but rather the tn−1 , the t-distribution with (n − 1) degrees of freedom. Therefore the zα/2 normal quantiles should be replaced with the corresponding tn−1,α/2 quantiles of the t-distribution with (n−1) degrees of freedom. The correct (1 − α) -confidence interval in this case is · ¸ ŝ ŝ X − tα/2 √ , X + tα/2 √ . n n We will now attempt to give an explanation as to why the t-distribution is the correct distribution to consider, rather than the normal distribution. Assume that the underlying distribution is normal N (µ, σ 2 ), with unkown mean and standard deviation. Suppose that a sample of size n, X1 , X2 , . . . , Xn is drawn from this distribution and consider the fraction √ X −µ n(X − µ)/σ √ = (1) . ŝ/σ ŝ/ n Then we claim that the distribution of the above random variable is exaclty the tn−1 -distribution with (n − 1) degrees of freedom. To prove this we need the following very interesting lemma 3 Lemma 1. Consider a sequence X1 , X2 , . . . , Xn of i.i.d. standard normal variables. Then the sample average X is independent of the random vector (X1 − X, X2 − X, . . . , Xn − X). The proof of this fact is not difficult, but we will skip it since it is a bit lengthy. The detailed proof can be found in the book of Rice, Section 6.3. Let us just say that to prove the statement it is enough to prove the for any numbers u, u1 , . . . , un it holds that h i h i h Pn i Pn E euX+ i=1 ui (Xi −X) = E euX E e i=1 ui (Xi −X) . √ Then, clearly n(X − µ)/σ is a standard normal distribution. On the other hand we have Lemma 2. If X1 , X2 , . . . , Xn are i.i.d. normal N (µ, σ 2 ) then the distribution of n ŝ2 1 X (n − 1) 2 = 2 (Xi − X)2 σ σ i=1 is a χ2n−1 distribution, with (n − 1) degrees of freedom. Proof. Note that ¶2 n n µ X Xi − µ 1 X 2 (Xi − µ) = ∼ χ2n σ 2 i=1 σ i=1 as a sum of the squares of n i.i.d. standard normals. Moreover, n n 1 X 1 X 2 (Xi − µ) = ((Xi − X) + (X − µ))2 2 2 σ i=1 σ i=1 µ ¶2 n 1 X X −µ 2 √ = ((Xi − X)) + , σ 2 i=1 σ/ n P where we also used the fact the ni=1 (Xi − X) = 0. The above equation is of the form W = U + V , where U,V are independent by the previous lemma. Also, W, V have distributions χ2n , χ21 , respectively. If MW (t) denotes the moment generating function of W , and similarly for U, V we have by independence that MU (t) = (1 − 2t)−n/2 MW (t) = = (1 − 2t)−(n−1)/2 , −1/2 MV (t) (1 − 2t) where we used the fact that the moment generating function of a χ2n with n degrees of freedom is (1 − 2t)−n/2 . ¤ From the above Lemma, as well as the definition p of a t-distribution (recall that 2 if Z is standard normal and U ∼ χr then Z/ U/r ∼ tr ) , it follows that the distribution of (1) is exactly the tn−1 -distribution. 4 1.1.3. Confidence Intervals for the Variance. Let us consider the particular case of an i.i.d. Normal sample X1 , X2 , . . . , Xn . Let σ̂ 2 the maximum likelihood estimator of the variance, i.e. n 1X σ̂ = (Xi − X)2 . n i=1 2 Then by Lemma 2 we have that nσ̂ 2 ∼ χ2n−1 . σ2 Let us denote by χ2m,α , the chi square quantile, i.e. the point beyond which the chi square distribution with m degrees of freedom has probability α. Then we have µ ¶ nσ̂ 2 2 2 P χn−1,1−α/2 < 2 < χn−1,α/2 = 1 − α σ and solving for σ 2 we get that ! à 2 nσ̂ nσ̂ 2 < σ2 < 2 =1−α P χ2n−1,α/2 χn−1,1−α/2 Therefore the (1 − α)-confidence interval for the variance is " # 2 2 nσ̂ nσ̂ , 2 2 χn−1,α/2 χn−1,1−α/2 1.1.4. Confidence Intervals for General Parameters. Suppose now that we want to construct confidence intervals for some parameter θ of a distribution. For a general parameter, other than the mean or the variance the construbtion of confidence intervals as above is more difficult, since rather detailed information on the distribution is required. We can get around this difficult problem, by constracting approximate confidence intervals with the help of maximum likelihood. In particular, we know p from Theorem 1, of Lecture 3, that for a parameter θ, with MLE θ̂ it holds that nI(θ)(θ̂ − θ) is appoximately standard normal. Therefore, if zα/s is the corresponding quantile for the standard normal we have that ´ ³ p P −zα/2 < nI(θ)(θ̂ − θ) < zα/2 ' 1 − α. The difficulty in this equation is to qsolve for θ. We can make our life easier by assuming that the distribution of nI(θ̂)(θ̂ − θ) is also approximately standard 5 normal. Therefore, it follows that the (1 − α)-confidence interval is approximately z z θ̂ − q α/2 , θ̂ + q α/2 . nI(θ̂) nI(θ̂) 1.1.5. What is the Confidence Interval ? We have so far several way of constructing confidence intervals. Let us now discuss how we should interpret a confidence interval. The confidence interval should be interpreted itself as a random object. It is a random inerval e.g., in the case of the mean, of the form · ¸ ŝ ŝ (2) X − tα/2 √ , X + tα/2 √ . n n but X and ŝ are functions of the sample and so they should be considered are random variables. The interpretation of an interval like the above one, should be a realization of a random interval, which with probability (1 − α) contains the unkown parameter (in this case the mean). As an example we do the following experiment. We generate 20 independent samples of size 9 each, from a normal distribution with mean µ = 10 and variance σ 2 = 9. For each one of these samples we form the resulting 0.9-confidence intervals for the mean, which will be of the form · ¸ σ σ X − zα/2 √ , X + zα/2 √ n n · ¸ 3 3 = X − 1.64 √ , X + 1.64 √ 9 9 ¤ £ = X − 1.64, X + 1.64 , where X is the corresponding sample mean in each one of the 20 samples. Once we generate these intervals, we expect that 90% of them, that is about 18 of them to contain the value 10, which corresponds to the real population mean. Be careful we expect that about 18 will contain the mean ! This does not mean that for sure 18 of the intervals will contain the actual mean, since as we said the outcome of the interval is itself random and depends on the realisation of the sample. 1.2. Hypothesis Testing. Let us start with an example. Suppose that X1 , X2 , . . . , Xn is a sample drawn from a normal distribution with unkown mean µ and variance σ 2 . 6 Consider testing the following hypotheses: H0 : µ = µ0 HA : µ 6= µ0 The hypothesis H0 is called null hypothesis, while the hypothesis HA is called alternative hypothesis. The idea is that one starts assuming that the mean of the normal distribution is µ0 and then proceeds in checking whether this assumption should be true and therefore be accepted or whether it should rejected in favor of the alternative hypothesis HA , which claims that the mean µ 6= µ0 . We would like to construct a test, based on which we will be rejecting or accepting the null hypothesis. Of course, since we deal with random events, there will always be a probability of false decision, that is, to accept the null hypothesis as correct, when it is not, or to accept the alternative hypothesis as correct, while it is not. The former type of error is called Type II error, while the latter error is called Type I error. We will come back to this point in a minute. First, we need to construct the test. Again, as in many occasions so far, there are several ways to construct an appropriate test. Here we will present the thest that is dual to confidence intervals. We start with the assumption that the null hypothesis is correct, i.e. that the mean of the distribution is µ0 . Then, as before, the random variable X − µ0 √ σ/ n is standard normal. The random variable Z is called the test statistic, that we use. Suppose that the actual value of the random variable Z, as this emerges from the sampling is such that ¯ ¯ ¯ X − µ0 ¯ ¯ ¯ (3) ¯ σ/√n ¯ > zα/2 Z= where zα/2 is the α/2 standard normal quantile. The porbability that something like this happens is P (|Z| > zα/2 ) = α. Therefore, if α is sufficiently small, the probability of obtaining sample data that result to a test statistic satisfying (3) is very small (and equal to α). It is, therefore, unlikely that we got “strange data” and we prefer to say that our null hypothesis was wrong and reject it in favor of the alternative hypothesis. Of course there is always the possibility that we really got “strange data” and we falsely rejected the null hypothesis. In this case we fell into a type I error. The probability of this happening is α and it is called the significance level of our test. We finally say that the region ¯ ¯ ª © ¯ x − µ0 ¯ x : ¯¯ √ ¯¯ > zα/2 σ/ n 7 is the rejection region for the test statistic (3) at significance level α. In other words we will reject the null hypothesis, if the data form a sample mean that falls into the rejection region. The above type of hypothesis testing is called two-side. We could also have a one-sided hypothesis testing, which would consist of H 0 : µ = µ0 H1 : µ > µ 0 In this case it is easy to see that the rejection region at significance level α should be © x − µ0 ª √ > zα . x: σ/ n One proceeds similarly in the case where > is replaced by <. Since we were dealing with normal distributions the computations of the above probabilities were exact. In the case that we want to test the mean of a general distribution, we to make use of the Central Limit Theorem and then proceed similarly. The only thing that will change is that the equation P (|Z| > zα/2 ) = α will be replaced by P (|Z| > zα/2 ) ' α. As in the case of confidence intervals with unknown variance, when the variance of the distribution is unknown, we will have to replace it with th esample variance ŝ2 . Then we also need to make use of the t-distribution, instead of the normal. In this case the test statistic that we will be using is X − µ0 √ . s/ n The rejection region at significance level α (in the case of o two-sided hypothesis testing) will be ¯ ¯ © ¯ x − µ0 ¯ ª ¯ x : ¯ √ ¯¯ > tn−1,α/2 σ/ n where tn−1,α/2 is quantile for the t-distribution with (n − 1) degrees of freedom (if our sample size is n). The smallest significance level at which the null hypothesis would be rejected is called the p-value. Example 1. A stock trading company institutes a new system, in order to reduce the trade time of a stock. The mean waiting time under the specific conditions with the previous system was 6.1 minutes. A sample of 14 stock trades is taken. The times are measured at widely separated times so to eliminate the possibility of dependent observations. The resulting sample mean is 5.043 and the sample standard deviation is 2.266 Test the null hypothesis of no change against an appropriate research hypothesis using α = .10 8 We are interested in the value of the mean trading time and if the new system reduces it. Since the current mean waiting time is 6.1 we can formulate the null and alternative hypothesis as H0 : µ = 6.1 H1 : µ < 6.1 Since we use the sample standard deviation we will use the quantiles of the tdistribution. We form the t- test statistic X −µ 5.043 − 6.1 √ = √ = −1.75 s/ n 2.266/ 14 For α = .10 and 13 degrees of freedom we have the the quantile is t.10 = 1.350 and the rejection region is © ª t : t < −1.350 This is because we are dealing we one-sided hypothesis testing. Since the observed value belongs into the rejection region we reject the null hypothesis in favor the alternative hypothesis. The p-value is equal to P (T13 < −1.75), where T13 is a random variable with t-distribution and 13 degrees of freedom. The exact value is found using the appropraite tables or software. t= We summarise the hypothesis testing for the mean when the standard deviation is unkown in the follwoing table H0 µ = µ0 H1 : 1.µ > µ0 2.µ < µ0 3.µ 6= µ0 X − µ0 √ T.S. : t = s/ n R.R > : For a given probability α of Type I error, reject H0 if 1.t > tα 2.t < −tα 3.|t| ≥ tα/2 where tα cuts off a right-tail are of α in a t distribution with n − 1 degrees of freedom p-value 1.P (Tn−1 > tactual ) 2.P (Tn−1 < tactual ) 3. 2P (Tn−1 > |tactual |) 9 1.3. Exercises. 1. A random sample of 20 vice executives of Fortune 500 firms is taken. The amount each executive paid in federal income taxes as a percentage of gross income is determined. The data are 16.0 18.1 18.6 20.2 21.7 22.4 22.4 23.1 23.2 23.5 24.1 24.3 24.7 25.2 25.9 26.3 27.9 28.0 30.4 33.7 A. Compute the sample mean and the sample standard deviation. B. Calculate a 95% confidence interval for the (population) mean. C. Calculate a 99% confidence interval for the (population) mean. D. Calculate a 95% confidence interval for the (population) variance. E. Calculate a 99% confidence interval for the (population) variance. F. GIve a careful verbal interpretation of the above confidence intervals. 2. In the above exercise repeat question A., B. assuming that you know that the population standard deviation is σ = 4.0. 3. Use Minitab to compute the confidence intervals in Exercise 1. 4. Often we are interested in how large a sample we need to take in order to have an appropriate confidence interval. This is outlined in the following statement: The sample size required to obtain a 100(1 − α)% confidence interval for a population mean µ of the form X ± E (assuming that the population standard deviation σ is known) is 2 zα/2 σ2 n= . E2 A. Derive the above statement, i.e. prove it ! B. What would be the corresponding statement if the population standard deviation σ is unkown ? Prove it ! Note: Often 2E is called the width of the confidence interval. 5. Union officials are concerned about reports of inferior wages being paid to employees of a company under its jurisdiction. How large a sample is needed to obtain a 90% confidence interval for the population mean hourly wage µ with width equal to 1.00£. Assume that σ = 4.00£. 6. The manager of a health maintenance organization has set as a target that the mean waiting timeof nonemergency patients not exceed 30 minutes. In spot checks the manager finds the waiting times for 22 patients. The patients are selected randomly on different days. Assume that the population standard deviation of waiting times is 10 minutes. A. What is teh relevant parameter to be tested ? B. Formulate the null and alternative hypotheses. C. State the test statistics and the rejection region corresponding to α = .05 10 7. The battery pack of a hand calculator is supposed to perform 20, 000 calculations before needing recharge. The quality control manager for the manufacturer is concerned that the pack may not be working for as long as the specifications state. A test of 114 battery packs gives average of 19, 695 calculations and a standard deviation of 1103. A. Formulate the null and alternative hypothesis. B. Calculate the appropriate test statistic and p-value. C. Calculate a 95% confidence interval. 8. Use Minitab to confirm the example given in section ??. That is genearate 20 samples of size 9 each from a normal with mean 10 and variance 9. Construct (using Minitab or by hand) the 20 corresponding 90% confidence intervals for the mean. How many contain the actual value µ = 10 Do the same thing by constructing the confidence intervals for the variance.