Statistical Inference and Tests of Hypotheses Article by Pat McGillion, current Examiner - Formation 1 Business Mathematics & Quantatitive Methods. Introduction. Section 5 of the syllabus relates to sampling theory. A question on statistical inference is frequently incorrectly attempted. The following discussion addresses some of the more salient points. Statistical Inference and Tests of Hypotheses. Problems of statistical inference arise just about everywhere – in business, in science, in everyday life. In business a HR manager may want to know how much variability there is in the absentee level of employees; in medicine a researcher may wish to determine the average time that it takes an adult to react to a certain drug; an auditor may wish to test the variability of errors in the company accounts; in everyday life the CEO of the Safety Council may want to find out what percentage of one-car accidents are due to driver fatigue. Among the above illustrations the HR manager’s and auditor’s problems concerns a measure of variation (a standard deviation, perhaps), the researcher’s problem concerns a population mean and the everyday safety problem concerns a percentage. Conceptually all such problems are treated in the same way but there are differences in the particular methods employed. These are examples of statistical inference which are problems of estimation. Statistical inference is concerned with the way sample results are used to estimate or infer values for the population, that is, a sample mean, x, is used to estimate the population average, μ. However, these are tests of hypotheses if the HR manager wants to check whether the standard deviation of the absentee level of employees is really 3 days; if the researcher wishes to decide if the average time it takes a patient to respond to drug treatment is 2.3 weeks; if the Safety Council wants to confirm its belief that 35% of all one-car accidents are due to driver fatigue. These decision problems are referred to as tests of hypotheses. To develop procedures for testing statistical hypotheses, it is necessary to know what to expect when a hypothesis is true and, for this reason, we often hypothesise the opposite to what we hope to prove. For example, if we wish to prove that one method of teaching is more effective than another we hypothesise that the two methods are equally effective; if we wish to show that one car is more fuel efficient than another, we hypothesise that they are equally efficient, that is, we hypothesise that there is no difference between the two teaching methods or fuel efficient cars. These hypotheses are called Null Hypotheses are denoted by H0. In effect, the concept ‘null hypothesis’ is used for any hypothesis set up primarily to see whether it can be rejected. This concept is also common in nonstatistical processes, for example, in a court case an accused is presumed to be innocent until his guilt is established beyond a reasonable doubt. 1 The null hypothesis challenges us with the view that the true mean, μ, is equal to a specified value, μ0, and can be stated as H0: μ = μ0. This view can only be rejected on the basis of significant statistical evidence. The rejection of this view necessarily implies the acceptance of another. This is an alternative hypothesis denoted by H1 and is a statement of the view that we are prepared to accept if we reject H0. If the test were concerned only with the view that the population mean were equal to a certain value (that is, μ = μ0) or different (μ ≠ μ0), then the alternative hypothesis would take the form H1: μ ≠ μ0. In testing the null hypothesis, there are two types of error that might occur (i) The Null Hypothesis may be rejected when it is true – a Type 1 error (ii) The Null hypothesis may be accepted when it is false – a Type 2 error. Avoiding a Type 1 error is the main concern of problems in this area. When testing the null hypothesis we state the maximum risk we are willing to accept in committing a Type 1 error, that is, the probability of a Type 1 error. This is the level of significance and is typically either 5% (0.05) or 1% (0.01). With a 95% confidence interval, there is a 95% chance of any sample mean (x) lying within 1.96 standard errors of the true population mean. However, there is still a 5% chance of a single sample mean lying outside this 95% confidence interval. Therefore, a confidence interval can be regarded as a set of acceptable hypotheses. If we are dealing with a Normal distribution, this 5% chance can be split evenly between the two tails of the distribution. If the sample means lie outside the confidence limits (x1 or x2), then the decision will be to reject H0 - even though we might be wrong in rejecting it. However, the chance of being wrong is less than 5% since there was a less than 5% chance of getting a sample mean (xi) outside the confidence limits when μ is the population mean. The confidence limits for the population mean, μ, are regarded as the critical values for tests of hypotheses. These are the values outside of which we are willing to reject the null hypothesis – the values that are critical to the decision. The size of a Type 2 error can be reduced by careful definition of the rejection region. If the level of significance is reduced (say from 5% to 1%) we are less willing to reject the null hypothesis when it is in fact true. If a significance level of 1% is used, it will reduce the chance of a Type 2 error occurring and excludes 1% of extreme observations. If the z score, that is, [(sample mean – hypothesised mean)/standard error] is calculated for the sample data and found to be outside the z critical value, then H0 is rejected. In the case of a two-sided confidence interval the population mean as specified is accepted or the alternative, that the true mean is larger or smaller, is accepted. If we wish to test the null hypothesis that the population mean is less than the specified value (H1: μ < μ0 ) or greater (H1: μ > μ0 ), the test 2 would be one-sided. If we construct one-sided tests, there is a greater chance that the null hypothesis will be rejected. Therefore to test a hypothesis, the summarised steps are: - state the hypotheses for H0 and H1 - state the significance level - state the critical values - calculate the z score, the test statistic [(x – μ)/(σ/√n)], for the sample; the sample standard deviation, s, is used for σ, providing that the sample size is reasonably large - compare this z sample score with the z critical value/s - come to a conclusion: accept or reject H0 - state the conclusion in words, that is, the sample evidence does (or does not) support the null hypothesis at the stated significance level. Application. The HR manager gives a number of examples of how this can operate. (i) From the company annual report, the HR manager believes that the average employee weekly wage for the company is €1,000 with a standard deviation of €300. To confirm this, a sample of 225 employees finds the sample average wage to be €950. To test his belief the hypotheses are stated as: H0: μ = €1000; H1: μ ≠ €1000. This is a two-tailed test since the alternative hypothesis (H1) has been expressed as ≠. Since the alternative hypothesis (H1) says that μ ≠ €1000, we are also interested in situations where μ > €1000 and μ < €1000. If the hypothesis is tested at a 5% level of significance, 2.5% is placed in each tail of the distribution. H0 is rejected if we believe that μ > €1000 or μ < €1000. The z score (critical value) for 5% is ± 1.96. The z statistic = [(950 – 1000)/(300/√225)] = – 2.5. Therefore, the evidence does not support the HR manager’s claim that the average wages are €1000/wk. (ii) In negotiations with the unions’, the HR manager claims that the company’s electricians are paid an average weekly wage of €800. The unions reject the claim believing that the average wage is less. They sample 100 employees and obtain a weekly wage of €770 with a standard deviation of €200. 3 The hypotheses are stated as: H0: μ = €800; H1: μ < €800. This is a one-tailed test. The test statistic is -1.5. The rejection region corresponding to a 5% significance level is defined by z = -1.645 or less. On the basis of this, H0 and the manager’s claim would be accepted - based on the sample there is insufficient evidence to reject it although the sample showed a difference of €30. (iii) The manager states that the average weekly wage of middle managers in the company is €1200 but the financial accountant believes that the figure is in excess of this level. Based on a sample of 100 managers with an average wage of €1300 and standard deviation of €300, the test statistic is +3.3. This again is a one-tailed test and the hypotheses are stated as: H0: μ = €1200; H1: μ > €1200. The rejection region corresponding to a 5% significance level is defined by z = 1.645 or more. Since the test statistic is in the rejection region (in the right- hand tail of the distribution) and is greater than the critical value, the null hypothesis is rejected and the claim that the average wages are in excess of €1200 is accepted. Conclusion. In practice, the difference between the two calculated values for the standard error may be small but it is good practice to remain consistent with the null hypothesis when carrying out the test. It is assumed that the null hypothesis is correct and the statement is only rejected if the test statistic produces an extreme value – a large number of standard errors from the estimate. In the case of small samples it is not accurate to assume that the distribution of sample means follows a normal distribution. The z tables can no longer be used to develop probabilities. Instead the distribution of sample means follows the t distribution. This distribution changes as the sample size changes. Hence, the comments previously about the need to establish that the sample size is large. 4 Graphical representation of Acceptance and Rejection Regions Accept H0 H1: μ < μ0 H1: μ > μ0 Z Critical Values subject to Level of Significance Rejection Regions for H0 5