Statistical Inference and Tests of Hypotheses

advertisement
Statistical Inference and Tests of Hypotheses
Article by Pat McGillion, current Examiner - Formation 1 Business Mathematics &
Quantatitive Methods.
Introduction. Section 5 of the syllabus relates to sampling theory. A question on statistical inference
is frequently incorrectly attempted. The following discussion addresses some of the more salient
points.
Statistical Inference and Tests of Hypotheses. Problems of statistical inference arise just about
everywhere – in business, in science, in everyday life. In business a HR manager may want to know
how much variability there is in the absentee level of employees; in medicine a researcher may wish
to determine the average time that it takes an adult to react to a certain drug; an auditor may wish to
test the variability of errors in the company accounts; in everyday life the CEO of the Safety Council
may want to find out what percentage of one-car accidents are due to driver fatigue. Among the
above illustrations the HR manager’s and auditor’s problems concerns a measure of variation (a
standard deviation, perhaps), the researcher’s problem concerns a population mean and the
everyday safety problem concerns a percentage. Conceptually all such problems are treated in the
same way but there are differences in the particular methods employed.
These are examples of statistical inference which are problems of estimation. Statistical inference is
concerned with the way sample results are used to estimate or infer values for the population, that is,
a sample mean, x, is used to estimate the population average, μ. However, these are tests of
hypotheses if the HR manager wants to check whether the standard deviation of the absentee level
of employees is really 3 days; if the researcher wishes to decide if the average time it takes a patient
to respond to drug treatment is 2.3 weeks; if the Safety Council wants to confirm its belief that 35% of
all one-car accidents are due to driver fatigue. These decision problems are referred to as tests of
hypotheses.
To develop procedures for testing statistical hypotheses, it is necessary to know what to expect when
a hypothesis is true and, for this reason, we often hypothesise the opposite to what we hope to
prove. For example, if we wish to prove that one method of teaching is more effective than another
we hypothesise that the two methods are equally effective; if we wish to show that one car is more
fuel efficient than another, we hypothesise that they are equally efficient, that is, we hypothesise that
there is no difference between the two teaching methods or fuel efficient cars. These hypotheses are
called Null Hypotheses are denoted by H0. In effect, the concept ‘null hypothesis’ is used for any
hypothesis set up primarily to see whether it can be rejected. This concept is also common in nonstatistical processes, for example, in a court case an accused is presumed to be innocent until his
guilt is established beyond a reasonable doubt.
1
The null hypothesis challenges us with the view that the true mean, μ, is equal to a specified value,
μ0, and can be stated as H0: μ = μ0. This view can only be rejected on the basis of significant
statistical evidence. The rejection of this view necessarily implies the acceptance of another. This is
an alternative hypothesis denoted by H1 and is a statement of the view that we are prepared to
accept if we reject H0. If the test were concerned only with the view that the population mean were
equal to a certain value (that is, μ = μ0) or different (μ ≠ μ0), then the alternative hypothesis would
take the form H1: μ ≠ μ0. In testing the null hypothesis, there are two types of error that might occur
(i)
The Null Hypothesis may be rejected when it is true – a Type 1 error
(ii)
The Null hypothesis may be accepted when it is false – a Type 2 error.
Avoiding a Type 1 error is the main concern of problems in this area. When testing the null
hypothesis we state the maximum risk we are willing to accept in committing a Type 1 error, that is,
the probability of a Type 1 error. This is the level of significance and is typically either 5% (0.05) or
1% (0.01). With a 95% confidence interval, there is a 95% chance of any sample mean (x) lying
within 1.96 standard errors of the true population mean. However, there is still a 5% chance of a
single sample mean lying outside this 95% confidence interval. Therefore, a confidence interval can
be regarded as a set of acceptable hypotheses. If we are dealing with a Normal distribution, this 5%
chance can be split evenly between the two tails of the distribution. If the sample means lie outside
the confidence limits (x1 or x2), then the decision will be to reject H0 - even though we might be wrong
in rejecting it. However, the chance of being wrong is less than 5% since there was a less than 5%
chance of getting a sample mean (xi) outside the confidence limits when μ is the population mean.
The confidence limits for the population mean, μ, are regarded as the critical values for tests of
hypotheses. These are the values outside of which we are willing to reject the null hypothesis – the
values that are critical to the decision. The size of a Type 2 error can be reduced by careful definition
of the rejection region. If the level of significance is reduced (say from 5% to 1%) we are less willing
to reject the null hypothesis when it is in fact true. If a significance level of 1% is used, it will reduce
the chance of a Type 2 error occurring and excludes 1% of extreme observations.
If the z score, that is, [(sample mean – hypothesised mean)/standard error] is calculated for the
sample data and found to be outside the z critical value, then H0 is rejected.
In the case of a two-sided confidence interval the population mean as specified is accepted or the
alternative, that the true mean is larger or smaller, is accepted. If we wish to test the null hypothesis
that the population mean is less than the specified value (H1: μ < μ0 ) or greater (H1: μ > μ0 ), the test
2
would be one-sided. If we construct one-sided tests, there is a greater chance that the null
hypothesis will be rejected.
Therefore to test a hypothesis, the summarised steps are:
-
state the hypotheses for H0 and H1
-
state the significance level
-
state the critical values
-
calculate the z score, the test statistic [(x – μ)/(σ/√n)], for the sample; the sample standard
deviation, s, is used for σ, providing that the sample size is reasonably large
-
compare this z sample score with the z critical value/s
-
come to a conclusion: accept or reject H0
-
state the conclusion in words, that is, the sample evidence does (or does not) support the null
hypothesis at the stated significance level.
Application. The HR manager gives a number of examples of how this can operate.
(i)
From the company annual report, the HR manager believes that the average employee weekly
wage for the company is €1,000 with a standard deviation of €300. To confirm this, a sample of
225 employees finds the sample average wage to be €950. To test his belief the hypotheses
are stated as:
H0: μ = €1000; H1: μ ≠ €1000.
This is a two-tailed test since the alternative hypothesis (H1) has been expressed as ≠. Since
the alternative hypothesis (H1) says that μ ≠ €1000, we are also interested in situations where μ
> €1000 and μ < €1000. If the hypothesis is tested at a 5% level of significance, 2.5% is placed
in each tail of the distribution. H0 is rejected if we believe that μ > €1000 or μ < €1000.
The z score (critical value) for 5% is ± 1.96.
The z statistic = [(950 – 1000)/(300/√225)] = – 2.5.
Therefore, the evidence does not support the HR manager’s claim that the average wages are
€1000/wk.
(ii)
In negotiations with the unions’, the HR manager claims that the company’s electricians
are paid an average weekly wage of €800. The unions reject the claim believing that the
average wage is less. They sample 100 employees and obtain a weekly wage of €770
with a standard deviation of €200.
3
The hypotheses are stated as:
H0: μ = €800; H1: μ < €800.
This is a one-tailed test. The test statistic is -1.5. The rejection region corresponding to a 5%
significance level is defined by z = -1.645 or less. On the basis of this, H0 and the manager’s
claim would be accepted - based on the sample there is insufficient evidence to reject it
although the sample showed a difference of €30.
(iii)
The manager states that the average weekly wage of middle managers in the company is
€1200 but the financial accountant believes that the figure is in excess of this level. Based on a
sample of 100 managers with an average wage of €1300 and standard deviation of €300, the
test statistic is +3.3. This again is a one-tailed test and the hypotheses are stated as:
H0: μ = €1200; H1: μ > €1200.
The rejection region corresponding to a 5% significance level is defined by z = 1.645 or more.
Since the test statistic is in the rejection region (in the right- hand tail of the distribution) and is
greater than the critical value, the null hypothesis is rejected and the claim that the average
wages are in excess of €1200 is accepted.
Conclusion. In practice, the difference between the two calculated values for the standard
error may be small but it is good practice to remain consistent with the null hypothesis when
carrying out the test. It is assumed that the null hypothesis is correct and the statement is only
rejected if the test statistic produces an extreme value – a large number of standard errors from
the estimate. In the case of small samples it is not accurate to assume that the distribution of
sample means follows a normal distribution. The z tables can no longer be used to develop
probabilities. Instead the distribution of sample means follows the t distribution. This distribution
changes as the sample size changes. Hence, the comments previously about the need to
establish that the sample size is large.
4
Graphical representation of Acceptance and Rejection Regions
Accept H0
H1: μ < μ0
H1: μ > μ0
Z
Critical Values
subject to
Level of Significance
Rejection Regions for H0
5
Download