Significance testing

advertisement
Applied Quantitative Methods
MBA course Montenegro
Peter Balogh
PhD
baloghp@agr.unideb.hu
Significance testing
Significance testing
• Significance testing aims to make statements about a
population parameter, or parameters, on the basis of
sample evidence.
• It is sometimes called point estimation.
• The emphasis here is on testing whether a set of
sample results support, or are consistent with, some
fact or supposition about the population.
• We are thus concerned to get a 'Yes' or 'No' answer
from a significance test; either the sample results do
support the supposition, or they do not.
• As you might expect, the two ideas of significance
testing and confidence intervals are so closely linked, all
of the assumptions that we have made in Chapter 11
about the underlying sampling distribution remain the
same for significance tests.
Significance testing
• Since we are dealing with samples, and not a
census, we can never be 100% sure of our results
(since sample results will vary from sample to
sample and from the population parameter in a
known way - see Figure 11.5).
• However, by testing an idea (or hypothesis) and
showing that it is very, very unlikely to be true we
can make useful statements about the population
which can form the basis for business decision
making.
Significance testing
• Significance testing is about putting forward an idea or
testable proposition that captures the properties of
the problem that we are interested in.
• The idea we are testing may have been obtained from a
past census or survey, or it may be an assertion that we
put forward, or has been put forward by someone else,
for example, an assertion that a majority of people
support legislation to enhance animal welfare.
• Similarly, it could be a critical value in terms of product
planning; for example, if a new machine is to be
purchased but the process is only viable with a certain
level of throughput, we would want to be sure that the
company can sell, or use, that quantity before
commitment to the purchase of that particular type of
machine.
Significance testing
• Advertising claims could also be tested using this
method by taking a sample and testing if the
advertised characteristics, for example strength,
length of life, or percentage of people preferring
this brand, are supported by the evidence of the
sample.
• In developing statistical measures we may want to
test if a certain parameter is significantly different
from zero; this has useful applications in the
development of regression and correlation models
(see Chapters 15-17).
Significance testing
• Taking the concept a stage further, we may have a
set of sample results from two regions giving the
percentage of people purchasing a product, and
want to test whether there is a significant
difference between the two percentages.
• Similarly, we may have a survey which is carried out
on an annual basis and wish to test whether there
has been a significant shift in the figures since last
year.
• In the same way that we created a confidence
interval for the differences between two samples,
we can also test for such differences.
Significance testing
• Finally we will also look at the situation where only
small samples are available and consider the
application of the t-distribution to significance
testing; another concept to which we will return in
Chapters 15 and 16.
12.1 Significance testing using
confidence intervals
• Significance testing is concerned with accepting or
rejecting ideas.
• These ideas are known as hypotheses.
• If we wish to test one in particular, we refer to it as
the null hypothesis.
• The term 'null' can be thought of as meaning no
change or no difference from the assertion.
• As a procedure, we would first state a null
hypothesis; something we wish to judge as true or
false on the basis of statistical evidence.
12.1 Significance testing using
confidence intervals
• We would then check whether or not the null
hypothesis was consistent with the confidence
interval.
• If the null hypothesis is contained within the confidence interval it will be accepted, otherwise, it will
be rejected.
• A confidence interval can be regarded as a set of
acceptable hypotheses.
• To illustrate significance testing consider an
example from Chapter 11.
12.1 Significance testing using
confidence intervals
Case study
• In the Arbour Housing Survey, it was found that the
average monthly rent was £221.25 and that the
standard deviation was £89.48, on the basis of 180
responses (see Section 11.3.1).
12.1 Significance testing using
confidence intervals
The 95% confidence interval was constructed as
follows:
12.1 Significance testing using
confidence intervals
• Now suppose the purpose of the survey was to test the
view that the amount paid in monthly rent was about
£200.
• Given that the confidence interval can be regarded as a
set of acceptable hypotheses, the null hypothesis
(denoted by H0) would be written as:
H0 : µ = £200
• As the value of £200 is not included within the
confidence interval the null hypothesis must be
rejected.
• The values of the null hypothesis that we can accept or
reject are shown on Figure 12.2.
12.1 Significance testing using
confidence intervals
• In rejecting the view that the average rent is £200
we must also accept that there is a chance that our
decision could be wrong.
• There is a 2.5% chance that the average is less than
£208.18 (which would include an average of £200)
and a 2.5% chance that the average is greater than
£234.32.
• As we shall see (Section 12.4) there is a probability
of making a wrong decision, which we need to
balance against the probability of making the
correct decision.
12.2 Hypothesis testing for single samples
• Hypothesis testing is merely an alternative name
for significance tests, the two being used
interchangeably.
• This name does stress that we are testing some
supposition about the population, which we can
write down as a hypothesis.
• In fact, we will have two hypotheses whenever we
conduct a significance test: one relating to the
supposition that we are testing, and one that
describes the alternative situation.
12.2 Hypothesis testing for single samples
• The first hypothesis relates to the claim, supposition
or previous situation and is usually called the null
hypothesis and labeled as H0.
• It implies that there has been no change in the
value of the parameter that we are testing from
that which previously existed; for example, if the
average spending per week on beer last year among
18-25 year olds was £15.53, then the null
hypothesis would be that it is still £15.53.
• If it has been claimed that 75% of consumers prefer
a certain flavour of icecream, then the null
hypothesis would be that the population
percentage preferring that flavour is equal to 75%.
12.2 Hypothesis testing for single samples
• The null hypothesis could be written out as a
sentence, but it is more usual to abbreviate it to:
• for a mean where µ0 is the claimed or previous
population mean. For a percentage:
• where π0 is the claimed or previous population
percentage.
12.2 Hypothesis testing for single samples
• The second hypothesis summarizes what will be the
case if the null hypothesis is not true.
• It is usually called the alternative hypothesis (fairly
obviously!), and is labelled as HA or as H1 depending on
which text you follow: we will use the H1 notation here.
• This alternative hypothesis is usually not specific, in
that it does not usually specify the exact alternative
value for the population parameter, but rather, it just
says that some other value is appropriate on the basis
of the sample evidence; for example, the mean amount
spent is not equal to £15.53, or the percentage
preferring this flavour of icecream is not 75%.
12.2 Hypothesis testing for single samples
• As before, this hypothesis could be written out as a
sentence, but a shorter notation is usually
preferred:
• for a mean where µ0 is the claimed or previous
population mean.
• For a percentage:
• where π0 is the claimed or previous population
percentage.
12.2 Hypothesis testing for single samples
• Whenever we conduct a hypothesis test, we assume
that the null hypothesis is true while we are doing the
test, and then come to a conclusion on the basis of the
figures that we calculate during the test.
• Since the sampling distribution of a mean or a
percentage (for large samples) is given by the Normal
distribution, to conduct a test, we compare the sample
evidence (sample statistic) with the null hypothesis
(what is assumed to be true).
• This difference is measured by a test statistic. A test
statistic of zero or reasonably near to zero would
suggest the null hypothesis is correct.
• The larger the value of the test statistic the larger the
difference between the sample evidence and the null
hypothesis.
12.2 Hypothesis testing for single samples
• If the difference is sufficiently large we conclude that
the null hypothesis is incorrect and accept the
alternative hypothesis.
• To make this decision we need to decide on a critical
value or critical values.
• The critical value or values define the point at which
the chance of the null hypothesis being true is at a
small, predetermined level, usually 5% or 1% (called
the significance level).
• In this chapter you will see the z-value being used as a
test statistic.
• It is more usual to divide the Normal distribution
diagram into sections or areas, and to see whether the
z-value falls into a particular section; then we may
either accept or reject the null hypothesis.
12.2 Hypothesis testing for single samples
• However, it is worth noting that some statistical
packages just calculate the probability of the null
hypothesis being true when you use them to
conduct tests.
• The reason for doing this is to see how likely or
unlikely the result is.
• If the result is particularly unlikely when the null
hypothesis is true, then we might begin to question
whether this is, in fact, the case.
• Obviously we will need to define more exactly the
phrase 'particularly unlikely' in this context before
we can proceed with hypothesis tests.
12.2 Hypothesis testing for single samples
• Most tests are conducted at the 5% level of
significance, and you should recall that in the
normal distribution, the z-values of +1.96 and -1.96
cut off a total of 5% of the distribution, 2.5% in each
tail.
• If a calculated z-value is between -1.96 and +1.96,
then we accept the null hypothesis; if the calculated
z-value is below -1.96 or above +1.96, we reject the
null hypothesis in favour of the alternative
hypothesis.
• This situation is illustrated in Figure 12.3.
12.2 Hypothesis testing for single samples
• If we were to conduct the test at the 1% level of
significance, then the two values used to cut off the
tail areas of the distribution would be +2.576 and 2.576.
• For each test that we wish to conduct, the basic
layout and procedure will remain the same,
although some of the details will change, depending
on what exactly we are testing.
• A proposed layout is given below, and we suggest
that by following this you will present clear and
understandable significance tests (and not leave out
any steps).
12.2 Hypothesis testing for single samples
• While the significance level may change, which will
lead to a change in the critical values, the major
difference as we conduct different types of
hypothesis test is in the way in which we calculate
the z-value at step 4.
12.2.1 A test statistic for a population mean
• In the case of testing for a particular value for a
population mean, the formula used to calculate z in
step 4 is:
• As we saw in Chapter 11, the population standard
deviation (σ) is rarely known, and we use the
sample standard deviation (s) in its place.
• We are assuming that the null hypothesis is true,
and so µ=µ0
12.2.1 A test statistic for a population mean
• The formula will become:
• We are now in a position to carry out a test.
12.2.1 A test statistic for a population mean
Example
• A production manager claims that an average of 50
boxes per hour are filled with finished goods at the
final stage of a production line.
• A random sampling of 48 different workers, at
different times, working at the end of identical
production lines shows an average number of boxes
filled as 47.5 with a standard deviation of 0.7 boxes.
• Does this evidence support the assertion by the
production manager at the 5% level of significance?
12.2.1 A test statistic for a population mean
• Step 1: The null hypothesis is based on the
production manager's assertion:
12.2.1 A test statistic for a population mean
• Although there is only a 2.5 difference between the
claim and the sample mean in this example, the test
shows that the sample result is significantly
different from the claimed value.
• In all hypothesis testing, it is not the absolute
difference between the values which is important,
but the number of standard errors that the sample
value is away from the claimed value.
• Statistical significance should not be confused with
the notion of business significance or importance.
12.2.2 A test for a population percentage
• Here the process will be identical to the one followed
above, except that the formula used to calculate the zvalue will need to be changed.
• (You should recall from Chapter 11 that the standard
error for a percentage is different from the standard
error for a mean.)
• The formula will be:
where p is the sample percentage and Π0 is the claimed
population percentage (remember that we are
assuming that the null hypothesis is true).
12.2.2 A test for a population percentage
Example
• An auditor claims that 10% of invoices for a
company are incorrect.
• To test this claim a random sample of 100 invoices
are checked, and 12 are found to be incorrect.
• Test, at the 5% significance level, if the auditor's
claim is supported by the sample evidence.
12.2.2 A test for a population percentage
12.2.2 A test for a population percentage
• Step 5: The calculated value falls between the two
critical values.
• Step 6: We therefore cannot reject the null
hypothesis.
• Step 7: The evidence from the sample is consistent
with the auditor's claim that 10% of the invoices are
incorrect.
12.2.2 A test for a population percentage
• In this example, when the calculated value falls
between the two critical values.
• The answer is to not reject the claim of 10% rather
than to accept the claim.
• This is because we only have the sample evidence to
work from, and we are aware that it is subject to
sampling error.
• The only way to firmly accept the claim would be to
check every invoice (i.e. carry out a census) and work
out the actual population percentage which are
incorrect.
• When we calculated a confidence interval for the
population percentage (in Section 11.4) we used a
slightly different formulation of the sampling error.
12.2.2 A test for a population percentage
• There we used
• because we did not know the value of the
population percentage; we were using the sample
percentage as our best estimate, and also as the
only available value.
• When we come to significance tests, we have
already made the assumption that the null
hypothesis is true (while we are conducting the
test), and so we can use the hypothesized value of
the population percentage in the calculations.
12.2.2 A test for a population percentage
• The formula for the sampling error will thus be
• In many cases there would be very little difference
in the answers obtained from the two different
formulae, but it is good practice (and shows that
you know what you are doing) to use the correct
formulation.
12.2.2 A test for a population percentage
Case study
• It has been claimed on the basis of census results
that 87% of households in Tonnelle now have
exclusive use of a fixed bath or shower with a hot
water supply.
• In the Arbour Housing Survey of this area, 246
respondents out of the 300 interviewed reported
this exclusive usage.
• Test at the 5% significance level whether this claim
is supported by the sample data.
12.2.2 A test for a population percentage
12.2.2 A test for a population percentage
• Step 5: The calculated value falls below the critical value of
-1.96.
• Step 6: We therefore reject the null hypothesis at the 5%
significance level.
• Step 7: The sample evidence does not support the view that
87% of households in the Tonnelle area have exclusive use of
a fixed bath or shower with a hot water supply.
• Given a sample percentage of 82% we could be tempted to
conclude that the percentage was lower - the sample does
suggest this, but the test was not structured in this way and
we must accept the alternative hypothesis that the
population percentage is not likely to be 87%.
• We will consider this issue again when we look at one-sided
tests.
12.3 One-sided significance tests
• The significance tests shown in the last section may
be useful if all we are trying to do is test whether or
not a claimed value could be true, but in most cases
we want to be able to go one step further.
• In general, we would want to specify whether the
real value is above or below the claimed value in
those cases where we are able to reject the null
hypothesis.
12.3 One-sided significance tests
• One-sided tests will allow us to do exactly this.
• The method employed, and the appropriate test
statistic which we calculate, will remain exactly the
same; it is the alternative hypothesis and the
interpretation of the answer which will change.
• Suppose that we are investigating the purchase of
cigarettes, and know that the percentage of the adult
population who regularly purchased last year was 34%.
• If a sample is selected, we do not want to know only
whether the percentage purchasing has changed, but
rather whether it has decreased (or increased).
• Before carrying out the test it is necessary to decide
which of these two propositions you wish to test.
12.3 One-sided significance tests
• If we wish to test whether or not the percentage
has decreased, then our hypotheses would be:
where π0 is the actual percentage in the population
last year.
• This may be an appropriate hypothesis test if you
were working for a health lobby.
12.3 One-sided significance tests
• If we wanted to test if the percentage had
increased, then our hypotheses would be:
• This could be an appropriate test if you were
working for a manufacturer in the tobacco industry
and were concerned with the effects of improved
packaging and presentation.
12.3 One-sided significance tests
• To carry out the test, we will want to concentrate the
chance of rejecting the null hypothesis at one end of
the Normal distribution.
• Where the significance level is 5%, then the critical
value will be -1.645 (i.e. the cut off value taken from
the Normal distribution tables) for the hypotheses
H0: π = π0 , H1: π < π0; and +1.645 for the hypotheses
H0: π = π0, H1 : π > π0. (Check these figures from
Appendix C.)
• Where the significance level is set at 1%, then the
critical value becomes either -2.33 or +2.33.
• In terms of answering examination questions, it is
important to read the wording very carefully to
determine which type of test you are required to
perform.
12.3 One-sided significance tests
• The interpretation of the calculated z-value is now
merely a question of deciding into which of two
sections of the Normal distribution it falls.
12.3.1 A one-sided test for a population mean
• Here the hypotheses will be in terms of the
population mean, or the claimed value.
• Consider the following example.
12.3.1 A one-sided test for a population mean
•
•
•
•
•
Example
A manufacturer of batteries has assumed that the
average expected life is 299 hours.
As a result of recent changes to the filling of the
batteries, the manufacturer now wishes to test if
the average life has increased.
A sample of 200 batteries was taken at random
from the production line and tested.
Their average life was found to be 300 hours with a
standard deviation of 8 hours.
You have been asked to carry out the appropriate
hypothesis test at the 5% significance level.
12.3.1 A one-sided test for a population mean
(Note that we are still assuming the null hypothesis
to be true while the test is conducted.)
12.3.1 A one-sided test for a population mean
• Step 5: The calculated value is larger than the
critical value.
• Step 6: We may therefore reject the null hypothesis.
• (Note that had we been conducting a two-sided
hypothesis test, then we would have been unable
to reject the null hypothesis, and so the conclusion
would have been that the average life of the
batteries had not changed.)
• Step 7: The sample evidence supports the
supposition that the average life of the batteries has
increased by a significant amount.
12.3.1 A one-sided test for a population mean
• Although the significance test has shown that there
has been a significant increase in the average length
of life of the batteries, this may not be an
important conclusion for the manufacturer.
• For instance, it would be quite misleading to use it
to back up an advertising campaign which claimed
that 'our batteries now last even longer!'.
• Whenever hypothesis tests are used it is important
to distinguish between statistical significance and
importance.
• This distinction is often ignored.
12.3.1 A one-sided test for a population mean
•
•
•
•
•
Case study
It has been argued that, because of the types of property and
the age of the population in Tonnelle, average mortgages are
likely to be around £200 a month.
However, the Arbour Housing Trust believe the figure is
higher because many of the mortgages are relatively recent
and many were taken out during the house price boom.
Test this proposition, given the result from the Arbour
Housing Survey that 100 respondents were paying an average
monthly mortgage of £254.
The standard deviation calculated from the sample is £72.05
(it was assumed to be £70 in an earlier example).
Use a 5% significance level.
12.3.1 A one-sided test for a population mean
• The test statistic value of 7.49 is greater than the
critical value of +1.645.
• On this basis, we can reject the null hypothesis that
monthly mortgages are around £200 in favour of the
alternative that they are likely to be more.
12.3.1 A one-sided test for a population mean
• A test provides a means to clarify issues and resolve
different arguments.
• In this case the average monthly mortgage payment
is higher than had been argued but this does not
necessarily mean that the reasons put forward by
the Arbour Housing Trust for the higher levels are
correct.
• In most cases, further investigation is required.
• It is always worth checking that the sample
structure is the same as the population structure.
12.3.1 A one-sided test for a population mean
• In an area like Tonnelle we would expect a range of
housing and it can be difficult for a small sample to
reflect this.
• The conclusions also refer to the average.
• The pattern of mortgage payments could vary
considerably between different groups of residents.
• It is known that the Victorian housing is again
attracting a more affluent group of residents, and
they could have the effect of pulling the mean
upwards.
• Finally, the argument for the null hypothesis could
be based on dated information, e.g. the last census.
12.3.2 A one-sided test for a population percentage
• Here the methodology is exactly the same as that
employed above, and so we just provide two examples
of its use.
Example
• A small political party expects to gain 20% of the vote in
by-elections.
• A particular candidate from the party is about to stand
in a by-election in Derbyshire South East, and has
commissioned a survey of 200 randomly selected
voters in the constituency.
• If 44 of those interviewed said that they would vote for
this candidate in the forthcoming by-election, test
whether this would be significantly above the national
party's claim.
• Use a test at the 5% significance level.
12.3.2 A one-sided test for a population percentage
12.3.2 A one-sided test for a population percentage
• Here there may not be evidence of a statistically
significant vote above the national party's claim (a
sample size of 1083 would have made this 2%
difference statistically significant), but if the
candidate does manage to achieve 22% of the vote,
and it is a close contest over who wins the seat
between two other candidates, then the extra 2%
could be very important.
12.3.2 A one-sided test for a population percentage
•
•
•
•
•
Example
A fizzy drink manufacturer claims that over 40% of
teenagers prefer its drink to any other.
In order to test this claim, a rival manufacturer
commissions a market research company to
conduct a survey of 200 teenagers and ascertain
their preferences in fizzy drinks.
Of the randomly selected sample, 75 preferred the
first manufacturers drink.
Test at the 5% level if this survey supports the claim.
Sample percentage is:
12.3.2 A one-sided test for a population percentage
12.3.2 A one-sided test for a population percentage
•
•
•
•
Case study
The Arbour Housing Survey was an attempt to describe
local housing conditions on the basis of responses to a
questionnaire.
Many of the characteristics will be summarized in
percentage terms, e.g. type of property, access to
baths, showers, flush toilets, kitchen facilities.
Such enquiries investigate whether these percentages
have increased or decreased over time, and whether
they are higher or lower than in other areas.
If this increase or decrease, or higher or lower value is
specified in terms of an alternative hypothesis, the
appropriate test will be one-sided.
12.3.3 Producer’s risk and consumers’ risk
• One-sided tests are sometimes referred to as
testing producers' risks or consumers' risks.
• If we are looking at the amount of a product in a
given packet or lot size, then the reaction to
variations in the amount may be different from the
two
• groups.
• Say, for instance, that the packet is sold as
containing 100 items.
• If there are more than 100 items per packet, then
the producer is effectively giving away some items,
since only 100 are being charged for.
12.3.3 Producer’s risk and consumers’ risk
• The producer, therefore, is concerned not to overfill the
packets but still meet legal requirements.
• In this situation, we might presume that the consumer is
quite happy, since some items come free of charge.
• In the opposite situation of less than 100 items per packet,
the producer is supplying less than 100 but being paid for 100
(this, of course, can have damaging consequences for the
producer in terms of lost future sales), while the consumers
receive less than expected, and are therefore likely to be less
than happy.
• Given this scenario, one would expect the producer to
conduct a onesided test using H1: µ > µ0, and a consumer
group to conduct a one-sided test using H1: µ < µ0.
• There may, of course, be legal implications for the producer in
selling packets of 100 which actually contain less than 100.
• (In practice most producers will, in fact, play it safe and aim to
meet any minimum requirements.)
12.4 Types of error
• We have already seen, throughout this section of
the book, that samples can only give us a partial
view of a population; there will always be some
chance that the true population value really does lie
outside of the confidence interval, or that we will
come to the wrong decision when conducting a
significance test.
• In fact these probabilities are specified in the names
that we have already used - a 95% confidence
interval and a test at the 5% level of significance.
• Both imply a 5% chance of being wrong.
12.4 Types of error
• You might want to argue that this number is too big,
since it gives a 1 in 20 chance of making the wrong
decision, but it has been accepted in both business
and social research over many years.
• One reason is that we are dealing with a situation
where we rely on people telling us the 'truth' which
they may not always do!
• Using a smaller number, say 1%, would also imply a
spurious level of accuracy in our results.
• (Even in medical research, the figure of 5% may be
used.)
12.4 Types of error
• If you consider significance tests a little further,
however, you will see that there are, in fact, two
different ways of getting the wrong answer. You
could throw out the claim when it is, in fact, true; or
you could fail to reject it when it is, in fact, false.
• It becomes important to distinguish between these
two types of error.
• As you can see from Table 12.1, the two different
types of error are referred to as Type I and Type II.
12.4 Types of error
• Table 12.1 Possible results of a hypothesis test
• A Type I error is the rejection of a null hypothesis when it is
true.
• The probability of this is known as the significance level of
the test.
• This is usually set at either 5% or 1% for most business
applications, and is decided on before the test is conducted.
• A Type II error is the failure to reject a null hypothesis which is
false.
• The probability of this error cannot be determined before the
test is conducted.
12.4 Types of error
• Consider a more everyday situation.
• You are standing at the curb, waiting to cross a busy main
road.
• You have four possible outcomes:
• If you decide not to cross at this moment, and there is
something coming, then you have made a correct decision.
• If you decide not to cross, and there is nothing coming, you
have wasted a little time, and made a Type II error.
• If you decide to cross and the road is clear, you have made a
correct decision.
• If you decide to cross and the road isn't clear, as well as the
likelihood of injury, you have made a Type I error.
• The error which can be controlled is the Type I error; and this
is the one which is set before we conduct the test.
12.4.1 Background theory
12.4.1 Background theory
• Consider the sampling distribution of z (which we
calculate in step 4), consistent with the null hypothesis,
H0: µ = µ0 illustrated as Figure 12.5.
• The two values for the test-statistic, A and B, shown in
Figure 12.5, are both possible, with B being less likely
than A.
• The construction of this test, with a 5% significance
level, would mean the acceptance of the null
hypothesis when A was obtained and its rejection when
B was obtained.
• Both values could be attributed to an alternative
hypothesis, H1: µ=µ1, which we accept in the case of B
and reject for A.
12.4.1 Background theory
• It is worth noting that if the test statistic follows a
sampling distribution we can never be certain
about the correctness of our decision.
• What we can do, having fixed a significance level
(Type I error), is construct the rejection region to
minimize Type II error.
• Suppose the alternative hypothesis was that the
mean for the population, µ, was not µ0 but a larger
value µ1.
12.4.1 Background theory
• This we would state as:
• H1 : µ= µ1, where µ1 > µ0 or, in the more general
form:
• H1 : µ > µ0 .
• If we keep the same acceptance and rejection
regions as before (Figure 12.5), the Type II error is
as shown in Figure 12.6.
• Figure 12.6 The Type II error resulting from a two-tailed test.
12.4.1 Background theory
• As we can see, the probability of accepting H0 when H1
is correct can be relatively large.
• If we test the null hypothesis H0: µ = µ0 against an
alternative hypothesis of the form H1: µ < µ0 or
H1: µ > µ0 , we can reduce the size of the Type II error
by careful definition of the rejection region.
• If the alternative hypothesis is of the form H1: µ < µ0, a
critical value of z = -1.645 will define a 5% rejection
region in the left-hand tail, and if the alternative
hypothesis is of the form H1: µ > µ0 a critical value of z =
1.645, will define a 5% rejection region in the righthand tail.
• The reduction in Type II error is illustrated in Figure
12.7.
12.4 Types of error
12.4 Types of error
• It can be seen from Figure 12.7 that the test statistic
now rejects the null hypothesis in the range 1.6451.96 as well as values greater than 1.96.
• If we construct one-sided tests, there is more
chance that we will reject the null hypothesis in
favour of a more radical alternative that the
parameter has a larger value (or has a smaller
value) than specified when that alternative is true.
• Note that if the alternative hypothesis was of the
form H1: µ < µ0, the rejection region would be
defined by the critical value z = -1.645 and we
would reject the null hypothesis if the test statistic
took this value or less.
12.5 Hypothesis testing with two samples
• All of the tests so far have made comparisons between
some known, or claimed, population value and a set of
sample results.
• In many cases we may know little about the actual
population value, but will have available the results of
another survey.
• Here we want to find if there is a difference between
the two sets of results.
• This could be a situation where we have two surveys
conducted in different parts of the country, or at
different points in time.
• Our concern now is to determine if the two sets of
results are consistent with each other (come from the
same parent population) or whether there is a
difference between them.
12.5 Hypothesis testing with two samples
• Such comparisons may be important in a marketing
context, for instance, looking at whether or not
there are different perceptions of the product in
different regions of the country.
• They could be used in performance appraisal of
employees to answer questions on whether the
average performance in different regions is, in fact,
different.
• They could be used in assessing a forecasting
model, to see if the values have changed
significantly from one year to the next.
12.5 Hypothesis testing with two samples
• Again we can use the concepts and methodology
developed in previous sections of this chapter.
• Tests may be two-sided to look for a difference, or
onesided to look for a significant increase (or
decrease).
• Similarly, tests may be carried out at the 5% or the
1% level, and the interpretation of the results will
depend upon the calculated value of the test
statistic and its comparison to a critical value.
• It will be necessary to rewrite the hypotheses to
take account of the two samples and to find a new
formulation for the test statistic (step 4 of the
process).
12.5.1 Tests for a difference of means
• In order to be clear about which sample we are
talking about, we will use the suffixes 1 and 2 to
refer to sample 1 and sample 2.
• Our assumption is that the two samples do, in fact,
come from the same population, and thus the
population means associated with each sample
result will be the same.
• This assumption will give us our null hypothesis:
12.5.1 Tests for a difference of means
• The alternative hypothesis will depend on whether
we are conducting a two-sided test or a one-sided
test.
• If it is a two-sided test, the alternative hypothesis
will be:
• and if it is a one-sided test, it would be either
12.5.1 Tests for a difference of means
• We could also test whether the difference between
the means could be assigned to a specific value, for
instance that the difference in take home pay
between two groups of workers was £25; here we
would use the hypotheses:
• The test statistic used is closely related to the
formula developed in Section 11.5.1 for the
confidence interval for the difference between two
means.
12.5.1 Tests for a difference of means
• It will be:
• but if we are using a null hypothesis which says that
the two population means are equal, then the
second bracket on the top line will be equal to zero,
and the formula for z will be simplified to:
12.5.1 Tests for a difference of means
• The example below illustrates the use of this test.
Example
• A union claims that the standard of living for its members in the
UK is below that of employees of the same company in Spain.
• A survey of 60 employees in the UK showed an average income of
£895 per week with a standard deviation of £120.
• A survey of 100 workers in Spain, after making adjustments for
various differences between the two countries and converting to
sterling, gave an average income of £914 with a standard deviation
of £ 90.
• Test at the 1% level if the Spanish workers earn more than their
British counterparts.
12.5.1 Tests for a difference of means
12.5.2 Tests for a difference in population percentage
• Again, adopting the approach used for testing the
difference of means, we will modify the hypotheses, using
π instead of µ for the population values, and redefine the
test statistic.
• Both one- and two-sided tests may be carried out.
• The test statistic will be:
• but if the null hypothesis is π1 – π2 = 0, then this is
simplified to:
12.5.2 Tests for a difference in population percentage
Example
• A manufacturer believes that a new promotional
campaign will increase favourable perceptions of
the product by 10%.
• Before the campaign, a random sample of 500
consumers showed that 20% had favourable
attitudes towards the product.
• After the campaign, a second random sample of
400 consumers found favourable reactions amongst
28%.
12.5.2 Tests for a difference in population percentage
• Use an appropriate test at the 5% level of
significance to find if there has
12.6 Hypothesis testing with small samples
• As we saw in Chapter 11, the basic assumption that
the sampling distribution for the sample parameters
is a Normal distribution only holds when the
samples are large (n > 30).
• Once we turn to small samples, we need to use a
different sampling distribution: the t-distribution
(see Section 11.7).
• Apart from this difference, which implies a change
to the formula for the test statistic (step 4 again!)
and also to the table of critical values, all of the
tests developed so far in this chapter may be
applied to small samples.
12.6.1 Single samples
• For a single sample, the test statistic for a population
mean is calculated by using:
with (n - 1) degrees of freedom.
• For a single sample, the test statistic for a population
percentage will be calculated using:
• and, again there will be (n - 1) degrees of freedom.
• (Note that, as with the large sample test, we use the
null hypothesis value of the population percentage in
the formula.) |
• Below are two examples to illustrate the use of the tdistribution in hypothesis testing.
12.6.1 Single samples
•
•
•
•
Example
A lorry manufacturer claims that the average annual
maintenance cost for its vehicles is £500.
The maintenance department of one of their
customers believes it to be higher, and to test this
randomly selects a sample of six lorries from their
large fleet.
From this sample, the mean annual maintenance
cost was found to be £555, with a standard
deviation of £75.
Use an appropriate hypothesis test, at the 5% level,
to find if the manufacturer's claim is valid.
12.6.1 Single samples
12.6.1 Single samples
•
•
•
•
Example
A company had a 50% market share for a newly
developed product last year.
It believes that as more entrants start to produce
this type of product, its market share will decline,
and in order to monitor this situation, decides to
select a random sample of 15 customers.
Of these, six have bought the company's product.
Carry out a test at the 5% level to find if there is
sufficient evidence to suggest that their market
share is now below 50%.
12.6.1 Single samples
12.6.1 Two samples
• Where we have two samples, we will again use
similar tests to Section 12.5 but with formulae
developed using the sampling errors developed in
Chapter 11.
• In the case of estimating a confidence interval for
the difference between two means the pooled
standard error (assuming that both samples have
similar variability) was given by:
12.6.1 Two samples
• Using this, we have the test statistic:
12.6.1 Two samples
•
•
•
•
•
•
Example
A company has two factories, one in the UK and one in
Germany.
The head office staff feel that the German factory is
more efficient than the British one, and to test this
select two random samples.
The British sample consists of 20 workers who take an
average of 25 minutes to complete a standard task.
Their standard deviation is 5 minutes.
The German sample has 10 workers who take an
average of 20 minutes to complete the same task, and
the sample has a standard deviation of 4 minutes.
Use an appropriate hypothesis test, at the 1% level, to
find if the German workers are more efficient.
12.6.1 Two samples
12.6.3 Matched pairs
• A special case arises when we are considering tests
of the difference of means if the two samples are
related in such a way that we may pair the
observations.
• This may arise when two different people assess the
same series of situations, for example different
interviewers assessing the same group of
candidates for a job.
• The example below illustrates the situation of
matched pairs.
12.6.3 Matched pairs
Example
• Seven applicants for a job were interviewed by two
personnel officers who were asked to give marks on
a scale of 1 to 10 to one aspect of the candidates'
performance.
• A summary of the marks given is shown below.
Marks given to candidate:
I
II
III
IV
V
VI
VII
Interviewer A
8
7
6
9
7
5
8
Interviewer B
7
4
6
8
5
6
7
12.6.3 Matched pairs
12.6.3 Matched pairs
Interviewer Interviewer Difference
(X)
A
B
8
7
1
7
4
3
6
6
0
9
8
1
7
5
2
5
6
-1
8
7
1
7
(x-x)2
0
4
1
0
1
4
0
10
12.6.3 Matched pairs
12.6.3 Matched pairs
• The use of such matched pairs is common in market
research and psychology.
• If, for reasons of time and cost, we need to work
with small samples, results can be improved using
paired samples since
it reduces the between-sample variation.
12.7 Conclusions
• The ideas developed in this chapter allow us to
think about statistical data as evidence, if we
consider the practice of law as an analogy, then a
set of circumstances require a case to be made, an
idea is proposed (that of innocence), an alternative
is constructed (that of guilt) and evidence is
construed to arrive at a decision.
• In statistical testing, a null hypothesis is proposed,
usually in terms of no change or no difference, an
alternative is constructed and data is used to accept
or reject the null hypothesis.
• Strangely, perhaps, these tests are not necessarily
acceptable in law.
12.7 Conclusions
• Given the size of many data sets, we could propose
many hypotheses and soon begin to find that we
were rejecting the more conservative null
hypotheses and accepting the more radical
alternatives.
• This does not, in itself, mean that there are
differences or that we have identified change or
that we have followed good statistical practice.
• We need to accept that with a known probability
(usually 5%), we can reject the null hypothesis
when it is correct - Type I error.
12.7 Conclusions
• We also need to recognize that if we keep looking
for relationships we will eventually find some.
• Testing is not merely about meeting the technical
requirements of the test but constructing
hypotheses that are meaningful in the problem
context.
• In an ideal world, before the data or evidence was
collected, the problem context would be clarified
and meaningful hypotheses specified.
12.7 Conclusions
• It is true that we can discover the unexpected from
data and identify important and previously
unknown relationships.
• The process of enquiry or research is therefore
complex, requiring a balance between the testing of
ideas we already have and allowing new ideas to
emerge.
• It should also be noted that a statistically significant
result may not have business significance.
• It could be that statistically significant differences
between regions, for example, were acceptable
because of regional differences in product
promotion and competitor activity.
12.7 Conclusions
• Since we are inevitably dealing with sample data
when we conduct tests of significance, we can
never be 100% sure of our answer - remember
Type I and Type II error (and other types of error).
• What significance testing does do is provide a way
of quantifying error and informing judgement.
Download