Applied Quantitative Methods MBA course Montenegro Peter Balogh PhD baloghp@agr.unideb.hu Significance testing Significance testing • Significance testing aims to make statements about a population parameter, or parameters, on the basis of sample evidence. • It is sometimes called point estimation. • The emphasis here is on testing whether a set of sample results support, or are consistent with, some fact or supposition about the population. • We are thus concerned to get a 'Yes' or 'No' answer from a significance test; either the sample results do support the supposition, or they do not. • As you might expect, the two ideas of significance testing and confidence intervals are so closely linked, all of the assumptions that we have made in Chapter 11 about the underlying sampling distribution remain the same for significance tests. Significance testing • Since we are dealing with samples, and not a census, we can never be 100% sure of our results (since sample results will vary from sample to sample and from the population parameter in a known way - see Figure 11.5). • However, by testing an idea (or hypothesis) and showing that it is very, very unlikely to be true we can make useful statements about the population which can form the basis for business decision making. Significance testing • Significance testing is about putting forward an idea or testable proposition that captures the properties of the problem that we are interested in. • The idea we are testing may have been obtained from a past census or survey, or it may be an assertion that we put forward, or has been put forward by someone else, for example, an assertion that a majority of people support legislation to enhance animal welfare. • Similarly, it could be a critical value in terms of product planning; for example, if a new machine is to be purchased but the process is only viable with a certain level of throughput, we would want to be sure that the company can sell, or use, that quantity before commitment to the purchase of that particular type of machine. Significance testing • Advertising claims could also be tested using this method by taking a sample and testing if the advertised characteristics, for example strength, length of life, or percentage of people preferring this brand, are supported by the evidence of the sample. • In developing statistical measures we may want to test if a certain parameter is significantly different from zero; this has useful applications in the development of regression and correlation models (see Chapters 15-17). Significance testing • Taking the concept a stage further, we may have a set of sample results from two regions giving the percentage of people purchasing a product, and want to test whether there is a significant difference between the two percentages. • Similarly, we may have a survey which is carried out on an annual basis and wish to test whether there has been a significant shift in the figures since last year. • In the same way that we created a confidence interval for the differences between two samples, we can also test for such differences. Significance testing • Finally we will also look at the situation where only small samples are available and consider the application of the t-distribution to significance testing; another concept to which we will return in Chapters 15 and 16. 12.1 Significance testing using confidence intervals • Significance testing is concerned with accepting or rejecting ideas. • These ideas are known as hypotheses. • If we wish to test one in particular, we refer to it as the null hypothesis. • The term 'null' can be thought of as meaning no change or no difference from the assertion. • As a procedure, we would first state a null hypothesis; something we wish to judge as true or false on the basis of statistical evidence. 12.1 Significance testing using confidence intervals • We would then check whether or not the null hypothesis was consistent with the confidence interval. • If the null hypothesis is contained within the confidence interval it will be accepted, otherwise, it will be rejected. • A confidence interval can be regarded as a set of acceptable hypotheses. • To illustrate significance testing consider an example from Chapter 11. 12.1 Significance testing using confidence intervals Case study • In the Arbour Housing Survey, it was found that the average monthly rent was £221.25 and that the standard deviation was £89.48, on the basis of 180 responses (see Section 11.3.1). 12.1 Significance testing using confidence intervals The 95% confidence interval was constructed as follows: 12.1 Significance testing using confidence intervals • Now suppose the purpose of the survey was to test the view that the amount paid in monthly rent was about £200. • Given that the confidence interval can be regarded as a set of acceptable hypotheses, the null hypothesis (denoted by H0) would be written as: H0 : µ = £200 • As the value of £200 is not included within the confidence interval the null hypothesis must be rejected. • The values of the null hypothesis that we can accept or reject are shown on Figure 12.2. 12.1 Significance testing using confidence intervals • In rejecting the view that the average rent is £200 we must also accept that there is a chance that our decision could be wrong. • There is a 2.5% chance that the average is less than £208.18 (which would include an average of £200) and a 2.5% chance that the average is greater than £234.32. • As we shall see (Section 12.4) there is a probability of making a wrong decision, which we need to balance against the probability of making the correct decision. 12.2 Hypothesis testing for single samples • Hypothesis testing is merely an alternative name for significance tests, the two being used interchangeably. • This name does stress that we are testing some supposition about the population, which we can write down as a hypothesis. • In fact, we will have two hypotheses whenever we conduct a significance test: one relating to the supposition that we are testing, and one that describes the alternative situation. 12.2 Hypothesis testing for single samples • The first hypothesis relates to the claim, supposition or previous situation and is usually called the null hypothesis and labeled as H0. • It implies that there has been no change in the value of the parameter that we are testing from that which previously existed; for example, if the average spending per week on beer last year among 18-25 year olds was £15.53, then the null hypothesis would be that it is still £15.53. • If it has been claimed that 75% of consumers prefer a certain flavour of icecream, then the null hypothesis would be that the population percentage preferring that flavour is equal to 75%. 12.2 Hypothesis testing for single samples • The null hypothesis could be written out as a sentence, but it is more usual to abbreviate it to: • for a mean where µ0 is the claimed or previous population mean. For a percentage: • where π0 is the claimed or previous population percentage. 12.2 Hypothesis testing for single samples • The second hypothesis summarizes what will be the case if the null hypothesis is not true. • It is usually called the alternative hypothesis (fairly obviously!), and is labelled as HA or as H1 depending on which text you follow: we will use the H1 notation here. • This alternative hypothesis is usually not specific, in that it does not usually specify the exact alternative value for the population parameter, but rather, it just says that some other value is appropriate on the basis of the sample evidence; for example, the mean amount spent is not equal to £15.53, or the percentage preferring this flavour of icecream is not 75%. 12.2 Hypothesis testing for single samples • As before, this hypothesis could be written out as a sentence, but a shorter notation is usually preferred: • for a mean where µ0 is the claimed or previous population mean. • For a percentage: • where π0 is the claimed or previous population percentage. 12.2 Hypothesis testing for single samples • Whenever we conduct a hypothesis test, we assume that the null hypothesis is true while we are doing the test, and then come to a conclusion on the basis of the figures that we calculate during the test. • Since the sampling distribution of a mean or a percentage (for large samples) is given by the Normal distribution, to conduct a test, we compare the sample evidence (sample statistic) with the null hypothesis (what is assumed to be true). • This difference is measured by a test statistic. A test statistic of zero or reasonably near to zero would suggest the null hypothesis is correct. • The larger the value of the test statistic the larger the difference between the sample evidence and the null hypothesis. 12.2 Hypothesis testing for single samples • If the difference is sufficiently large we conclude that the null hypothesis is incorrect and accept the alternative hypothesis. • To make this decision we need to decide on a critical value or critical values. • The critical value or values define the point at which the chance of the null hypothesis being true is at a small, predetermined level, usually 5% or 1% (called the significance level). • In this chapter you will see the z-value being used as a test statistic. • It is more usual to divide the Normal distribution diagram into sections or areas, and to see whether the z-value falls into a particular section; then we may either accept or reject the null hypothesis. 12.2 Hypothesis testing for single samples • However, it is worth noting that some statistical packages just calculate the probability of the null hypothesis being true when you use them to conduct tests. • The reason for doing this is to see how likely or unlikely the result is. • If the result is particularly unlikely when the null hypothesis is true, then we might begin to question whether this is, in fact, the case. • Obviously we will need to define more exactly the phrase 'particularly unlikely' in this context before we can proceed with hypothesis tests. 12.2 Hypothesis testing for single samples • Most tests are conducted at the 5% level of significance, and you should recall that in the normal distribution, the z-values of +1.96 and -1.96 cut off a total of 5% of the distribution, 2.5% in each tail. • If a calculated z-value is between -1.96 and +1.96, then we accept the null hypothesis; if the calculated z-value is below -1.96 or above +1.96, we reject the null hypothesis in favour of the alternative hypothesis. • This situation is illustrated in Figure 12.3. 12.2 Hypothesis testing for single samples • If we were to conduct the test at the 1% level of significance, then the two values used to cut off the tail areas of the distribution would be +2.576 and 2.576. • For each test that we wish to conduct, the basic layout and procedure will remain the same, although some of the details will change, depending on what exactly we are testing. • A proposed layout is given below, and we suggest that by following this you will present clear and understandable significance tests (and not leave out any steps). 12.2 Hypothesis testing for single samples • While the significance level may change, which will lead to a change in the critical values, the major difference as we conduct different types of hypothesis test is in the way in which we calculate the z-value at step 4. 12.2.1 A test statistic for a population mean • In the case of testing for a particular value for a population mean, the formula used to calculate z in step 4 is: • As we saw in Chapter 11, the population standard deviation (σ) is rarely known, and we use the sample standard deviation (s) in its place. • We are assuming that the null hypothesis is true, and so µ=µ0 12.2.1 A test statistic for a population mean • The formula will become: • We are now in a position to carry out a test. 12.2.1 A test statistic for a population mean Example • A production manager claims that an average of 50 boxes per hour are filled with finished goods at the final stage of a production line. • A random sampling of 48 different workers, at different times, working at the end of identical production lines shows an average number of boxes filled as 47.5 with a standard deviation of 0.7 boxes. • Does this evidence support the assertion by the production manager at the 5% level of significance? 12.2.1 A test statistic for a population mean • Step 1: The null hypothesis is based on the production manager's assertion: 12.2.1 A test statistic for a population mean • Although there is only a 2.5 difference between the claim and the sample mean in this example, the test shows that the sample result is significantly different from the claimed value. • In all hypothesis testing, it is not the absolute difference between the values which is important, but the number of standard errors that the sample value is away from the claimed value. • Statistical significance should not be confused with the notion of business significance or importance. 12.2.2 A test for a population percentage • Here the process will be identical to the one followed above, except that the formula used to calculate the zvalue will need to be changed. • (You should recall from Chapter 11 that the standard error for a percentage is different from the standard error for a mean.) • The formula will be: where p is the sample percentage and Π0 is the claimed population percentage (remember that we are assuming that the null hypothesis is true). 12.2.2 A test for a population percentage Example • An auditor claims that 10% of invoices for a company are incorrect. • To test this claim a random sample of 100 invoices are checked, and 12 are found to be incorrect. • Test, at the 5% significance level, if the auditor's claim is supported by the sample evidence. 12.2.2 A test for a population percentage 12.2.2 A test for a population percentage • Step 5: The calculated value falls between the two critical values. • Step 6: We therefore cannot reject the null hypothesis. • Step 7: The evidence from the sample is consistent with the auditor's claim that 10% of the invoices are incorrect. 12.2.2 A test for a population percentage • In this example, when the calculated value falls between the two critical values. • The answer is to not reject the claim of 10% rather than to accept the claim. • This is because we only have the sample evidence to work from, and we are aware that it is subject to sampling error. • The only way to firmly accept the claim would be to check every invoice (i.e. carry out a census) and work out the actual population percentage which are incorrect. • When we calculated a confidence interval for the population percentage (in Section 11.4) we used a slightly different formulation of the sampling error. 12.2.2 A test for a population percentage • There we used • because we did not know the value of the population percentage; we were using the sample percentage as our best estimate, and also as the only available value. • When we come to significance tests, we have already made the assumption that the null hypothesis is true (while we are conducting the test), and so we can use the hypothesized value of the population percentage in the calculations. 12.2.2 A test for a population percentage • The formula for the sampling error will thus be • In many cases there would be very little difference in the answers obtained from the two different formulae, but it is good practice (and shows that you know what you are doing) to use the correct formulation. 12.2.2 A test for a population percentage Case study • It has been claimed on the basis of census results that 87% of households in Tonnelle now have exclusive use of a fixed bath or shower with a hot water supply. • In the Arbour Housing Survey of this area, 246 respondents out of the 300 interviewed reported this exclusive usage. • Test at the 5% significance level whether this claim is supported by the sample data. 12.2.2 A test for a population percentage 12.2.2 A test for a population percentage • Step 5: The calculated value falls below the critical value of -1.96. • Step 6: We therefore reject the null hypothesis at the 5% significance level. • Step 7: The sample evidence does not support the view that 87% of households in the Tonnelle area have exclusive use of a fixed bath or shower with a hot water supply. • Given a sample percentage of 82% we could be tempted to conclude that the percentage was lower - the sample does suggest this, but the test was not structured in this way and we must accept the alternative hypothesis that the population percentage is not likely to be 87%. • We will consider this issue again when we look at one-sided tests. 12.3 One-sided significance tests • The significance tests shown in the last section may be useful if all we are trying to do is test whether or not a claimed value could be true, but in most cases we want to be able to go one step further. • In general, we would want to specify whether the real value is above or below the claimed value in those cases where we are able to reject the null hypothesis. 12.3 One-sided significance tests • One-sided tests will allow us to do exactly this. • The method employed, and the appropriate test statistic which we calculate, will remain exactly the same; it is the alternative hypothesis and the interpretation of the answer which will change. • Suppose that we are investigating the purchase of cigarettes, and know that the percentage of the adult population who regularly purchased last year was 34%. • If a sample is selected, we do not want to know only whether the percentage purchasing has changed, but rather whether it has decreased (or increased). • Before carrying out the test it is necessary to decide which of these two propositions you wish to test. 12.3 One-sided significance tests • If we wish to test whether or not the percentage has decreased, then our hypotheses would be: where π0 is the actual percentage in the population last year. • This may be an appropriate hypothesis test if you were working for a health lobby. 12.3 One-sided significance tests • If we wanted to test if the percentage had increased, then our hypotheses would be: • This could be an appropriate test if you were working for a manufacturer in the tobacco industry and were concerned with the effects of improved packaging and presentation. 12.3 One-sided significance tests • To carry out the test, we will want to concentrate the chance of rejecting the null hypothesis at one end of the Normal distribution. • Where the significance level is 5%, then the critical value will be -1.645 (i.e. the cut off value taken from the Normal distribution tables) for the hypotheses H0: π = π0 , H1: π < π0; and +1.645 for the hypotheses H0: π = π0, H1 : π > π0. (Check these figures from Appendix C.) • Where the significance level is set at 1%, then the critical value becomes either -2.33 or +2.33. • In terms of answering examination questions, it is important to read the wording very carefully to determine which type of test you are required to perform. 12.3 One-sided significance tests • The interpretation of the calculated z-value is now merely a question of deciding into which of two sections of the Normal distribution it falls. 12.3.1 A one-sided test for a population mean • Here the hypotheses will be in terms of the population mean, or the claimed value. • Consider the following example. 12.3.1 A one-sided test for a population mean • • • • • Example A manufacturer of batteries has assumed that the average expected life is 299 hours. As a result of recent changes to the filling of the batteries, the manufacturer now wishes to test if the average life has increased. A sample of 200 batteries was taken at random from the production line and tested. Their average life was found to be 300 hours with a standard deviation of 8 hours. You have been asked to carry out the appropriate hypothesis test at the 5% significance level. 12.3.1 A one-sided test for a population mean (Note that we are still assuming the null hypothesis to be true while the test is conducted.) 12.3.1 A one-sided test for a population mean • Step 5: The calculated value is larger than the critical value. • Step 6: We may therefore reject the null hypothesis. • (Note that had we been conducting a two-sided hypothesis test, then we would have been unable to reject the null hypothesis, and so the conclusion would have been that the average life of the batteries had not changed.) • Step 7: The sample evidence supports the supposition that the average life of the batteries has increased by a significant amount. 12.3.1 A one-sided test for a population mean • Although the significance test has shown that there has been a significant increase in the average length of life of the batteries, this may not be an important conclusion for the manufacturer. • For instance, it would be quite misleading to use it to back up an advertising campaign which claimed that 'our batteries now last even longer!'. • Whenever hypothesis tests are used it is important to distinguish between statistical significance and importance. • This distinction is often ignored. 12.3.1 A one-sided test for a population mean • • • • • Case study It has been argued that, because of the types of property and the age of the population in Tonnelle, average mortgages are likely to be around £200 a month. However, the Arbour Housing Trust believe the figure is higher because many of the mortgages are relatively recent and many were taken out during the house price boom. Test this proposition, given the result from the Arbour Housing Survey that 100 respondents were paying an average monthly mortgage of £254. The standard deviation calculated from the sample is £72.05 (it was assumed to be £70 in an earlier example). Use a 5% significance level. 12.3.1 A one-sided test for a population mean • The test statistic value of 7.49 is greater than the critical value of +1.645. • On this basis, we can reject the null hypothesis that monthly mortgages are around £200 in favour of the alternative that they are likely to be more. 12.3.1 A one-sided test for a population mean • A test provides a means to clarify issues and resolve different arguments. • In this case the average monthly mortgage payment is higher than had been argued but this does not necessarily mean that the reasons put forward by the Arbour Housing Trust for the higher levels are correct. • In most cases, further investigation is required. • It is always worth checking that the sample structure is the same as the population structure. 12.3.1 A one-sided test for a population mean • In an area like Tonnelle we would expect a range of housing and it can be difficult for a small sample to reflect this. • The conclusions also refer to the average. • The pattern of mortgage payments could vary considerably between different groups of residents. • It is known that the Victorian housing is again attracting a more affluent group of residents, and they could have the effect of pulling the mean upwards. • Finally, the argument for the null hypothesis could be based on dated information, e.g. the last census. 12.3.2 A one-sided test for a population percentage • Here the methodology is exactly the same as that employed above, and so we just provide two examples of its use. Example • A small political party expects to gain 20% of the vote in by-elections. • A particular candidate from the party is about to stand in a by-election in Derbyshire South East, and has commissioned a survey of 200 randomly selected voters in the constituency. • If 44 of those interviewed said that they would vote for this candidate in the forthcoming by-election, test whether this would be significantly above the national party's claim. • Use a test at the 5% significance level. 12.3.2 A one-sided test for a population percentage 12.3.2 A one-sided test for a population percentage • Here there may not be evidence of a statistically significant vote above the national party's claim (a sample size of 1083 would have made this 2% difference statistically significant), but if the candidate does manage to achieve 22% of the vote, and it is a close contest over who wins the seat between two other candidates, then the extra 2% could be very important. 12.3.2 A one-sided test for a population percentage • • • • • Example A fizzy drink manufacturer claims that over 40% of teenagers prefer its drink to any other. In order to test this claim, a rival manufacturer commissions a market research company to conduct a survey of 200 teenagers and ascertain their preferences in fizzy drinks. Of the randomly selected sample, 75 preferred the first manufacturers drink. Test at the 5% level if this survey supports the claim. Sample percentage is: 12.3.2 A one-sided test for a population percentage 12.3.2 A one-sided test for a population percentage • • • • Case study The Arbour Housing Survey was an attempt to describe local housing conditions on the basis of responses to a questionnaire. Many of the characteristics will be summarized in percentage terms, e.g. type of property, access to baths, showers, flush toilets, kitchen facilities. Such enquiries investigate whether these percentages have increased or decreased over time, and whether they are higher or lower than in other areas. If this increase or decrease, or higher or lower value is specified in terms of an alternative hypothesis, the appropriate test will be one-sided. 12.3.3 Producer’s risk and consumers’ risk • One-sided tests are sometimes referred to as testing producers' risks or consumers' risks. • If we are looking at the amount of a product in a given packet or lot size, then the reaction to variations in the amount may be different from the two • groups. • Say, for instance, that the packet is sold as containing 100 items. • If there are more than 100 items per packet, then the producer is effectively giving away some items, since only 100 are being charged for. 12.3.3 Producer’s risk and consumers’ risk • The producer, therefore, is concerned not to overfill the packets but still meet legal requirements. • In this situation, we might presume that the consumer is quite happy, since some items come free of charge. • In the opposite situation of less than 100 items per packet, the producer is supplying less than 100 but being paid for 100 (this, of course, can have damaging consequences for the producer in terms of lost future sales), while the consumers receive less than expected, and are therefore likely to be less than happy. • Given this scenario, one would expect the producer to conduct a onesided test using H1: µ > µ0, and a consumer group to conduct a one-sided test using H1: µ < µ0. • There may, of course, be legal implications for the producer in selling packets of 100 which actually contain less than 100. • (In practice most producers will, in fact, play it safe and aim to meet any minimum requirements.) 12.4 Types of error • We have already seen, throughout this section of the book, that samples can only give us a partial view of a population; there will always be some chance that the true population value really does lie outside of the confidence interval, or that we will come to the wrong decision when conducting a significance test. • In fact these probabilities are specified in the names that we have already used - a 95% confidence interval and a test at the 5% level of significance. • Both imply a 5% chance of being wrong. 12.4 Types of error • You might want to argue that this number is too big, since it gives a 1 in 20 chance of making the wrong decision, but it has been accepted in both business and social research over many years. • One reason is that we are dealing with a situation where we rely on people telling us the 'truth' which they may not always do! • Using a smaller number, say 1%, would also imply a spurious level of accuracy in our results. • (Even in medical research, the figure of 5% may be used.) 12.4 Types of error • If you consider significance tests a little further, however, you will see that there are, in fact, two different ways of getting the wrong answer. You could throw out the claim when it is, in fact, true; or you could fail to reject it when it is, in fact, false. • It becomes important to distinguish between these two types of error. • As you can see from Table 12.1, the two different types of error are referred to as Type I and Type II. 12.4 Types of error • Table 12.1 Possible results of a hypothesis test • A Type I error is the rejection of a null hypothesis when it is true. • The probability of this is known as the significance level of the test. • This is usually set at either 5% or 1% for most business applications, and is decided on before the test is conducted. • A Type II error is the failure to reject a null hypothesis which is false. • The probability of this error cannot be determined before the test is conducted. 12.4 Types of error • Consider a more everyday situation. • You are standing at the curb, waiting to cross a busy main road. • You have four possible outcomes: • If you decide not to cross at this moment, and there is something coming, then you have made a correct decision. • If you decide not to cross, and there is nothing coming, you have wasted a little time, and made a Type II error. • If you decide to cross and the road is clear, you have made a correct decision. • If you decide to cross and the road isn't clear, as well as the likelihood of injury, you have made a Type I error. • The error which can be controlled is the Type I error; and this is the one which is set before we conduct the test. 12.4.1 Background theory 12.4.1 Background theory • Consider the sampling distribution of z (which we calculate in step 4), consistent with the null hypothesis, H0: µ = µ0 illustrated as Figure 12.5. • The two values for the test-statistic, A and B, shown in Figure 12.5, are both possible, with B being less likely than A. • The construction of this test, with a 5% significance level, would mean the acceptance of the null hypothesis when A was obtained and its rejection when B was obtained. • Both values could be attributed to an alternative hypothesis, H1: µ=µ1, which we accept in the case of B and reject for A. 12.4.1 Background theory • It is worth noting that if the test statistic follows a sampling distribution we can never be certain about the correctness of our decision. • What we can do, having fixed a significance level (Type I error), is construct the rejection region to minimize Type II error. • Suppose the alternative hypothesis was that the mean for the population, µ, was not µ0 but a larger value µ1. 12.4.1 Background theory • This we would state as: • H1 : µ= µ1, where µ1 > µ0 or, in the more general form: • H1 : µ > µ0 . • If we keep the same acceptance and rejection regions as before (Figure 12.5), the Type II error is as shown in Figure 12.6. • Figure 12.6 The Type II error resulting from a two-tailed test. 12.4.1 Background theory • As we can see, the probability of accepting H0 when H1 is correct can be relatively large. • If we test the null hypothesis H0: µ = µ0 against an alternative hypothesis of the form H1: µ < µ0 or H1: µ > µ0 , we can reduce the size of the Type II error by careful definition of the rejection region. • If the alternative hypothesis is of the form H1: µ < µ0, a critical value of z = -1.645 will define a 5% rejection region in the left-hand tail, and if the alternative hypothesis is of the form H1: µ > µ0 a critical value of z = 1.645, will define a 5% rejection region in the righthand tail. • The reduction in Type II error is illustrated in Figure 12.7. 12.4 Types of error 12.4 Types of error • It can be seen from Figure 12.7 that the test statistic now rejects the null hypothesis in the range 1.6451.96 as well as values greater than 1.96. • If we construct one-sided tests, there is more chance that we will reject the null hypothesis in favour of a more radical alternative that the parameter has a larger value (or has a smaller value) than specified when that alternative is true. • Note that if the alternative hypothesis was of the form H1: µ < µ0, the rejection region would be defined by the critical value z = -1.645 and we would reject the null hypothesis if the test statistic took this value or less. 12.5 Hypothesis testing with two samples • All of the tests so far have made comparisons between some known, or claimed, population value and a set of sample results. • In many cases we may know little about the actual population value, but will have available the results of another survey. • Here we want to find if there is a difference between the two sets of results. • This could be a situation where we have two surveys conducted in different parts of the country, or at different points in time. • Our concern now is to determine if the two sets of results are consistent with each other (come from the same parent population) or whether there is a difference between them. 12.5 Hypothesis testing with two samples • Such comparisons may be important in a marketing context, for instance, looking at whether or not there are different perceptions of the product in different regions of the country. • They could be used in performance appraisal of employees to answer questions on whether the average performance in different regions is, in fact, different. • They could be used in assessing a forecasting model, to see if the values have changed significantly from one year to the next. 12.5 Hypothesis testing with two samples • Again we can use the concepts and methodology developed in previous sections of this chapter. • Tests may be two-sided to look for a difference, or onesided to look for a significant increase (or decrease). • Similarly, tests may be carried out at the 5% or the 1% level, and the interpretation of the results will depend upon the calculated value of the test statistic and its comparison to a critical value. • It will be necessary to rewrite the hypotheses to take account of the two samples and to find a new formulation for the test statistic (step 4 of the process). 12.5.1 Tests for a difference of means • In order to be clear about which sample we are talking about, we will use the suffixes 1 and 2 to refer to sample 1 and sample 2. • Our assumption is that the two samples do, in fact, come from the same population, and thus the population means associated with each sample result will be the same. • This assumption will give us our null hypothesis: 12.5.1 Tests for a difference of means • The alternative hypothesis will depend on whether we are conducting a two-sided test or a one-sided test. • If it is a two-sided test, the alternative hypothesis will be: • and if it is a one-sided test, it would be either 12.5.1 Tests for a difference of means • We could also test whether the difference between the means could be assigned to a specific value, for instance that the difference in take home pay between two groups of workers was £25; here we would use the hypotheses: • The test statistic used is closely related to the formula developed in Section 11.5.1 for the confidence interval for the difference between two means. 12.5.1 Tests for a difference of means • It will be: • but if we are using a null hypothesis which says that the two population means are equal, then the second bracket on the top line will be equal to zero, and the formula for z will be simplified to: 12.5.1 Tests for a difference of means • The example below illustrates the use of this test. Example • A union claims that the standard of living for its members in the UK is below that of employees of the same company in Spain. • A survey of 60 employees in the UK showed an average income of £895 per week with a standard deviation of £120. • A survey of 100 workers in Spain, after making adjustments for various differences between the two countries and converting to sterling, gave an average income of £914 with a standard deviation of £ 90. • Test at the 1% level if the Spanish workers earn more than their British counterparts. 12.5.1 Tests for a difference of means 12.5.2 Tests for a difference in population percentage • Again, adopting the approach used for testing the difference of means, we will modify the hypotheses, using π instead of µ for the population values, and redefine the test statistic. • Both one- and two-sided tests may be carried out. • The test statistic will be: • but if the null hypothesis is π1 – π2 = 0, then this is simplified to: 12.5.2 Tests for a difference in population percentage Example • A manufacturer believes that a new promotional campaign will increase favourable perceptions of the product by 10%. • Before the campaign, a random sample of 500 consumers showed that 20% had favourable attitudes towards the product. • After the campaign, a second random sample of 400 consumers found favourable reactions amongst 28%. 12.5.2 Tests for a difference in population percentage • Use an appropriate test at the 5% level of significance to find if there has 12.6 Hypothesis testing with small samples • As we saw in Chapter 11, the basic assumption that the sampling distribution for the sample parameters is a Normal distribution only holds when the samples are large (n > 30). • Once we turn to small samples, we need to use a different sampling distribution: the t-distribution (see Section 11.7). • Apart from this difference, which implies a change to the formula for the test statistic (step 4 again!) and also to the table of critical values, all of the tests developed so far in this chapter may be applied to small samples. 12.6.1 Single samples • For a single sample, the test statistic for a population mean is calculated by using: with (n - 1) degrees of freedom. • For a single sample, the test statistic for a population percentage will be calculated using: • and, again there will be (n - 1) degrees of freedom. • (Note that, as with the large sample test, we use the null hypothesis value of the population percentage in the formula.) | • Below are two examples to illustrate the use of the tdistribution in hypothesis testing. 12.6.1 Single samples • • • • Example A lorry manufacturer claims that the average annual maintenance cost for its vehicles is £500. The maintenance department of one of their customers believes it to be higher, and to test this randomly selects a sample of six lorries from their large fleet. From this sample, the mean annual maintenance cost was found to be £555, with a standard deviation of £75. Use an appropriate hypothesis test, at the 5% level, to find if the manufacturer's claim is valid. 12.6.1 Single samples 12.6.1 Single samples • • • • Example A company had a 50% market share for a newly developed product last year. It believes that as more entrants start to produce this type of product, its market share will decline, and in order to monitor this situation, decides to select a random sample of 15 customers. Of these, six have bought the company's product. Carry out a test at the 5% level to find if there is sufficient evidence to suggest that their market share is now below 50%. 12.6.1 Single samples 12.6.1 Two samples • Where we have two samples, we will again use similar tests to Section 12.5 but with formulae developed using the sampling errors developed in Chapter 11. • In the case of estimating a confidence interval for the difference between two means the pooled standard error (assuming that both samples have similar variability) was given by: 12.6.1 Two samples • Using this, we have the test statistic: 12.6.1 Two samples • • • • • • Example A company has two factories, one in the UK and one in Germany. The head office staff feel that the German factory is more efficient than the British one, and to test this select two random samples. The British sample consists of 20 workers who take an average of 25 minutes to complete a standard task. Their standard deviation is 5 minutes. The German sample has 10 workers who take an average of 20 minutes to complete the same task, and the sample has a standard deviation of 4 minutes. Use an appropriate hypothesis test, at the 1% level, to find if the German workers are more efficient. 12.6.1 Two samples 12.6.3 Matched pairs • A special case arises when we are considering tests of the difference of means if the two samples are related in such a way that we may pair the observations. • This may arise when two different people assess the same series of situations, for example different interviewers assessing the same group of candidates for a job. • The example below illustrates the situation of matched pairs. 12.6.3 Matched pairs Example • Seven applicants for a job were interviewed by two personnel officers who were asked to give marks on a scale of 1 to 10 to one aspect of the candidates' performance. • A summary of the marks given is shown below. Marks given to candidate: I II III IV V VI VII Interviewer A 8 7 6 9 7 5 8 Interviewer B 7 4 6 8 5 6 7 12.6.3 Matched pairs 12.6.3 Matched pairs Interviewer Interviewer Difference (X) A B 8 7 1 7 4 3 6 6 0 9 8 1 7 5 2 5 6 -1 8 7 1 7 (x-x)2 0 4 1 0 1 4 0 10 12.6.3 Matched pairs 12.6.3 Matched pairs • The use of such matched pairs is common in market research and psychology. • If, for reasons of time and cost, we need to work with small samples, results can be improved using paired samples since it reduces the between-sample variation. 12.7 Conclusions • The ideas developed in this chapter allow us to think about statistical data as evidence, if we consider the practice of law as an analogy, then a set of circumstances require a case to be made, an idea is proposed (that of innocence), an alternative is constructed (that of guilt) and evidence is construed to arrive at a decision. • In statistical testing, a null hypothesis is proposed, usually in terms of no change or no difference, an alternative is constructed and data is used to accept or reject the null hypothesis. • Strangely, perhaps, these tests are not necessarily acceptable in law. 12.7 Conclusions • Given the size of many data sets, we could propose many hypotheses and soon begin to find that we were rejecting the more conservative null hypotheses and accepting the more radical alternatives. • This does not, in itself, mean that there are differences or that we have identified change or that we have followed good statistical practice. • We need to accept that with a known probability (usually 5%), we can reject the null hypothesis when it is correct - Type I error. 12.7 Conclusions • We also need to recognize that if we keep looking for relationships we will eventually find some. • Testing is not merely about meeting the technical requirements of the test but constructing hypotheses that are meaningful in the problem context. • In an ideal world, before the data or evidence was collected, the problem context would be clarified and meaningful hypotheses specified. 12.7 Conclusions • It is true that we can discover the unexpected from data and identify important and previously unknown relationships. • The process of enquiry or research is therefore complex, requiring a balance between the testing of ideas we already have and allowing new ideas to emerge. • It should also be noted that a statistically significant result may not have business significance. • It could be that statistically significant differences between regions, for example, were acceptable because of regional differences in product promotion and competitor activity. 12.7 Conclusions • Since we are inevitably dealing with sample data when we conduct tests of significance, we can never be 100% sure of our answer - remember Type I and Type II error (and other types of error). • What significance testing does do is provide a way of quantifying error and informing judgement.