Hypothesis testing Chapter 6 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Sampling and inference 1) 2) 3) 4) 5) Exploit the probabilistic nature of sampling to show how statements on the population parameters can be based on sample estimates Consider the degree of uncertainty (level of confidence) in such an operation Build a confidence interval, which is a range of values around the sample mean which is expected to include the true population mean at a given probability (confidence) level Compute the probability associated with a given statement on the population parameters (or parameters from different population) Decide whether the statement is false depending on its probability level (hypothesis testing) Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 2 Direct and indirect problems • Direct problem – If we knew the population mean and the standard error then we would know the exact probability of any sample mean • Indirect problem – • Confidence interval: Given the sample mean, we can exploit the normal probability curve to find a range of value which will contain the true population mean at a given confidence level • Hypothesis testing: Given the sample mean, we can exploit the normal probability curve to check the probability of an hypothesis on the true mean – then we can decide whether to discard or retain that hypothesis Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 3 Example • Extract a simple random sample of 100 units out of a population of 400 pub customers • Measure the average expenditure on beer on a sample • The number of potential samples is huge (2.24*1096) and we observe only one sample • However, if extraction is random, the probability distribution of all sample means is a normal curve centered around the true population mean Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 4 The normal distribution of the sample means • Once more the normal distribution... • The central point is the true population mean and is the most likely sample mean • The larger the standard error the flatter the curve 95% of probability 2.5% of probability 2.5% of probability Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi • 95% of all possible sample means fall within a range of about two standard errors from the mean (1.96 to be precise) 5 Indirect problem • We have only one sample mean (x-bar) • 95% of sample means fall within the range [ 1.96 ; 1.96 ] • If 100% of the sample means were in that range we could state with certainty that the true mean falls in the range [ x 1.96 ; x 1.96 ] • However, we can affirm that only for 95% of the sample means, thus we refer to a 95% confidence level Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 6 Confidence level x 1.96 95% of probability x 1.96 2.5% of probability 2.5% of probability If the singleONLY sample we extractMEANS falls inFALL this range, the– HOWEVER, 95%mean OF THE SAMPLE IN THISthen RANGE true population THUS WE CAN STATE meanTHAT is also THEincluded TRUE POPULATION in the interval MEAN between IS INCLUDED THE AND x 1.96BETWEEN x 1.96IN x 1.96 x 1.96 . andINTERVAL In fact, even ifwe get one of the . WITH A 95%itCONFIDENCE extremes, will still be atLEVEL a distance of 1.96 from Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi Confidence levels and critical values • The critical value 1.96 corresponds to a probability of 95%, but it is possible to know exact critical values for any confidence level • For example, if we want a 99% confidence level, the critical value based on the normal curve is 2.58 • Most packages including Microsoft Excel allow computation of critical values Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 8 A further complication • The standard error of the mean is unknown (sample estimate) • This adds some further uncertainty on the top of the sampling error • Hence, we use an approximation of the normal distribution which is more conservative, is flatter and assigns higher probabilities to the tails (extreme values) compared to the normal distribution • This distribution is the so-called Student-t distribution and its critical values are different from those of the normal distribution Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 9 The Student-t distribution 0.4 Normal distribution t(20) Dotted lines show the Student t distribution with different degrees of freedom (1, 5 and 20) t(5) 0.35 0.3 t(1) The bold line represents the standard normal distribution (with mean zero and standard deviation equal to one). 0.25 0.2 0.15 The degrees of freedom are equal to the sample size minus one. 0.1 0.05 0 -4 -3 -2 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi -1 0 1 2 3 4 10 Critical values and sample size Level of confidence 99% 95% 90% Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi Student t values Normal value t according to sample size z 10 20 30 40 3.17 2.23 1.81 2.85 2.09 1.72 2.75 2.04 1.70 2.70 2.02 1.68 2.58 1.96 1.64 11 How to build a confidence interval from a sample 1. Compute the sample mean x-bar and standard deviation s 2. Estimate the standard error of the mean sx 3. Choose a confidence level a 4. Choose the appropriate coefficient for critical values (see previous slide), using the Student t approximation instead of the Normal (z) value if the sample size is below fifty 5. Compute the lower and upper confidence limits as: x za / 2sx ; x za / 2sx x ta / 2sx ; x ta / 2sx Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi for the Normal distribution or for the Student t approximation 12 SPSS confidence intervals Click here Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 13 Confidence interval Descriptives In a typical week how much do you spend on fresh or frozen chicken (Euro)? Mean 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi Lower Bound Upper Bound Statistic 5.6677 5.2817 6.0537 5.2958 5.0000 17.089 4.13383 .00 30.00 30.00 4.50 2.084 8.005 Std. Error .19640 Boundaries of the c.i. .116 .231 14 Hypothesis testing (HT) • HT is a form of inference • HT is a statistical tool to decide whether to reject or not a statement about the target population on the basis of statistics computed on sample • This is only possible if the sampling distribution is known! Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 15 Example (Trust data-set) • The sample mean expenditure for chicken is £ 5.67 • Considering only the sub-sample of those belonging to consumer organizations the mean is £ 5.04 • May we safely conclude that those who belong to consumer organizations spend less? • This depends on the precision of the estimates. • the upper limit of the 99% confidence interval of the sub-sample of respondents associated to consumer organizations is 6.84 • the upper limit of the confidence interval for the overall sample is 6.18 • There is a chance that those who belong to consumer organization actually spend more than other people • Statistical tests based on the sample help in deciding whether the hypothesis should be rejected; for example, that there is no difference between respondents who belong to a consumer organization and those who don’t. Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 16 Hypothesis testing • The difference between the two means follows a known statistical distribution • If the hypothesis of mean equality is true, the difference between the two means should be zero • The actual difference is it is £ 0.63 (£ 5.67 minus £ 5.04), but this difference may be generated by sampling error • When no difference exists at the population level, what is the probability that a sample with a difference of £ 0.63 is extracted? • If the probability is very high (say 90%) then it is wise not to reject the hypothesis to equality • Instead, a very low probability (e.g. 0.001%) means that it is very unlikely that the difference is due sampling error, so one should choose to reject the null hypothesis and conclude that belonging to a consumer organization actually makes a difference on chicken expenditure. Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 17 The hypotheses • The initial hypothesis is called the null hypothesis, denoted by H0 • Contextually, the researcher sets an alternative hypothesis (H1) which is complementary to H0 and remains valid if H0 is rejected. • Two-tailed tests are those where the alternative hypothesis can go in either direction • When the alternative hypothesis is formulated in a unique direction (for example >6), the test is one-tailed Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 18 Significance and confidence • The threshold probability level below which the null hypothesis is rejected is the significance level arbitrarily set by the researcher (usually at the 5% or 1% level). • It is denoted by a and its complementary 1-a is the confidence level of a test • The smaller the significance level the smaller the rejection region, (which is the set of values that leads to rejection of the hypothesis) and the larger is the acceptance region, (which is the set of values that lead to non-rejection of the hypothesis) Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 19 Errors confidence and power The statistical power of a test is the probability of correctly rejecting the null hypothesis when it is false and it is equal to 1-b. It can be estimated on sample data and depends on • sample size • significance level a • effect size (a measure of “how wrong” is the null hypothesis) When the null hypothesis is not rejected and power is above 80% then the conclusion is usually regarded as robust. Reject H0 Non-reject H0 H0 is true Type I Error prob. =significance level (a) Correct prob. = confidence level (1-a) H0 is false, H1 is true Correct prob. = power level (1-b) Type II Error prob. = b Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 20 Hypothesis testing 1. Formulate the null hypothesis (H0) and the alternative hypothesis (H1) 2. Determine the distribution of the sample test statistic under H0 3. Choose a significance level a (i.e. a confidence level 1-a) 4. Compute the sample test statistic and its probability level (p-value) 5. When p-value<a, then reject H0 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 21 One mean test • The null hypothesis is made on the population mean • Example: • UK Department of Environment, Food and Rural Affairs (DEFRA) figure on average weekly household chicken consumption: 1.15 Kg • UK household average in the Trust data-set: 1.75 Kg on 92 respondents • The DEFRA figure is likely to be more representative; we test the hypothesis that the population mean is actually 1.15 Null H0: = 1.15 versus the alternative H1: ≠ 1.15 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 22 Probability distribution • As for confidence intervals, the sample means probabilities follow the normal distribution • Under the null hypothesis H0, the sample mean is to be distributed as a normal with mean 1.15 and unknown variance • The sample standard error of the mean is 0.28 • With a sample size of 92 we can use the normal curve (with less than 50 observations we should have referred to the Student t distribution) • We set the significance level a=0.05 (which means that the confidence level is 95%) Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 23 Testing the hypothesis on a mean (unknown variance) • The test statistic is built as follows: Sample mean x 0 t= sx Population mean under H0 Under H0 the standardized test statistic follows the Standard Normal distribution Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi Standard error of the mean (estimated through the sample) 24 Critical values The test statistic is the sample mean, which need to be standardized: x 1.75 1.15 z= = = 2.14 sx 0.28 • This value needs to be compared with the critical values which separate the acceptance and the rejection regions • This is a two-tailed test, since the alternative hypothesis is formulated in both direction and since ≠ 1.15 holds for either < 1.15 or > 1.15. • Critical values are obtained like for confidence intervals: -za/2 defines the left rejection region (negative values < 2.5% probability) +za/2 defines the right rejection region (positive values < 2.5% probability) • For a 5% significance level (or a 95% confidence level): –z0.025=-1.96 and +z0.025=+1.96 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 25 Acceptance and rejection areas The test statistic lies in the rejection area Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 26 Hypothesis testing in SPSS Test value here Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 27 SPSS output One-Sample Test Test Value = 1.15 t In a typical week how much fresh or frozen chicken do you buy for your household consumption (Kg.)? 2.183 df 91 Sig. (2-tailed) Mean Difference .032 .60152 95% Confidence Interval of the Difference Lower Upper .0541 1.1490 p value The null hypothesis is rejected (as the p-value is smaller than 0.05) • We reject the null hypothesis that the average weekly consumption is Kg. 1.15 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 28 One sided hypothesis (one tailed test) • We want to test whether average consumer evaluation of animal welfare is larger than 4.9 (on a 7-point scale) • It is convenient to formulate the hypothesis as follows: H 0 : 4.9 H1 : 4.9 • This is an one tailed test, as the alternative hypothesis is expressed directionally: the rejection area lies on the right of the critical value (all values on the left are consistent with the null hypothesis) Sample mean: 5.01 5.01 4.90 z= = 1.49 Standard deviation: 1.65 0.074 Standard error of the mean: 0.074 • instead of two critical values with a/2=0.025 as in the two-tailed test, we only require a single critical value for a=0.05 (which corresponds to the a=0.10 two-tailed critical values), which is 1.64 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 29 One-tailed test The test statistic lies in the acceptance area Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 30 One mean one-tailed test in SPSS • Exactly like two-tailed tests, but when interpreting the output: – One should only consider values larger than the positive critical value for rejection (or smaller of the negative critical value if the null direction is >) – To get the correct critical value one should consider the a one instead of a/2 (for example, z0.05 rather than z0.25) – Thus, the critical value for a 5% significance level in a one-tailed test corresponds to the critical value for a 10% two-tailed test. For example z0.05=1.64 – Similarly, when looking at the two-tailed test p-value in SPSS, one should consider a “double” threshold, i.e. reject the null at the 95% level when p>0.10 (since SPSS always assumes two-tailed tests) Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 31 SPSS output and one-sided test One-Sample Test Test Value = 4.9 Animal welfare t 1.545 df 496 Sig. (2-tailed) .123 Mean Difference .114 95% Confidence Interval of the Difference Lower Upper -.03 .26 • The null hypothesis is not rejected at the 5% significance level because Sig>0.10 • Note that differently from two two-tailed test here the threshold is twice the chosen significance level. Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 32 Test on two means • Tests of equality on two means follow directly from the single-mean test • The difference of two normally distributed sample means is still a normally distributed, provided that the samples are unrelated or paired • The mean of the difference distribution is zero under the null hypothesis of mean equality Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 33 Unrelated samples • The sampled units belong to different populations and are randomly extracted from each of the population. The key condition is that the sampled units are randomly assigned to the two groups • This excludes the case where: a) a single sample is drawn b) the units are subdivided into two groups according to some variable (gender, living in urban versus rural areas, etc.) c) the sampling process might have some selection bias creating dependency between the two groups • Most social studies consider the samples to be unrelated if the units are randomly extracted and the groups are mutually exclusive. Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 34 Related and paired samples • In related samples he same subjects may belong to both groups. • For example, if the same individual is interviewed in two waves, the two samples are said to be related. • Two sub-sets from the same sample are generally related (not for stratified sampling) • Paired samples are a special case where exactly the same units appear in both samples • In this case it is possible to compute the difference for each of the sampled units and the result corresponds to a single sample • Matched samples are artificially paired samples, where the two samples are matched according to some characteristics. Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 35 Example (Trust data-set) • 71% of respondents are female • the targeted population are people in charge of food purchases; males and females in the Trust data-set are associated by the fact that they are responsible for food purchases • This excludes those females and males who are not; any gender comparison is conditional on this external factor • We could not be conclusive in testing – for example – whether males like chicken more than females Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 36 Test for unrelated samples • Example (Trust data-set) • Italian versus UK respondents (extracted independently) • Does the attitude towards chicken (question q9, “In my household we like chicken”) differ between the two countries? H0: UK = ITA Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi vs. H1: UK ≠ ITA 37 Unrelated samples • The two means are normally distributed • Under the null hypothesis their difference is also normally distributed • Consider the difference variable D = UK - ITA • The test becomes identical to the one mean test for D = 0 • However, a measure of the standard error for D is necessary Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 38 Standard error for mean comparison testing • In the (unlikely) event that the true standard errors are known the joint standard error is: = 1 2 12 n1 22 n2 • Everything proceeds as for the one mean test, thus the test statistic is: x x t= 1 1 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 2 2 39 Test statistics (unknown standard errors) • With unknown but equal standard errors, the test statistics is t= x1 x2 1 1 sx n1 n2 • Given that the standard error is estimated, additional uncertainty requires the use of the t distribution with n1+ n2-2 degrees of freedom • With different standard errors the test statistic is x1 x2 z= sx2 sx2 n1 n2 • This statistic can only be applied to large samples and the standard normal distribution is the reference 1 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 2 40 How to decide whether the standard errors are equal or different? • There are appropriate hypothesis tests for the equality of two variances (discussed later) • SPSS shows the p-value for the Levene’s test (Brown and Forsythe, 1974) • SPSS provides the outcomes of both test, with and without assuming equality of standard errors. • In most cases these two tests provide consistent outcomes. Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 41 Mean comparison with unrelated samples • The mean for the 100 UK respondents is 6.12 (standard error 0.15) • the mean for the 100 Italian respondents is 5.62 (standard error 0.11) • Is a mean difference of 0.5 significantly different from 0? • Are the standard error equal? Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 42 SPSS example Target variable here Sub-groups defined here through a grouping variable Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 43 SPSS output The null hypothesis of equality of variance (Levene’s test) is not CONCLUSION: rejected thea5% s.l. (as p-value is larger thandifference 0.05), thus the two Thereatis statistically significant standard errors could be regarded as equal between the UK and Italy in terms of attitudes towards chicken (as measured by q9) Independent Samples Test Levene's Test for Equality of Variances F In my household we like chicken Equal variances assumed Equal variances not assumed Sig. .243 .622 t-test for Equality of Means t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper 2.682 198 .008 .500 .186 .132 .868 2.682 183.426 .008 .500 .186 .132 .868 At any rate, the null hypothesis is rejected in both cases (as the p-value is smaller than 0.05) Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 44 Paired samples • As a second case study, consider the situation where two measures are taken on the same respondents. • For example, all respondents were asked a second question on their general attitude towards chicken, “A good diet should always include chicken” (q10) • Do the two questions measure the same item (general attitude towards chicken)? • Can we assume that the results are – on average – equal? • In this case the samples are paired and it is possible to compute a difference between the response to q9 and q10 for each of the sampled household. • Everything goes back to one mean test discussed earlier Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 45 Paired samples • One may compute a third variable for each of the respondent as the difference between q9 and q10, then compute the mean and the standard error for this variable • SPSS does this automatically – – – – mean of 5.73 for q9 versus a mean of 5.50 for q10 average difference 0.23 estimated standard error 0.06. The t test statistic is 3.84, largely above the two-tailed 99% critical value, • the null hypothesis of mean equality should be rejected. • It is not safe to assume that the two questions are measuring the same construct Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 46 Paired samples Select two variables from the same data-set Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 47 Output Paired Samples Test Paired Differences Mean Pair 1 In my household we like chicken - A good diet should include chicken • .231 Std. Deviation 1.348 Std. Error Mean .060 95% Confidence Interval of the Difference Lower Upper .112 .350 t 3.824 df Sig. (2-tailed) 497 .000 Note how mean comparison tests can be used in the post-editing phase to check consistency (see lecture 4): 1) Try again with q9 and q12e, the two questions are targeted to measure the same construct with slightly different wording 2) If the null hypothesis is not rejected, one can compute the difference between q9 and q12 and look for outliers 3) Cases with outliers show an inconsistency Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 48 Other type of tests • Independent samples • Standard t-test (as seen) • Paired samples • It becomes a one-sample t-test • Related samples • Non-parametric tests Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 49 Related samples tests and non-parametric statistics • Do not require knowledge of the underlying distributions (see Gibbons, 1993 for a good introduction to nonparametric testing). • Non-parametric tests are also used in situations where the variables to be tested are qualitative Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 50 Non-parametric testing (NPT) • Parametric methods assume knowledge of the distribution (usually normal) and its parameters • Problems: different distributions (income), small sample sizes, etc. • NPT – No prior assumption on the distribution (and its parameters) Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 51 Some non-parametric tests One group • Runs (Wald-Wolfowitz) • Kolmogorov-Smirnov • Chi-square test • Binomial test • Wilcoxon signed-rank Two groups • Runs (Wald-Wolfowitz) on two groups • Mann-Whitney test (Wilcoxon-rank-sum test) • Kolmogorov-Smirnov on two groups • Wilcoxon paired sample test (on two paired samples) Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 52 Runs test • On one group • Check whether the order in the sample is random • Check whether observations below and above a cut-off point (e.g. median) follow a random order • On two groups • Check whether observations from two groups follow a random order (after being sorted according to some variable) Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 53 One-group tests Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 54 Runs test with a single group • NULL HYPOTHESIS: the cases in a sample are ordered in a random fashion ALTERNATIVE HYPOTHESIS: the order is not random CRITERION: • • • • • SEQUENCE: • • • • A dichotomous variable (e.g. gender) OR A cut-off point for a metric variable (e.g. income) which generates a dicotomous variable 111111111122222222221111111111 unlikely to be random 112122112122121211212212111212 more likely to be random Concentrations and sequences of ones and twos highlight nonrandomness For large samples, it is possible to test for randomness in the sequence using the normal distribution. Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 55 Kolmogorov-Smirnov • The null hypothesis is that a random sample is drawn from a (user specified) given distribution • SPSS allows to test whether a variable is distributed according to a normal distribution, a Poisson curve, a uniform or an exponential distribution • Once the distribution and its parameters are known, it becomes possible to estimate the population parameters Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 56 Chi-square test • Compare the frequency of the observed distribution with the expected frequency from a theoretical distribution (e.g. the normal curve) • The more different are the frequencies, the less likely it is that the empirical observations come from the theoretical distribution • The Chi-square statistics synthesizes the distance between the observed and expected frequencies and is compared to critical values from the Chi-square distribution Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 57 Binomial test • Applied to dichotomous variables, e.g. gender (males and females) • NULL HYPOTHESIS: the proportion of males and females in the sample is imputable to sampling error only • E.g. if we get 71% females and 29% males in the trust dataset, could this depend on sampling error only? • Assuming a 50%-50% true distribution between males and females, a 71-29 outcome leads to rejection of the null hypothesis Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 58 Wilcoxon signed rank test • Non-parametric correspondent of the one-mean t-test • NULL HYPOTHESIS: the median (or the difference between two medians) is equal to some specified value, provided that the distribution is symmetric • CRITERION: based on ranking, which consists of assigning increasing discrete values one, two, etc. to the cases once they have been sorted in ascending order according to the variable to be tested • The observations which differ from the hypothesized median are ranked according to their distance from the median • Ranks above and below the assumed median are summated to build the test statistic Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 59 Two groups tests Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 60 Runs test with two groups 1. 2. 3. 4. NULL HYPOTHESIS: the cases in a dataset are extracted from two independent samples from the same population ALTERNATIVE HYPOTHESIS: the samples are not independent OR they do not belong to the same population It corresponds to a mean comparison test, but it tests whether they belong to the same population CRITERION: 1) 2) 3) 5. 6. 7. Merge the samples into a single data-set and Order the variables according to a metric variable and Proceed as for one group, using a dichotomous variable or a cutoff point SEQUENCE: (as before) 111111111122222222221111111111 unlikely to be random 112122112122121211212212111212 more likely to be random Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 61 Mann-Whitney test • Also known as the Wilcoxon-rank-sum test • ASSUMPTION: the two samples are random and come from population that follow the same distribution apart from a translation k • NULL HYPOTHESIS: k=0 (which means that the samples are extracted from the same population) • The null here is stronger than the null of mean equality Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 62 Example (Trust) (1) • • • Trust in the European food safety authority for French (F) and German (G) respondents (measured on a one to seven scale) Trust of Germans = Trust of French + k NULL HYPOTHESIS H0: k= 0 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 63 Trust (2) 1. 2. 3. 4. 5. All cases are ranked by trust level independently from the country and a rank number is assigned Ranks for each group (F and G) are summated The sums are compared (allowing for different group sizes) through the U statistic The U statistic is based on the frequency with whom the first sample has a higher rank than the second sample If the two samples come from the same distribution the frequency should be random (similarly for the Run test) Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 64 Other tests Kolmogorov-Smirnov for two groups • Two distributions are compared, instead of simply comparing the theoretical distribution with the empirical one Wilcoxon paired sample test • Wilcoxon signed rank test on the median difference between the two samples provided that the differences are symmetrically distributed Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 65 Non-parametric tests in SPSS Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 66 Other tests • Test on proportions • Proceed like for means • For large samples they correspond to test on means (use the normal distribution) • Test on variances • F-test for the equality of variances • Levene’s F-test Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 67 Test on variances: F distribution Under the null hypothesis of variance equality, if the two samples are extracted from normal distributions or the sample sizes are large enough, the ratio between two variances follows a distribution like this: The F-distribution is characterized by two values for degrees of freedom: Number of obs of the first variance – 1 Number of obs of the second variance – 1 Notation: F(n1-1;n2-1) Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 68 F-test • Note that the F distribution only includes positive value; it is nonsymmetrical and for two-tailed tests the critical values are defined in a slightly different manner when compared to the t and normal distributions • For a given significance level a,the critical value for the right rejection region (the first variance is larger than the second) is denoted by Fa/2 as usual, since we want to exclude F values larger than Fa/2 because their probability is below a/2. • Instead, the critical value for the left rejection area is denoted as Fa/2 , since we set the probability that F is larger than the critical value to a/2, which means that the probability that F is smaller than the critical value is actually a/2, as desired. • This difference is also relevant to one-tailed test where the alternative hypothesis is 1 < 2, where the critical value will be Fa Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 69 Example (Trust data) • Is the variance in chicken consumption for males equal to the variance for females? H0: M = F vs H1: M ≠ F • Variance males: 0.87 • Variance females: 2.60 • F-statistic: 0.87/2.60=0.33 • Critical values: F0.975(129,313)=0.74 F0.025(129,313)=1.33 • The null hypothesis is rejected Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 70 Other tests on variances • The standard F test is valid under the assumption of normal distribution of the populations • Generalization for non-normal populations (SPSS and SAS): • Levene’s test for homogeneity of variances (slightly different F statistic). • Levene’s test can be biased by the presence of outliers Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 71