Statistical Significance Testing

Statistical Significance Testing The concept of statistical significance is that some variation in the results of research findings is large enough to not be explained simply by chance. If a survey were given to 100 people and then given to a completely different group of 100 people, the results of the two surveys are going to be somewhat different. The question is, does this difference arise because of sampling error in choosing the participants for one of the surveys or is there a significant difference between the two groups. Statistical significance testing is a method of determining if marketing research findings are significant or incidental. There are several different types of significance tests. They include: 1. 2. 3. 4. Chi-square tests. z-tests. t-tests. F-tests. Each of these methods of significance testing is described in this tutorial as well as when to apply the test and how to interpret the results of the test. Chi-Square Test The chi-square statistical test studies the relationship between two categorical variables. As explained in the Association Cross Tabulation tutorial, the association between two categorical variables is looked at by creating a table of all the possible combinations of responses of the two different variables. This table can be created in SPSS by a process called crosstabs. A chisquare test enables you to determine whether an observed pattern of frequencies in a crosstabs table corresponds to or fits an “expected” pattern. The simplest way to conduct a chi-square test is to use SPSS. A chi-square test using SPSS and a chi-square distribution table can be done by following these steps: 1. Enter the data into SPSS and perform a crosstabs analysis as explained in the SPSS Tool Kit tutorial. 2. Look at the table in the SPSS viewer labeled, “Chi-Square Tests.” There is a row in this column labeled “Pearson Chi-Square.” The two numbers in this row that you need to pay attention to are in the column labeled “Value” and “df.” The “df” stands for degrees of freedom. 3. Look up the chi-square test value in a chi-square test table. This table can be found in the appendix in the back of the textbook. You will notice in the test table there is a column labeled “Degrees of freedom” and one or more columns labeled with different levels of significance. For this class we will always use a 0.10 or 0.05 level of significance. These are the significance levels most commonly used in practice. The chi-square test table corresponding to these levels of significance is duplicated at the end of this tutorial. Look up the number in the chi-square test table that corresponds to the “df” number given in the SPSS crosstabs analysis and the desired level of significance. 4. Compare the “Value” number given in the SPSS crosstabs analysis to the number looked up in the table. If the “Value” number given in the SPSS crosstabs analysis is greater than the number looked up in the table, then the results of the analysis are statistically significant at the level chosen. For a level of significance of .10, this means you are 90% confident the results are statistically significant. For example, suppose a survey asked a question about the respondent’s gender and another question about the respondent’s frequency of visits to a particular store. After collecting the data and running a crosstabs analysis in SPSS, you have the following two tables: fre que ncy of visits * gender Crosstabula tion Count frequency of visit s 1-5 6-14 15 and above Total gender male female 14 26 16 34 15 11 45 71 Total 40 50 26 116 Chi-Square Te sts Pearson Chi-Square Lik elihood Ratio Linear-by-Linear As soc iation N of Valid Cases Value 5.125a 5.024 2.685 2 2 As ymp. Sig. (2-sided) .077 .081 1 .101 df 116 a. 0 c ells (.0% ) have expected count less than 5. The minimum expected count is 10. 09. The crosstabs analysis yields two degrees of freedom. Looking up the chi-square statistic for two degrees of freedom and for a 0.10 level of significance gives you the value 4.605. Since the Pearson chi-square value of 5.125 is bigger than the value from the table, you would conclude with 90% confidence that there is a statistically significant difference in frequency of visits to the store between males and females. Alternatively, you can determine the significance level by looking at the value in the “Pearson Chi-Square” row under the column labeled “Asymp. Sig. (2sided)”. From the example, this value is .077. This means we are 92.3% confident (1.0 - .077 = 0.923 or 92.3%) that there is a statistically significant difference in frequency of visits to the store between males and females. One other item to notice is the footnote immediately below the “Chi-Square Tests” output table. This footnote tells you how many cells in the crosstabs table have an expected count less than 5. If more than 20% of the cells have an expected count less than 5, or if any cell has an expected count less than 1, then the results of the Chi-square test should not be used to test for statistical significance. The reason for this is because cells with low expected counts throw off the calculation of the Chi-square statistic. If too many cells have a low expected count the calculated Chi-square value is no longer accurate and should not be used to test for statistical significance. z-Test There are a couple of different types of z-tests you can conduct. Note that for any type of z-test you need to have at least 30 data points for the test results to be valid. One type of z-test is to test the statistical significance of a survey’s results when used to estimate the characteristics of an entire population represented by the sample. Your marketing research textbook contains instructions for conducting and evaluating a z-test for this purpose. However, this is generally a less useful type of z-test and is therefore not explained any further in this tutorial. A more useful type of z-test is one that can be conducted to test the statistical significance of the difference in means between two sets of data. For example, suppose you asked a group of respondents a question about how often they make purchases from a particular store. You want to know if there is any difference between average frequency of purchases at the store for men and women. The statistical significance of the difference in means can be checked using a z-test. This can be done in Microsoft Excel by following these steps: 1. Enter a data set into one column. There must be at least 30 data points in this data set for the z-test to be valid. 2. Enter a second data set into another column. There also must be at least 30 data points in this data set for the z-test to be valid. However, the two data sets don’t have to have the same number of data points. 3. Calculate the variance of the first data set. This is done by selecting a cell and using Excel’s “var” function. Select all data points in the set when calculating the variance. You can also calculate the standard deviation of the data set and then square the standard deviation to get the variance. 4. Calculate the variance of the second data set using the same procedure used to calculate variance of the first data set. 5. Click on the “Tools” menu at the top of the Excel screen. 6. Select “Data Analysis” from the drop down menu. If you don’t have the “Data Analysis” option you need to perform the following steps (you shouldn’t have this problem using one of the computers in the Tanner Building computer labs): a. Click on the “Tools” menu at the top of the Excel screen. b. Select “Add-Ins” from the drop down menu. c. In the window that pops up, check the box next to “Analysis Toolpak”. d. Click on “OK”. Depending on how Excel was originally installed on your computer, you may need your original Excel installation CD to complete this process. 7. Select “z-test: Two Sample for Means” from the list on the pop-up screen (it is the very last choice). 8. Click “OK”. 9. A new box will pop up asking for the information to use for conducting the z-test. For “Variable 1 Range” select the data set in the first column. 10. For “Variable 2 Range” select the data set in the second column. 11. For “Hypothesized Mean Difference” enter 0. This means you want to test the statistical significance of there being any difference between the means of the two data sets. 12. For “Variable 1 Variance (known)” enter the variance calculated in step 3 for the first data set. 13. For “Variable 2 Variance (known)” enter the variance calculated in step 4 for the second data set. Note that the two variance values must be manually entered. You can’t just select a cell containing the values. 14. If you included column labels in the variable ranges selected in steps 9 and 10, check the “Labels” box. Otherwise leave this box blank. 15. For “Alpha” enter the level of significance you wish to use. In this class, and in most instances in business, you will either use 0.10 or 0.05. 16. Under “Output options” either choose “Output Range” and select a cell or choose “New Worksheet Ply” and type the name of a new worksheet. You can also select “New Workbook” but you may find it more helpful to keep track of your results by placing the output as close to the data as possible. 17. Click on “OK” After following these steps, an output table will appear with the results of the z-test. The table consists of three columns and several rows. The first column contains the labels for each of the rows after the first three header rows. The following steps will help you interpret the output results: 1. The first row of the table (after the table header) is labeled “Mean”. The 2nd and 3rd columns contain the calculated mean value for the two sets of data. 2. The second row is labeled “Known Variance”. The 2nd and 3rd columns contain the values you input for the variance of each set of data on steps 12 and 13 above. 3. The third row is labeled “Observations”. The 2nd and 3rd columns contain a count of the number of data points included in each data set. 4. The fourth row is labeled “Hypothesized Mean Difference”. The 2nd column contains the value you input on step 11 above. 5. The fifth row is labeled “z” and contains the z-score for the test you conducted. 6. The sixth and seventh rows are labeled “P(Z<=z) one-tail” and “z Critical one-tail”. Ignore the information in these two rows. You are interested in the information for a two-tail test. You want to look at the two-tail test results because it looks at the probability that the one mean is neither higher nor lower than the other mean. The onetail test is only checking one of these two possibilities. 7. The eight row is labeled “P(Z<=z) two-tail”. The 2nd column should contain a number between 0 and 1. This number is the probability there is no statistically significant difference between the two means. If you take one minus this number, it will give you the statistical significance of the test. For example, if the value shown were 0.025 then you would be 97.5% (1 - 0.025 = 0.975) certain that there is a statistically significant difference in the means of the two data sets. 8. The last row is labeled “z Critical two-tail”. The 2nd column contains the minimum zscore necessary for the test to be statistically significant at your chosen level. If this number is less than the absolute value of the z-score indicated in the fifth row, then the results of your test are statistically significant at the level you chose. For example, if you chose a 0.05 level of significance, the last row should have a value of 1.96 (rounded). If the z-score in the fifth row is 2.05, then you would be 95% confident that there is a statistically significant difference in the means of the two data sets because z (2.05) is greater than z critical (1.96). t-Test A t-test is similar to a z-test. The difference is that a t-test is used if the sample size is 30 or less. A t-test can be used for the same purposes as a z-test (determine statistical significance of a sample projected onto an entire population or determine statistical significance of a difference in sample means). Methods for conducting a t-test are not described in this tutorial because you will almost always have sample sizes bigger than 30 to analyze and therefore will use z-tests far more often than t-tests. Sometimes an SPSS analysis will provide results that include a t-statistic as well as a “Sig” value. SPSS never provides a z-score as part of the results of an analysis. This is because if the number of degrees of freedom is large enough, a t-test and a z-test provide the exact same analysis of statistical significance. Therefore, SPSS will not conduct z-tests. It will only conduct t-tests. It is typically easier to conduct a z-test in Excel than to conduct a t-test in SPSS. However, when SPSS provides a t-statistic, it is not necessary to know the number of degrees of freedom because the “Sig” value associated with the t-score provides the level of confidence. For example, if SPSS provided a t-score with an associated “Sig” value of 0.046 then we would be 95.4% confident (1 – .046 = .954 or 95.4%) in the results of the analysis and would say the results are statistically significant at this level. F-Test Another common test of statistical significance that you will come across when doing analysis with SPSS is the F-test. For example, if you run a linear regression test in SPSS you get a table in the SPSS viewer labeled, “ANOVA.” The last two columns of this table are labeled “F” and “Sig.” If you know the degrees of freedom associated with this F-score you can compare the Fscore to a table of F-values like the one in the back of the textbook. However, this is beyond the scope of this class. The important thing to remember is that with an F-score, like a Z-score or a t-score, the bigger the number, the higher the level of statistical significance. The “Sig” value listed in the last column of the ANOVA table tells you the level of significance associated with the F-score. For example, if the “Sig” value is .21 then you are 79% confident (1 - .21 = .79 or 79%) that the results of your analysis are statistically significant. This level of confidence is generally considered to be statistically insignificant. A general rule of thumb is that an SPSS “Sig” value must be 0.10 or less (90% or more confident) to be considered statistically significant. Chi-Square Test Table for 0.10 and 0.05 Level of Significance Degrees of Freedom 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 70 80 90 100 Test Value at 0.10 Level of Significance 2.70554 4.60517 6.25139 7.77944 9.23635 10.6446 12.0170 13.3616 14.6837 15.9871 17.2750 18.5494 19.8119 21.0642 22.3072 23.5418 24.7690 25.9894 27.2036 28.4120 29.6151 30.8133 32.0069 33.1963 34.3816 35.5631 36.7412 37.9159 39.0875 40.2560 51.8050 63.1671 74.3970 85.5271 96.5782 107.565 118.498 Test Value at 0.05 Level of Significance 3.84146 5.99147 7.81473 9.48773 11.0705 12.5916 14.0671 15.5073 16.9190 18.3070 19.6751 21.0261 22.3621 23.6848 24.9958 26.2962 27.5871 28.8693 30.1435 31.4104 32.6705 33.9244 35.1725 36.4151 37.6525 38.8852 40.1133 41.3372 42.5569 43.7729 55.7585 67.5048 79.0819 90.5312 101.879 113.145 124.342 t-Test Table for 0.10 and 0.05 Level of Significance Degrees of Freedom 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 60 120  Test Value at 0.10 Level of Significance 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.303 1.296 1.289 1.282 Test Value at 0.05 Level of Significance 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.684 1.671 1.658 1.645

Statistical Significance Testing

Related documents

Products

Support

Statistical Significance Testing

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib