Statistics MINITAB - Lab 15 1. CHI – SQUARED TESTS FOR UNIVARIATE CATEGORICAL DATA Often experimental or survey work results in data that consists of counts of responses fitting into a number of classifications or categories. This data does not lend itself to much of the data analysis we have looked at in earlier classes as count data cannot be assumed to be normally distributed. Univariate categorical data is most conveniently summarised in a one-way frequency table. EXAMPLE: A Study was carried out to identify the cause of Adverse Drug Effects. The table below summarises the cause of 95 ADE's that were caused by the wrong dose being prescribed or dispensed. Using = 0.01 determine whether the true percentages of ADE's in the five categories are different. Wrong Dosage Cause Lack of Knowledge of Drug 29 Rule Violation 17 Faulty Dose Checking 13 Slips 9 Other 27 Calculating the expected number of outcomes In this case we can assume that the cause of wrong dosage is equally divided among all of these reasons. In this case there were 95 ADE’s and five categories and so we would expect 19 in each. Wrong Dosage Cause Observed Expected Lack of Knowledge of Drug 29 19 Rule Violation 17 19 Faulty Dose Checking 13 19 Slips 9 19 Other 27 19 1 The 2 Test in MINITAB Ho: 1 19 19 19 19 19 , 2 , 3 , 4 , 5 95 95 95 95 95 Ha: Ho is not true TO CALCULATE A TEST STATISTIC AND P-VALUE 1 Enter the observed and expected values. 2 Name the columns by typing Observed and Expected in the name cells. 3 Calculate the test statistic: (Observed Expected ) 2 Expected by first enabling commands and then generating some commands under MTB to create the test statistic equation. Hints: (a) Store the (Observed – Expected) values in C3 (b) Store the (Observed – Expected)2 values in C4 (c) Store the (Observed Expected ) 2 Expected values in C4 (d) Then sum the values in C4 and store them under some name, eg k1 (e) Print k1 and this is the value of your test statistic 4. Calculate the P-Value by: Choosing Calc > Probability Distributions > ChiSquare. Choose Cumulative probability Enter the degrees of freedom for the test Choose Input constant, and enter the name you gave the test statistic In Optional storage, enter a name for the cumulative probability. Eg k2 2 Look this value up in the tables. How would you calculate the p-value from this figure? ________________________________________________________________________ 6. The p-value is calculated from 1 minus the cumulative probability. Calculate this under an MTB command. Using = .05, summarise this analysis. You should state the Ho, Ha, , the value of the test statistic, p and your conclusions. 2. NON - PARAMETRIC TESTS - 1 SAMPLE SIGN TEST. Often we are required to analyse small sample data that cannot be considered to be normally distributed. In these cases a t or F test (ANOVA) cannot be performed. However there is a class of tests that do not require data to follow any particular probability distribution - these tests are called non-parametric tests. One such test, which is based on the binomial distribution is called the Sign Test. The Sign test tests hypothesis concerning the median of a small sample. We know from the definition of a median that 50% of the distribution should lie below and 50% above the true median. Therefore if we specify a median under the null hypothesis we can analyse the sample as a binomial experiment, with the number of observations below the median defined as a success, with p = 0.5 being the probability of a success. (we can also define a success on the binomial experiment as being above the hypothesised median). 3 Sign Test for a Population Median Note: (the Greek letter eta) will be the symbol used here for the population median. ONE-TAILED TEST TWO-TAILED TEST Ho: = 0 Ho: < 0 (or Ho: > 0 ) Ho: = 0 Ho: 0 Test statistic: Test statistic: S = Number of sample measurements less than 0 [ or S = number of measurements greater than 0 ] Larger of S1 and S2 , where S1 is the number of measurements less than 0 and S2 is the number of measurement greater than 0 Observed significance level: Observed significance level: p-value = P( x S ) p-value = 2P( x S ) where x has a binomial distribution with parameters n and p = 0.5. P(x S) = 1 p(x < S) The P-value is computed as: => P.value 1 s 1 n i 0.5 n i 0 s 1 The quantity n i 0.5 n may be computed in the MINITAB session window as follows; i 0 MTB > cdf S-1; (where S is replaced by the actual integer of interest) SUBC > binomial n 0.5. (where n is replaced with the actual number of trials) Rejection region: reject Ho: if P.value Assumption: The sample is selected randomly from a continuous probability distribution [Note: No assumptions need to be made about the shape of the probability distribution.] Note: when the sample size is 10 or more the normal approximation to the binomial may be used - refer to lecture notes and text book for more information on these formulae. The following data are scores for the survival times in months of heart and lung transplant patients at a particular hospital. Test the hypothesis that the median survival time is 10 months versus an alternative that it is less than 10 months. Survival times (months): 8.4, 16.9, 15.8, 12.5, 10.3, 4.9, 12.9, 9.8, 23.7, 7.3 . Go to Stat > Nonparametrics > 1-Sample Sign... 4 1. Select the test variable here 2. Enter the median under the Ho 3. Select the correct alternative hypothesis Summarise your analysis and conclusion here. Is there any evidence that the median survival time is greater than 20 months ? Summarise your analysis and conclusions. Assignment: 5 An experiment was carried out to investigate if there was equal distribution of the different colours of M&M’s in the large bags. The observed values were 9, 15, 13, 17, 26, 4 for the red, green, orange, brown, yellow and blue M&M’s respectively. Perform an appropriate hypothesis test on these results using = 0.01. Report your answers here: REVISION SUMMARY After this lab you should be able to : - Perform a chi-squared test for univariate categorical data (to do this you must be familiar with the test statistic for this test) - Perform a sign test END 6