Σtatistics αt KΣU Review of Non-Parametric Tests STAT 3130 Scenario 1 Objective: You need to determine if a single variable is the same or different from a target value. Example: A community wants to determine if the average household recycles more than 5 pounds of plastic a week. Data: (A single quantitative variable – I just put it into two columns to save space) Pounds 1.1 1.2 2.1 2.6 2.7 2.9 3.6 3.9 4.2 4.3 4.5 4.7 5.3 Pounds 5.6 5.8 6.5 6.7 6.7 7.8 7.8 14.2 25.9 29.5 34.8 43.8 Parametric or Non Parametric? With only 25 observations, you might be able to get away with a parametric test (one sample ttest if the data is normal – which it is not). With fewer than 30 observations and with a skewed distribution, you are better off with a non-parametric test. Appropriate test: Sign Test Hypothesis Statements: η < 5 and η > 5. Process: 1. we count the number of values in the dataset greater than η0 (in this case, 5). This test statistic is referred to as S+. 2. Then, count the number of values in the dataset less than η0. This test statistic is referred to as S-. 3. If S+ is greater than S-, then we reject the null hypothesis SAS Code: Proc Univariate data=recycle mu0=5 loccount; var plastic; Run; In addition to the S+ and S- values, you will get an approximated p-value from the binomial distribution. This p-value should be interpreted like any other p-value – if it is less than .05 (or .1) then reject the null. Scenario 2 Objective: You need to determine if observations from paired samples are different. Example: Distance runners believe that moderate exposure to ozone increases lung capacity. To investigate this hypothesis, researchers exposed 12 rats to ozone at a rate commensurate with human exposure. Their lung capacity was measured before and after exposure. Data: (Two quantitative variables representing one population – Pre/Post) RAT 1 2 3 4 5 6 7 8 9 10 11 12 Capacity – PRE 8.7 7.9 8.3 8.4 9.2 9.1 8.2 8,1 8.9 8.2 8.9 7.5 Capacity – POST 9.4 9.8 9.9 10.3 8.9 8.8 9.8 8.2 9.4 9.9 12.2 9.3 Parametric or Non Parametric? With only 12 observations, this is most certainly a non-parametric test. Appropriate test: Wilcoxon Signed Rank Test Hypothesis Statements: η = 0 and η ≠ 0 (where η is the median of the differences). Process: Step 1: Calculate the differences. Step 2: Subtract η0 from each value. Step 2: Take the absolute value of all the deviations calculated in Step 2. Step 3: Delete all of the 0 values and let n be the number of values which remain. Step 4: Rank the absolute deviations from the smallest to the largest. Assign the average of the ranks in cases of ties. Step 5: Let T+ be the total of ranks given to deviations that were originally positive and T- be the total of ranks given to deviations that were originally negative. If Testing a two tailed hypothesis statement, select the lower of the two Ts. Compare this value to the critical value (see attached table). If the selected T is less than the critical value, reject the null. If testing a one tailed (>), test the T- against the critical value. If testing a one tailed (<), test the T+ against the critical value. SAS Code: Proc Univariate data=ozone mu0=0 loccount; var ozone; Run; In addition to the T+ and T- values, you will get an approximated p-value from the binomial distribution. This p-value should be interpreted like any other p-value – if it is less than .05 (or .1) then reject the null. Scenario 3 Objective: You need to determine if observations from two independent samples are different. Example: A plumbing contractor was interested in making her operation more efficient by decreasing her drive time/mileage between service calls. She hired a dispatcher to make a logistics plan for organizing service calls. She tested the amount of miles driven between two plumbers in her company – one used the dispatcher and one did not. Data: (Two quantitative variables representing two independent populations) Dispatcher Plumber 88.2 94.7 101.8 102.6 89.3 95.7 78.2 80.1 Control Plumber 105.8 117.6 119.5 126.8 108.2 114.7 90.2 95.6 Parametric or Non Parametric? With only 8 observations in each group, this is most certainly a non-parametric test. Appropriate test: Wilcoxon Rank Sum Test or Mann Whitney Test Hypothesis Statements: H0: The distribution for the dispatcher plumber is not shifted from the distribution for the control plumber. Ha: The distribution for the dispatcher plumber is shifted to the left of the control plumber. Process: Step 1: List the data values from both samples in a single list arranged from smallest to largest. Step 2: In the next column, assign the numbers 1 to N (where N = n1+n2). These are the ranks of the observations. As before, if there are ties, assign the average of the ranks the values would receive to each of the tied values. Step 3: Let W denote the sum of the ranks for the obs from Population 1. Note that if there is no difference between the two medians (the null is true), the value of W will be around half the sum of the ranks – {(n1(1+N))/2} Determine if the sum of the ranks for the first population is larger or smaller than half of the sum of all ranks. In the present example, the sum of all ranks is 1+2+3+4+5+6+7+8+9+10+11+12+13+14+15+16 = 136. Half would be 68. If the sum of the dispatcher ranks is less than 68, we would reject H0 and conclude that the distribution is shifted. SAS Code: Proc sort data = plumbers; By group; Run; Proc boxplot data=plumbers; Plot time*group; Run; proc npar1way wilcoxon data=plumbers; class group; var time; run; As before, you will get an approximated p-value from the binomial distribution. This pvalue should be interpreted like any other p-value – if it is less than .05 (or .1) then reject the null.