Σtatistics αt KΣU Review of Non-Parametric Tests STAT 3130

advertisement
Σtatistics αt KΣU
Review of Non-Parametric Tests
STAT 3130
Scenario 1
Objective: You need to determine if a single variable is the same or different from a
target value.
Example: A community wants to determine if the average household recycles more
than 5 pounds of plastic a week.
Data: (A single quantitative variable – I just put it into two columns to save space)
Pounds
1.1
1.2
2.1
2.6
2.7
2.9
3.6
3.9
4.2
4.3
4.5
4.7
5.3
Pounds
5.6
5.8
6.5
6.7
6.7
7.8
7.8
14.2
25.9
29.5
34.8
43.8
Parametric or Non Parametric? With only 25 observations, you might be able to
get away with a parametric test (one sample ttest if the data is normal – which it is not).
With fewer than 30 observations and with a skewed distribution, you are better off with
a non-parametric test.
Appropriate test: Sign Test
Hypothesis Statements: η < 5 and η > 5.
Process:
1. we count the number of values in the dataset greater than η0 (in this case, 5).
This test statistic is referred to as S+.
2. Then, count the number of values in the dataset less than η0. This test statistic is
referred to as S-.
3. If S+ is greater than S-, then we reject the null hypothesis
SAS Code:
Proc Univariate data=recycle mu0=5 loccount;
var plastic;
Run;
In addition to the S+ and S- values, you will get an approximated p-value from the
binomial distribution. This p-value should be interpreted like any other p-value – if it is
less than .05 (or .1) then reject the null.
Scenario 2
Objective: You need to determine if observations from paired samples are different.
Example: Distance runners believe that moderate exposure to ozone increases lung
capacity. To investigate this hypothesis, researchers exposed 12 rats to ozone at a rate
commensurate with human exposure. Their lung capacity was measured before and
after exposure.
Data: (Two quantitative variables representing one population – Pre/Post)
RAT
1
2
3
4
5
6
7
8
9
10
11
12
Capacity – PRE
8.7
7.9
8.3
8.4
9.2
9.1
8.2
8,1
8.9
8.2
8.9
7.5
Capacity – POST
9.4
9.8
9.9
10.3
8.9
8.8
9.8
8.2
9.4
9.9
12.2
9.3
Parametric or Non Parametric? With only 12 observations, this is most certainly a
non-parametric test.
Appropriate test: Wilcoxon Signed Rank Test
Hypothesis Statements: η = 0 and η ≠ 0 (where η is the median of the differences).
Process:
Step 1: Calculate the differences.
Step 2: Subtract η0 from each value.
Step 2: Take the absolute value of all the deviations calculated in Step 2.
Step 3: Delete all of the 0 values and let n be the number of values which remain.
Step 4: Rank the absolute deviations from the smallest to the largest. Assign the average
of the ranks in cases of ties.
Step 5: Let T+ be the total of ranks given to deviations that were originally positive and
T- be the total of ranks given to deviations that were originally negative.
If Testing a two tailed hypothesis statement, select the lower of the two Ts. Compare
this value to the critical value (see attached table). If the selected T is less than the
critical value, reject the null. If testing a one tailed (>), test the T- against the critical
value. If testing a one tailed (<), test the T+ against the critical value.
SAS Code:
Proc Univariate data=ozone mu0=0 loccount;
var ozone;
Run;
In addition to the T+ and T- values, you will get an approximated p-value from the
binomial distribution. This p-value should be interpreted like any other p-value – if it is
less than .05 (or .1) then reject the null.
Scenario 3
Objective: You need to determine if observations from two independent samples are
different.
Example: A plumbing contractor was interested in making her operation more
efficient by decreasing her drive time/mileage between service calls. She hired a
dispatcher to make a logistics plan for organizing service calls. She tested the amount of
miles driven between two plumbers in her company – one used the dispatcher and one
did not.
Data: (Two quantitative variables representing two independent populations)
Dispatcher Plumber
88.2
94.7
101.8
102.6
89.3
95.7
78.2
80.1
Control Plumber
105.8
117.6
119.5
126.8
108.2
114.7
90.2
95.6
Parametric or Non Parametric? With only 8 observations in each group, this is
most certainly a non-parametric test.
Appropriate test: Wilcoxon Rank Sum Test or Mann Whitney Test
Hypothesis Statements: H0: The distribution for the dispatcher plumber is not
shifted from the distribution for the control plumber. Ha: The distribution for the
dispatcher plumber is shifted to the left of the control plumber.
Process:
Step 1: List the data values from both samples in a single list arranged from smallest to
largest.
Step 2: In the next column, assign the numbers 1 to N (where N = n1+n2). These are the
ranks of the observations. As before, if there are ties, assign the average of the ranks the
values would receive to each of the tied values.
Step 3: Let W denote the sum of the ranks for the obs from Population 1.
Note that if there is no difference between the two medians (the null is true), the value
of W will be around half the sum of the ranks – {(n1(1+N))/2}
Determine if the sum of the ranks for the first population is larger or smaller than half of
the sum of all ranks. In the present example, the sum of all ranks is
1+2+3+4+5+6+7+8+9+10+11+12+13+14+15+16 = 136. Half would be 68. If the sum of
the dispatcher ranks is less than 68, we would reject H0 and conclude that the
distribution is shifted.
SAS Code:
Proc sort data = plumbers;
By group;
Run;
Proc boxplot data=plumbers;
Plot time*group;
Run;
proc npar1way wilcoxon data=plumbers;
class group;
var time;
run;
As before, you will get an approximated p-value from the binomial distribution. This pvalue should be interpreted like any other p-value – if it is less than .05 (or .1) then
reject the null.
Download