Mann_Whitney and Hodges-Lehmann in R

advertisement
STT 430/530
R#4
Fall, 2007
#let's first do example 2.6.1 on page 43
#calculate the Wilcoxon rank-sum statistic to compare the distributions
#of battery lifetimes (in hours) of two brands of laptops
brand1=c(3.6,3.9,4.0,4.3)
brand2=c(3.8,4.1,4.5,4.8)
bothbrands=c(brand1,brand2) #combine to rank
W=sum(rank(bothbrands)[1:4]) ; W #so W=14
#compare this to the critical values in Table A.3, p.340
#note they are 11 (lower) and 25 (upper) and that 14 falls
#in between, so at the 10% level we would not reject the
#null hypothesis of no difference in distributions of battery lifetimes
#RECALL the table gives 5% in each tail. Thus our p-value in this
#problem is >.10
#
#
#now check the results with the built-in function wilcox.test
wilcox.test(brand1,brand2)
#note that W is given as 4 and p=.3429 (two-sided alternative is the default)
#
#So what's going on here?? It turns out this is actually an equivalent
#statistic called the Mann-Whitney U statistic and U = W - n(n+1)/2
#where n=the size of the group whose ranks were summed to get W.
#
#define U=# of pairs (Xi,Yj) s.t. Xi < Yj
#first compute all the pairwise differences (pwds) between the X's & Y's
#Then see how many are negative to get the value of U
#create a matrix to contain all the pwds and fill it with 0's
k=matrix(rep(0,4), nrow=4, ncol=4, byrow=T) ; k
#now compute the pairwise differences and store them in k
for (i in 1:4) { k[i,]=brand1[i]-brand2 }
#NOTE: be careful about this subtraction - why is there a [i] attached to
#brand1 and not to brand2??
#now see how many are negative
length(k[k<0]) #note this gives U=12, agreeing with Table 2.6.1, p.44
#Why is this not equal to the value of W=4 we found in wilcox.test?
#
#it turns out that there is an existing function in R that will also do all the
#matrix computations called the outer function with "-" as an argument Check
help(outer)
pwds=outer(brand1,brand2,"-")
#gives the pwds and now to count all the ones with brand1<brand2
#i.e., count the pwds < 0
length(pwds[pwds<0])
#
#find the p-value for a U=12 from Table A.4: U=12 is in between 1 and 15
#so we don't reject the null hypothesis of no difference in battery
#life distributions between the two brands, again p>.1
#
#Often, if you reject the null hypothesis, then you will want to be able to
#estimate the difference between the distributions (since you're saying
#there is one!). Use the "shift alternative" and go over the steps on page 46
#for constructing a CI for delta ...
#
#
#
#
#
STT 430/530
R#4
Fall, 2007
#Now let's look at example 2.6.2 on page 46-47
granite=c(33.63,39.86,69.32,42.13,58.36,74.11)
basalt=c(26.15,18.56,17.55,9.84,28.29,34.15)
#first check that you're rejecting the null hypothesis
wilcox.test(granite,basalt,alternative="g")
#note p=.002165 so we reject the null hypothesis of equal distributions.
#let's get the pwds either by doing your own matrix computations or by
pwds=outer(granite,basalt,"-")
#the Hodges-Lehmann estimate of the shift delta is the median of these pwds
median(pwds)
#now get the quantiles of the Mann-Whitney distribution in order to
#compute CI's for delta
qwilcox(.05,6,6); qwilcox(.95,6,6)
#or you may get the H-L estimate and the confidence interval directly
wilcox.test(granite,basalt,conf.int=T,conf.level=.90)
Download