R workshop #4 - University of British Columbia

advertisement
Workshop in R and GLMs: #4
Diane Srivastava
University of British Columbia
srivast@zoology.ubc.ca
Exercise
1. Fit the binomial glm survival = size*treat
2. Fit the bionomial glm parasitism =
size*treat
3. Predict what size has 50% parasitism in
treatment “0”
Predicting size for p=0.5,
treat=0
Output from logistic regression with
logit link: predicted loge (p/1-p) =
a+bx
So when p=0.5, solve log(1)=a+bx
What is equation for treat 0?
treat 1?
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
-2.38462
0.16780 -14.211
size
0.76264
0.04638 16.442
treat
0.28754
0.23155 1.242
size:treat
-0.09477
0.06357 -1.491
<2e-16 ***
<2e-16 ***
0.214
0.136
Rlecture.csv
80
70
Growth
60
50
40
30
20
10
0
0
1
2
3
4
5
6
7
Size
1.2
100
90
1
70
Survival
Parasitism (%)
80
60
50
0.8
0.6
0.4
40
30
0.2
20
0
10
0
0
0
2
4
Size
6
8
1
2
3
4
Size
3.12
5
6
7
Model simplification
1. Parsimonious/ Logical sequence (e.g.
highest order interactions first)
2. Stepwise sequence
3. Bayesian comparison of candidate
models (not covered)
ANCOVA: Difference between categories….
Constant, doesn’t
depend on size
Depends on size
size*treat sig
12
Logit parasitism
Logit parasitism
size*treat ns
10
8
6
4
2
12
10
8
6
4
2
0
0
0
2
4
Plant size
6
0
2
4
Plant size
6
Deletion tests
How to change your model quickly:
model2<-update(model1,~.-size:treat)
How to do a deletion test:
anova(reduced model, full model, test="Chi")
1. Test for interaction in logit parasitism ANCOVA
If not sig, remove and continue. If sig, STOP!
2. Test covariate If not sig, remove and continue. If
sig, put back and continue
3. Test main effect
Code for “parasitism” analysis
> ds<-read.table(file.choose(), sep=",", header=TRUE); ds
> attach(ds)
> par<-cbind(parasitism, 100-parasitism); par
> m1<-glm(par~size*treat, data=ds, family=binomial)
> summary(m1)
> m2<-update(m1, ~.-size:treat)
> summary(m2)
> anova(m2,m1, test="Chi")
> m3<-update(m2, ~.-size)
> anova(m3,m2, test="Chi")
> m3<-update(m2, ~.-treat)
> anova(m3,m2, test="Chi")
Context (often) matters!
What is the p-value for treat in:
size+treat?
treat?
Stepwise regression:
step(model)
Jump height (how high ball can be
raised off the ground)
8
9
10
11
Feet off ground
Total SS = 11.11
11
10.5
Jump (ft)
10
9.5
9
8.5
8
7.5
7
4.5
5.5
6.5
7.5
8.5
Height (ft)
X variable
parameter
SS
F1,13
p
Height
of player
+0.943
9.96
112
<0.0001
11
10.5
Jump (ft)
10
9.5
9
8.5
8
7.5
7
105
125
145
165
185
205
Weight (lbs)
X variable
parameter
SS
F1,13
p
Weight
of player
+0.040
7.92
32
<0.0001
Why do you think weight is +
correlated with jump height?
An idea
Perhaps if we took two people
of identical height, the lighter
one might actually jump
higher? Excess weight may
reduce ability to jump high…
lighter
heavier
11
10.5
Jump (ft)
10
9.5
9
8.5
8
7.5
7
4
5
6
7
8
Height (lbs)
X variable
parameter
SS
F
Height
Weight
+2.133
-0.059
9.956 803
1.008 81
p
<0.0001
<0.0001
Tall people can
jump higher
Heavy people
often tall (tall
people often
heavy)
+
Height
Jump
+
Weight
People light for
their height can
jump a bit more
Species.txt
Rothamsted Park Grass experiment started in 1856
Exercise (species.txt)
diane<-read.table(file.choose(), header=T); diane;
attach(diane)
Univariate trends:
plot(Species~Biomass)
plot(Species~pH)
Combined trends:
plot(Species~Biomass, type="n");
points(Species[pH=="high"]~Biomass[pH=="high"]);
points(Species[pH=="mid"]~Biomass[pH=="mid"], pch=16);
points(Species[pH=="low"]~Biomass[pH=="low"], pch=0)
Exercise (species.txt)
1. With a normal distribution, fit pH*Biomass
• check model dignostics
• test interaction for significance
2. With a poisson distribution, fit pH *Biomass
• check model dignostics
• test interaction for significance
2
4
6
8
10
3.0
2.0
0
2
4
6
Biomass
3.0
2.0
Make sure you KNOW what you
are modelling!
0
2
4
6
Biomass
1.0
log(Species)
Biomass
1.0
log(Species)
40
30
20
10
Species
0
Moral of the story:
8
10
8
10
Exercise (species.txt)
1. Fit glm: Species~pH, family=gaussian
2. Test if low and mid pH have the same effect
• this is a planned comparison
Further reading
Statistics: An Introduction using R
(M.J. Crawley, Wiley publishers)
Extending the linear model with R
(JJ Faraway, Chapman & Hall/CRC)
Code for “Species” analysis
> m1<-glm(Species~pH*Biomass, family=gaussian, data=diane)
> summary(m1)
> m2<-update(m1, ~.-pH:Biomass)
> anova(m2,m1, test="Chi")
> par(mfrow=c(2,2)); plot(m1)
> m3<-glm(Species~pH*Biomass, family=poisson, data=diane)
> m4<-update(m3, ~.-pH:Biomass)
> anova(m4,m3, test="Chi")
> par(mfrow=c(2,2)); plot(m3)
>PH<-(pH!="high")+0
> m5<-glm(Species~pH, family=gaussian, data=diane)
> m6<-update(m5, ~.-pH+PH)
> anova(m6,m5, test="Chi")
Download