The data set sleeplevels contains the following variables:

advertisement
The data set sleeplevels contains the following variables:
Sleep: the number of hours of sleep for a randomly selected night
Humidity: the humidity in the room (measured in whatever units people use for humidity)
Temperature: The temperature of the room (measured in Fahrenheit)
Noise: The ambient noise (measured in decibels)
Age: The age of the person being measured
The goal is to be able to predict the amount of sleep a person will get based on the other variables
For each part below when it asks you to plot something (usually part a) discuss what you see in the plot.
When you plot residuals by variable (usually part b) show the plot for each of the four variables
1) Run a model predicting sleep based on the other variables
a. What in the residual plots suggests your model could be improved?
Curvature in the plot
b. Plot the residuals against each variable to find which variables have an interaction
2) Run a new model predicting sleep adding a temperature*humidity interaction
a. What in the residual plots suggests your model could still be improved?
Bad normality
b. Plot the residuals against each variable to find which new variables have an interaction
3) Run a new model predicting sleep adding a noise*age interaction
a. What in the residual plot shows there is still a problem?
Still bad normality
b. Plot the residuals against each variable to get hints about how to fix the problem
4) Run a new model with temperature*humidity but the other two variables (noise and age) have a
quadratic interaction (xy, x2, y2, x2y, xy2, x2y2)
a. The residual plots should now look good, but plot the residuals against each variable to
get hints as to how the model can be improved.
5) Based on part 4 find the term that is missing in your model (it should be a cubic term).
a. You should now see that all residual plots (even by each variable) all look good. Run the
summary and find which term is not needed in the model
(Intercept)
humidity
temperature
noise
age
I(noise^2)
I(age^2)
I(noise^3)
humidity:temperature
noise:age
I(noise^2):I(age^2)
age:I(noise^2)
noise:I(age^2)
Estimate Std. Error
t value Pr(>|t|)
1.231e+01 1.787e-03 6889.291
<2e-16 ***
2.066e-03 2.957e-05
69.878
<2e-16 ***
2.593e-03 2.488e-05
104.239
<2e-16 ***
-3.186e-02 5.860e-05 -543.647
<2e-16 ***
1.583e-02 1.201e-04
131.756
<2e-16 ***
-2.110e-04 1.045e-06 -201.942
<2e-16 ***
-7.915e-04 3.562e-06 -222.187
<2e-16 ***
-1.230e-06 6.882e-09 -178.774
<2e-16 ***
-1.847e-03 4.829e-07 -3824.823
<2e-16 ***
1.782e-03 6.986e-06
255.048
<2e-16 ***
1.193e-09 2.534e-09
0.471
0.638
1.976e-05 8.522e-08
231.885
<2e-16 ***
-7.920e-05 2.074e-07 -381.851
<2e-16 ***
It is the noise^2*age^2 term
6) Run the final model. (It should have an F-statistic of 6.819e+07)
a. Put in your paper the residual plot
b. Put in your paper the residual plots by each variable
c. Put in your paper the summary of the model
Call
lm(formula = sleep ~ humidity * temperature + noise * age + I(noise^2)*
age + noise * I(age^2) + I(noise^3), data = sleeplevels)
Residuals:
Min
1Q
-0.0054581 -0.0025217
Median
0.0000534
3Q
0.0024694
Max
0.0052014
Coefficients:
(Intercept)
humidity
temperature
noise
age
I(noise^2)
I(age^2)
I(noise^3)
humidity:temperature
noise:age
age:I(noise^2)
noise:I(age^2)
Estimate Std. Error t value Pr(>|t|)
1.231e+01 1.690e-03 7283.0
<2e-16 ***
2.066e-03 2.956e-05
69.9
<2e-16 ***
2.593e-03 2.487e-05
104.3
<2e-16 ***
-3.184e-02 3.974e-05 -801.2
<2e-16 ***
1.587e-02 8.155e-05
194.6
<2e-16 ***
-2.113e-04 8.858e-07 -238.6
<2e-16 ***
-7.927e-04 2.351e-06 -337.1
<2e-16 ***
-1.230e-06 6.879e-09 -178.9
<2e-16 ***
-1.847e-03 4.827e-07 -3826.1
<2e-16 ***
1.779e-03 2.267e-06
784.7
<2e-16 ***
1.980e-05 1.800e-08 1099.9
<2e-16 ***
-7.911e-05 5.160e-08 -1533.1
<2e-16 ***
--Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.002896 on 1988 degrees of freedom
Multiple R-squared:
1,
Adjusted R-squared:
1
F-statistic: 6.819e+07 on 11 and 1988 DF, p-value: < 2.2e-16
7) Explain super complicated interactions
a. Explain the effect of each of the four variables. To do so you will need to
i. create a prediction equation
ii. Set the interacting variable to a low value
iii. Plot the effect
iv. Increase the interacting variable until it is high
v. Add the lines for each level of the interacting variable
vi. Give a legend which has different colors for each level
You should have four plots which have a bunch of multicolored lines based on changing
levels for the interacting variable
b. For each plot include a sentence or two explaining what you see as the effect of the
variable.
As noise increases sleep decreases. For low or high levels of age that decrease is the most dramatic, and
for ages in the middle range (10-20) that decrease is not as pronounced.
As age increases the sleep level goes up, until it peaks out where it goes back down slightly again. That
peak is around 15 for noise levels that are lower (0-40) but it raises to about 20 for higher noise levels.
As humidity goes up the sleep level drops. That drop is more dramatic the higher the temperature
As Temperature goes up the sleep level drops. That drop is more dramatic the higher the humidity
sleeplevels <read.table("C:\\Users\\scrawfo8\\AppData\\Local\\Temp\\RtmpGAZdjX\\data75073e19be",
header=TRUE, quote="\"")
par(mfrow=c(2,2))
fit1<-lm(sleep~humidity+temperature+noise+age,data=sleeplevels)
plot(fit1)
plot(fit1$residuals~sleeplevels$humidity)
plot(fit1$residuals~sleeplevels$temperature)
plot(fit1$residuals~sleeplevels$age)
plot(fit1$residuals~sleeplevels$noise)
fit2<-lm(sleep~humidity*temperature+noise+age,data=sleeplevels)
plot(fit2)
plot(fit2$residuals~sleeplevels$humidity)
plot(fit2$residuals~sleeplevels$temperature)
plot(fit2$residuals~sleeplevels$age)
plot(fit2$residuals~sleeplevels$noise)
fit3<-lm(sleep~humidity*temperature+noise*age,data=sleeplevels)
plot(fit3)
plot(fit3$residuals~sleeplevels$humidity)
plot(fit3$residuals~sleeplevels$temperature)
plot(fit3$residuals~sleeplevels$age)
plot(fit3$residuals~sleeplevels$noise)
fit4<lm(sleep~humidity*temperature+noise*age+I(noise^2)*I(age^2)+I(noise^2)*age+noise*I(age^2),data=sl
eeplevels)
plot(fit4)
plot(fit4$residuals~sleeplevels$humidity)
plot(fit4$residuals~sleeplevels$temperature)
plot(fit4$residuals~sleeplevels$age)
plot(fit4$residuals~sleeplevels$noise)
fit5<lm(sleep~humidity*temperature+noise*age+I(noise^2)*I(age^2)+I(noise^2)*age+noise*I(age^2)+I(noise
^3),data=sleeplevels)
plot(fit5)
plot(fit5$residuals~sleeplevels$humidity)
plot(fit5$residuals~sleeplevels$temperature)
plot(fit5$residuals~sleeplevels$age)
plot(fit5$residuals~sleeplevels$noise)
fit6<lm(sleep~humidity*temperature+noise*age+I(noise^2)*age+noise*I(age^2)+I(noise^3),data=sleeplevels
)
plot(fit6)
plot(fit6$residuals~sleeplevels$humidity)
plot(fit6$residuals~sleeplevels$temperature)
plot(fit6$residuals~sleeplevels$age)
plot(fit6$residuals~sleeplevels$noise)
summary(fit6)
c<-fit6$coefficients
par(mfrow=c(1,1))
predict<-function(hum,age,tem,noi){
c[1]+c[2]*hum+c[3]*tem+c[4]*noi+
c[5]*age+c[6]*noi^2+c[7]*age^2+c[8]*noi^3+
c[9]*hum*tem+c[10]*noi*age+c[11]*age*noi^2+c[12]*noi*age^2
}
colors=c("red","orange","yellow","green","blue","purple","brown",
"black","pink")
age<-0;noi<-0;hum<-30;tem<-40
plot(-1,-1,xlim=c(0,80),ylim=c(0,10),main="Effect of Noise",
ylab="Sleep",xlab="Noise")
noi<-seq(0,80,length=1000)
lines(predict(hum,0,tem,noi)~noi,col=colors[1])
lines(predict(hum,5,tem,noi)~noi,col=colors[2])
lines(predict(hum,10,tem,noi)~noi,col=colors[3])
lines(predict(hum,15,tem,noi)~noi,col=colors[4])
lines(predict(hum,20,tem,noi)~noi,col=colors[5])
lines(predict(hum,25,tem,noi)~noi,col=colors[6])
lines(predict(hum,30,tem,noi)~noi,col=colors[7])
legend("bottomleft",col=colors,lty=1,
legend=paste("age=",seq(0,30,by=5),sep=""))
age<-0;noi<-0;hum<-30;tem<-40
plot(-1,-1,xlim=c(0,30),ylim=c(0,10),main="Effect of Age",
ylab="Sleep",xlab="Age")
age<-seq(0,30,length=1000)
lines(predict(hum,age,tem,0)~age,col=colors[1])
lines(predict(hum,age,tem,10)~age,col=colors[2])
lines(predict(hum,age,tem,20)~age,col=colors[3])
lines(predict(hum,age,tem,30)~age,col=colors[4])
lines(predict(hum,age,tem,40)~age,col=colors[5])
lines(predict(hum,age,tem,50)~age,col=colors[6])
lines(predict(hum,age,tem,60)~age,col=colors[7])
lines(predict(hum,age,tem,70)~age,col=colors[8])
lines(predict(hum,age,tem,80)~age,col=colors[9])
legend("bottomleft",col=colors,lty=1,
legend=paste("noise=",seq(0,80,by=10),sep=""))
age<-0;noi<-0;hum<-30;tem<-40
plot(-1,-1,xlim=c(30,70),ylim=c(0,10),main="Effect of Humidity",
ylab="Sleep",xlab="Humidity")
hum<-seq(30,70,length=1000)
lines(predict(hum,age,40,noi)~hum,col=colors[1])
lines(predict(hum,age,45,noi)~hum,col=colors[2])
lines(predict(hum,age,50,noi)~hum,col=colors[3])
lines(predict(hum,age,55,noi)~hum,col=colors[4])
lines(predict(hum,age,60,noi)~hum,col=colors[5])
lines(predict(hum,age,65,noi)~hum,col=colors[6])
lines(predict(hum,age,70,noi)~hum,col=colors[7])
lines(predict(hum,age,75,noi)~hum,col=colors[8])
lines(predict(hum,age,80,noi)~hum,col=colors[9])
legend("bottomleft",col=colors,lty=1,
legend=paste("Temperature=",seq(40,80,by=5),sep=""))
age<-0;noi<-0;hum<-30;tem<-40
plot(-1,-1,xlim=c(40,80),ylim=c(0,10),main="Effect of Temperature",
ylab="Sleep",xlab="Temperature")
tem<-seq(40,80,length=1000)
lines(predict(30,age,tem,noi)~tem,col=colors[1])
lines(predict(35,age,tem,noi)~tem,col=colors[2])
lines(predict(40,age,tem,noi)~tem,col=colors[3])
lines(predict(45,age,tem,noi)~tem,col=colors[4])
lines(predict(50,age,tem,noi)~tem,col=colors[5])
lines(predict(55,age,tem,noi)~tem,col=colors[6])
lines(predict(60,age,tem,noi)~tem,col=colors[7])
lines(predict(65,age,tem,noi)~tem,col=colors[8])
lines(predict(70,age,tem,noi)~tem,col=colors[9])
legend("bottomleft",col=colors,lty=1,
legend=paste("Humidity=",seq(30,70,by=5),sep=""))
Download