Lab 5:

advertisement
To access R in Unix:
In the computer-lab click on the Putty window and login in Unity. Then click
on an ICON # named XWin32 (A blue X icon)
% add R
% R
EXERCISES:
1: HOW TO CONVERT A SPLUS OBJECT TO R
2: INTRO TO LM (LINEAR MODELS)
3: STATISTICAL MODELS: ANOVA, GLM, Poisson regression
#We will use the CAR library: Companion to Applied Regression
# for more information:
library(car)
library(help=car)
EXERCISE 1:
How to convert a Splus object to R object?
For example, the data object "car.test.frame" will be converted from a Splus
object to R object.
In Splus6:
% add Splus
% Splus6
(license has expired)
> car.test.frame
# this is a file available in Splus6
> dump("car.test.frame")
# car.test.frame.q will be generated in
the current working directory.
Open R in the current working directory
> source("car.test.frame.q")
> car.test.frame
# This would be exactly same as "car.test.frame" in Splus
-----------------------EXERCISE 2: INTRO TO LM
Linear Models
The function lm() returns an R object called a fitted linear
least-squares model object, or lm object for short:
lm(formula, data)
where formula is the structural formula that specifies the model
#
and data is the data frame containing the data (omitted if the
data is not stored in a data frame).
library(rpart)
>data(car.test.frame)
# get the data file
The 'car.test.frame' data frame has 60 rows and 8 columns, giving
data on makes of cars taken from the April, 1990 issue of
_Consumer Reports_.
>
fuel.fit<-lm(100/Mileage ~ Weight + Disp., car.test.frame)
> fuel.fit
Call:
lm(formula=100/Mileage~Weight + Disp.,data=car.test.frame)
Coefficients:
(Intercept)
Weight
Disp.
0.4789733 0.001241421 0.0008543589
Degrees of freedom: 60 total; 57 residual
Residual standard error: 0.3900812
A list of the components of an lm object and of the arguments which can be
passed through the lm() function can be found at the
end of this section.
The lm object fuel.fit can be used as an argument to other functions to
obtain summaries of the data or to extract specific
information about the model. The function summary() gives a more detailed
description of the model than the one obtained by
printing out the lm object. The components of an lm.summary object are listed
at the end of this section.
> fuel.summary<-summary(fuel.fit)
> fuel.summary
Call:
lm(formula=100/Mileage~Weight + Disp.,data = car.test.frame)
Residuals:
Min
1Q Median
3Q
Max
-0.8109 -0.2559 0.01971 0.2673 0.9812
Coefficients:
Value Std. Error
(Intercept) 0.4790 0.3418
Weight 0.0012 0.0002
Disp. 0.0009 0.0016
t value
1.4014
7.2195
0.5427
Pr(>|t|)
0.1665
0.0000
0.5895
Residual standard error: 0.3901 on 57 degrees of freedom
Multiple R-Squared: 0.7438
F-statistic: 82.75 on 2 and 57 degrees of freedom,
the p-value is 0
Correlation of Coefficients:
(Intercept) Weight
Weight -0.8968
Disp. 0.4720
-0.8033
Specific information about the model can be printed using the functions
coef(), resid() and fitted(). The same information can be
obtained using fuel.fit$coef, fuel.fit$resid, and fuel.fit$fitted.
> coef(fuel.fit)
(Intercept)
Weight
Disp.
0.4789733 0.001241421 0.0008543589
> formula(fuel.fit)
100/Mileage ~ Weight + Disp.
* the formula() function extracts the model
formula from the lm object fuel.fit
The update() function is used to modify the model formula used in fuel.fit:
> fuel.fit2<-lm(update(fuel.fit, . ~ . + Type))
#Type is a factor with 6 levels (we need 5 degrees of freedom)
> is.factor(car.test.frame$Type)
[1] T
>
> levels(car.test.frame$Type)
[1] "Compact" "Large"
"Medium" "Small"
"Sporty" "Van"
>
> fuel.fit2
Call:
lm(formula = update(fuel.fit, . ~ . + Type))
Coefficients:
(Intercept)
Weight
Disp.
Type1
Type2
2.960617 4.16396e-05 0.007594073 -0.1452804 0.09808411
Type3
Type4
Type5
-0.1240942 -0.04358047 0.1915062
Degrees of freedom: 60 total; 52 residual
Residual standard error: 0.3139246
-----------------attach(Ornstein)
attach(Mroz)
attach(Moore)
attach(Prestige)
EXERCISE 3:
STATISTICAL MODELS IN R
# multiple regression (Prestige data)
# The 'Prestige' data frame has 102 rows and 6 columns. The
#
observations are occupations.
# This data frame contains the following columns:
# education Average education of occupational incumbents, in
1971.
# income Average income of incumbents, dollars, in 1971.
# women Percentage of incumbents who are women.
# prestige Pineo-Porter prestige score for occupation, from a social
#
survey conducted in the mid-1960s.
# census Canadian Census occupational code.
# type Type of occupation. A factor with levels (note: out of
#
order): 'bc', Blue Collar; 'prof', Professional, Managerial,
#
and Technical; 'wc', White Collar.
library(car)
data(Prestige)
names(Prestige)
attach(Prestige)
prestige.mod <- lm(prestige ~ income + education + women,Prestige)
summary(prestige.mod)
# dummy regression
type
# a factor
detach(Prestige)
Prestige.2 <- na.omit(Prestige) # filter out missing data
attach(Prestige.2)
type
length(type)
class(type)
unclass(type)
# generating contrasts from factors
options("contrasts")
contrasts(type)
contrasts(type) <- contr.treatment(levels(type), base=2)
# changing the baseline category
contrasts(type)
contrasts(type) <- 'contr.helmert'
contrasts(type)
contrasts(type) <- 'contr.sum'
# Helmert contrasts
# "deviation" contrasts
contrasts(type)
contrasts(type) <- NULL
# back to default
'cont.helmert' returns Helmert contrasts, which contrast the
second level with the first, the third with the average of the
first two, and so on. 'contr.poly' returns contrasts based on
orthogonal polynomials. 'contr.sum' uses "sum to zero contrasts".
'contr.treatment' contrasts each level with the baseline level
(specified by 'base'): the baseline level is omitted. Note that
this does not produce "contrasts" as defined in the standard
theory for linear models as they are not orthogonal to the
constant.
type.ord <- ordered(type, levels=c("bc", "wc", "prof")) # ordered factor
type.ord
round(contrasts(type.ord), 3)
# orthogonal polynomial contrasts
prestige.mod.1 <- lm(prestige ~ income + education + type)
summary(prestige.mod.1)
anova(prestige.mod.1)
# sequential ("type-I") sums of squares
prestige.mod.0 <- lm(prestige ~ income + education)
summary(prestige.mod.0)
anova(prestige.mod.0, prestige.mod.1)
# incremental F-test
Anova(prestige.mod.1)
# "type-II" sums of squares
prestige.mod.2 <- lm(prestige ~ income + education + type - 1)
# suppressing the constant
summary(prestige.mod.2)
Anova(prestige.mod.2)
# note: test for type is that all intercepts are 0 (not sensible)
prestige.mod.3 <- update(prestige.mod.1,
.~. + income:type + education:type)
summary(prestige.mod.3)
Anova(prestige.mod.3)
# adding interactions to the model
lm(prestige ~ income*type + education*type) # equivalent specification
lm(prestige ~ type + (income+education) %in% type,data=Prestige) # nesting
lm(prestige ~ type + (income+education) %in% type – 1,data=Prestige)
# separate intercepts and slopes
detach(Prestige.2)
# Anova Models
data(Moore)
attach(Moore)
Moore
The 'Moore' data frame has 45 rows and 4 columns. The data are for
subjects in a social-psychological experiment, who were faced with
manipulated disagreement from a partner of either of low or high
status. The subjects could either conform to the partner's
judgment or stick with their own judgment.
fcategory <- factor(fcategory, levels=c("low","medium","high"))
#reorder levels
fcategory
means <- tapply(conformity, list(fcategory, partner.status), mean)
means
# table of means
tapply(conformity, list(fcategory, partner.status), sd) # standard deviations
tapply(conformity, list(fcategory, partner.status), length) # counts
Fcat <- as.numeric(fcategory)
# plot the means
#USEFUL GRAPH
plot(c(0.5, 3.5), range(conformity), xlab="F category",
ylab="Conformity", type="n", axes=F)
axis(1, at=1:3, labels=c("low", "medium", "high"))
axis(2)
box()
points(jitter(Fcat[partner.status=="low"]),
conformity[partner.status=="low"], pch="L")
points(jitter(Fcat[partner.status=="high"]),
conformity[partner.status=="high"], pch="H")
lines(1:3, means[,1], lty=1, lwd=3, type="b", pch=19, cex=2)
lines(1:3, means[,2], lty=3, lwd=3, type="b", pch=1, cex=2)
legend(locator(1), c("high","low"), lty=c(1,3), lwd=c(3,3), pch=c(19,1))
#jitter: to add a little bit of noise
#lty: line type, 1 solid, 3 dotted.
#lwd: The line width.
#pch= Either an integer specifying a symbol or a single character
#
to be used as the default in plotting points.
#Type: the default plot type desired (“b” for overplotted lines)
identify(Fcat, conformity)
#press ESC to finish
# identify unusual points
options(contrasts=c("contr.sum", "contr.poly"))
# contr.sum = deviation contrasts
moore.mod <- lm(conformity ~ fcategory*partner.status)
summary(moore.mod)
Anova(moore.mod)
# type II sums of squares
Anova(moore.mod, type="III")
# type III sums of squares
# more on lm
args(lm)
# Subset: observation selection
lm(prestige ~ income + education, data=Prestige, subset=-c(6, 16))
lm(conformity ~ partner.status*fcategory, # specifying contrasts
contrasts=list(partner.status=contr.sum, fcategory=contr.poly))
lm(100*conformity/40 ~ partner.status*fcategory, data=Moore,subset=-c(6,16))
# data argument
# note computation of y
# Generalized Linear Models (GLM)
# binary logit model
data(Mroz)
attach(Mroz)
Mroz[sort(sample(753,10)),]
The 'Mroz' data frame has 753 rows and 8 columns. The
observations, from the Panel Study of Income Dynamics (PSID), are
married women.
attach(Mroz)
mod.mroz <- glm(lfp ~ k5 + k618 + age + wc + hc + lwg + inc,
data=Mroz,fam=binomial)
summary(mod.mroz)
anova(update(mod.mroz, . ~ . - k5), mod.mroz, test="Chisq")
# likelihood-ratio test
Anova(mod.mroz)
# analysis-of-deviance table
detach(Mroz)
# Poisson regression
data(Ornstein)
Ornstein[sort(sample(248,10)),]
attach(Ornstein)
tab <- table(interlocks)
tab
x <- sort(unique(interlocks))
par(mfrow=c(1,1))
plot(x, tab, type='h', xlab='Number of Interlocks',
ylab='Frequency')
points(x, tab, pch=16)
mod.ornstein <- glm(interlocks ~ assets + nation + sector, family=poisson)
summary(mod.ornstein)
Anova(mod.ornstein)
detach(Ornstein)
Assingment: Fit a linear model for the climate dataset,
Having as response rain, and as regressors, lon, lat, and elev.
Assingment: Fit a linear model to the Prestige dataset. Eliminate the
observations with missing values using na.omit.
Response: prestige, and predictors: income and education. Compare models with a
without the variable type using an incremental F-test (using anova). Is Type
categorical? Choose a contrast. Update the model to add second order
interactions with the variable type.
Refit but without using observations #6 and #16 (use subset command).
anova gives type I
Anova gives by default type II (but one can get type III)
Download