To access R in Unix: In the computer-lab click on the Putty window and login in Unity. Then click on an ICON # named XWin32 (A blue X icon) % add R % R EXERCISES: 1: HOW TO CONVERT A SPLUS OBJECT TO R 2: INTRO TO LM (LINEAR MODELS) 3: STATISTICAL MODELS: ANOVA, GLM, Poisson regression #We will use the CAR library: Companion to Applied Regression # for more information: library(car) library(help=car) EXERCISE 1: How to convert a Splus object to R object? For example, the data object "car.test.frame" will be converted from a Splus object to R object. In Splus6: % add Splus % Splus6 (license has expired) > car.test.frame # this is a file available in Splus6 > dump("car.test.frame") # car.test.frame.q will be generated in the current working directory. Open R in the current working directory > source("car.test.frame.q") > car.test.frame # This would be exactly same as "car.test.frame" in Splus -----------------------EXERCISE 2: INTRO TO LM Linear Models The function lm() returns an R object called a fitted linear least-squares model object, or lm object for short: lm(formula, data) where formula is the structural formula that specifies the model # and data is the data frame containing the data (omitted if the data is not stored in a data frame). library(rpart) >data(car.test.frame) # get the data file The 'car.test.frame' data frame has 60 rows and 8 columns, giving data on makes of cars taken from the April, 1990 issue of _Consumer Reports_. > fuel.fit<-lm(100/Mileage ~ Weight + Disp., car.test.frame) > fuel.fit Call: lm(formula=100/Mileage~Weight + Disp.,data=car.test.frame) Coefficients: (Intercept) Weight Disp. 0.4789733 0.001241421 0.0008543589 Degrees of freedom: 60 total; 57 residual Residual standard error: 0.3900812 A list of the components of an lm object and of the arguments which can be passed through the lm() function can be found at the end of this section. The lm object fuel.fit can be used as an argument to other functions to obtain summaries of the data or to extract specific information about the model. The function summary() gives a more detailed description of the model than the one obtained by printing out the lm object. The components of an lm.summary object are listed at the end of this section. > fuel.summary<-summary(fuel.fit) > fuel.summary Call: lm(formula=100/Mileage~Weight + Disp.,data = car.test.frame) Residuals: Min 1Q Median 3Q Max -0.8109 -0.2559 0.01971 0.2673 0.9812 Coefficients: Value Std. Error (Intercept) 0.4790 0.3418 Weight 0.0012 0.0002 Disp. 0.0009 0.0016 t value 1.4014 7.2195 0.5427 Pr(>|t|) 0.1665 0.0000 0.5895 Residual standard error: 0.3901 on 57 degrees of freedom Multiple R-Squared: 0.7438 F-statistic: 82.75 on 2 and 57 degrees of freedom, the p-value is 0 Correlation of Coefficients: (Intercept) Weight Weight -0.8968 Disp. 0.4720 -0.8033 Specific information about the model can be printed using the functions coef(), resid() and fitted(). The same information can be obtained using fuel.fit$coef, fuel.fit$resid, and fuel.fit$fitted. > coef(fuel.fit) (Intercept) Weight Disp. 0.4789733 0.001241421 0.0008543589 > formula(fuel.fit) 100/Mileage ~ Weight + Disp. * the formula() function extracts the model formula from the lm object fuel.fit The update() function is used to modify the model formula used in fuel.fit: > fuel.fit2<-lm(update(fuel.fit, . ~ . + Type)) #Type is a factor with 6 levels (we need 5 degrees of freedom) > is.factor(car.test.frame$Type) [1] T > > levels(car.test.frame$Type) [1] "Compact" "Large" "Medium" "Small" "Sporty" "Van" > > fuel.fit2 Call: lm(formula = update(fuel.fit, . ~ . + Type)) Coefficients: (Intercept) Weight Disp. Type1 Type2 2.960617 4.16396e-05 0.007594073 -0.1452804 0.09808411 Type3 Type4 Type5 -0.1240942 -0.04358047 0.1915062 Degrees of freedom: 60 total; 52 residual Residual standard error: 0.3139246 -----------------attach(Ornstein) attach(Mroz) attach(Moore) attach(Prestige) EXERCISE 3: STATISTICAL MODELS IN R # multiple regression (Prestige data) # The 'Prestige' data frame has 102 rows and 6 columns. The # observations are occupations. # This data frame contains the following columns: # education Average education of occupational incumbents, in 1971. # income Average income of incumbents, dollars, in 1971. # women Percentage of incumbents who are women. # prestige Pineo-Porter prestige score for occupation, from a social # survey conducted in the mid-1960s. # census Canadian Census occupational code. # type Type of occupation. A factor with levels (note: out of # order): 'bc', Blue Collar; 'prof', Professional, Managerial, # and Technical; 'wc', White Collar. library(car) data(Prestige) names(Prestige) attach(Prestige) prestige.mod <- lm(prestige ~ income + education + women,Prestige) summary(prestige.mod) # dummy regression type # a factor detach(Prestige) Prestige.2 <- na.omit(Prestige) # filter out missing data attach(Prestige.2) type length(type) class(type) unclass(type) # generating contrasts from factors options("contrasts") contrasts(type) contrasts(type) <- contr.treatment(levels(type), base=2) # changing the baseline category contrasts(type) contrasts(type) <- 'contr.helmert' contrasts(type) contrasts(type) <- 'contr.sum' # Helmert contrasts # "deviation" contrasts contrasts(type) contrasts(type) <- NULL # back to default 'cont.helmert' returns Helmert contrasts, which contrast the second level with the first, the third with the average of the first two, and so on. 'contr.poly' returns contrasts based on orthogonal polynomials. 'contr.sum' uses "sum to zero contrasts". 'contr.treatment' contrasts each level with the baseline level (specified by 'base'): the baseline level is omitted. Note that this does not produce "contrasts" as defined in the standard theory for linear models as they are not orthogonal to the constant. type.ord <- ordered(type, levels=c("bc", "wc", "prof")) # ordered factor type.ord round(contrasts(type.ord), 3) # orthogonal polynomial contrasts prestige.mod.1 <- lm(prestige ~ income + education + type) summary(prestige.mod.1) anova(prestige.mod.1) # sequential ("type-I") sums of squares prestige.mod.0 <- lm(prestige ~ income + education) summary(prestige.mod.0) anova(prestige.mod.0, prestige.mod.1) # incremental F-test Anova(prestige.mod.1) # "type-II" sums of squares prestige.mod.2 <- lm(prestige ~ income + education + type - 1) # suppressing the constant summary(prestige.mod.2) Anova(prestige.mod.2) # note: test for type is that all intercepts are 0 (not sensible) prestige.mod.3 <- update(prestige.mod.1, .~. + income:type + education:type) summary(prestige.mod.3) Anova(prestige.mod.3) # adding interactions to the model lm(prestige ~ income*type + education*type) # equivalent specification lm(prestige ~ type + (income+education) %in% type,data=Prestige) # nesting lm(prestige ~ type + (income+education) %in% type – 1,data=Prestige) # separate intercepts and slopes detach(Prestige.2) # Anova Models data(Moore) attach(Moore) Moore The 'Moore' data frame has 45 rows and 4 columns. The data are for subjects in a social-psychological experiment, who were faced with manipulated disagreement from a partner of either of low or high status. The subjects could either conform to the partner's judgment or stick with their own judgment. fcategory <- factor(fcategory, levels=c("low","medium","high")) #reorder levels fcategory means <- tapply(conformity, list(fcategory, partner.status), mean) means # table of means tapply(conformity, list(fcategory, partner.status), sd) # standard deviations tapply(conformity, list(fcategory, partner.status), length) # counts Fcat <- as.numeric(fcategory) # plot the means #USEFUL GRAPH plot(c(0.5, 3.5), range(conformity), xlab="F category", ylab="Conformity", type="n", axes=F) axis(1, at=1:3, labels=c("low", "medium", "high")) axis(2) box() points(jitter(Fcat[partner.status=="low"]), conformity[partner.status=="low"], pch="L") points(jitter(Fcat[partner.status=="high"]), conformity[partner.status=="high"], pch="H") lines(1:3, means[,1], lty=1, lwd=3, type="b", pch=19, cex=2) lines(1:3, means[,2], lty=3, lwd=3, type="b", pch=1, cex=2) legend(locator(1), c("high","low"), lty=c(1,3), lwd=c(3,3), pch=c(19,1)) #jitter: to add a little bit of noise #lty: line type, 1 solid, 3 dotted. #lwd: The line width. #pch= Either an integer specifying a symbol or a single character # to be used as the default in plotting points. #Type: the default plot type desired (“b” for overplotted lines) identify(Fcat, conformity) #press ESC to finish # identify unusual points options(contrasts=c("contr.sum", "contr.poly")) # contr.sum = deviation contrasts moore.mod <- lm(conformity ~ fcategory*partner.status) summary(moore.mod) Anova(moore.mod) # type II sums of squares Anova(moore.mod, type="III") # type III sums of squares # more on lm args(lm) # Subset: observation selection lm(prestige ~ income + education, data=Prestige, subset=-c(6, 16)) lm(conformity ~ partner.status*fcategory, # specifying contrasts contrasts=list(partner.status=contr.sum, fcategory=contr.poly)) lm(100*conformity/40 ~ partner.status*fcategory, data=Moore,subset=-c(6,16)) # data argument # note computation of y # Generalized Linear Models (GLM) # binary logit model data(Mroz) attach(Mroz) Mroz[sort(sample(753,10)),] The 'Mroz' data frame has 753 rows and 8 columns. The observations, from the Panel Study of Income Dynamics (PSID), are married women. attach(Mroz) mod.mroz <- glm(lfp ~ k5 + k618 + age + wc + hc + lwg + inc, data=Mroz,fam=binomial) summary(mod.mroz) anova(update(mod.mroz, . ~ . - k5), mod.mroz, test="Chisq") # likelihood-ratio test Anova(mod.mroz) # analysis-of-deviance table detach(Mroz) # Poisson regression data(Ornstein) Ornstein[sort(sample(248,10)),] attach(Ornstein) tab <- table(interlocks) tab x <- sort(unique(interlocks)) par(mfrow=c(1,1)) plot(x, tab, type='h', xlab='Number of Interlocks', ylab='Frequency') points(x, tab, pch=16) mod.ornstein <- glm(interlocks ~ assets + nation + sector, family=poisson) summary(mod.ornstein) Anova(mod.ornstein) detach(Ornstein) Assingment: Fit a linear model for the climate dataset, Having as response rain, and as regressors, lon, lat, and elev. Assingment: Fit a linear model to the Prestige dataset. Eliminate the observations with missing values using na.omit. Response: prestige, and predictors: income and education. Compare models with a without the variable type using an incremental F-test (using anova). Is Type categorical? Choose a contrast. Update the model to add second order interactions with the variable type. Refit but without using observations #6 and #16 (use subset command). anova gives type I Anova gives by default type II (but one can get type III)