Stat404 Fall 2009 Lab 8 This lab will make use of output from the following program: data list list file='c:time.txt'/ length content year. compute d1=content. recode d1(1=1)(2,3,4=0). compute d2=content. recode d2(2=1)(1,3,4=0). compute d3=content. recode d3(3=1)(1,2,4=0). compute e1=content. recode e1(1=1)(2,3=0)(4=-19). compute e2=content. recode e2(2=1)(1,3=0)(4=-11). compute e3=content. recode e3(3=1)(1,2=0)(4=-6). frequencies vars=content. frequencies vars=d1,d2,d3 / statistics=mean. regression vars=length,d1,d2,d3 / dep=length / enter. frequencies vars=e1,e2,e3 / statistics=mean. regression vars=length,e1,e2,e3 / dep=length / enter. 1. In the file, 'c:time.txt', three variables: LENGTH CONTENT YEAR you will find data on the following = the length (in number of paragraphs) of an article in Time magazine = the primary type of content of an article in Time magazine (values are 1 = political, 2 = economic, 3 = cultural, 4 = other) = the year in which an article in Time magazine was written (values range from 1976 to 1995) a. Create three dummy variables for CONTENT. You may wish to name these D1, D2, and D3. Regress LENGTH on D1, D2, and D3. b. Transferring numbers from your output, write down the unstandardized regression equation. c. Based on this regression equation interpret the constant and three slopes in words. (Hint: With dummy variables this 1 interpretation should NOT make use of the "adjust" terminology that we have used thus far.) d. Verify that the dummy variables are NOT contrasts. 2. Reanalyze the data in 'c:time.txt' as follows: a. Create three weighted effect variables for CONTENT. You may wish to name these E1, E2, and E3. Regress LENGTH on E1, E2, and E3. b. Transferring numbers from your output, write down the unstandardized regression equation. c. Based on this regression equation interpret the constant and three slopes in words. (Hint: As with dummy variables you should NOT make use of "adjust" terminology in this interpretation.) d. Verify that the weighted effect variables ARE contrasts. e. In the program the constants -19, -11, and -6 are used in creating the effect variables. Show how these constants were calculated. f. On the computer output slopes for the effect variables represent the "effects" of three of the four types of article content. What is the effect of the fourth type of article content? Also demonstrate that the four effects sum to zero as indicated in equation 1. Below please find R and SAS code for these problems: # R # Code: time <- read.table("C:/time.txt") colnames(time) <- c("length","content","year") time$d1 <- 0 time$d2 <- 0 time$d3 <- 0 time$e1 <- 0 time$e2 <- 0 time$e3 <- 0 #Recode d1, d2, d3, e1, e2, and e3# time[time$content == 1, 4] <1 2 time[time$content == 2, 5] <1 time[time$content == 3, 6] <1 time[time$content == 1, 7] <1 time[time$content == 4, 7] <- -19 time[time$content == 2, 8] <1 time[time$content == 4, 8] <- -11 time[time$content == 3, 9] <1 time[time$content == 4, 9] <- -6 #Freqs and means# attach(time) table(content) table(d1) table(d2) table(d3) summary(time[,4:6]) table(e1) table(e2) table(e3) summary(time[,7:9]) #Regression# reg1 <- lm(length ~ d1+d2+d3, data=time) summary(reg1) reg2 <- lm(length ~ e1+e2+e3, data=time) summary(reg2) * SAS * Code; DATA time; INFILE 'C:\time.txt'; INPUT length content year; d1 = 0; d2 = 0; d3 = 0; e1 = 0; e2 = 0; e3 = 0; IF content = 1 THEN d1 = 1; IF content = 2 THEN d2 = 1; IF content = 3 THEN d3 = 1; IF content = 1 THEN e1 = 1; IF content = 4 THEN e1 = -19; 3 IF content = 2 THEN e2 = 1; IF content = 4 THEN e2 = -11; IF content = 3 THEN e3 = 1; IF content = 4 THEN e3 = -6; RUN; PROC FREQ; TABLES content d1 d2 d3 e1 e2 e3; RUN; PROC MEANS; VAR d1 d2 d3 e1 e2 e3; RUN; PROC REG; MODEL length = d1 d2 d3; RUN; PROC REG; MODEL length = e1 e2 e3; RUN; 4