Stat401E Fall 2010 Lab 9 1. Undoubtedly someone who wins a beauty pageant must be someone who smiles a lot. In fact, your research has found a correlation of .14 between the amount of time the 121 contestants in the last Miss Universe competition smiled and the "beauty scores" they were given by the judges. You have calculated the following statistics as well: standard mean deviation Smiling behavior (in smiles per minute) 10 4 Beauty score (from 0 to 100) 55 20 a. Given your findings, what beauty score were the judges giving to contestants who smiled 15 times per minute? b. Find a 95% confidence interval for the score found in part a. c. Using the .05 significance level, test the hypothesis that smiling enhanced contestants' beauty scores at the contest. d. If b = 1 , hypotheses: what would one's power be in a test of the following Ho: b = 0 HA: b > 0 2. Using the 1984 National Opinion Research Center data (i.e., GSS84 or lab5data), determine whether people are poorer if they get married at a young age. a. Using these data, have SPSS, R, or SAS calculate the regression equation that describes the number of dollars that respondents' incomes (rincome) increase for each additional year-of-age at which they were first married (agewed). State in words what the regression equation says about the relation between respondents' incomes and their age when first married. (I.e., state the meanings of â and bˆ in words.) b. Make a box plot for the relation estimated in part a. Draw the OLS regression line on this box plot. Do "rincome" and "agewed" meet the assumptions of linear regression? (Explain your answer.) c. Find means and standard deviations for "rincome" and "agewed". Also find the mean of "newvar", a variable that takes values equal 1 to the product of multiplying "rincome" by "agewed". Using these means and standard deviations (plus the sample size), calculate the following 15 numbers from your output: estimates, standard errors, and t-values for the unstandardized slope and constant, R-squared, F, plus degrees of freedom, sums of squares, and mean squares for regression and error. Also calculate the standardized slope. Explain how p-values associated with the 2 t-values and the F were arrived at. d. Imagine that you are 23 years old and that you are contemplating getting married before your next birthday. However, your lifestyle is so important to you that the wedding will only happen if you can be 95% confident that your annual salary will be at least $15,000 (in 1984 inflation-adjusted dollars). Based on your findings in part a, would you cancel the wedding? Some help: recode agewed(12 thru 16=17)(25=26)(27 thru 50=27). recode rincome (1=500)(2=2000)(3=3500)(4=4500)(5=5500)(6=6500) (7=7500)(8=9000)(9=12500)(10=17500)(11=22500)(12=35000)(13=99). select if ((rincome ne 99) and (agewed ne 99)). compute newvar = rincome * agewed. regression vars=agewed,rincome/dependent=rincome/enter. examine vars=rincome by agewed/plot=boxplot/statistics=none/nototal. frequencies vars=agewed,rincome/statistics=mean,stddev. frequencies vars=newvar/statistics=mean. NOTE: The "select if" statement in the program ensures that people with data missing on either "rincome" or "agewed" are eliminated from your calculations. Because in a regression SPSS excludes all cases for which data are missing on any variable in the regression model, the means and standard deviations (requested in part c) must be computed based on cases for which data exist on both variables. 3. Show algebraically that if Z X and Z Y are standardized, then ˆ = r Z and Z ˆ = r Z , where r Z Y X XY X XY Y XY is the correlation coefficient ˆ = r Z , start with the between X and Y. (Hints: To show that Z Y XY X formulas for â (Lecture Notes, p. 133) and for the relation between bˆ and r XY (p. 144). Substitute the formulas for â and bˆ into the ˆ = â + bˆ Z . Since for standardized variables the mean equation, Z Y X equals zero and the variance equals one , the equation will simplify to 2 ˆ = r Z . Repeat this for Z ˆ =r Z . Z Y X XY X XY Y is not as difficult as you might expect.) No, this problem Below please find R and SAS code for problem 2: # R # # # # Directions: Copy the below R code into the "R Editor" window (accessed by selecting "New script" under the "File" pull-down menu), swipe the code, and press F5. # Code: # read lab5data.txt into "gss" gss<-read.table('http://www.public.iastate.edu/~carlos/401/labs/lab5da ta.txt') # read gss into gssnew without missing data codes for rincome # (var2=13 [refused] or 99) and agewed (var8=99) gssnew<-gss[gss[,2]!=13 & gss[,2]!=99 & gss[,8]!=99,] # assign new values to rincome so that the data are in dollar units gssnew[gssnew[,2]==1,2]=500 gssnew[gssnew[,2]==2,2]=2000 gssnew[gssnew[,2]==3,2]=3500 gssnew[gssnew[,2]==4,2]=4500 gssnew[gssnew[,2]==5,2]=5500 gssnew[gssnew[,2]==6,2]=6500 gssnew[gssnew[,2]==7,2]=7500 gssnew[gssnew[,2]==8,2]=9000 gssnew[gssnew[,2]==9,2]=12500 gssnew[gssnew[,2]==10,2]=17500 gssnew[gssnew[,2]==11,2]=22500 gssnew[gssnew[,2]==12,2]=35000 # collapse values of agewed in gssnew gssnew[gssnew[,8]>=12 & gssnew[,8]<=16,8]=17 gssnew[gssnew[,8]==25,8]=26 gssnew[gssnew[,8]>=27 & gssnew[,8]<=50,8]=27 # Results # regress rincome on agewed reg1<-lm(gssnew[,2]~gssnew[,8]) 3 summary(reg1) # generate anova table anova(reg1) # find the standardized slope (the correlation coefficient when k=1) cor(gssnew[,2],gssnew[,8]) # obtain boxplot boxplot(gssnew[,2]~gssnew[,8]) # get means and standard deviations for rincome and agewed mean(gssnew[,2]) sd(gssnew[,2]) mean(gssnew[,8]) sd(gssnew[,8]) # generate newvar=rincome*agewed newvar<-gssnew[,2]*gssnew[,8] # get mean for newvar mean(newvar) length(newvar) * SAS * Directions: * Copy lab5data.txt into the C-drive's root (i.e., into "C:/"). * Copy the below SAS code into the "Editor" window, * and press the button with the figure of a little guy running. * Code: * read lab5data.txt into "gss"; data gss; infile 'C:\lab5data.txt'; input age rincome sex fear papres16 prestige educ agewed xnorcsiz; run; * copy "gss" into "gssnew" without missing data and with new values * assigned to new variables called income and agewed1; data gssnew; set gss; if (rincome=1) then income=500; if (rincome=2) then income=2000; if (rincome=3) then income=3500; if (rincome=4) then income=4500; if (rincome=5) then income=5500; if (rincome=6) then income=6500; if (rincome=7) then income=7500; 4 if (rincome=8) then income=9000; if (rincome=9) then income=12500; if (rincome=10) then income=17500; if (rincome=11) then income=22500; if (rincome=12) then income=35000; if (rincome=13) then delete; if (rincome=99) then delete; * collapse values of agewed yielding agewed1; if agewed>=12 and agewed<=16 then agewed1=17; else if agewed>=27 and agewed<=50 then agewed1=27; else if agewed=25 then agewed1=26; else if agewed=99 then delete; else agewed1=agewed; * create newvar; newvar=income*agewed1; run; * Results * regress rincome on agewed1; proc reg data=gssnew; model income=agewed1; run; * find the standardized slope (the correlation coefficient when k=1); proc corr data=gssnew; var income agewed1; run; * obtain boxplot; proc boxplot data=gssnew; plot income*agewed1/overlay=(mpred agewed1); run; * get means and standard deviations for rincome, agewed1, and newvar; proc means data=gssnew; var income agewed1 newvar; run; 5