Lab 9

advertisement
Stat401E
Fall 2010
Lab 9
1. Undoubtedly someone who wins a beauty pageant must be
someone who smiles a lot. In fact, your research has found a
correlation of .14 between the amount of time the 121 contestants in
the last Miss Universe competition smiled and the "beauty scores" they
were given by the judges. You have calculated the following statistics
as well:
standard
mean
deviation
Smiling behavior (in smiles per minute)
10
4
Beauty score (from 0 to 100)
55
20
a. Given your findings, what beauty score were the judges giving to
contestants who smiled 15 times per minute?
b. Find a 95% confidence interval for the score found in part a.
c. Using the .05 significance level, test the hypothesis that
smiling enhanced contestants' beauty scores at the contest.
d. If b = 1 ,
hypotheses:
what would one's power be in a test of the following
Ho: b = 0
HA: b > 0
2. Using the 1984 National Opinion Research Center data (i.e., GSS84 or
lab5data), determine whether people are poorer if they get married at a
young age.
a. Using these data, have SPSS, R, or SAS calculate the regression
equation that describes the number of dollars that respondents'
incomes (rincome) increase for each additional year-of-age at which
they were first married (agewed). State in words what the
regression equation says about the relation between respondents'
incomes and their age when first married. (I.e., state the meanings
of â and bˆ in words.)
b. Make a box plot for the relation estimated in part a. Draw the
OLS regression line on this box plot. Do "rincome" and "agewed"
meet the assumptions of linear regression? (Explain your answer.)
c. Find means and standard deviations for "rincome" and "agewed".
Also find the mean of "newvar", a variable that takes values equal
1
to the product of multiplying "rincome" by "agewed".
Using these means and standard deviations (plus the
sample size), calculate the following 15 numbers from your
output: estimates, standard errors, and t-values for the
unstandardized slope and constant, R-squared, F, plus
degrees of freedom, sums of squares, and mean squares for
regression and error. Also calculate the standardized
slope. Explain how p-values associated with the 2 t-values
and the F were arrived at.
d. Imagine that you are 23 years old and that you are contemplating
getting married before your next birthday. However, your lifestyle
is so important to you that the wedding will only happen if you can
be 95% confident that your annual salary will be at least $15,000
(in 1984 inflation-adjusted dollars). Based on your findings in part
a, would you cancel the wedding?
Some help:
recode agewed(12 thru 16=17)(25=26)(27 thru 50=27).
recode rincome (1=500)(2=2000)(3=3500)(4=4500)(5=5500)(6=6500)
(7=7500)(8=9000)(9=12500)(10=17500)(11=22500)(12=35000)(13=99).
select if ((rincome ne 99) and (agewed ne 99)).
compute newvar = rincome * agewed.
regression vars=agewed,rincome/dependent=rincome/enter.
examine vars=rincome by agewed/plot=boxplot/statistics=none/nototal.
frequencies vars=agewed,rincome/statistics=mean,stddev.
frequencies vars=newvar/statistics=mean.
NOTE: The "select if" statement in the program ensures that people with
data missing on either "rincome" or "agewed" are eliminated from your
calculations. Because in a regression SPSS excludes all cases for
which data are missing on any variable in the regression model, the
means and standard deviations (requested in part c) must be computed
based on cases for which data exist on both variables.
3. Show algebraically that if Z X and Z Y are standardized, then
ˆ = r Z and Z
ˆ = r Z , where r
Z
Y
X
XY X
XY Y
XY is the correlation coefficient
ˆ = r Z , start with the
between X and Y. (Hints: To show that Z
Y
XY X
formulas for â (Lecture Notes, p. 133) and for the relation between bˆ
and r XY (p. 144). Substitute the formulas for â and bˆ into the
ˆ = â + bˆ Z . Since for standardized variables the mean
equation, Z
Y
X
equals zero and the variance equals one , the equation will simplify to
2
ˆ = r Z . Repeat this for Z
ˆ =r Z .
Z
Y
X
XY X
XY Y
is not as difficult as you might expect.)
No, this problem
Below please find R and SAS code for problem 2:
# R
#
#
#
#
Directions:
Copy the below R code into the "R Editor" window (accessed
by selecting "New script" under the "File" pull-down menu),
swipe the code, and press F5.
# Code:
# read lab5data.txt into "gss"
gss<-read.table('http://www.public.iastate.edu/~carlos/401/labs/lab5da
ta.txt')
# read gss into gssnew without missing data codes for rincome
#
(var2=13 [refused] or 99) and agewed (var8=99)
gssnew<-gss[gss[,2]!=13 & gss[,2]!=99 & gss[,8]!=99,]
# assign new values to rincome so that the data are in dollar units
gssnew[gssnew[,2]==1,2]=500
gssnew[gssnew[,2]==2,2]=2000
gssnew[gssnew[,2]==3,2]=3500
gssnew[gssnew[,2]==4,2]=4500
gssnew[gssnew[,2]==5,2]=5500
gssnew[gssnew[,2]==6,2]=6500
gssnew[gssnew[,2]==7,2]=7500
gssnew[gssnew[,2]==8,2]=9000
gssnew[gssnew[,2]==9,2]=12500
gssnew[gssnew[,2]==10,2]=17500
gssnew[gssnew[,2]==11,2]=22500
gssnew[gssnew[,2]==12,2]=35000
# collapse values of agewed in gssnew
gssnew[gssnew[,8]>=12 & gssnew[,8]<=16,8]=17
gssnew[gssnew[,8]==25,8]=26
gssnew[gssnew[,8]>=27 & gssnew[,8]<=50,8]=27
# Results
# regress rincome on agewed
reg1<-lm(gssnew[,2]~gssnew[,8])
3
summary(reg1)
# generate anova table
anova(reg1)
# find the standardized slope (the correlation coefficient
when k=1)
cor(gssnew[,2],gssnew[,8])
# obtain boxplot
boxplot(gssnew[,2]~gssnew[,8])
# get means and standard deviations for rincome and agewed
mean(gssnew[,2])
sd(gssnew[,2])
mean(gssnew[,8])
sd(gssnew[,8])
# generate newvar=rincome*agewed
newvar<-gssnew[,2]*gssnew[,8]
# get mean for newvar
mean(newvar)
length(newvar)
* SAS
* Directions:
* Copy lab5data.txt into the C-drive's root (i.e., into "C:/").
* Copy the below SAS code into the "Editor" window,
*
and press the button with the figure of a little guy running.
* Code:
* read lab5data.txt into "gss";
data gss;
infile 'C:\lab5data.txt';
input age rincome sex fear papres16 prestige educ agewed xnorcsiz;
run;
* copy "gss" into "gssnew" without missing data and with new values
*
assigned to new variables called income and agewed1;
data gssnew;
set gss;
if (rincome=1) then income=500;
if (rincome=2) then income=2000;
if (rincome=3) then income=3500;
if (rincome=4) then income=4500;
if (rincome=5) then income=5500;
if (rincome=6) then income=6500;
if (rincome=7) then income=7500;
4
if (rincome=8) then income=9000;
if (rincome=9) then income=12500;
if (rincome=10) then income=17500;
if (rincome=11) then income=22500;
if (rincome=12) then income=35000;
if (rincome=13) then delete;
if (rincome=99) then delete;
* collapse values of agewed yielding agewed1;
if agewed>=12 and agewed<=16 then agewed1=17;
else if agewed>=27 and agewed<=50 then agewed1=27;
else if agewed=25 then agewed1=26;
else if agewed=99 then delete;
else agewed1=agewed;
* create newvar;
newvar=income*agewed1;
run;
* Results
* regress rincome on agewed1;
proc reg data=gssnew;
model income=agewed1;
run;
* find the standardized slope (the correlation coefficient when k=1);
proc corr data=gssnew;
var income agewed1;
run;
* obtain boxplot;
proc boxplot data=gssnew;
plot income*agewed1/overlay=(mpred agewed1);
run;
* get means and standard deviations for rincome, agewed1, and newvar;
proc means data=gssnew;
var income agewed1 newvar;
run;
5
Download