Lab 8

advertisement
Stat401E
Fall 2010
Lab 8
1. You wish to examine the relation between racial prejudice
and annual church visits. In a pilot study you developed a
prejudice measure that ranges from 0 = not prejudiced to 100 =
extremely prejudiced. You sample 12 people at random from among the
residents of Ames, Iowa, and your data on them are as follows:
Church Visits (X):
Racial Prejudice (Y):
11
36
46
33
3
6
16
42
41
49
21
51
23
61
10
23
34
57
48
18
28
65
55
3
a. Plot the data on graph paper (or use a spreadsheet program, such
as Excel, to plot them for you).
ˆ = â + bˆ X by calculating â and bˆ as discussed
b. Fit the model Y
1
1
1
1
in class.
c. Plot the resulting regression line (or draw the line on your
plot).
ˆ for each X and Y – Y
ˆ for each Y .
d. Calculate Y
i
i
i
i
i
n
e. Verify that
  Yi – Yi 
ˆ
= 0 .
i=1
f. Partition the total sum of squares into two parts: that due to
regression and that due to error.
g. Repeat steps "a" through "f", this time using the following
model:
ˆ = â + bˆ  X – X  2
Y
2
2
h. Express â 1 and bˆ 1 in words.
i. Which model comes closest to describing the relation between
prejudice and church visits? How can you tell?
NOTE: You can save a lot of hand calculations by getting a computer to
do the work for you. (Hint: The best--and strongly recommended--method
for doing this problem is with a spreadsheet program such as Excel. If
you choose this method, be sure to include a printout of your
spreadsheet as part of your homework.)
1
2. Return to problem 2 on Lab 7 in which a t-test was
performed for the difference between two means. Imagine
that the grouping variable is coded as a dummy variable, D,
(with 0 = no relatives killed and 1 = relatives killed)
and that you are to perform the analysis as a regression.
a. Calculate the constant and slope for the regression of
Y=“the number of terrorist acts during 3 years” on this dummy
variable. (Hints: Use the two group means to compute the overall
mean, and use your knowledge about the dummy variable to obtain its
mean and its sum of squares. Thinking about what your data matrix
would look like should make it clear how you can get the value of
the sum of Y*D.)
b. Express the constant and slope (calculated in part a) in words.
c. What proportion of the variance in the number of terrorist acts
is explained by whether or not the terrorist had relatives killed in
the massacre? (Hints: If done correctly, the regression equation in
part a should show the OLS estimate for each level of the dummy
variable to equal the mean value of the dependent variable among
subjects within that level. Knowing this, you can use the two group
variances in computing the residual, or unexplained, sum of squares
(a.k.a. the error sum of squares). The explained, or regression,
sum of squares can be calculated given your knowledge of the overall
mean and the two OLS estimates--one for each level of the dummy
variable. As always, the total sum of squares equals the sum of the
unexplained plus explained sums of squares.)
3. You are interested in investigating whether more rapes occur in
states in which a lot of pornography is read than in states in which
little pornography is read. Although each of the 50 states in the U.S.
has a different method of recording instances of rape, you identify 24
states that have similar methods. You decide to use data from these 24
states to generalize to the population of all 50 United States. You
have obtained data on each state's annual sales of Playboy, Oui,
Playgirl, and Penthouse from the publishers of each of these magazines.
You enter your data on the following two variables into SPSS:
PORNPT
=
RAPESPM =
the number of copies of a pornographic magazine (from the
above four) sold annually in a state per 1,000 population
the number of rapes reported annually in a state per
1,000,000 population
To analyze these data you use the following SPSS commands:
2
compute pornrape = pornpt * rapespm.
frequencies general = pornpt,rapespm,pornrape
/ statistics = mean,variance.
Parts of your output look as follows:
Statistics
N
Mean
Variance
Valid
PORNPT
24
24.90
13.70
RAPESPM
24
4.90
2.80
PORNRAPE
24
123.29
4856.79
a. How much of the variance in the number of rapes is explained by
the number of pornographic magazines sold?
b. Give a 95% confidence interval for the correlation between PORNPT
and RAPESPM.
c. Give the unstandardized regression equation appropriate to your
research problem and say in words what each regression coefficient
means.
d. Give the standardized regression equation appropriate to your
research problem and say in words what each regression coefficient
means.
e. Based on your findings, how many rapes would occur in Iowa--a
state in which 30 copies of the above four pornographic magazines
are sold annually per 1,000 population?
f. In part c you were asked to find the unstandardized regression
equation in which RAPESPM was regressed on PORNPT. Recalculate the
equation twice more: First, find the unstandardized regression
equation if you were to change RAPESPM to be a measure of “the
number of rapes reported annually in a state per 50,000 population.”
Second, find the unstandardized regression equation if you were to
change PORNPT to be a measure of “the number of copies of a
pornographic magazine sold annually in a state per 50,000
population.” (Hints: You only need to convert the slope and constant
found in part c. Try drawing the regression line along two axes, and
then note how the regression equation would change if the units
along each axis were changed.)
3
Download