Stat401E Fall 2010 Lab 8 1. You wish to examine the relation between racial prejudice and annual church visits. In a pilot study you developed a prejudice measure that ranges from 0 = not prejudiced to 100 = extremely prejudiced. You sample 12 people at random from among the residents of Ames, Iowa, and your data on them are as follows: Church Visits (X): Racial Prejudice (Y): 11 36 46 33 3 6 16 42 41 49 21 51 23 61 10 23 34 57 48 18 28 65 55 3 a. Plot the data on graph paper (or use a spreadsheet program, such as Excel, to plot them for you). ˆ = â + bˆ X by calculating â and bˆ as discussed b. Fit the model Y 1 1 1 1 in class. c. Plot the resulting regression line (or draw the line on your plot). ˆ for each X and Y – Y ˆ for each Y . d. Calculate Y i i i i i n e. Verify that Yi – Yi ˆ = 0 . i=1 f. Partition the total sum of squares into two parts: that due to regression and that due to error. g. Repeat steps "a" through "f", this time using the following model: ˆ = â + bˆ X – X 2 Y 2 2 h. Express â 1 and bˆ 1 in words. i. Which model comes closest to describing the relation between prejudice and church visits? How can you tell? NOTE: You can save a lot of hand calculations by getting a computer to do the work for you. (Hint: The best--and strongly recommended--method for doing this problem is with a spreadsheet program such as Excel. If you choose this method, be sure to include a printout of your spreadsheet as part of your homework.) 1 2. Return to problem 2 on Lab 7 in which a t-test was performed for the difference between two means. Imagine that the grouping variable is coded as a dummy variable, D, (with 0 = no relatives killed and 1 = relatives killed) and that you are to perform the analysis as a regression. a. Calculate the constant and slope for the regression of Y=“the number of terrorist acts during 3 years” on this dummy variable. (Hints: Use the two group means to compute the overall mean, and use your knowledge about the dummy variable to obtain its mean and its sum of squares. Thinking about what your data matrix would look like should make it clear how you can get the value of the sum of Y*D.) b. Express the constant and slope (calculated in part a) in words. c. What proportion of the variance in the number of terrorist acts is explained by whether or not the terrorist had relatives killed in the massacre? (Hints: If done correctly, the regression equation in part a should show the OLS estimate for each level of the dummy variable to equal the mean value of the dependent variable among subjects within that level. Knowing this, you can use the two group variances in computing the residual, or unexplained, sum of squares (a.k.a. the error sum of squares). The explained, or regression, sum of squares can be calculated given your knowledge of the overall mean and the two OLS estimates--one for each level of the dummy variable. As always, the total sum of squares equals the sum of the unexplained plus explained sums of squares.) 3. You are interested in investigating whether more rapes occur in states in which a lot of pornography is read than in states in which little pornography is read. Although each of the 50 states in the U.S. has a different method of recording instances of rape, you identify 24 states that have similar methods. You decide to use data from these 24 states to generalize to the population of all 50 United States. You have obtained data on each state's annual sales of Playboy, Oui, Playgirl, and Penthouse from the publishers of each of these magazines. You enter your data on the following two variables into SPSS: PORNPT = RAPESPM = the number of copies of a pornographic magazine (from the above four) sold annually in a state per 1,000 population the number of rapes reported annually in a state per 1,000,000 population To analyze these data you use the following SPSS commands: 2 compute pornrape = pornpt * rapespm. frequencies general = pornpt,rapespm,pornrape / statistics = mean,variance. Parts of your output look as follows: Statistics N Mean Variance Valid PORNPT 24 24.90 13.70 RAPESPM 24 4.90 2.80 PORNRAPE 24 123.29 4856.79 a. How much of the variance in the number of rapes is explained by the number of pornographic magazines sold? b. Give a 95% confidence interval for the correlation between PORNPT and RAPESPM. c. Give the unstandardized regression equation appropriate to your research problem and say in words what each regression coefficient means. d. Give the standardized regression equation appropriate to your research problem and say in words what each regression coefficient means. e. Based on your findings, how many rapes would occur in Iowa--a state in which 30 copies of the above four pornographic magazines are sold annually per 1,000 population? f. In part c you were asked to find the unstandardized regression equation in which RAPESPM was regressed on PORNPT. Recalculate the equation twice more: First, find the unstandardized regression equation if you were to change RAPESPM to be a measure of “the number of rapes reported annually in a state per 50,000 population.” Second, find the unstandardized regression equation if you were to change PORNPT to be a measure of “the number of copies of a pornographic magazine sold annually in a state per 50,000 population.” (Hints: You only need to convert the slope and constant found in part c. Try drawing the regression line along two axes, and then note how the regression equation would change if the units along each axis were changed.) 3