5811 Lab 10

advertisement
Agenda
Soc 5811 Lab #10
11.14.05
I. Welcome
1. Proposals due tomorrow!
2. Review last lab.
2. Lab handouts, datasets, and other information can be found at:
http://www.tc.umn.edu/~long0324/
II. Objectives
1. Learn how to calculate correlation coefficients in SPSS.
2. Learn how to conduct bivariate regression analyses in SPSS.
3. Learn one way to construct an index for the final paper.
3. If time remains, we can address remaining questions about your proposals.
III. Review scatterplots.
1. Last week we looked at the various options for constructing simple scatterplots.
SPSS has three other options. Overlay scatterplots are for than on x-y pair, i.e.,
you can plot two y-variables with the same x-variable. Matrix scatterplots
represent multiple bivariate scatterplots in the same diagram with rows and
columns representing the number of variables chosen. In matrix scatterplots, all
possible combinations of variables are represented. 3-D scatterplots plot three
variables in three dimensions.
2. Construct an overlay scaterplot with GDP per capita (gdppou00), illiteracy
rates (illrat00), and foreign aid per capita (aidpc00). To construct an overlay
scatterplot, you must select your variables as x-y pairs. What can we interpret
from the scatterplot? When are overlay plots useful?
3. Construct a matrix scatterplot with the same variables. How do we interpret a
matrix scatterplot? What do the rows and columns mean?
4. Finally, construct a 3-D scatterplot with the same variables. Double-click on
the scatterplot in the Output window to open the Chart editor; here you can rotate
the dimensions of the graph.
5. In addition to adding a regression line, we can also add a Mean of Y Reference
Line. Construct a simple scatterplot with illiteracy (illrat00) and infant mortality
(morrai00). Add a regression line and mean of y reference line. In the
scatterplot, what is the explained variance? What is the error variance? How can
we use this information to calculate the R2?
IV. Review correlation.
1. The Pearson correlation coefficient is the same as the R in a bivariate
regression model, which is a measure of linear association. When squared, the
correlation coefficient becomes the coefficient of determination, R2.
2. We can calculate multiple correlation coefficients simultaneously in SPSS.
Calculate correlation coefficients for infant immortality (morrai00), illiteracy
(illrat00), and GDP per capita (gdppou00). Interpret the results.
3. We have not discussed how we can conduct hypothesis tests with correlation
coefficients, but it can be done. Any ideas? What would be the null hypothesis?
V. Bivariate regression.
1. Bivariate regression refers to the linear relationship of two variables, including
how well the two variables are associated and the slope of the line that best fits
the data. Before conducting a regression analysis, it is important to first look at
the scatterplot to see if a linear relationship exists, as well as clean up the data to
take care of missing values, etc.
2. Also, before running a regression analysis it is important to first check whether
or not the data meets the assumptions of regression. What are the assumptions of
bivariate regression? First, we must be using a random sample with a sufficient
N. Second, the relationship between the independent and dependent variable
must be linear. The third assumption is conditional normality, or a normal
distribution of Y for different values of X. Finally, the variance of Y must be
homoskedastic, or equal across values of X.
3. Construct a scatterplot with illiteracy and infant mortality. Check to make sure
the assumptions of regression are met. Is it a random sample? Is there a linear
relationship? Is Y normally distributed for every value of X? How can we
determine this? Are the variances of the error equal for every value of X? How
can we check this?
4. Conduct a linear regression analysis with the same two variables, and interpret
the output.
a. First, how well are the two variables correlated? What statistic tells us
this?
b. What is the R2? Is the line a good fit for the data? This depends…
c. The ANOVA analysis refers to the partitioned error and regression
variances. Recall that the R2 is calculated by dividing the regression
variance by the total variance around Y-bar. What statistic in the ANOVA
output can we use to calculate R2?
d. The most important statistics in the output are the coefficients. What is
the coefficient for illiteracy? How can we interpret this? How do we
interpret the constant? How do we interpret the significance levels of each
(i.e., how do we do hypothesis tests for the slope? Although we haven’t
discussed it in lecture yet, what do you think the standard error for the
coefficient is?
5. For now, it is important that you understand the meaning of R2 and the slope
and constant estimates. We will discuss what standardized coefficients are later.
VI. One way to construct an index…
1. It is important that you have interval variables for a regression analysis,
although ordinal variables with many categories can also be used. Unfortunately,
many surveys do not have ordinal variables with many categories. So, we have to
construct an index.
2. If multiple variables measure the general phenomena you are interested in, and
they are all measured on the same scale, the variables can simply be added to
construct one variable.
3. Open the 2002 GSS subset, which includes three 3-category variables
(conlegis, conjudge, and confed) measuring confidence in different branches of
the government. Check the frequencies of each to check the scales and missing
values. Are they each coded on the same scale? What do you notice about the
scale (i.e., what do big values mean)? Recode the variables so that higher values
represent more confidence in the government.
4. To add the three variables to construct one 9-category variable, simply
construct a new variable in the Compute window. The Compute command allows
you to use mathematical functions with variables to construct new ones.
Construct a new variable, congov, by adding the three variables together.
5. Check the frequency of your new variable. Is it nine categories?
VI.I. Any questions about the proposals?
Hint: See p. 256 for the assumptions of multiple regression (BLUE).
SPSS INSTRUCTIONS
I. Scatterplots
1. Click on Graphs, Scatter.
2. Choose a Simple, Matrix, Overlay, or 3-D scatterplot. For today, we will
only be looking at simple scatterplots.
3. Place your independent variable into the x-axis box and your dependent
variable into the y-axis box.
4. If your cases have labels (such as country names), put the label variable into
the Label cases by box.
5. To add a title to your scatterplot, click on Title.
6. Double-click on the scatterplot in the Output window to open the Chart
Editor.
II. Correlation
1. Click on Analyze, Correlate, Bivariate.
2. Place the variables into the box.
3. Check the Pearson correlation coefficient box.
4. Paste and Run.
III. Bivariate Regression
1. Click on Analyze, Regression, Linear.
2. Place the dependent and independent variables into the appropriate boxes.
3. As a default, SPSS provides the model summary statistics, ANOVA statistics,
and coefficients. For information on additional options, consult the Norusis text,
pp. 451-461. We will explore some of these options later in the semester.
4. To test for conditional normality, select a value, or range of values, for x, and
check the resulting histogram for the y variable. The assumption is met if the
distribution of the y variable appears to be normal.
5. To check for homoskedasticity, check the bivariate scatterplot of the x and y
variables.
IV. Computing new variables
1. Click on Transform, Compute.
2. Type the name of the new variable being created in the Target Variable box.
3. Drag variables from the left into the computation window, and use
mathematical symbols or the embedded functions to construct the equation for
the new variable.
4. If the computation only applies to certain cases, use the If… option to set up
selection criteria.
5. Paste, and Run.
Download