CHAPTER SIXTEEN Regression NOTE TO INSTRUCTORS This

advertisement
CHAPTER SIXTEEN
Regression
NOTE TO INSTRUCTORS
This chapter includes a number of complex concepts
that may seem intimidating to students. Encourage
students to focus on the big picture through some
of
the
discussion
questions
and
classroom
activities. You can ease students’ concerns about
multiple regression by describing it as similar to
simple linear regression except that researchers
examine multiple variables rather than only one.
OUTLINE OF RESOURCES
III. Simple Linear Regression
 Discussion Question 16-1
 Discussion Question 16-2
 Classroom Activity 16-1:
153)
 Discussion Question 16-3
 Discussion Question 16-4
 Discussion Question 16-5
 Classroom Activity 16-2:
Line (p. 154)
(p. 152)
(p. 152)
Make It Your Own (p.
(p. 153)
(p. 153)
(p. 154)
Finding the Regression
III. Interpretation and Prediction
 Classroom Activity 16-3: Make It Your Own (p.
154)
 Discussion Question 16-6 (p. 155)
 Discussion Question 16-7 (p. 156)
III. Multiple Regression
 Discussion Question 16-8 (p. 156)
 Discussion Question 16-9 (p. 157)
 Additional Readings (p. 158)
 Online Resources (p. 158)
IV.
Next Steps: Structural Equation Modeling (SEM)
 Discussion Question 16-9 (p. 157)
Classroom Activity 16-4: Careers in Prediction
(p. 157)
 Classroom Activity 16-5: SEM in Context (p. 158)

IV.
Handouts
 Handout 16-1: Finding the Regression Line (p.
159)
 Handout 16-2: Careers in Prediction (p. 160)
 Handout 16-3: Examing SEM in Context (p. 161)
CHAPTER GUIDE
I.
Simple Linear Regression
1. Simple linear regression is a statistical tool
that enables us to predict an individual’s
score on the dependent variable from his or her
score on one independent variable.
2. Regression allows us to make quantitative
predictions
that
more
precisely
explain
relations among variables.
> Discussion Question 16-1
What is simple linear regression, and why is it useful?
Your students’ answers should include:

Simple linear regression is a tool that allows us
to make predictions.

Simple linear regression is useful as an
extension of correlation that allows us to
quantify the relationship among variables with
greater precision and accuracy.
3. Because simple linear regression helps us to
find the equation for a line, we must have data
that are linearly related to use it.
4. We can use z scores when making these
predictions. Specifically, the formula is zY^ =
(rXY)(zX). The first z score is for the
dependent variable and the second z score is
for the independent variable. The ^ symbol
signals that the z score is predicted rather
than being the actual score.
5. The tendency for scores that are particularly
high or low to drift toward the mean over time
is known as regression to the mean.
6. Usually, we want to predict a raw score from a
raw score. We will first need to convert a raw
score on one variable to a z score. We can then
predict a z score for the second variable.
Finally, we convert the z score from the second
variable to a raw score.
> Discussion Question 16-2
How would we predict a raw score from a raw score?
Your students’ answers should include:

In order to predict a raw score from a raw score,
we must first transform one raw score into a z
score. Then we multiply that z score by the
correlation coefficient to get the predicted z
score for the second variable. Finally, we
transpose that z score back into a raw score and
make our prediction.
Classroom Activity 16-1
Make It Your Own

Use your students’ weight and height as measures
for this exercise, or use height and age if you
think using weight would be a sensitive issue.

Have your students anonymously submit their
weight and height or height and age.

Load these data into SPSS and run the analysis as
a correlation and simple regression.
7. The intercept is the predicted value for Y
when X is equal to 0, which is the point at
which the line crosses or intercepts the yaxis.
8. The slope is the amount that Y is predicted to
increase for an increase of 1 in X.
> Discussion Question 16-3
What is the difference between the intercept and the slope?
Why do we calculate them in simple linear regression?
Your students’ answers should include:

The difference between the intercept and the
slope is that the intercept is the predicted
value for Y when X is equal to 0 and the slope is
the amount that Y is predicted to increase for an
increase of 1 unit in X.

We calculate them in simple linear regression
because it allows us to develop a raw-score
regression equation for predicting the raw score
for Y.
9. Both the intercept and the slope are needed to
^
calculate the equation of a line: Y = a +
b(X).
10. To calculate the intercept, we calculate the z
score for X when X = 0 by using the formula: zX
= (X – MX)/SDX. We then use the z-score
regression equation to calculate the predicted
z score for Y by using the formula: zY^ =
(rXY)(zX). We then convert this z score to the
^
predicted raw score for Y using the formula: Y
= zY^ (SDY) + MY.
11. To calculate the slope, we repeat the previous
steps that we used to calculate the intercept
but use an X of 1 rather than 0. We then
^
determine the change in Y as X increased from
0 to 1. It is important to include the
appropriate sign based on whether there is an
^
increase or decrease in Y .
> Discussion Question 16-4
How do we calculate the slope of the regression line? How
is it different from calculating the intercept?
Your students’ answers should include:

We calculate the slope of the regression line by
first calculating a z score for X when X = 1 by
using the formula: zX = (X – MX)/SDX. We then use
the z-score regression equation to calculate the
predicted z score for Y by using the formula: zY^
= (rXY)(zX). We then convert this z score to the
^
predicted raw score for Y using the formula: Y =
zY^ (SDY) + MY.

Calculating the slope of the regression line is
different from calculating the intercept of the
regression line because for calculating the slope
we substitute an X of 0 with an X of 1.
12. With both the intercept and slope calculated,
we can now use our formula to predict the raw
score for Y.
13. If we find at least three other predicted
values for Y, we can use these values to draw a
regression line. This is also known as the line
of best fit.
14. A negative slope means that the regression
line starts in the upper left of the graph and
ends in the lower right. A positive slope means
that the regression line starts in the lower
left of the graph and ends in the upper right.
> Discussion Question 16-5
How can you tell whether a slope is positive or negative?
Your students’ answers should include:

You can tell whether a slope is positive or
negative by first drawing a regression line
through the dots on a graph corresponding to
^
pairs of scores for X and Y ; A negative slope
means that the line looks like it’s going
downhill as we move from left to right, while a
positive slope means that the line looks like
it’s going uphill as we move from left to right.
Classroom Activity 16-2
Finding the Regression Line

Have students use data created from Classroom
Activity 16-3, “Creating Correlations,” from the
previous chapter.

Have students use the data collected to determine
the regression line.
Handout 16-1, found at the end of this chapter,
can be used to aid in this process.
15. The standardized regression coefficient (also
known as beta weight), a standardized version
of the slope in a regression equation, is the
predicted change in the dependent variable in
terms of standard deviations for a 1 standard
deviation increase in the independent variable.
16. The standardized regression coefficient is
symbolized by  and pronouned “beta” or called
beta weight. It is calculated using the formula
 = (b) (SSX/SSY ).
II.
Interpretation and Prediction
1. The number that best describes how far away,
on average, the data points are from the line
of best fit is called the standard error of the
estimate. In other words, it is a statistic
indicating the typical distance between a
regression line and the actual data points.
Classroom Activity 16-3
Make It Your Own
In this activity, use SAT scores and overall GPAs
to demonstrate simple regression.

Again, anonymously collect the data from the
students.


Have the students frame the research question for
a correlation for a simple regression.
After running the analysis, have the students
discuss the results. It is likely that your data
may suffer from a restricted range—but that is a
good point for class discussion because real data
are messy.
2. The proportionate reduction in error is a
statistic
that
quantifies
how
much
more
accurate our predictions are when we use the
regression line instead of the mean as a
prediction tool.
> Discussion Question 16-6
Why do you think that we would use the mean as a basis of
comparison with the regression line? Why would we use the
mean instead of some other number from our sample?
Your students’ answers should include:

We use the mean as a basis of comparison with the
regression
line
because,
with
limited
information, the mean is a fair predictor. By
using the mean, we can calculate the coefficient
of determination and measure how accurate our
predictions are in using the regression line
compared to the mean.

We use the mean instead of some other number from
a sample because the mean is involved in
calculating the regression equation and, as a
result, we can quantify the improvement in
prediction that results from using the regression
line over the mean.
3. If we were to subtract the mean score of the
sample from each person’s score, square that
value, and sum all of the values, we would
obtain the sum of squared errors, or the sum of
squares total (SStotal). This is the error that
results if we were to predict the mean as the
score for each person.
4. We want our regression equation to be a
substantial improvement over just using the
mean as our prediction.
5. To determine how much better our regression
equation predicts over the mean, we plug each X
value into the regression equation.
6. To find the sum of squared errors, or SSerror,
we subtract each predicted score from the mean,
square the errors, and sum them.
7. To find the amount of error we’ve reduced, we
subtract the sum of squared errors from the sum
of squares total. This number is divided by the
sum of squares total to obtain a proportion.
8. The
proportionate
reduction
in
error
is
symbolized as r2 and is calculated using the
formula: r2 = (SStotal – SSerror)/SStotal.
> Discussion Question 16-7
What is the difference between the SStotal and the SSerror?
What is the purpose of calculating them?
Your students’ answers should include:

The difference between the SStotal and the SSerror is
that
the
SStotal
represents
the
error
in
prediction from the mean compared to the SSerror,
which represents error from predicting Y with our
regression equation.

The purpose of calculating them is to quantify
the amount of error that we’ve reduced in using
the regression equation instead of the mean.
9. We could also calculate
correlation coefficient.
r2 by
squaring
the
III. Multiple Regression
1. An orthogonal variable is an independent
variable that makes a separate and distinct
contribution in the prediction of a dependent
variable, as compared to another independent
variable.
2. Multiple regression is a statistical technique
that includes two or more predictor variables
in a prediction equation.
3. Multiple regression is more widely used than
simple linear regression because most dependent
variables are best explained by using more than
one independent variable.
> Discussion Question 16-8
Why is multiple regression an improvement over simple
linear regression?
Your students’ answers should include:

Multiple regression is an improvement over simple
linear regression because it provides greater
prediction by incorporating two or more predictor
variables into the regression equation.
4. Compared
to
using
averages,
multiple
regression represents a significant advance in
our ability to predict human behavior.
5. When calculating the proportionate reduction
in error for multiple regression, its symbol is
R2 rather than r2 to indicate that the error is
based on more than one independent variable.
6. In stepwise multiple regression, computer
software
determines
the
order
in
which
independent variables are included in the
equation.
7. Stepwise multiple regression is frequently
used because it is the default in many software
programs and is useful in the absence of a
clear, predictive theory.
8. Another
approach
is
to
use
hierarchical
multiple regression whereby the researcher adds
independent variables into the equation in an
order determined by theory.
9. In
order
to
use
hierarchical
multiple
regression,
we
need
to
have
a
specific
predictive theory that we are testing.
> Discussion Question 16-9
What is the difference between stepwise and hierarchical
multiple regression? When would you want to use one
technique rather than the other?
Your students’ answers should include:

The difference between stepwise and hierarchical
multiple
regression
is
that,
in
stepwise
regression,
the
computer
software
program
determines the order of variable entry, while in
hierarchical regression analysis, the researcher
determines order of variable entry in light of
theory.

A stepwise regression can be used in the absence
of theory, such as in model building, while
hierarchical regression can be used to test a
specific theory.
Classroom Activity 16-4
Careers in Prediction
The chapter refers to many opportunities for using
prediction
within
certain
careers.
In
this
activity, students will expand on this topic using
Handout 16-2. The goal of this activity is for
students to observe the relevance and usefulness
of regression in their daily experience.
IV.
Next Steps: Structural Equation Modeling (SEM)
1. Structural equation modeling (SEM) is one of
several statistical techniques (and one of the
most sophisticated statistical approaches) that
quantifies
how
well
sample
data
fit
a
theoretical model that hypothesizes a set of
relations among multiple variables.
2. When using SEM, statisticians will refer to a
statistical (or theoretical) model, which is a
hypothesized
network
of
relations,
often
portrayed
graphically,
among
multiple
variables.
3. When creating a model that hypothesizes the
relation among factors being tested, we create
paths that describe the connection between two
variables in a statistical mode. We can conduct
a path analysis to examine a hypothesized model
by conducting a series of regression analyses
that quantify the paths at each succeeding step
in the model.
4. In SEM, we refer to variables that we observe
and are measured as manifest variables.
5. In contrast, latent variables are ideas that
we want to research but cannot directly
measure.
We will try to indirectly observe
such variables using appropriate measurement
tools.
6. When encountering a model such as SEM, it is
important to first figure out what variables
the researcher is studying. Next, look at the
numbers to see what variables are related and
the signs of the numbers to see the direction
of the relation.
Classroom Activity 16-5
SEM in Context
In this activity, students will try to understand
how path analysis is used in context. To do this,
students will download or be given copies of the
article: Kim, Y. M. & Neff, J. A. (2010). Direct
and indirect effects of parental influence upon
adolescent alcohol use: A structural equation
modeling analysis. Journal of Child & Adolescent
Substance Abuse, 19(3), 244–260.
Students will
use Handout 16-3 in their analysis of the article.
Additional Readings
Harrell,
F.
E.
(2001).
Regression
Modeling
Strategies. New York: Springer.
Beyond discussing regression, this book also
explores when and how to use this statistic. It is
geared toward graduate students and researchers.
Cohen, J., and Cohen, P. (2002). Applied Multiple
Regression/Correlation Analysis for the Behavioral
Sciences. Mahwah, NJ: Lawrence Erlbaum Publishers.
This book is data oriented and presents an
excellent
nonmathematical
approach
to
data
analysis. It aims toward at least a graduate level
course in statistics, but is also an invaluable
reference for those wanting more depth in this
area.
Online Resources
The following site provides you with simulations
or demonstrations for almost all topics found in
the textbook, as well as additional information
about
each
topic:
http://onlinestatbook.com/stat_sim/index.html. The
“regression by eye” is a good support for students
as they learn to visually grasp regression.
The
following
is
the
award-winning
Web
Experimental Psychology Lab site, home of the
“Magic”
experiment:
http://www.psychologie.uzh.
ch/sowi/Ulf/Lab/WebExpPsyLab.html.
There
are
a
number of fun experiments that your students can
explore, including ranking probability terms and
learning via tutorial dialogues.
PLEASE NOTE: Due to formatting, the Handouts are only available in Adobe
PDF®.
Download