TABLE OF CONTENTS

advertisement
Lab #12
TESTS FOR RELATIONSHIPS:
MULTIPLE REGRESSION
A natural extension of bivariate regression is the step to multiple regression. In this
application, the word multiple means more than one independent variable or multi-factor.
As you are well aware of by now, most effects in the social world have more than one cause,
and most of the time those multiple causes are not additive, but instead they interact with one
another to produce an overall outcome. For instance, a country may have a very high per
capita gross domestic product, possibly due to a recently discovered natural resource (such
as oil), but if the dietary habits of its citizens are very poor, then the high per capita GDP
will not have a strong effect in reducing infant mortality rates. SPSS can easily handle
additional independent variables in the OLS regression test and the reporting is only slightly
adjusted to accommodate the additional variables.
As always, state your independent variables first. Let's expand a research question
and read the output for a model that includes multiple independent variables:
The Independent Variables are listed in series
"Is there a relationship between country’s population,
per capita gross domestic product, and daily calories per person, with the
infant mortality rate? "
The Dependent Variable is stated last
Go to the dialog
box for linear
regression as
you did before,
but add all of
the independent
variables into
the box:
-1-
One part of the output looks like this:
Model Summary
Model
1
R
.793a
Adjusted
R Square
.614
R Square
.630
Std. Error of
the Estimate
24.0680
a. Predictors: (Constant), Gross domes tic product /
capita, Population in thousands, Daily calorie intake
R square: R2 will sound much the same as it does in a bivariate regression except that you
will be reporting on the Adjusted R2 in a multiple regression model. The adjusted R2
considers the additional complexities of multiple independent variables in the research
model, so we'll refer to all of them in our report: "Population, per capita GDP, and daily
calorie intake per person together explain about 61% of the variance in infant mortality
rates of the world's nations." In a multiple regression you are stating how much of the
dependent variable is explained by all of the independent variables in the model.
Coefficientsa
Standardized
Unstandardized Coefficients
Model
1
B
(Constant)
Population in thousands
Gross domestic product /
Std. Error
166.171
18.323
2.669E-6
.000
-.001
-.041
Coefficients
Beta
t
Sig.
9.069
.000
.012
.166
.869
.001
-.242
-2.208
.030
.007
-.594
-5.429
.000
capita
Daily calorie intake
a. Dependent Variable: Infant mortality (deaths per 1000 live births)
SLOPE: We report on the slope for each independent variable separately, but we have to
add an important qualifier each time we state it. However, in this example we will also have
to unravel the scientific notation in this regression coefficient. Simply move the decimal
point the number of places noted after the “+” or “-” sign following the ‘E’ in the regression
coefficient (in the case of Population in thousands: -6).
2.669E-6 = .000002669
Also, this is where it is necessary to know your units of measurement. In this case,
population in thousands is measured in a thousand people.
The multiplication rule: It would be nearly impossible to state the slope for population the
way it is currently presented: For each additional thousand people in a nation, it is
predicted there would be a .0000027 increase in infant deaths per 1000 births. That is just
unmanageable. We can solve this problem by multiplying both variables by factors of 10
until the sentence has a more useful interpretation. The important rule to follow is:
Whatever amount one variable is multiplied, the other variable must also be multiplied
by that same amount. Another more general rule is to examine the range of values for the
independent variable and make a judgment about reasonable increases. So, we start
multiplying until we have numbers that make more sense:
Multiplier
x10
x100
x1000
x10,000
x100,000
I.V.
10,000 people
100,000
1,000,000
10,000,000
100,000,000
D.V.
.000027 infant deaths
.00027
.0027
.027
.27
If we were to look at the X axis of a histogram, we could see that increments of 10 million or
100,000 million for the population variable makes sense:
So, it would sound like this: "For each additional 10 million people in the population of a
nation, there is predicted to be a .027 increase in infant deaths per 1000 births (p= .869);” or,
"For each additional 100 million people in the population of a nation, there is predicted to be
a .27 increase in infant deaths per 1000 births (p= .869)”… but there’s more
-3-
There is an important addition we must include in our statement for a multiple
regression model:
It sounds like this: "For each additional 10 million people in the population of a nation, there
is predicted to be a .027 increase in infant deaths per 1000 births, holding constant for per
capita GDP and daily calories per person (p= .869)." The qualifier, "holding constant
for..." is the verbal way of accounting for the addition of extra independent variables in the
regression model.
Each independent variable is handled separately in the same fashion: In the next case, Gross
domestic product is measured in dollars. If we read the output as it is, we would state, "For
each additional dollar in a country's gross domestic product it is predicted there would be a
.001 decrease in infant deaths per 1000 births." Even though the relationship is statistically
significant, .001 deaths per 1000 births appears almost imperceptible. This is because the
independent variable, gross domestic product, is being measured in $1 increments, which is
also imperceptibly small. In this case (not otherwise shown in the output), gross domestic
product has a range of $122 (Ethiopia) to $23,474 (United States) with a mean of $5860.
Based on this range, it is reasonable to use $1000 increments to report the slope. So let's
restate our slope value in this more reasonable way: "For each additional $1000 in a
country's gross domestic product it is predicted there would be a decrease of 1.0 infant
deaths per 1000 births, holding constant for population and daily calorie intake (p= .03)."
We arrived at the number "1.0" by multiplying the regression coefficient by 1000 (1000 x
.001 = 1.0) the same amount we used to enhance the independent variable. This satisfies
another goal of demographers and others who use numbers like this: Use the multiplication
rule to move the decimal point until you get a whole number in one or both of the
reported variables. It isn’t always possible to get whole numbers, but it is a general goal to
approach.
And finally: "For each additional 100 calories in the daily diets of the citizens in a nation,
the number of infant deaths is predicted to decrease by 4.1 per 1000 births, holding constant
for per capita GDP and population (p< .001)." We multiplied calories and the slope for
infant mortalities by 100.
Another value of the multiple regression models is that we can determine which independent
variables are strongest and which are weakest in predicting the outcome. We do this by
referring to the Probability (Sig.) column. In this case we can judge that population has the
weakest predictive value, in fact the regression coefficient is not even statistically significant
(p= .869). We can further determine that daily calories per person has the strongest
predictive value because its p value is less than .001 (remember that there no probability that
equals zero). So the strength of an independent variable in a regression model can be
compared to any other independent variable by comparing the significance of their
regression coefficients.
Twelfth (and last) Lab Assignment
(worth 5 points)
Go to “PASW 17.0 for Windows” and open any of the data files you find interesting (and has
the appropriate variables to complete this assignment). Produce a table of “Descriptives” so
that you can study the data set for continuous variables. Be certain that you are clear on the
units of analysis and on your units of measurement. Use only continuous variables.
1. Produce 1 multiple regression using at least 3 independent variables.
2. Include the ‘Descriptives’ table; the ‘Model Summary’ table; and the ‘Coefficients’
table.
3. Report on the R2 for the whole model.
4. Report separately for each slope.
This assignment is on Tuesday, March 12th at 3 pm (on Turnitin.com). Penalty for late
assignments. No labs accepted after March 19th.
YOU MUST DO YOUR OWN WORK ON ALL LAB ASSIGNMENTS. You are
welcome to ask for assistance on assignments, and you may discuss the course material with
other students, however when you begin to follow the guidelines for the assignment you
must work alone and hand in a work product that you accomplished by yourself. Handing in
another person’s work will constitute cheating and could result in expulsion from the course.
-5-
Download