Linear Regression - Lancaster University

advertisement
Introduction to SPSS 6
Linear Regression
Version: 1.0 September 2015
Table of Contents
1 Introduction .................................................................................................................................. 1
2 Simple Linear Regression Example ....................................................................................... 1
3 Simple Linear Regression on SPSS ......................................................................................... 1
3.1 Scatter Plots ..................................................................................................................... 1
3.2 Adding a regression line to your scatterplot ................................................................... 4
3.3 Regression Analysis .......................................................................................................... 6
4 Multiple Linear Regreassion.................................................................................................... 8
4.1 Assumptions ..................................................................................................................... 8
4.2 Multiple Linear Regression Model ................................................................................... 8
4.3 Multiple Linear Regression on SPSS ................................................................................. 9
1 Introduction
Linear regression is used to examine the relationship between a dependent variable and an
independent or predictor variable. Linear regression enables you to find the equation by
which you can best predict scores on the dependent variable from scores on the predictor
variable.
2 Simple Linear Regression Example
Linear regression is defined as (from Wikipedia): “an approach for modelling the
relationship between a scalar dependent variable y and one or more explanatory variables
denoted X. The case of one explanatory variable is called simple linear regression.”
A company wants to investigate the relationship between monthly income and age, and has
invited 30 participates, the data has saved in file ‘r_1.sav’.
Practice files that accompany the guide are available on the ISS website at:
www.lancaster.ac.uk/iss/info/IThandouts/spss/SPSSfiles.zip - you can download them and
extract them to your computer.
3 Simple Linear Regression on SPSS
3.1 Scatter Plots
It is always useful to examine the data visually, and you will be able to notice some obvious
relationships between variables before you build your math models.
To produce a scatterplot of only two variables with the dependent variable on the Y-axis
(verticle axis) and the independent variable on the X-axis (horizontal axis), do the following:
1. Open
‘r_1’
from
your
workspace
–
can
be
accessed
from
www.lancaster.ac.uk/iss/info/IThandouts/spss/SPSSfiles.zip
2. From top menu, select ‘Graphs’, ‘Legacy Dialogs’ and then select ‘Scatter Dots’
Page 1
3. From the dialog box, select ‘Simple Scatter’ and click ‘Define’:
4. Select the variable you want to plot on the Y axis (dependent variable) by clicking on
the variable and click the first right arrow to put it into ‘Y-axis’ on the right.
5. Select the variable you want to plot on the X axis (independent variable) by clicking
on the variable and then click the second right arrow to put it into ‘X-axis’.
(If you want to graph onto a z-axis as well, change the "2-D Coordinate" setting to
"3-D Coordinate". Then, drag the variable you want to plot on the z-axis into the
white box inside of the diagonal line.)
6. Click ‘OK’
7. Then scatter plot will appear in the Output window.
Page 2
Page 3
3.2 Adding a regression line to your scatterplot
1. Double click on your scatterplot and maximize the window.
2. Right click on the scatter plot, and you will see a dropdown menu. Click on ‘Add fit
line at total’
Page 4
3. Then a regression line will appear on the scatter plot. At the same time. A Properties
window will appear, you can edit the colours of the regression line and scatter dots
under this window.
Page 5
3.3 Regression Analysis
1. From ‘Analyze’, select ‘Regression’ and then ‘Linear’
You will be presented with a dialog box below listing the variables in your data set in a
column on the left.
2. Select the dependent variable by clicking on it, and move it to the box labelled
‘Dependent’ by clicking the right arrow. Select the independent or predictor
variable you want to use to predict the dependent variables by clicking on it, and
move it to the box labelled ‘Independent’ by clicking the right arrow. Then click
‘OK’
Page 6
3. The regression result will be shown on ‘Statistics Viewer’ window, and we will
discuss the output in the next section.
Page 7
4 Multiple Linear Regreassion
The previous section introduced the simplest case of linear regression, where there are only
two variables in the data set. In this section, we will extend our analysis to data set that
contains more than two variables: multiple linear regression.
Multiple linear regression is probably one of the most popular statistical analysis. Multiple
linear regression is used to determine importance of variables, and to understand how two
or more variables are related in the context of a model.
4.1 Assumptions
Regression model works best with proper assumptions:
1. Multiple regression works best under the condition of proper model specification;
essentially, you should have all the important variables in the model and no unimportant variables in the model. Literature reviews on the theory and variables of
interest pay big dividends when conducting regression.
2. Regression works best when there is a lack of multicollinearity (variables are too
strongly related, which degrades regression's ability to discern which variables are
important to the model).
3. Regression is designed to work best with linear relationships.
4. Regression is designed to work with continuous or nearly continuous data.
5. Categorical predictors need to be coded using special strategies in order to be
included into a regression model and produce meaningful interpretive output.
6. Regression works best when outliers are not present. Thorough initial data analysis
should be used to review the data, identify outliers (both univariate and
multivariate), and take appropriate action.
4.2 Multiple Linear Regression Model
A company wants to investigate the productivity of sales manager in the corporation. 70
sales managers were evaluated and the evaluation includes 6 variables that quantify
different aspects of managerial performance. The manager wants you to use this
information stored in ‘multiple.sav’ to develop a model that predicts a manager’s evaluation.
Page 8
4.3 Multiple Linear Regression on SPSS
1. Open ‘multiple.sav’ from your workspace – if needed, extract from the ISS website
at www.lancaster.ac.uk/iss/info/IThandouts/spss/SPSSfiles.zip
2. From the top menu, select ‘Analyze’, ‘Regression’ and then ‘Linear…’
3. From the ‘Linear Regression’ Dialogue box, highlight your independent variable, and
use the top arrow button to move it to the ‘Dependent’ box.
Page 9
4. Then highlight all the related variables and use the second arrow to move them into
the ‘Independent’ box.
5. Then from the ‘Linear Regression’ dialogue box, click ‘Statistics’ button on the right.
Select ‘Confidence Interval’, ‘Covariance matrix’, ‘Descriptives’ and ‘Part and partial
correlations’. Then click on ‘Continue’ button on the bottom to exit this dialogue
box.
Page 10
6. In order to understand the performance of this model, we can ask SPSS to draw
some diagrams. From the ‘Linear Regression’ dialogue box, click ‘Plots’ button on
the right, and highlight ‘ZRESD’ to put it into ‘Y’, and highlight ‘DEPENDENT’ to put it
into ‘X’.
7. Click on the ‘Next’ button, then select ‘Histogram’ and ‘Normal probability plot’.
Then click ‘Continue’ button.
8. In the ‘Linear Regression’ dialogue box, click ‘Save’ button. Select the results you
would like to save on the ‘Linear Regression: Save’ dialogue box. For example, for
predicted value, we select ‘Unstandardized’ and ‘Standardized’, and for ‘Residuals’ ,
we select ‘Standardized’
Page 11
In the ‘Distance’ section, we can select ‘Mahalanobis’ for checking multivariate
outliers.
In the ‘Predication Interval’ part, you can input the confidence interval you would
need for your analysis.
Then click ‘Continue’.
Page 12
9. Click ‘OK’. From ‘Statistics Viewer’ window, you should get some similar results that
displayed below:

Variables Entered/Removed Table
This table tells us about multiple correlation, and multiple correlation squared.
The multiple correlation means the combined correlation squared represents the
amount of variance in the outcome.

Coefficients gives the information you need to construct your regression
equation.
The first column, labelled B, gives your regression coefficient (the slope of the
line) and your constant term (the Y intercept).


The column labelled Beta gives coefficients that would be used if all scores were
converted to standardized scores (z-scores). The constant is always zero, and the
regression coefficient equals the correlation between the two variables.
Page 13
Page 14
Correlation matrix table
The correlation matrix table gives the correlation, p-value and number of observations
for each pair of variables in the model. If you have unequal number of observations for
each pair, SPSS will remove cases from the regression analysis. This table tells us
whether we have multicolinearity.
The histogram of the standardised residual value gives us a rough idea whether it is a
good model or not. We would expect the histogram to be normally or approximately
normally distributed around mean of zero.
Page 15
Finally, we have the Normal P-P Plot of Regression Standardized Residual values. We expect
the values to be close to the reference line.
Exercise
Can you write down the model for evaluation?
For
further
training
materials,
see
the
http://www.lancaster.ac.uk/iss/training/materials/spss
Page 16
ISS
Training
website:
Information System Services (ISS), Lancaster University
Page 17
Download