Introduction to SPSS 6 Linear Regression Version: 1.0 September 2015 Table of Contents 1 Introduction .................................................................................................................................. 1 2 Simple Linear Regression Example ....................................................................................... 1 3 Simple Linear Regression on SPSS ......................................................................................... 1 3.1 Scatter Plots ..................................................................................................................... 1 3.2 Adding a regression line to your scatterplot ................................................................... 4 3.3 Regression Analysis .......................................................................................................... 6 4 Multiple Linear Regreassion.................................................................................................... 8 4.1 Assumptions ..................................................................................................................... 8 4.2 Multiple Linear Regression Model ................................................................................... 8 4.3 Multiple Linear Regression on SPSS ................................................................................. 9 1 Introduction Linear regression is used to examine the relationship between a dependent variable and an independent or predictor variable. Linear regression enables you to find the equation by which you can best predict scores on the dependent variable from scores on the predictor variable. 2 Simple Linear Regression Example Linear regression is defined as (from Wikipedia): “an approach for modelling the relationship between a scalar dependent variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple linear regression.” A company wants to investigate the relationship between monthly income and age, and has invited 30 participates, the data has saved in file ‘r_1.sav’. Practice files that accompany the guide are available on the ISS website at: www.lancaster.ac.uk/iss/info/IThandouts/spss/SPSSfiles.zip - you can download them and extract them to your computer. 3 Simple Linear Regression on SPSS 3.1 Scatter Plots It is always useful to examine the data visually, and you will be able to notice some obvious relationships between variables before you build your math models. To produce a scatterplot of only two variables with the dependent variable on the Y-axis (verticle axis) and the independent variable on the X-axis (horizontal axis), do the following: 1. Open ‘r_1’ from your workspace – can be accessed from www.lancaster.ac.uk/iss/info/IThandouts/spss/SPSSfiles.zip 2. From top menu, select ‘Graphs’, ‘Legacy Dialogs’ and then select ‘Scatter Dots’ Page 1 3. From the dialog box, select ‘Simple Scatter’ and click ‘Define’: 4. Select the variable you want to plot on the Y axis (dependent variable) by clicking on the variable and click the first right arrow to put it into ‘Y-axis’ on the right. 5. Select the variable you want to plot on the X axis (independent variable) by clicking on the variable and then click the second right arrow to put it into ‘X-axis’. (If you want to graph onto a z-axis as well, change the "2-D Coordinate" setting to "3-D Coordinate". Then, drag the variable you want to plot on the z-axis into the white box inside of the diagonal line.) 6. Click ‘OK’ 7. Then scatter plot will appear in the Output window. Page 2 Page 3 3.2 Adding a regression line to your scatterplot 1. Double click on your scatterplot and maximize the window. 2. Right click on the scatter plot, and you will see a dropdown menu. Click on ‘Add fit line at total’ Page 4 3. Then a regression line will appear on the scatter plot. At the same time. A Properties window will appear, you can edit the colours of the regression line and scatter dots under this window. Page 5 3.3 Regression Analysis 1. From ‘Analyze’, select ‘Regression’ and then ‘Linear’ You will be presented with a dialog box below listing the variables in your data set in a column on the left. 2. Select the dependent variable by clicking on it, and move it to the box labelled ‘Dependent’ by clicking the right arrow. Select the independent or predictor variable you want to use to predict the dependent variables by clicking on it, and move it to the box labelled ‘Independent’ by clicking the right arrow. Then click ‘OK’ Page 6 3. The regression result will be shown on ‘Statistics Viewer’ window, and we will discuss the output in the next section. Page 7 4 Multiple Linear Regreassion The previous section introduced the simplest case of linear regression, where there are only two variables in the data set. In this section, we will extend our analysis to data set that contains more than two variables: multiple linear regression. Multiple linear regression is probably one of the most popular statistical analysis. Multiple linear regression is used to determine importance of variables, and to understand how two or more variables are related in the context of a model. 4.1 Assumptions Regression model works best with proper assumptions: 1. Multiple regression works best under the condition of proper model specification; essentially, you should have all the important variables in the model and no unimportant variables in the model. Literature reviews on the theory and variables of interest pay big dividends when conducting regression. 2. Regression works best when there is a lack of multicollinearity (variables are too strongly related, which degrades regression's ability to discern which variables are important to the model). 3. Regression is designed to work best with linear relationships. 4. Regression is designed to work with continuous or nearly continuous data. 5. Categorical predictors need to be coded using special strategies in order to be included into a regression model and produce meaningful interpretive output. 6. Regression works best when outliers are not present. Thorough initial data analysis should be used to review the data, identify outliers (both univariate and multivariate), and take appropriate action. 4.2 Multiple Linear Regression Model A company wants to investigate the productivity of sales manager in the corporation. 70 sales managers were evaluated and the evaluation includes 6 variables that quantify different aspects of managerial performance. The manager wants you to use this information stored in ‘multiple.sav’ to develop a model that predicts a manager’s evaluation. Page 8 4.3 Multiple Linear Regression on SPSS 1. Open ‘multiple.sav’ from your workspace – if needed, extract from the ISS website at www.lancaster.ac.uk/iss/info/IThandouts/spss/SPSSfiles.zip 2. From the top menu, select ‘Analyze’, ‘Regression’ and then ‘Linear…’ 3. From the ‘Linear Regression’ Dialogue box, highlight your independent variable, and use the top arrow button to move it to the ‘Dependent’ box. Page 9 4. Then highlight all the related variables and use the second arrow to move them into the ‘Independent’ box. 5. Then from the ‘Linear Regression’ dialogue box, click ‘Statistics’ button on the right. Select ‘Confidence Interval’, ‘Covariance matrix’, ‘Descriptives’ and ‘Part and partial correlations’. Then click on ‘Continue’ button on the bottom to exit this dialogue box. Page 10 6. In order to understand the performance of this model, we can ask SPSS to draw some diagrams. From the ‘Linear Regression’ dialogue box, click ‘Plots’ button on the right, and highlight ‘ZRESD’ to put it into ‘Y’, and highlight ‘DEPENDENT’ to put it into ‘X’. 7. Click on the ‘Next’ button, then select ‘Histogram’ and ‘Normal probability plot’. Then click ‘Continue’ button. 8. In the ‘Linear Regression’ dialogue box, click ‘Save’ button. Select the results you would like to save on the ‘Linear Regression: Save’ dialogue box. For example, for predicted value, we select ‘Unstandardized’ and ‘Standardized’, and for ‘Residuals’ , we select ‘Standardized’ Page 11 In the ‘Distance’ section, we can select ‘Mahalanobis’ for checking multivariate outliers. In the ‘Predication Interval’ part, you can input the confidence interval you would need for your analysis. Then click ‘Continue’. Page 12 9. Click ‘OK’. From ‘Statistics Viewer’ window, you should get some similar results that displayed below: Variables Entered/Removed Table This table tells us about multiple correlation, and multiple correlation squared. The multiple correlation means the combined correlation squared represents the amount of variance in the outcome. Coefficients gives the information you need to construct your regression equation. The first column, labelled B, gives your regression coefficient (the slope of the line) and your constant term (the Y intercept). The column labelled Beta gives coefficients that would be used if all scores were converted to standardized scores (z-scores). The constant is always zero, and the regression coefficient equals the correlation between the two variables. Page 13 Page 14 Correlation matrix table The correlation matrix table gives the correlation, p-value and number of observations for each pair of variables in the model. If you have unequal number of observations for each pair, SPSS will remove cases from the regression analysis. This table tells us whether we have multicolinearity. The histogram of the standardised residual value gives us a rough idea whether it is a good model or not. We would expect the histogram to be normally or approximately normally distributed around mean of zero. Page 15 Finally, we have the Normal P-P Plot of Regression Standardized Residual values. We expect the values to be close to the reference line. Exercise Can you write down the model for evaluation? For further training materials, see the http://www.lancaster.ac.uk/iss/training/materials/spss Page 16 ISS Training website: Information System Services (ISS), Lancaster University Page 17