1 12. SIMPLE LINEAR REGRESSION INTRODUCTION Regression analysis enables you to develop a model to predict the values of a numerical variable based on the value of other variables—in the case of simple linear regression, a single variable. Dependent variable = The variable you are trying to predict. Independent variable = The variable you are using to predict the dependent variable. REGRESSION EQUATION Y = The predicted value of the dependent variable β0 = The Intercept β1 = The slope X1 is the value of the independent variable ε is the error 2 EXAMPLE Store sales by size in square feet. Y = β0 + β1*X1 β0 = .0945 β1 = 1.6699 Suppose you want to estimate sales for new store with 2,000 square feet. X1 = 2000 square feet (2) Y = 0.945 + 1.6699*2 = 4.2848 = ($4,284,800 per year) Because square feet are given in thousands in the data. Because annual sales are given in millions in the data. Suppose you want to estimate sales for new store with 3,000 square feet. X1 = 2000 square feet (3) Y = 0.945 + 1.6699*3 Because square feet are given in thousands in the data. = 5.9547 = ($5,954,700 per year) Because annual sales are given in millions in the data. Suppose you want to estimate sales for new store with 4,000 square feet. X1 = 2000 square feet (4) Y = 0.945 + 1.6699*4 Because square feet are given in thousands in the data. = 7.6246 = ($7,624,600 per year) Because annual sales are given in millions in the data. 3 EXAMPLE: SUNFLOWERS APPAREL Wish to predict sales for store size in square feet, based on 14 stores. DATA IN SITE.XLS SCATTER PLOT 4 THE DATA IN EXCEL INTERPRETATION Interpreting the Y intercept (β0) and the slope (β1) β0 If a building has 0 square feet, sales should be $0.9645 million annually ($964,500). Of course, this is meaningless because no store can have 0 square feet. The stores in the sample varied from 1.1 feet (1,100) thousand square feet and 5,800 square feet. Using the equation outside that range is dangerous. 5 β1 For every additional thousand square feet, sales will increase by 1.6699 million dollars PREDICTING DEPENDENT VARIABLE VALUES Predict sales for a store of $4,000 square feet. Y = 0.9645 + 1.6699 X X = 4 (4,000) Y = 0.9645 + 1.6699 *4 = 7.6441 So the predicted annual sales are $7,644,000 HOW STRONG IS THE ASSOCIATION? Measured by the R2 value. Perfect prediction gives and R2 of 1.0000. Here, R2 is .9042. This is very high. 6 INFERENCES ABOUT THE SLOPE AND CORRELATION COEFFICIENT Are they statistically significant? The p-values tell you. Suppose the confidence limit, α, is .05 The p-value for the intercept is 0.0917. So the intercept value is not statistically significant. The p-value for the slope is 0.0000. So the intercept value is statistically significant. 7 13. MULTIPLE REGRESSION INTRODUCTION Multiple regression analysis enables you to develop a model to predict the values of a numerical variable based on the value of other variables—in the case of multiple regression, multiple variables. Dependent variable = The variable you are trying to predict. Independent variables = The variables you are using to predict the dependent variable. REGRESSION EQUATION Y = The predicted value of the dependent variable β0 = The Intercept β1 = The change with Variable 1 if other variables are being held constant β2 = The change with Variable 2 if other variables are being held constant βi = The change with Variable i if other variables are being held constant X1 is the value of the first independent variable X2 is the value of the first independent variable Xi is the value of the first independent variable ε is the error 8 OMNIPOWER CASE OmniPower is a sports bar. It is sold in 34 stores of a change. In one period, the price of omnipower bars was varied by star, and so was in-store promotion expenses. Price was measured in cents, and monthly promotion budget was measured in dollars. Here is the data [OMNIPOWER.XLS]. THE DATA SCATTERPLOT 9 10 USING REGRESSION The Y range is the dependent variable—Bars The X range covers ALL independent variables—in this case two (Price and Promotion) THE ONLY DIFFERENCE FROM SINGLE-VARIABLE REGRESSION IS ADDITIONAL DATA 11 USING THE MODEL Y = β0 + β1X1 + β2X2 + ε Y (Bars sold) = 5837.5208 + (-53.2173)*price(in pennies) + 3.6131*promotion budget (in dollars) For price of 69 (cents) and promotion = 300 (dollars) Y = 5837.5208 + (-53.2173)*69 + 3.6131*300 Y = 3249.457 bars For price of 59 (cents) and promotion = 200 (dollars) Y = 5837.5208 + (-53.2173)*59 + 3.6131*200 Y = 3420.32 bars For price of 59 (cents) and promotion = 400 (dollars) Y = 5837.5208 + (-53.2173)*59 + 3.6131*400 Y = 4142.94 bars For price of 79 (cents) and promotion = 200 (dollars) Y = 5837.5208 + (-53.2173)*79 + 3.6131*200 Y = 2355.974 bars