Chapters 12 and 13

advertisement
1
12. SIMPLE LINEAR REGRESSION
INTRODUCTION
Regression analysis enables you to develop a model to predict the values of a numerical variable based on the
value of other variables—in the case of simple linear regression, a single variable.
Dependent variable = The variable you are trying to predict.
Independent variable = The variable you are using to predict the dependent variable.
REGRESSION EQUATION
Y = The predicted value of the dependent variable
β0 = The Intercept
β1 = The slope
X1 is the value of the independent variable
ε is the error
2
EXAMPLE
Store sales by size in square feet.
Y = β0 + β1*X1
β0 = .0945
β1 = 1.6699
Suppose you want to estimate sales for new store with 2,000 square feet.
X1 = 2000 square feet (2)
Y = 0.945 + 1.6699*2
= 4.2848
= ($4,284,800 per year)
Because square feet are given in thousands in the data.
Because annual sales are given in millions in the data.
Suppose you want to estimate sales for new store with 3,000 square feet.
X1 = 2000 square feet (3)
Y = 0.945 + 1.6699*3
Because square feet are given in thousands in the data.
= 5.9547
= ($5,954,700 per year)
Because annual sales are given in millions in the data.
Suppose you want to estimate sales for new store with 4,000 square feet.
X1 = 2000 square feet (4)
Y = 0.945 + 1.6699*4
Because square feet are given in thousands in the data.
= 7.6246
= ($7,624,600 per year)
Because annual sales are given in millions in the data.
3
EXAMPLE: SUNFLOWERS APPAREL
Wish to predict sales for store size in square feet, based on 14 stores.
DATA IN SITE.XLS
SCATTER PLOT
4
THE DATA IN EXCEL
INTERPRETATION
Interpreting the Y intercept (β0) and the slope (β1)
β0
If a building has 0 square feet, sales should be $0.9645 million annually ($964,500). Of course, this is
meaningless because no store can have 0 square feet. The stores in the sample varied from 1.1 feet
(1,100) thousand square feet and 5,800 square feet. Using the equation outside that range is dangerous.
5
β1
For every additional thousand square feet, sales will increase by 1.6699 million dollars
PREDICTING DEPENDENT VARIABLE VALUES
Predict sales for a store of $4,000 square feet.
Y = 0.9645 + 1.6699 X
X = 4 (4,000)
Y = 0.9645 + 1.6699 *4
= 7.6441
So the predicted annual sales are $7,644,000
HOW STRONG IS THE ASSOCIATION?
Measured by the R2 value.
Perfect prediction gives and R2 of 1.0000.
Here, R2 is .9042.
This is very high.
6
INFERENCES ABOUT THE SLOPE AND CORRELATION COEFFICIENT
Are they statistically significant?
The p-values tell you.
Suppose the confidence limit, α, is .05
The p-value for the intercept is 0.0917. So the intercept value is not statistically significant.
The p-value for the slope is 0.0000. So the intercept value is statistically significant.
7
13. MULTIPLE REGRESSION
INTRODUCTION
Multiple regression analysis enables you to develop a model to predict the values of a numerical variable based on
the value of other variables—in the case of multiple regression, multiple variables.
Dependent variable = The variable you are trying to predict.
Independent variables = The variables you are using to predict the dependent variable.
REGRESSION EQUATION
Y = The predicted value of the dependent variable
β0 = The Intercept
β1 = The change with Variable 1 if other variables are being held constant
β2 = The change with Variable 2 if other variables are being held constant
βi = The change with Variable i if other variables are being held constant
X1 is the value of the first independent variable
X2 is the value of the first independent variable
Xi is the value of the first independent variable
ε is the error
8
OMNIPOWER CASE
OmniPower is a sports bar. It is sold in 34 stores of a change. In one period, the price of omnipower bars was
varied by star, and so was in-store promotion expenses. Price was measured in cents, and monthly promotion
budget was measured in dollars. Here is the data [OMNIPOWER.XLS].
THE DATA
SCATTERPLOT
9
10
USING REGRESSION
The Y range is the dependent variable—Bars
The X range covers ALL independent variables—in this case two (Price and Promotion)
THE ONLY DIFFERENCE FROM SINGLE-VARIABLE REGRESSION IS ADDITIONAL DATA
11
USING THE MODEL
Y = β0 + β1X1 + β2X2 + ε
Y (Bars sold) = 5837.5208 + (-53.2173)*price(in pennies) + 3.6131*promotion budget (in dollars)
For price of 69 (cents) and promotion = 300 (dollars)
Y = 5837.5208 + (-53.2173)*69 + 3.6131*300
Y = 3249.457 bars
For price of 59 (cents) and promotion = 200 (dollars)
Y = 5837.5208 + (-53.2173)*59 + 3.6131*200
Y = 3420.32 bars
For price of 59 (cents) and promotion = 400 (dollars)
Y = 5837.5208 + (-53.2173)*59 + 3.6131*400
Y = 4142.94 bars
For price of 79 (cents) and promotion = 200 (dollars)
Y = 5837.5208 + (-53.2173)*79 + 3.6131*200
Y = 2355.974 bars
Download