Relationships
Estimating Multiple Linear Functions
Sources of Variability
Assumptions
Assumption Checks
Testing for the effect of all the variables
Inferences about each coefficient
Testing the effect of some of the variables
Prediction and Estimation
NCSS
What you should be able to do when you finish the notes o Discuss differences in the types of multiple linear relationships o Define and put in English the effects of variables in the different models o Define and put in English the standard error of the estimate and the coefficient of determination o Discuss the required assumptions o Know how to check the validity of the assumptions o Test the hypothesis about the effect of the model o Test hypothesis and construct confidence intervals of the slopes o Construct confidence intervals for an average value of Y and prediction intervals for a value of Y (for specific values of X) o Compare the similarities and differences in simple linear and multiple linear regression o Get NCSS to calculate the required estimates and tests
1. Relationships
1.1 Types of relationships
First Order: more distinct variables are added to the slr model
Second Order: squared terms are added to model
Interaction: product terms are added
1.2 Effects of each variable
First Order Model: This is the change in the average value of the dependent variable when an independent variable increases by one, holding constant all the others.
Second Order Model: The change in the average value of the dependent variable when an independent variable increases by one depends on the current value of the independent variable, holding constant all the others.
Interaction Model: The change in the average value of the dependent variable when an independent variable increases by one depends on the value of another variable, while holding constant all the others.
1.3 Examples
Based on trying to predict exam grade, Y, based on hours studied, X
1
, and GPA, X
2
.
First Order Model: This is the change in the average exam grade when you study an additional hour, for those people with the same GPA.
E ( ExamGrade )
0
1
( HoursStudi ed )
2
( GPA )
Or E ( Y )
0
1
X
1
2
X
2
Second Order Model:
E ( ExamGrade )
0
1
( HoursStudi ed )
2
( HoursStudi ed )
2
3
( GPA )
The change in the average exam grade when you study an additional hour depends on how long you have studied, controlling for the effect of GPA.
To find the change, substitute the value of “HourStudied+1” into the above equation and examine the difference:
E ( ExamGrade )
0
1
( HoursStudi ed
1 )
2
( HoursStudi ed
1 )
2
3
( GPA ) minus
E ( ExamGrade )
0
1
( HoursStudi ed )
2
( HoursStudi ed )
2
3
( GPA ) which equals to the following as long as GPA is the same value in both equations:
1
2
( 1
2 * HourStudie d )
Interaction Model: The change in the average exam grade when for each additional hour studied depends on the value of the GPA.
E ( ExamGrade )
0
1
( HoursStudi ed )
2
( GPA )
3
( HoursStudi ed * GPA ) or
E ( Y )
0
1
X
1
2
X
2
3
X
1
X
2
E ( Y )
0
(
1
3
X
2
) X
1
2
X
2
Other examples:
Baseball: http://baseballanalysts.com/archives/2007/05/the_value_of_th.php
Accounting http://www.nysscpa.org/cpajournal/2005/105/essentials/p56.htm
Business: http://www.cdc.gov/MMWR/preview/mmwrhtml/00037061.htm
U.S. Weather Forecasts: http://www.weather.gov/ost/NWS_TIP.pdf
Create your own examples of interpretations by changing the words and numbers in
First Order Model: http://wweb.uta.edu/faculty/eakin/busa5325/BetaInter1stOrderModel.xls
Interaction Model: http://wweb.uta.edu/faculty/eakin/busa5325/BetaInterInteractionModel.xls
Quadratic Model http://wweb.uta.edu/faculty/eakin/busa5325/BetaInterQuadraticModel.xls
2. Estimating Multiple Linear Functions
Same as for SLR: Least squares
Interpretation of estimates: just place the word estimate in the above examples and provide the values from the estimated equation
Examples for a first order two variable model: http://wweb.uta.edu/faculty/eakin/busa5325/EstbiMLR.xls
3. Sources of Variation
Total: The variability of the y values around their mean
Regression: the variability of the estimated slopes around their mean
Error: the variability of the y values around the mean given the values of the X’s
Standard Error of Estimate, S y|x1,x2,…,xk or S e
estimate of the typical error when predicting y that occurs when using this model estimate on the sample data. This was what we called the sample standard deviation.
Coefficient of Determination, R 2 : Percent of sample variability of Y that can be associated with variation in the independent variables
Example: Sales, size, and ad budget of store, S e
= 966,000, R
2
= 0.904, Interpret the meaning of the standard error of the estimate and the coefficient of determination.
Solution: As always replace Y and X with the names from the example
Standard Error of Estimate, S y|x
: If the estimated line is used, there is a typical error of $966,000 when predicting sales in this sample,
Coefficient of Determination, R 2 : 90.4% of the sample variability of sales that can be associated with variation in the size of the store and advertising budget.
Click on the following link to load an Excel worksheet that will allow you to create new examples: When asked for the independent variable list all the variables, separated by commas http://wweb.uta.edu/faculty/eakin/busa3321/VariationExplanation.xls
4. Assumptions
Same as in simple linear regression but more independent variables to list
Example: Based on trying to predict exam grade, Y, based on hours studied, X
1
, and
GPA, X
2
.
Linearity : the average exam grade has the specified relationship with hours studied and GPA
Independence: the students are independently selected and the values of the exam grades are not related for a given values of hours studied and GPA
Normality: the exam grades are normally distributed for students with the same values of hours studied and GPA
Equal Variation: the variation in the exam grades is the same (equal) for students with the same values of hours studied and GPA regardless of the value of the hours studied and GPA.
Examples: Interpreting the Assumptions of MLR
5. Assumption Checks
Residual plots versus estimated average
Residual plots versus each variable
Normality probability plots
6. Overall effects:
6.1 Test of the model: E ( Y )
0
1
X
1
2
X
2
...
k
X k
Note that any one of the X’s above could be a function of one or more other independent variables; e.g., the square of another variable or the product of two variables. Therefore find the value of k by looking at the number on the last
in the model. So if our model was
E ( Y )
0
1
X
1
2
X
1
2
3
X
1
2
X
2
4
X
2
2
5
X
2 then k = 5
6.1.1 Test form for
Hypothesis: Ho:
1
=
2
=…= k
=0 (no variable has an effect)
H1: at least one has an effect
Rejection region: One sided F with k and n-k-1 degrees of freedom
Test Statistic : This is an application of the 5 th Building Block of the course: if the sample slopes vary enough from zero, we can conclude the populaiotn slopes vary from zero. We will use the F form of the test:
F =variation of coefficient estimates/ variability of randomness
Conclusion: We can (not) say that changes in the values of at least one independent variable is associated with changes in the average value of the dependent variable.
6.1.2 Example: Based on trying to predict exam grade, Y, based on hours studied,
X
1
, and GPA, X
2
. You are given that n = 30, MSR = 200 and MSE = 10 using model: E ( ExamGrade )
0
1
( HoursStudi ed )
2
( GPA )
Hypothesis: Ho:
1
=
2
=0 (neither hours studied nor GPA has an effect on the average exam grade)
H1: at least one has an effect
Rejection region: Reject Ho if F > F
2, 27
= 3.354
Test Statistic : variation of coefficient estimates/ variability of randomness
F = MSR/MSE = 200/10 = 20
Conclusion: We can say that changes GPA and/or hours studied is useful in predicting the exam grade.
To create other examples, click on the link below and change numbers and names http://wweb.uta.edu/faculty/eakin/busa5325/MLRTestOfModel.xls
7. Test of each coefficient
We will only consider the test for the first-order model
The test is the same as in SLR but with n-k-1 degrees of freedom and adding the phrase “adjusting for all others variables”
Example: You are given that n = 30, with estimated model:
30
1 .
67 ( HoursStudi ed )
20 ( GPA ) where S b1
= 0.157 o Hypothesis o
H
0
:
1
=0
H
1
:
1
>0 o Rejection region: The degrees of freedom are n-k-1=27. Since this is a right-sided t-test, find the t-table value of 1.7033 in row 27 column 0.05.
Therefore Reject H
0
if t > 1.7033 o Test Statistic t
1 .
67
0
10 .
64
0 .
157 o Conclusion: After controlling for the effect of GPA, we can say that increases in the size of the store is associated with increases in the average sales
To create other examples, click on the link below and change numbers and names
http://wweb.uta.edu/faculty/eakin/busa5325/MLRTestOfBeta1.xls
5.2 Confidence interval for slope
Same as in SLR but with n-k-1 degrees of freedom and controlling for all other variables.
Click on the following link to load an Excel worksheet that will allow you to create new examples for a two variable first-order model.
http://wweb.uta.edu/faculty/eakin/busa3321/C.I.slope.xls
8. Testing a subset of the coefficients.
8.1 Variation of the subset of variables over and above the effect of all other variables.
Find the sum of squares of regression of all variables (Full Model)
Find the sum of squares regression of the other variables (Reduced Model).
The differences is the variation in the subset over and above the others
Estimate of variation is this difference divided by the number of variables, c, being tested.
8.2 Example: Based on trying to predict exam grade, Y, based on hours studied, X
1
, and GPA, X
2
. You are given that n=30 and
For the Full model:
E ( ExamGrade )
0
1
( HoursStudi ed )
2
( HoursStudi ed ) 2
3
( GPA ) The
MSE = 5 with d.f. = n-k-1 = 26 and the SSR f
= 250
You want to test the effect of Hour Studied. So after fitting the reduced model:
E ( ExamGrade )
0
3
( GPA ) you find SSR r
= 50
Hypothesis: Ho:
1
=
2
=0 (hours studied has an no effect on the average exam grade after adjusting for GPA)
H1: it does have an effect
Rejection region: Reject Ho if F > F
2, 26
= 3.369
Test Statistic: variation of coefficient estimates/ variability of randomness
F
( SSR f
c
SSR r
)
MSE f
( 250
2
50 )
5
20
Conclusion: We can say that hours studied is useful for predicting the exam grade after adjusting for the effect of GPA.
9. Estimating the average and predicting an individual
9.1 Estimating the average value of all Y values for observations with the same value of X
1
, the same value of X
2
, …. , and the same value of X k
Formula: estimated mean plus and minus the margin of error yˆ
t n
k
1
( SE mean
)
Conclusion: We can say with ___ confidence that the average value of the dependent is _____ with a margin of error of _________ for all observations with value of the independent variables of ____
Example Find average sales for all stores that have 4,000 square feet and a
$100,000 advertising budget. You are given that the estimated average sales
=964,000 + 1,670,000 (size)+1.45(ad budget) and that the SE mean
, the standard error of the mean estimate = 309,000
Solution: Substitute the value of x into the estimated equation to obtain the estimated average sales:
Estimate of average sales = 964,000 + 1,670,000 (4)+1.45(100,000) = 7,789,000
Next substitute values into confidence interval yˆ
t n
k
1
( SE mean
)
7 , 789 , 000
2 .
2010 ( 309 , 000 )
7 , 789 , 000
680109
Conclusion: For all stores that have 4,000 square feet and have a 100,000 advertising budget, we can say with 95% confidence that the average sales is
$7,789,000 with a margin of error of ± $680,109
Click on the following link to load an Excel worksheet that will allow you to create new examples (Special case of two independent variables with the
estimated standard error of estimating the average value of Y already calculated)
http://wweb.uta.edu/faculty/eakin/busa5325/CIMuYMLR.xls
9.2 Predicting the value of Y for an observation with a given value of X
Formula: predicted value plus and minus the margin of error
n
k
1
individual
Conclusion: We can say with ___ confidence that the value of the dependent is
_____ with a margin of error of _________ for an observations with values of the
independent variables of ____
Example: Find sales for a store that has 4,000 square feet. You are given the predicted sales =964,000 + 1,670,000 (size)+1.45(ad budget) and its estimated standard error is 1,104,000.
Solution: Substitute the value of x into the estimated equation to obtain the value of the predicted sales:
predicted sales = 964,000 + 1,670,000 (4)+1.45(100,000) = 7,789,000
Next substitute values into confidence interval yˆ
t n
k
1
( SE individual
)
7 , 789 , 000
2 .
2120 * ( 1 , 104 , 000 )
7 , 789 , 000
2 , 429 , 904
Conclusion: For a store that has 4,000 square feet and has a $100,000 advertising budget, we can say with 95% confidence that the sales will be $7,789,000 with a margin of error of ± $2,429,904
Click on the following link to load an Excel worksheet that will allow you to create new examples (Special case of two independent variables with the
estimated standard error of predicting an individual value of Y already calculated)
http://wweb.uta.edu/faculty/eakin/busa5325/CIYMLR.xls
10. SAS (to be constructed)