REGRESSIONS IN STATPLUS Here's how to run a regression in

advertisement
REGRESSIONS IN STATPLUS
Here’s how to run a regression in StatPlus.
Open Excel. Label cell A1 X and cell B1 Y. Type in some numbers randomly in A2 –
A10 and B2 – B10. Here’s an example.
Graph the data as a ‘scatter graph.’ Notice there is a slight negative slope to the data as
you look from left to right. This tips us off that there might be a negative correlation in
the data between Y and X.
Click on the StatPlus icon and you’ll see it in the menu at the top of the screen. Draw
down ‘Statistics’ until you see ‘Regression’ and select that. Click on ‘Linear Regression.’
You’ll see a dialog box like the one below. To the right of ‘Dependent Variable’ is a
small box with a red arrow as depicted. Click it and then go to the data and select the Y
column B1 – B10.
The data location will now appear in the dialog box below ‘Dependent Variable.’ Notice
the next dialog box for ‘Independent Variables.’ It also has a small box to the right with a
red arrow. Click it then go to the data and select the data for X in A1 – A10. Click the
dialog box for Linear Regression and you’ll see the data description listed. You’re almost
ready. Notice that that box ‘Labels in First Row’ is checked. This treats the first row
literally as labels in the regression equation output.
Click ‘Advanced Options.’ Notice the last box. If you click it your regression will not
have a constant. Since we want a constant in the equation DO NOT click it. Later we will
want to run regressions without a constant. Instead click ‘Cancel.’
Now we’re ready. Click ‘OK’ in the Linear Regression dialog box. The output will be in
a separate spreadsheet. Here’s what it will look like.
Isn’t this cool? The estimated equation is,
Y = 12.3 – 0.5332X.
Notice the standard error is large for the variable X, 0.335. This gives us a t-stat of only 1.59, so this estimate is not very precise. We would not have much confidence in this
equation. It is not explaining the Y data very well at all.
EXERCISE 1.
Type the following data into Excel.
!
"
#
$$
%
$#
$
$%
&
%'
(
%(
)
%&
*
%%
+
%$
'
#+
#,
#)
##
#(
#%
#&
#$
#%
#&
##
#(
*
Graph the data. What does it look like?
Run a regression of Y on X as we just did. What did you get? What is the standard error
on the coefficient? What is the t-stat? Is it statistically significant? Here’s what I got:
The t-stat is huge in absolute value so this estimate is very significant.
EXERCISE 2
Now we want to run a multiple regression with several variables that might influence Y
according to some relationship Y = F(X, Z), where X and Z are independent variables
and Y is the dependent variable. We assume there is a linear relationship among the
variables,
Y = a + bX + cZ,
And we want the data to tell us what the parameters a, b, and c are. So we regress Y on a
constant, X, and Z.
Here’s the data
!
$$
$%
$(
(*
('
(+
((
($
%,
%&
"
%
(
$
+
'
&
)
,
*
%-
#
&'
&(
')
'+
'&
',
'%
++
+)
+(
In StatPlus select the Y data for the ‘Dependent Variable’ and now select both X and Z
for the ‘Independent Variable.’ Are both independent variables statistically significant?
Here’s what I got:
Both variables are significant, although X has more influence than Z.
The sign of the coefficient is also important. If b > 0 an increase in X increases Y. If b <
0, then an increase in X reduces Y.
Download