Regression Tricks of the Trade

Regression Tricks of the Trade

Coding and Interpreting Regression Models

Xiao Chen

UCLA Academic Technology Services

Philip B. Ender

UCLA Department of Education


Michael N. Mitchell


c 2003 UCLA Academic Technology Services

Draft 8/7/03 – Do Not Distribute

2

Chapter 2

Regression Without

Predictors

At first glance, it doesn’t seem that studying regression without predictors would be very useful. Certainly, we are not suggesting that using regression without predictors is a major data analysis tool. We do think that it is worthwhile to look at regression models without predictors to see what they can tell us about the nature of the constant. Understanding the regression constant in these simpler models will help us to understand both the constant and the other regression coefficients in later more complex models.

The regression constant is also known as the intercept thus, regression models without predictors are also known as intercept only models. As such, we will begin with intercept only models for OLS regression and then move on to logistic regression models without predictors.

2.1

About the data

In this section we will use a sample of 200 observations taken from the

High School and Beyond (HSB) study (?????, 1986?). We have selected the variable write as our response or dependent variable. The values of write represent standardized writing test scores from a test that was normalized to have a mean equal to 50 and standard deviation of 10. Table 2.1 gives the summary statistics for write .

3

4 CHAPTER 2. REGRESSION WITHOUT PREDICTORS

Table 2.1: Summary statistics

Variable N Mean Std. Dev.

Var.

write

Min

31.00

200

Q1

45.50

52.775

Mdn

54.00

9.479

Q3

60.00

89.843

Max

67.00

2.2

OLS regression without predictors

Regression models are designed to estimate the expected means of a response

(dependent) variable conditional on values of a set of predictor variables. An ordinary least square regression equation with a single predictor variable can be written as,

Y i

= a + bX i

+ i

, where Y is the response variable, X is a predictor variable and is the residual or error. The coefficient b is the regression slope coefficient and a is the constant or intercept. In the case where there are no predictors, this equation reduces to,

Y i

= a + i

.

In this chapter, we are only interested in understanding and interpreting the constant.

If we use the standard assumption that the residuals are normally distributed with mean zero and variance σ 2 , i.e., i

∼ N (0 , σ 2 ) then the expected value of the response variable is

E ( Y i

) = E ( a + i

) = a + E ( i

) = a + 0 = ¯ which is reduces to a = ¯ . That is, the constant in the regression model is a mean, in particular, in an intercept only model the constant is the mean of the response variable.

Now, let’s review how the sums of squares (SS) are partitioned into SS for the regression model and SS for the residual.

n

X

( Y i

− i =1

SS total

= SS model

¯

)

2

= n

P i =1

( ˆ i

− ¯

)

2

+ SS residual

+ n

X

( Y i

− ˆ i

)

2 i =1 where Y is the response or observed variable, ¯ Y is the predicted score. From now on we won’t include all of the subscripts since it will be understood that the summation is over one to n.

2.2. OLS REGRESSION WITHOUT PREDICTORS 5

In an intercept only model the predicted score equals the mean, that is,

ˆ

= ¯ . Therefore, we can replace ˆ Y in the sums of squares equation, leading to

X

( Y −

¯

)

2

=

X

Y −

¯

)

2

+

X

( Y −

¯

)

2

X

( Y − ¯

)

2

=

X

(0)

2

+

X

( Y − ¯

)

2

X

( Y − ¯

)

2

=

X

( Y − ¯

)

2

This demonstrates that with an intercept only model there is only residual variability; there is no variability due to the regression model because there are no predictors. Now, let’s run an intercept only model and see what it looks like (Table 2.2).

Table 2.2: Model: write = constant

Variable Coefficient Std. Err.

constant 52.775

0.670

t

78.741

∗∗

N = 200 R

2

= 0 F

(0 , 199)

= 0

Significance levels : † : 10% ∗ : 5% ∗∗ : 1%

What can we tell from this regression output?

First, the constant is equal to 52.775, which is the mean of the variable write . It is also the expected or predicted value for every observation. The standard error of the constant is simply the standard error of variable write . We can manually

9 .

48 /

√

200 = 0 .

67. There is no R

2 and no overall F-ratio for the model – both of which can predicted from working through the partitioning of the sums of squares. The R

2

R

2 is equal to SS model

/SS total and since the SS model

= 0 the

= 0. Technically, the F-ratio doesn’t exist at all. Consider the equation,

F = ( SS model

/df model

) / ( SS residual

/df residual

)

F = (0 / 0) / (17878 .

875 / 199)

Since the degrees of freedom for the model are zero (no predictors) and division by zero is undefined so the F-ratio should also be undefined and not zero.

The t -test of the constant tests whether the constant is significantly different from zero. It is also possible to test whether the constant is significantly different from any value, say 50. It is just a matter of subtracting the

6 CHAPTER 2. REGRESSION WITHOUT PREDICTORS hypothesized value from the constant and dividing by the standard error of the constant.

t

( df )

= ( constant − 50) / ( Std.Err.

) t

(199)

= (52 .

775 − 50) / 0 .

670 = 4 .

14

This is equivalent to doing a single-sample t -test. Most statistics packages will let you do the test that the constant equals 50, so that you won’t have to do the computation by hand.

It is also possible do the above t -test using an intercept only regression model. Lets create a new dependent variable called write50 , in which we take the response variable, write , and subtract 50 from each value. We can then run an intercept only model with write50 as the response variable.

The value of t given in the output below (Table 2.3) is the same value that we computed manually above. The constant term, 2.755, in this model is the difference between the overall mean of write and the value 50. This is because we have set up our regression equation to be Y − 50 = a + and this leads to E ( Y ) − 50 = a .

Table 2.3: Model: write50 = constant

Variable Coefficient Std. Err.

constant 2.775

0.670

t

4.140

∗∗

N = 200 R 2 = 0 F

(0 , 199)

= 0


2.3

Creating a constant

It is possible for us to create a constant of our own. To do this we make a new variable called one that is equal to the value one. Then we run a model in which we use one as a predictor. We will have to tell the program we are using not to automatically include a constant in the model. This is the, so called, “no constant” model. The model and output are shown in Table 2.4.

There are many items that are the same in this model and the interceponly model above. The value of the coefficient for the predictor one is the same as for the constant in Table 2.2 and has the same standard error and t . But there are also several things that are funny or “wrong” about this model.

2.3. CREATING A CONSTANT 7

Table 2.4: Model: write = one / no-constant

Variable one

Coefficient

52.775

Std. Err.

0.670

t

78.741

∗∗

N = 200 R

2

= 0.969

F

(1 , 199)

= 6200.110


There are values for the overall F-ratio and for R

2

. Further, the values of these two statistics are wrong. To see why this has happened we need to understand more about the no constant model.

The reason for these differences lies in the fact that the “no constant” model assumes that the mean of the response variable is zero. It will be made clearer when we look once again at the partitioning of the sums of squares substituting zero for the value of the mean.

SS total

X

( Y − 0)

2

= SS model

= P ( ˆ − 0)

2

X

Y

2

= P

2

+ SS residual

+

X

( Y − ˆ

)

2

+

X

( Y − ˆ

)

2

Recall that the predicted score for each observation is the mean of the response variable which is 52.775, thus the sum of squares for the model would be,

X 2

=

X

52 .

775

2

= n (52 .

775

2

) = 557 , 040 .

125 .

While the sum of squares total is,

X

( Y −

¯

)

2

= 574 , 919 .

Thus,

R

2

= 557 , 040 .

125 / 574 , 919 = 0 .

969 and

F = (557 , 040 .

125 / 1) / (574 , 919 / 199) = 6200 .

11 .

Both of which are bogus because the sums of squares for the model are artifically inflated due to using zero instead of the actual mean.

Some statistics packages allow you to run the model with an option to indicate that you have included a constant. With this “has constant” option the program does not compute a constant but uses the constant predictor

8 CHAPTER 2. REGRESSION WITHOUT PREDICTORS as the constant for the model. The “has constant” model computes sums of squares and degrees of freedom correctly. These results (Table 2.5) look, in fact, exactly like the results in our intercept only model above (Table 2.2).

Table 2.5: Model: write = one / has-constant

Variable one

Coefficient

52.775

Std. Err.

0.670

t

78.741

∗∗

N = 200 R

2

= 00 F

(0 , 199)

= 0


With both the “no constant” and the “has constant” approaches you are not restricted to creating a constant equal to one. You could for example set a constant equal to two, in which case the value of the coefficient would be half the value of the mean and the standard error is also half as large so that the t -test is identical to previous models. Table 2.6 shows the results of this model.

Table 2.6: Model: write = two / has-constant

Variable two

Coefficient

26.388

Std. Err.

0.335

t

78.741

∗∗

N = 200 R 2 = 00 F

(0 , 199)

= 0


Please note, we said you could set the constant equal to two but we are not sure why anyone would want to do so.

2.4

Conclusion

What we have established, thus far, is that the constant in an OLS regression model has something to do with the mean of the response variable.

In particular, in intercept only models the constant is equal to the mean of the response variable.

[more concluding remarks go here.]

Regression Tricks of the Trade

Regression Tricks of the Trade

Xiao Chen

UCLA Academic Technology Services

Philip B. Ender

UCLA Department of Education

UCLA Academic Technology Services

Michael N. Mitchell

UCLA Academic Technology Services

Draft 8/7/03 – Do Not Distribute

Chapter 2

Regression Without

Predictors

2.1

About the data

2.2

OLS regression without predictors

2.3

Creating a constant

2.4

Conclusion

Related documents

Products

Support

Regression Tricks of the Trade