Exercise 1: Understanding Regression Output

advertisement

Instructions: read this document and complete the exercises. You are NOT to work together. You can e-mail me any questions you may have. It is due Tuesday, March 1st, in class.

Exercise 1: The T-distribution.

You should have studied and used the t-distribution in your previous statistics course. The actual theory behind the t-distribution is fairly advanced. Your econometrics textbook in Chapter 5 presents some of this material; it discusses how a t random variable is the ratio of a standard normal random variable and a Chi-squared random variable. For our purposes, we don’t need to go into these details. Instead, I want you to understand what random variables have a tdistribution and how to use the t-table (it works quite differently from the Z-table).

First, remember what a standard normal random variable is: Take a normally distributed random variable, minus out its mean and divide by its standard deviation, this gives you a standard normal random variable that has a mean of 0 and a standard deviation of 1.0. In notation, this is:

X ~ N (

,

)  Z

( X

 

)

~ N ( 0 , 1 )

However, if

is unknown, then it must be estimated. The best estimator to use is the sample standard deviation, s , where s

 s

2 

( x

T t

1 x )

2

. If we then standardize X using s instead of

σ

, we are adding an additional source of uncertainty to the new standardized random variable. This is because the sample standard deviation s is also a random variable, and this random variable has a Chi Squared distribution. Therefore, constructing a new random variable that is the ratio of a normal random variable, X, divided by a Chi Squared random variable s , results in a new random variable that has a t -distribution, *not* a normal distribution. This distribution is pictured on page 94 of your textbook. It is bell-shaped like the standard normal, but it has “wider” tails to allow for the additional uncertainty that is added when dividing through by the sample standard deviation.

Similar to the normal distribution, we use a table to get probabilities for the t values. However, the t-table works very differently. It appears on the inside of the front cover of your text book (it might be on the back cover, depending on the date of your book’s printing.)

Each t value has some measure of degrees of freedom. For our simple regression models it will be T-2 where T is the sample size. Go to the table and look up DF (“degrees of freedom”) of 10

(it could be any value, I chose 10 just for demonstration). You will find this in the first column of the table. Go down to the value 10. In this row, you are given t-values (not probabilities).

Instead, the probabilities appear at the top of each column: there are only 5 of them: 0.10, 0.05,

0.025, 0.01 and 0.005. In the row for DF = 10 you find:

DF

= 0.10

= 0.05

= 0.025

… … … …

= 0.01

= 0.005

10 1.372 1.812 2.228 2.764 3.169

How to interpret these values: The probability that the random variable t is greater than or equal to 1.372 is 0.10 (10%). We write this out as: P(t

1.372) = 0.10.

The top of the table tells us that the table contains “Right-tail critical values for the t distribution”. Because the t random variable is symmetric, we also know that

P(t

-1.372) = 0.10. this area under the curve to the right of

1.372 is equal to 0.10

0 1.372 t values

What happens if we want to calculate the probability that the t random variable with 10 degrees of freedom takes on any value, such as P(t

2.00)? Well, this table will only allow us to approximate it:

The table tells us tha t P(t

1.812) = 0.05 and P(t

2.228) = 0.025 (verify this by examining the table).

Therefore, we can say that the probability lies between 2.5% and 5%.

This is: 0.025

P(t

2.00)

0.05.

Alternatively, we could use Excel to get exact probabilities, instead of approximating. The formula in Excel is =TDIST( ) in the brackets you need to enter 3 values: the first is the value for t. In our example, this would be 2.00. The second value is the degrees of freedom, and the third value is either a 1 or a 2, for one or two tails. To answer the probability: P(t

2.00) with

10 degrees of freedom, we would enter the formula: =TDIST(2.00,10,1). Hit return and Excel returns the value 0.036694, which lies between 0.025 and 0.05 as we indicated above. If you wanted to find the probability that t is greater than 2.00 OR less than –2.00, this would be the sum of the two tails. In excel, the formula would be =TDIST(2.00,10,2) and Excel would return the value 0.073388. (Verify these results using Excel before proceeding).

Answer these questions: a) Approximate P(t

. 2.00) for 38 degrees of freedom using the table b) Find P(t

. 2.00) for 38 degrees of freedom using the Excel

c) Find t c where P(t

t c d) Find t c where P(t

t c

) = 0.05

for 38 degrees of freedom using the T-table

) or P(t

t c

) = 0.05

for 38 degrees of freedom. For this one, the following diagram will help:

0.025 0.025 you want the sum of the two tail areas to be 0.05, so you need to know what t c

value leaves 0.025 in each tail?

-t c

0 t c

Exercise 2: Confidence Intervals

Read slides 5.1 – 5.9 and pages 90 – 98, only skimming pages 92, 93 and 94 (which go into great detail concerning the t- distribution). These slides and pages explain how to construct a confidence interval and how to interpret it. Slide 5.7 demonstrates how we move from a statement concerning the t-distribution to a confidence interval for β

2

. Think of the confidence interval as an estimator for the true, unknown parameter. We know from Chapter 4 that b

2

is our estimator for β

2

, but now we recognize it as a point estimator. By itself, if tells us nothing about its precision. For that, we need to take b

2

’s standard deviation into consideration. That is where confidence intervals come into play. We take our point estimator, b

2

, and construct an interval around it using the standard deviation of the estimator, se(b

2

). The interval is expressed in terms of degree of confidence. We get to choose the degree of confidence, with 95% being the most popular choice, and 90% and 99% also being useful levels.

In the example from the slides (slides 5.8 and 5.9) and book (bottom of page 97 – 98), we have a sample of 40 observations, implying T – 2 = 40 – 2 = 38 degrees of freedom. From here, we used the t-table to determine that 95% of the possible t values lie between +/- 2.024. From here we then infer that 95% of the possible b

2

values lie between +/-2.024 standard deviations

(se(b

2

)): b

2

2 .

024 se ( b

2

) = 0.1283 ± 0.0617 = [0.0665, 0.1901]

We say that the margin for error is 0.0617. We are 95% confident that this interval contains the true β

2

value. Notice that we do not say that we believe there to be a 95% probability that the true β

2

is in this interval. The difference between the two statements concerns what the probability refers to. The β

2

parameter is not random, so we do not make probabilistic statements about it. The interval is constructed using b

2

and se(b

2

) which are both random variables, so we make the probabilistic statement about the interval.

The interval can be for any level of confidence, where the critical t-value t c

will differ depending on the level of confidence and the degrees of freedom. The general form of a confidence interval is: b

2

 t c se ( b

2

)

Answer these questions:

1) For the example on slides 5.8 and 5.9, construct the 99% and 90% confidence intervals for β

2

.

Show all of your work. This is very straight-forward as long as you know how to use the t -table.

2) For the elections regression that you estimated in computer assignment #2, demonstrate how the 95% confidence interval for β

2

was constructed. Excel prints this out for you [0.499459,

1.27633], but you need to show exactly how it was constructed.

3) Suppose that a political pundit argues that the growth rate of the economy has NO effect on how people vote. Explain how your results in 2) above can refute this statement. (Hint: if this statement were true, what would the value of β

2

be?)

4) Repeat 2) for the mortgage rate/housing starts regression that you estimated for problem set

#3.

5) Suppose that a housing market analyst predicts that a 1 percentage point increase in the 30 year fixed mortgage rate will cause a reduction in housing starts of 40,000 units. Does your confidence interval from 4) support or refute this prediction? Explain.

Download