Powerpoint slides - CMU Contributed Webserver

advertisement
70-208 Regression Analysis
Week 3
Dielman: Ch 4 (skip Sub-Sec 4.4.2, 4.6.2, and 4.6.3 and Sec 4.7), Sec 7.1
Multiple Independent Variables
• We believe that both education and
experience affect the salary you earn. Can
linear regression still be used to capture this
idea?
• Yes, of course
• The “linear” part of “linear regression” means
that the regression coefficients cannot enter
the eq’n in a nonlinear way (such as β12 * x1)
Multiple Independent Variables
• Salaryi = β0 + β1 * Educi + β2 * Experi + μi
• Graphing this equation requires the use of 3
dimensions, so the usefulness of graphical
methods such as scatterplots and best-fit lines is
somewhat limited now
• As the number of explanatory variables increases,
the formulas for computing the estimates of the
regression coefficients become increasingly
complex
– So we will not cover how to solve them by hand
Multiple Independent Variables
• Equation that “best” describes the relationship btwn a
dependent variable y and K independent variables x1,
x2, … , xK can be written as:
– y = β0 + β1 * x1 + β2 * x2 + … + βK * xK + μ
– Note that I will mostly drop the “i” subscript moving fwd
• The criterion for “best” is the same as it was for simple
(i.e. K = 1) regression – the sum of the squared
difference btwn the true values of y and the values
predicted yhat should be as small as possible
• β0,hat, β1,hat, β2,hat, … , βK,hat ensure that the sum of
squared errors is minimized
Labeling β
• Sometimes we just use β0, β1, β2, … , βK to label the
coefficients
• Other times, it is useful to be more specific. For example, if
x1 represents “education level”, it is better to write β1 as
βeduc.
– β0 is always written the same
• The first regression below is more helpful in seeing and
presenting your work than the second regression, even if
we knew that y was salary, x1 was education, etc
– Salary = β0 + βeduc * Educ + βexper * Exper + μ
– y = β0 + β1 * x1 + β2 * x2 + μ
• I will go back and forth with my labeling throughout the
course. I just wanted you to understand the difference and
why one way might be better in practice.
Multiple Independent Variables
• Ceteris paribus – all else equal
• In the case of simple regression, we interpreted
the regression coefficient estimate as meaning
how much the dependent variable increased
when the independent variable went up one unit
• Implicit was the concept that the error term for
any two individuals were equally distributed, in
other words, that all else was equal
Multiple Independent Variables
• It is very possible that that is a bad implicit
assumption
• That is one reason we like to add multiple
explanatory variables. Once they are added,
they are not part of the error term and can be
explicitly accounted for when we interpret
coefficient estimates
• What the hell do I mean by all of this?
Multiple Independent Variables
• Go back to the salary example
• Hopefully you all agree that education and
experience are both highly likely to explain
salary in statistically significant ways
• But what if we didn’t have experience data, so
we just ran the regression on salary and
education?
Multiple Independent Variables
• What we would like to run:
– Salary = β0 + β1 * Educ + β2 * Exper + μ
• What we do run:
– Salary = β0 + β1 * Educ + μ
• Which means that experience has now been sucked into
the error term. If experience levels (conditional on
education) differ in our sample data set, the implicit
assumption that the errors are equally distributed across
all observations is wrong!
• If we ran the 2nd regression written above, we would
interpret β1,hat as the amount by which salary increases
when education increases by one unit (implicitly saying
all else, i.e. the “errors”, are equal, which I just argued is
probably a poor assumption)
Multiple Independent Variables
• So now say we have the experience data and we
can run the regression with 2 explanatory
variables
• Now we would interpret β1,hat as the amount by
which salary increases when education increases
by one unit AND EXPERIENCE IS THE SAME (plus
the remaining information captured by the errors
is the same across all observations)
• So we explicitly take experience out of the error
term and can now condition on it being the same
when we interpret the education coefficient
Multiple Independent Variables
• But how good does the implicit, ceteris
paribus, “error” assumption hold up even
when both educ and exper are included?
• Maybe still not very good. Everything you can
think of is still being captured by the error
terms except for education and experience
levels. If these somehow differ systematically
across observations, the assumption of equal
error distributions is still wrong!
Multiple Independent Variables
• What do I mean by “everything you can think of”?
Very simply, anything else that might (or might not!)
affect salary.
– Years of experience at current company
– Number of extended family members that work at same
company
– Intelligence
– How many sick days you took over the past 5 years
– How many kids you have
– How many siblings you have
– How many different cities you’ve lived in
– How many hot dogs you eat each year
– Etc, etc, etc, blah, blah, blah
Multiple Independent Variables
• Let’s look at those closer
– Years of experience at current company – Probably would have significant
effect on salary. We should include this in the regression if we can get the
data.
– Number of extended family members that work at same company – Might or
might not have affect on salary.
– Intelligence – Tough to measure, but could proxy for it using an IQ score. Very
likely to affect salary, so it should be included in the regression, too.
– How many sick days you took over the past 5 years – Kind of a measure of
effort, so I think it would matter.
– How many kids you have – Could matter, especially for women.
– How many siblings you have – Doubtful it would be significant.
– How many different cities you’ve lived in – Very unlikely to be significant.
– How many hot dogs you eat each year – I’m literally just making stuff up at
this point, so I doubt this would affect salary (unless we are measuring the
salaries of competitive eaters, so note that context can matter when
“answering” these questions)
Multiple Independent Variables
• So what happens if we think intelligence matters but it
wasn’t included in the regression as a separate explanatory
variable?
• Then intelligence is rolled up into the error term. But if
education and intelligence are highly correlated (smarter
people have more years of education), then the errors are
not the same across the individuals in the sample (E(μi|X) ≠
0). In fact, those with higher education have “higher” error,
by which I mean one component of the error term is
systemically bigger for some individuals
• This would make our ceteris paribus assumption false and
we would end up with biased estimators!
Multiple Independent Variables
• What if we include insignificant variables because
we are afraid of getting biased estimates if we
don’t throw everything in?
• Not really a problem. We will see how to
evaluate whether there are any relevant gains to
including additional variables. If there are, they
should be kept in the regression. If the gains are
negligible or even negative, drop those
insignificant variables and fear not the
repercussions of bias.
Multiple Independent Variables –
Output
• Look at and interpret output
• Sales are dependent on Advertising and Bonus
• Run the regression:
– Saleshat = -516.4 + 2.47 * Adv + 1.86 * Bonus
• This equation can be interpreted as providing an
estimate of mean sales for a given level of advertising
and bonus payment.
• If advertising is held constant, mean sales tend to rise
by $1860 (1.86 thousands of dollars) for each unit
increase in Bonus. If bonus is held fixed, mean sales
tend to rise by $2470 (2.47 thousands of dollars) for
each unit increase in Adv.
Multiple Independent Variables –
Output
• Notice in the Excel output that the dof of the
Regression is now 2 (always used to be 1). This is
because there are 2 explanatory variables. The
SSE, MSR, F, etc are calculated basically the same
way as before, which we will go over very soon.
• Look at Fig 4.7b on pg 141 of Dielman to see how
Excel outputs all the regression information when
multiple independent variables are included
Multiple Independent Variables –
Prediction
• As in simple regression, when we run a multiple
regression we can then predict, or estimate,
values for y when we have values for every
explanatory variable by solving for yhat
• Back to sales example with Adv and Bonus only
– Saleshat = -516.4 + 2.47 * Adv + 1.86 * Bonus
• Say Adv = 200 and Bonus = 150. What would we
predict for Sales (i.e. what is Saleshat)?
– Plug in Adv = 200 and Bonus = 150
– Saleshat = -516.4 + 2.47 * 200 + 1.86 * 150 = 256.6
Confidence Intervals and Hypothesis
Testing
• The confidence interval on βk,hat when K
explanatory variables are included is
– (βk,hat – tα/2,N-K-1 * sβk, βk,hat – tα/2,N-K-1 * sβk)
– Notice the dof change on the t-value
• Hypothesis testing on any one independent
variable is the same as before. The default
Excel test is shown below.
– H0 : βk = 0
– Ha : βk ≠ 0
Hypothesis Testing
• If the null on the previous slide is not rejected,
then the conclusion is that, once the effects of
all other variables in the regression are
included, xk is not linearly related to y. In
other words, adding xk to the regression eq’n
is of no help in explaining any additional
variation in y left unexplained by the other
explanatory variables. You can drop xk from
the regression and still have the same “fit”.
Hypothesis Testing
• If the null is rejected, then there is evidence
that xk and y are linearly related and that xk
does help explain some of the variation in y
not accounted for by the other variables
Hypothesis Testing
• Are Sales and Bonus linearly related?
• Use t-test
– H0 : βBON = 0
– Ha : βBON ≠ 0
– Dec rule → reject null if test stat more extreme than tvalue and do not reject otherwise
– βBON,hat = 1.856 and sβBON = 0.715, so test stat = 1.856 /
0.715 = 2.593
– The t value with 22 dof (from N-K-1) for a two-tailed test
with α = 0.05 is 2.074.
– Since 2.593 > 2.074, reject null
– Yes, they are linearly related (even when Advertising is
also accounted for)
Hypothesis Testing
• Could have used p-value or CI to answer the
question on previous slide
– Would have reached same conclusion
– Don’t use full F when testing just one variable
(more explanation later)
Assessing the Fit
• Recall SST, SSR, and SSE
– SST = ∑ (yi – ybar)2
– SSR = ∑ (yi,hat – ybar)2
– SSE = ∑ (yi – yi,hat)2
• For SSR, dof is equal to number of explanatory
variables K
• For SSE, dof is N – K – 1
• So SST has N – 1 dof
Assessing the Fit
• Recall that R2 = SSR / SST = 1 – (SSE / SST)
• It was a measure of the goodness of fit of the
regression line and ranged from 0 to 1. If R2 was
multiplied by 100, it represented the percentage
of the variation in y explained by the regression.
• Drawback to R2 in multiple regression → As more
explanatory variables are added, the value of R2
will never decrease even if the additional
variables are explaining an insignificant
proportion of the variation in y
Assessing the Fit
• From R2 = 1 – (SSE / SST), you can see that R2 gets
increasingly closer to 1 since SSE falls any time
any little tiny bit more variation in y is explained
• Addition of unnecessary explanatory variables,
which add little, if anything, to the explanation of
the variation in y, is not desirable
• An alternative measure is called adjusted R2, or
Radj2
– “Adjusted” because it adjusts for the dof
Assessing the Fit
• Radj2 = 1 – (SSE / (N – K – 1)) / (SST / (N – 1))
• Now suppose an explanatory variable is added
to the regression model that produces only a
very small decrease in SSE. The divisor N-K-1
also falls since K has been increased by 1. It is
possible that the numerator of Radj2 may
increase if the decrease in SSE from the
addition of another variable is not great
enough to overcome the decrease in N-K-1.
Assessing the Fit
• Radj2 no longer represents the proportion of
variation in y explained by the regression (that
is still captured only by R2), but it is useful
when comparing two regressions with
different numbers of explanatory variables. A
decrease in Radj2 from the addition of one or
more explanatory variables signals that the
added variable(s) was of little importance in
the regression, so it can be dropped.
Assessing the Fit
•
•
•
•
F = MSR / MSE
MSR = SSR / K
MSE = SSE / (N – K – 1)
Full F statistic is used to test the following
hypothesis:
– H0 : β1 = β2 = … = βK = 0
– Ha : At least one coefficient above is not equal to 0
Assessing the Fit
• Decision rule → reject null if F > fcrit(α; K, N-K1) and do not rej otherwise
• Failing to reject the null implies that the
explanatory variables in the regression
equation are of little or no use in explaining
the variation in y. Rejection of the null implies
that at least one (but not necessarily all) of
the explanatory variables helps explain the
variation in y.
Assessing the Fit
• Rejection of the null does not mean that all pop’n
regression coefficients are different from 0 (though this
may be true), just that the regression is useful overall
in explaining y.
• The full F test can be thought of as a global test
designed to assess the overall fit of the model.
• That’s why full F cannot be used for hypothesis testing
on a single variable in multiple regression, but it could
be used for the hypothesis testing on the single
explanatory variable in simple regression (since that
variable was the whole, “global” model)
Sales Example
• Show the calculation of F on the Excel sheet
– Using SSE and SSR
– Using MSE and MSR
• Would we reject the null that all coefficients
are equal to 0?
– YES
Comparing Two Regression Models
• Remember that the t-test can check whether
each individual regression coefficient is
significant and the full F test can check the
overall fit of the regression by asking whether
any coefficient is significant
• Partial F test is in between – it answers the
question of whether some subset of
coefficients are significant or not
Comparing Two Regression Models
• Want to test whether variables xL+1, … , xK are
useful in explaining any variation in y after
taking into account variation already
explained by x1, … , xL variables
• Full model has all K variables:
– y = β0 + β1 * x1 + β2 * x2 + … + βL * xL + βL+1 * xL+1 +
… + β K * xK + μ
• Reduced model only has L variables:
– y = β 0 + β 1 * x 1 + β 2 * x 2 + … + β L * xL + μ
Comparing Two Regression Models
• Is the full model significantly better than the
reduced model at explaining the variation in
y?
• H0 : βL+1 = … = βK = 0
• Ha : at least one of them isn’t equal to 0
• If null is not rejected, choose the reduced
model
• If null is rejected, xL+1, … , xK contribute to
explaining y, so use the full model
Comparing Two Regression Models
• To test the hypothesis, use the following partial F
statistic
– Fpart = ((SSER – SSEF) / (K – L)) / ((SSEF) / (N – K – 1)),
where the “R” stands for reduced model and “F”
stands for full model
• SSER – SSEF is always greater than or equal to 0
– Full model includes K – L extra variables which, at
worst, explain none of variation in y and in all
likelihood explain at least a little of it, so SSE falls
– This difference represents the additional amount of
variation in y explained by adding xL+1, … , xK to the
regression
Comparing Two Regression Models
• This measure of improvement is then divided
by the number of additional variables
included, K – L
– Thus the numerator of Fpart is the additional
variation in y explained per additional explanatory
variable used
• Reject null if Fpart > fcrit(α; K – L, N – K – 1) and
do not reject otherwise
Sales Example Revisited
• Example 4.4, pg 152 of Dielman
• Let’s add two more variables to the sales
example from earlier
• x3 is mkt share held by company in each
territory and x4 is largest competitor’s sales in
each territory
• So the “reduced” model results we already
have. They were shown earlier when just x1
(Adv) and x2 (Bonus) were included
Sales Example Revisited
• We need to see the full model results
• Notice that R2 is higher for the full model
(remember, R2 can never fall when more
variables are added) but Radj2 is actually lower
– This should be a clue that we will probably not
reject the null on β3 and β4 when comparing the
full and reduced models
Sales Example Revisited
• SSER = 181176, SSEF = 175855, K – L = 2, N – K – 1
= 20
– Note that this last value is the dof of SSE in the full
model
• So Fpart = ((181176 – 175855) / 2) / (175855 / 20)
= 0.303
• fcrit(0.05; 2, 20) = 3.49
• Since 0.303 < 3.49, do not reject null
• Conclude that β3 = β4 = 0, so x3 and x4 should not
be included in the regression
Sales Example Revisited
• Notice that the values for β0,hat, βADV,hat, and βBON,hat
changed when we added additional variables
– Saleshat = -516.4 + 2.47 * Adv + 1.86 * Bonus
– Saleshat = -593.5 + 2.51 * Adv + 1.91 * Bonus + 2.65 *
Mkt_Shr – 0.121 * Compet
• This should not surprise you. Some of what was
previously rolled up into μ has now been explicitly
accounted for, and that changes the way the initial set
of explanatory variables relate to Sales.
• Note that the inclusion of additional observations (i.e.
we gather more data) could also adjust the estimates
of β0,hat, etc
• Every regression is different! (like snowflakes.......)
Sales Example Revisited
• If we chose to stick with the “full” sales model, we
would include the x3 and x4 variables in predicting
Saleshat
– Even though they are insignificant, because the β0,hat,
βADV,hat, and βBON,hat values changed with their inclusion, it
would be wrong to make predictions without them
(unless we re-ran the original regression where they were
not even included)
• So what is Saleshat for Adv = 500, Bonus = 150,
Mkt_Shr = 0.5, and Compet = 100?
– Saleshat = -593.5 + 2.51 * 500 + 1.91 * 150 + 2.65 * 0.5 –
0.121 * 100 = 937.2
Limits to K?
• There are K + 1 coefficients that need to be estimated
(β0, β1, … , βK)
• We need at least N observations to estimate that many
coefficients
• Normally written as K ≤ N – 1
• This is a similar concept from an algebra class you’d
have taken in middle school, where we needed at least
M equations to solve for X unknowns (i.e. M ≥ X)
– Here, you can think of N being similar to the number of
equations needed and K being the number of unknowns to
be solved for
Multicollinearity
• For a regression of y on K explanatory variables, it
is hoped that the explanatory variables are highly
correlated with the dependent variable
• However, it is not desirable for strong
relationships to exist among the explanatory
variables themselves
• When explanatory variables are correlated with
one another, the problem of multicollinearity is
said to exist
Multicollinearity
• Seriousness of problem depends on degree of
correlation
• Some books list an additional assumption of OLS that
the sample data X is not all the same value, and a
follow-up assumption that X1 cannot directly
determine X2
– The first point made in the last bullet hardly ever happens.
As long as X varies in the population, the sample data will
almost always vary unless the pop’n variation is minimal or
the sample size is very small.
– The second point made in the last bullet expressly forbids
perfect multicollinearity to occur between any 2
explanatory variables
Biggest Problem for MultiC
• The std errors of regression coefficients are large
when there is high multicollinearity among
explanatory variables
• The null hypo that the coefficients are 0 may not be
rejected even when the associated variable is
important in explaining variation in y
• Summary: Perfect collinearity is fatal for a
regression. Any small degree of multicollinearity
increases std errors and is thus somewhat
undesirable, though basically unavoidable.
– We will look at one strategy for investigating
multicollinearity and using it to inform our regression
choices next (free preview: Fpart is useful)
Baseball Example
• Example comes from the Wooldridge text
• I believe baseball player salaries are determined by
years in the league, avg games played per year, career
batting average, avg home runs per year, and avg RBIs
per year
• So the following regression is run:
– log(salary) = β0 + β1 * years + β2 * games_yr + β3 * cavg + β4
* hr_yr + β5 * rbi_yr + μ
– Ignore the log for now, that’s for next week. I just wanted
to stay kosher with the example from my other book. Just
think of it as “salary” if it really bothers you.
Baseball Example
• Results
β0
β1
β2
β3
β4
11.19
(0.29)
0.0689
(0.0121)
0.0126
(0.0026)
0.00098
0.0144
(0.00110) (0.0161)
• Plus N = 353 and SSEF = 183.186
β5
0.0108
(0.0072)
Baseball Example
• Simple t-test on the last three coefficients would
say they are insignificant in explaining log(salary)
• But any baseball fan knows that batting avg,
home runs, and RBIs definitely are big factors in
determining player salaries (and team
performance for that matter)
• So let’s run the reduced model where we drop
out those three variables and check to see what
the partial F statistic reveals
Baseball Example
• Results
β0
β1
β2
11.22
(0.11)
0.0713
(0.0125)
0.0202
(0.0013)
• Plus N = 353 and SSER = 198.311
Baseball Example
• So Fpart = 9.55 (do the math yourself later, you
have everything you need), and we reject null
that β3 = β4 = β5 = 0 (and thus that batting avg,
home runs, and RBIs have no effect on salary)
• That may seem surprising in light of
insignificant t-stats for all 3 in the full model
regression
Baseball Example
• What is happening is that two variables, hr_yr and
rbi_yr, are highly correlated (and less so for cavg), and
this multicollinearity makes it difficult to uncover the
partial effect of each variable
– This is reflected in individual t-stats
• Fpart stat tests whether all 3 variables above are jointly
significant, and multicollinearity between them is
much less relevant for testing this hypo
• If we drop one of those variables, we would see the tstat of the others increase by a lot (even to the point of
significance). The point estimates might change up or
down, but the standard errors would definitely fall.
Dummy Variables
• Dummy variables, or indicator variables, take on
only two values → 0 or 1
• They indicate whether a sample observation from
our data does (1) or does not (0) belong in a certain
category
– You can think of them as “yes” (1) or “no” (0) variables
• Examples:
–
–
–
–
Gender – 1 if female, 0 otherwise
Race – 1 if white, 0 otherwise
Employment – 1 if employed, 0 otherwise
Education – 1 if college graduate, 0 otherwise
Dummy Variables
• Can also be used to capture deeper qualitative
information
–
–
–
–
Is person A a US citizen? (1 if yes, 0 if no)
Is person A a baseball fan? (1 if yes, 0 if no)
Does person A own a computer? (1 if yes, 0 if no)
Is summer the favorite season of person A? (1 if yes, 0
if no)
– Does firm Z sell video games? (1 if yes, 0 if no)
– Has country Z signed a free trade agreement with
Canada? (1 if yes, 0 if no)
Dummy Variables
• In regression analysis, we must always “leave
out” one part of the indicator
• Use gender as the example here
– So Xmale = 1 if male, 0 otherwise might be included in
the regression as an independent variable
– But we cannot also include Xfemale = 1 if female, 0
otherwise in the regression
– One “part” (here, female indicator) must be left out
– Why is this? Think back to the prefect collinearity
problem discussed earlier. We can always define
“female” completely in terms of “male” (Xfemale = 1 –
Xmale). So both cannot be included in the regression or
we get an error.
Dummy Variables
• The group whose indicator is omitted from the
regression serves as the base-level group for
comparison
• In the gender example, say I ran the following
regression:
– Salary = β0 + β1 * Educ + β2 * Male + μ
Dummy Variables
• Then the base-level group are females
• The intercept for females is β0, while for males
it is β0 + β2
• From where?
– Indicated group (males) → Salary = β0 + β1 * Educ
+ β2 * Male + μ = β0 + β1 * Educ + β2 + μ = (β0 + β2)
+ β1 * Educ + μ
– Non-indicated group (females) → Salary = β0 + β1
* Educ + β2 * Male + μ = (β0) + β1 * Educ + μ
Dummy Variables
• If we wanted to answer the question of whether
or not men and women earn the same salary
once education has been accounted for, a simple
t-test would do the trick
– H0 : β2 = 0
– Ha : β2 ≠ 0
– If we reject the null, then men and women earn
different salaries even when education levels are
accounted for (remember there’s all kinds of other
stuff in μ though)
Dummy Variables
• How about a more complicated example of
indicator variables?
• Suppose firms in a sample are categorized
according to the exchange on which they are
listed (NYSE, AMEX, or NASDAQ). We believe the
exchange they are on may have some predictive
power over the value of the firm.
– D1 = 1 if listed on NYSE, 0 otherwise
– D2 = 1 if listed on AMEX, 0 otherwise
– D3 = 1 if listed on NASDAQ, 0 otherwise
Dummy Variables
• Let NYSE be the base level, so leave its dummy
out of the regressions equation
• Include firm-level assets and number of
employees as additional independent
variables
• Value = β0 + β1 * D2 + β2 * D3 + β3 * Assets + β4
* Employees + μ
• Then the NYSE intercept is β0, AMEX is β0 + β1,
and NASDAQ is β0 + β2
Dummy Variables
• When using indicator variables, the partial F statistic is
used to test whether the variables are important as a
group. The t-test on individual coefficients should not be
used to decide whether individual indicator variables
should be retained or dropped (except when there are
only two groups represented, and thus only one indicator
variable, such as the male/female salary regression a few
slides back).
• The indicator variables are designed to have meaning as a
group, and are either all retained or all dropped as a
group. Dropping individual indicators changes the
meaning of the remaining ones.
– Imagine dropping just D2 (AMEX) in the previous regression.
Then D3 (NASDAQ) is kept, while the base-level group switches
from D1 (NYSE) to simply “not D3” (which would include both
NYSE and AMEX)
Dummy Variables – Sales Example
• This is example 7.3 on pg 279 of Dielman
• Look at relationship between dependent variable
(Sales) and a few independent variables
(Advertising, Bonus).
• Let’s add variables indicating the region of the US
in which Sales are made.
– South = 1 if territory is in the South, 0 otherwise
– West = 1 if territory is in the West, 0 otherwise
– Midwest = 1 if territory is in the Midwest, 0 otherwise
Dummy Variables – Sales Example
• Let Midwest be the base level group
– So leave it out of the regression
• Regression:
– Sales = β0 + β1 * Adv + β2 * Bonus + β3 * South + β4 *
West + μ
• We find β3,hat = -258 and β4,hat = -210. What do
those mean?
– Since β3,hat = -258, Sales in the South are 258 units
lower than sales in the Midwest (since Midwest is our
comparison group) even if we condition on Adver and
Bonus being the same (similar for β4,hat)
Dummy Variables – Sales Example
• It would be inappropriate to run simple t-tests on
those coefficients to determine their significance. We
need to use partial F. Think about how the
interpretation of all indicators would change if we ran a
t-test and decided to drop only β4 * West from the
regression.
• To determine whether there is a significant difference
in sales for territories in different regions, the following
hypotheses should be tested:
– H0 : β3 = β4 = 0
– Ha : at least one of them isn’t equal to 0
Dummy Variables – Sales Example
• The larger (full) model is again:
– Sales = β0 + β1 * Adv + β2 * Bonus + β3 * South + β4
* West + μ
• So the null stating that both indicators are 0, if
not rejected, would mean no differences in
sales in the regions exists, and the indicators
can be dropped
• The simpler (reduced) model:
– Sales = β0 + β1 * Adv + β2 * Bonus + μ
Dummy Variables – Sales Example
• So Fpart = ((SSER – SSEF) / (K – L)) / MSEF = 17.3
• Decision is to reject null since 17.3 > fcrit(0.05;
2, 20)
• Thus, at least one of the coefficients of the
indicator variables is not 0. There are
differences in average sales levels btwn the
three regions. Keep the indicator variables in
the regression.
Suggested Problems from Dielman
•
•
•
•
•
•
Pg 148, #1
Pg 158, #3
Pg 169, #7
Pg 170, #11
Pg 173, #17
Pg 285, #1
Download