Chapter 3
Learning to Use
Regression
Analysis
Copyright © 2011 Pearson Addison-Wesley.
All rights reserved.
Slides by Niels-Hugo Blunch
Washington and Lee University
Steps in Applied
Regression Analysis
• The first step is choosing the dependent variable – this step is
determined by the purpose of the research (see Chapter 11 for
details)
• After choosing the dependent variable, it’s logical to follow the
following sequence:
1. Review the literature and develop the theoretical model
2. Specify the model: Select the independent variables and the
functional form
3. Hypothesize the expected signs of the coefficients
4. Collect the data. Inspect and clean the data
5. Estimate and evaluate the equation
6. Document the results
© 2011 Pearson Addison-Wesley. All rights reserved.
3-1
Step 1: Review the Literature and
Develop the Theoretical Model
• Perhaps counter intuitively, a strong theoretical foundation
is the best start for any empirical project
• Reason: main econometric decisions are determined by the
underlying theoretical model
• Useful starting points:
– Journal of Economic Literature or a business oriented publication of
abstracts
– Internet search, including Google Scholar
– EconLit, an electronic bibliography of economics literature (for more
details, go to www.EconLit.org)
© 2011 Pearson Addison-Wesley. All rights reserved.
3-2
Step 2: Specify the Model: Independent
Variables and Functional Form
• After selecting the dependent variable, the
specification of a model involves choosing the
following components:
1. the independent variables and how they should be
measured,
2. the functional (mathematical) form of the variables,
and
3. the properties of the stochastic error term
© 2011 Pearson Addison-Wesley. All rights reserved.
3-3
Step 2: Specify the Model:
Independent Variables and
Functional Form (cont.)
• A mistake in any of the three elements results in a specification error
• For example, only theoretically relevant explanatory variables should
be included
• Even so, researchers frequently have to make choices –also denoted
imposing their priors
• Example:
• when estimating a demand equation, theory informs us that prices of
complements and substitutes of the good in question are important
explanatory variables
• But which complements—and which substitutes?
© 2011 Pearson Addison-Wesley. All rights reserved.
3-4
Step 3: Hypothesize the Expected
Signs of the Coefficients
• Once the variables are selected, it’s important to
hypothesize the expected signs of the regression
coefficients
• Example: demand equation for a final consumption good
• First, state the demand equation as a general function:
(3.2)
• The signs above the variables indicate the hypothesized
sign of the respective regression coefficient in a linear
model
© 2011 Pearson Addison-Wesley. All rights reserved.
3-5
Step 4: Collect the Data & Inspect
and Clean the Data
• A general rule regarding sample size is “the more
observations the better”
• as long as the observations are from the same general
population!
• The reason for this goes back to notion of degrees of
freedom (mentioned first in Section 2.4)
• When there are more degrees of freedom:
• Every positive error is likely to be balanced by a negative error
(see Figure 3.2)
• The estimated regression coefficients are estimated with a
greater deal of precision
© 2011 Pearson Addison-Wesley. All rights reserved.
3-6
Figure 3.1 Mathematical Fit of a
Line to Two Points
© 2011 Pearson Addison-Wesley. All rights reserved.
3-7
Figure 3.2 Statistical Fit of a Line
to Three Points
© 2011 Pearson Addison-Wesley. All rights reserved.
3-8
Step 4: Collect the Data & Inspect
and Clean the Data (cont.)
• Estimate model using the data in Table 2.2 to get:
• Inspecting the data—obtain a printout or plot (graph)
of the data
• Reason: to look for outliers
– An outlier is an observation that lies outside the range of the rest of
the observations
• Examples:
– Does a student have a 7.0 GPA on a 4.0 scale?
– Is consumption negative?
© 2011 Pearson Addison-Wesley. All rights reserved.
3-9
Step 5: Estimate and Evaluate
the Equation
• Once steps 1–4 have been completed, the estimation part
is quick
– using Eviews or Stata to estimate an OLS regression takes less
than a second!
• The evaluation part is more tricky, however, involving
answering the following questions:
– How well did the equation fit the data?
– Were the signs and magnitudes of the estimated coefficients as
expected?
• Afterwards may add sensitivity analysis (see Section 6.4
for details)
© 2011 Pearson Addison-Wesley. All rights reserved.
3-10
Step 6: Document the Results
• A standard format usually is used to present estimated
regression results:
(3.3)
• The number in parentheses under the estimated coefficient
is the estimated standard error of the estimated
coefficient, and the t-value is the one used to test the
hypothesis that the true value of the coefficient is different
from zero (more on this later!)
© 2011 Pearson Addison-Wesley. All rights reserved.
3-11
Case Study: Using Regression Analysis
to Pick Restaurant Locations
• Background:
• You have been hired to determine the best location
for the next Woody’s restaurant (a moderately priced,
24-hour, family restaurant chain)
• Objective:
• How to decide location using the six basic steps of
applied regression analysis, discussed earlier?
© 2011 Pearson Addison-Wesley. All rights reserved.
3-12
Step 1: Review the Literature and
Develop the Theoretical Model
• Background reading about the restaurant industry
• Talking to various experts within the firm
– All the chain’s restaurants are identical and located in
suburban, retail, or residential environments
– So, lack of variation in potential explanatory variables to help
determine location
– Number of customers most important for locational decision
 Dependent variable: number of customers (measured by
the number of checks or bills)
© 2011 Pearson Addison-Wesley. All rights reserved.
3-13
Step 2: Specify the Model: Independent
Variables and Functional Form
• More discussions with in-house experts
reveal three major determinants of sales:
– Number of people living near the location
– General income level of the location
– Number of direct competitors near the location
© 2011 Pearson Addison-Wesley. All rights reserved.
3-14
Step 2: Specify the Model: Independent
Variables and Functional Form (cont.)
• Based on this, the exact definitions of the independent
variables you decide to include are:
– N = Competition: the number of direct competitors within a twomile radius of the Woody’s location
– P = Population: the number of people living within a three-mile
radius of the location
– I = Income: the average household income of the population
measured in variable P
• With no reason to suspect anything other than linear
functional form and a typical stochastic error term,
that’s what you decide to use
© 2011 Pearson Addison-Wesley. All rights reserved.
3-15
Step 3: Hypothesize the Expected
Signs of the Coefficients
• After talking some more with the in-house
experts and thinking some more, you
come up with the following:
(3.4)
© 2011 Pearson Addison-Wesley. All rights reserved.
3-16
Step 4: Collect the Data &
Inspect and Clean the Data
• You manage to obtain data on the dependent and
independent variables for all 33 Woody’s restaurants
• Next, you inspect the data
• The data quality is judged as excellent because:
• Each manager measures each variable identically
• All restaurants are included in the sample
• All information is from the same year
• The resulting data is as given in Tables 3.1 and 3.3 in the
book (using Eviews and Stata, respectively)
© 2011 Pearson Addison-Wesley. All rights reserved.
3-17
Step 5: Estimate and Evaluate
the Equation
• You take the data set and enter it into the computer
• You then run an OLS regression (after thinking the model over one
last time!)
• The resulting model is:
(3.5)
Estimated coefficients are as expected and the fit is reasonable
• Values for N, P, and I for each potential new location are then
obtained and plugged into (3.5) to predict Y
© 2011 Pearson Addison-Wesley. All rights reserved.
3-18
Step 6: Document the Results
• The results summarized in Equation 3.5
meet our documentation requirements
• Hence, you decide that there’s no need to
take this step any further
© 2011 Pearson Addison-Wesley. All rights reserved.
3-19
Table 3.1a
Data for the Woody’s Restaurants Example
(Using the Eviews Program)
© 2011 Pearson Addison-Wesley. All rights reserved.
3-20
Table 3.1b
Data for the Woody’s Restaurants Example
(Using the Eviews Program)
© 2011 Pearson Addison-Wesley. All rights reserved.
3-21
Table 3.1c
Data for the Woody’s Restaurants Example
(Using the Eviews Program)
© 2011 Pearson Addison-Wesley. All rights reserved.
3-22
Table 3.2a
Actual Computer Output
(Using the Eviews Program)
© 2011 Pearson Addison-Wesley. All rights reserved.
3-23
Table 3.2b
Actual Computer Output
(Using the Eviews Program)
© 2011 Pearson Addison-Wesley. All rights reserved.
3-24
Table 3.3
Data for the Woody’s Restaurants Example
(Using the Stata Program)
© 2011 Pearson Addison-Wesley. All rights reserved.
3-25
Table 3.3b
Data for the Woody’s Restaurants Example
(Using the Stata Program)
© 2011 Pearson Addison-Wesley. All rights reserved.
3-26
Table 3.4a
Actual Computer Output
(Using the Stata Program)
© 2011 Pearson Addison-Wesley. All rights reserved.
3-27
Table 3.4b
Actual Computer Output
(Using the Stata Program)
© 2011 Pearson Addison-Wesley. All rights reserved.
3-28
Key Terms from Chapter 3
• The six steps in applied regression analysis
• Dummy variable
• Cross-sectional data set
• Specification error
• Degrees of freedom
© 2011 Pearson Addison-Wesley. All rights reserved.
3-29