Two-Variable Regression Analysis: Some Basic Ideas

advertisement
Two-Variable Regression Analysis: Some Basic Ideas
Jamie Monogan
University of Georgia
Intermediate Political Methodology
Jamie Monogan (UGA)
Two-Variable Regression: Basic Ideas
POLS 7014
1 / 10
Objectives
By the end of this meeting, participants should be able to:
Define the population and sample regression models and identify the
components of each.
Classify whether a model is linear in variables and whether it is linear
in parameters.
Explain the role of the stochastic disturbance in the population
regression model and describe how this term might be interpreted.
Estimate a two-variable linear model in R & Stata and graph the
results over the raw data.
Jamie Monogan (UGA)
Two-Variable Regression: Basic Ideas
POLS 7014
2 / 10
The Population Regression Model
Regression models give us the
conditional expectation of Y given
X , E (Y |X ). This should be more
informed than the unconditional
expected value E (Y ).
At the population level, if we
connect all of the conditional
expected values, we obtain a
population regression curve.
Generally speaking, we try to
model the population regression
function, E (Y |Xi ) = f (Xi ). In
other words, what function can
represent the conditional
expectation in the population?
Jamie Monogan (UGA)
Source:
Gujarati & Porter
2009, 37
Two-Variable Regression: Basic Ideas
POLS 7014
3 / 10
A Linear Population Regression Function
It falls on the researcher to specify the functional form of the
population regression function. (Recall: Attributes of a population are
unknown.)
A common specification is the linear population regression function:
E (Y |Xi ) = β1 + β2 Xi .
Equivalent representation: Yi = β1 + β2 Xi + ui .
β1 is the population intercept, or the conditional expectation when
Xi = 0.
β2 is the population slope coefficient.
β1 and β2 are parameters.
Jamie Monogan (UGA)
Two-Variable Regression: Basic Ideas
POLS 7014
4 / 10
The Meaning of the Term Linear
A model is linear in the variables if Y is a linear function of every X
variable.
A model is linear in the parameters if each parameter is only raised to
the power 1 and is not multiplied or divided by any other parameter.
(I.e., β12 , β1 × β2 , and β1 /β2 are all prohibited.)
The linear regression model is linear in the parameters.
A linear regression model might be linear in the variables, or it might
not be.
Hence, the linear regression model can produce a variety of geometric
shapes.
Jamie Monogan (UGA)
Two-Variable Regression: Basic Ideas
POLS 7014
5 / 10
Examples of Linear and Non-Linear Models
Which models are linear in the parameters?
Income as a Function of Age
Yi = β1 + β2 Xi + ui
Income as a Function of Age Squared
Yi = β1 + β2 Xi2 + ui
Probability of Voting as a Function of Income (MLE)
exp(β1 +β2 Xi )
Pr (Yi = 1) = 1+exp(β
1 +β2 Xi )
Alternate form (Generalized linear model): Λ−1 (Pr (Yi = 1)) = β1 + β2 Xi
Moving Average Model (Time Series)
zt = θ0 − θ1 at−1 + at
Jamie Monogan (UGA)
Two-Variable Regression: Basic Ideas
POLS 7014
6 / 10
The Role of the Disturbance Term
We model the conditional expectation, but that does not mean we
can perfectly predict the outcome.
Hence, we have to say that a disturbance term is part of the model of
the outcome. Namely, ui in the equation Yi = E (Y |Xi ) + ui .
The disturbance takes-on different and unpredictable values for each
observation. For example, consider several outcomes with the same
input value (from Gujarati & Porter 2009, 40):
Y1 = 55 = β1 + β2 (80) + u1
Y2 = 60 = β1 + β2 (80) + u2
Y3 = 65 = β1 + β2 (80) + u3
Y4 = 70 = β1 + β2 (80) + u4
Y5 = 75 = β1 + β2 (80) + u5
Jamie Monogan (UGA)
Two-Variable Regression: Basic Ideas
POLS 7014
7 / 10
The Meaning of the Disturbance Term
What information might be captured by the disturbance term?
1
Vagueness of theory.
2
Unavailability of data.
3
Core variables versus peripheral variables. (Careful here.)
4
Intrinsic randomness in human behavior.
5
Poor proxy variables. (Poses a substantial problem.)
6
Principle of parsimony. (Occam’s razor.)
7
Wrong functional form. (Poses a substantial problem.)
Jamie Monogan (UGA)
Two-Variable Regression: Basic Ideas
POLS 7014
8 / 10
The Sample Regression Model
We usually have to estimate our models with a sample from the
population.
Hence, the sample regression function is: Ŷi = β̂1 + β̂2 Xi .
β̂1 & β̂2 are estimators or statistics. We use these to estimate
population parameters.
The numerical values we obtain from our estimator are called
estimates.
The estimate for β̂1 is the sample intercept and the estimate for β̂2 is
the sample slope coefficient.
ûi is the residual or error term.
In contrast to p. 44, we don’t really estimate ui as much as we
predict it.
Jamie Monogan (UGA)
Two-Variable Regression: Basic Ideas
POLS 7014
9 / 10
For Next Time
Read Gujarati & Porter chapter 3 (Two-Variable Regression Analysis:
The Problem of Estimation).
From Gujarati & Porter, pp. 48-49, answer questions 2.3, 2.5, & 2.13.
Open stateImmig0511.tab in R or Stata (source:
http://hdl.handle.net/1902.1/16471). Again, you may want
the file codebookStateImmig.txt to look-up variable descriptions.
From these data please report the following:
Write down a linear population regression model in which immigrant
policy is a function of public ideology. (Same measures as last week.)
Estimate this regression model using the data.
Report the results of your model in equation form as a sample
regression model.
Create a scatterplot with immigrant policy on the vertical axis, public
ideology on the horizontal axis, and the estimated regression line
through the data.
Write a sentence or two describing how this line compares to the line
you approximated by hand last time.
Include your code at the back of the document.
Jamie Monogan (UGA)
Two-Variable Regression: Basic Ideas
POLS 7014
10 / 10
Download