Quantitative Methods Bivariate Regression (OLS) We’ll start with OLS regression. Stands forOrdinary Least Squares Regression. Relatively basic multivariate regression. Purpose: Prediction? Explanation? Assessing the effects of various independent variables on a dependent variable. Limits: Consider the construction of the dependent variable. There are more appropriate methods to predict or explain the number of things that happen, or when something happens, or the likelihood that something will happen, and so on. (but robustness is a plus, and the discussion can serve as a foundation for other methods) Bivariate Regression (OLS) How many pieces of information do we need to define a line? Look at Figure A. Slope? Intercept? Different ways to interpret “slope”. Bivariate Regression (OLS) Figure A is a “perfect linear relationship”. Is this likely what our data look like? Error.... Perfect Linear Relationships—and error What about Figure B? So, in OLS regression, we “fit” a line to the data. What is the equation for a line represented in Figure A? How do we indicate that there is some error? Notation A detour about standard notation.... predicted versus population or “true” slope, intercept, error Apply this to figure C. Fitting a Line So, we fit a line to our data, in order to predict, to explain / assess the effects of different variables.... But how can we decide which “line” represents the best fit? Fitting a Line Minimize sample errors (see Figure D1 and D2) Minimize absolute errors (see Figure D3 and D4) Minimize Squared Errors (we’ll talk next week about estimating a “principal components line”, which is what can be used in other multivariate methods) OLS Assumptions There are general one of two consequences to violating any OLS assumption. Biased estimates—and what’s important here is to begin to understand what is meant by “bias”. And incorrect confidence intervals (or standard errors that are inflated or deflated). In other words, you are more or less sure of your results than you really otherwise would (or “should”) be. OLS Assumptions Measurement error -- consequences Specification error 1. Linear Relationship between X and Y. See Graph E1 and E2. What will happen if we fit a line to those data? Non-linear relationships are very common, and can be easily addressed. OLS Assumptions 2. Include all relevant variables in the model. 3. Do not include irrelevant variables in the model. (Think of this as a degrees of freedom or information issue—but it’s also just useful to have parsimonious models.) (but how do we decide what variables to include in our model?) OLS Assumptions Error Term Assumptions The error terms should average out to zero, in the long run. Note that the average residual value will always be zero—as an articate of OLS regression calculations. The variance of the error terms is constant across observations. That is, we have homoskedastic errors; we do not have heteroskedasticity. See Graphs F1 through F4. (Note, however, that what appears to be heteroskedasticity could be specification error). OLS Assumptions Error Term Assumptions Any one residual is not correlated with any other residual. That is, our error terms are not autocorrelated. When is this most common? And, we assume that the error term is uncorrelated with each independent variable. (Again, this often reflects specification error).