Uploaded by khodiguian

Econ 309 Final Study-Guide

4. REGRESSION AS A BEST FITTING LINE: Tool economists use to understand the relationship between two or more variables. Particularly
useful in cases where there are many variables and complex interactions between them (ex. unemployment and interest rates, money supply,
exchange rates, inflation, etc.). Regression that involves two variables in considered a simple regression with two lines (X, Y). Multiple
regression involves many variables. XY-plots reveal a great deal about the relationship between X and Y. A straight line drawn through the
points on the XY-plot provides a convenient summary of the relationship between them (ex. Y = house price and X = lot size). The linear
relationship between them is Y = alpha + betaX, where alpha is the intercept of the line and beta is the slope. This equation is referred to as
the regression line. In the real world there are no data points that lie precisely on a straight line and the linear regression model is only an
approximation of the true relationship. There are many factors that are dependent on data for a regression model that is not practical to
have. The omission of this type of data means that the model makes an error. We call this the error (e). With the error includes the equation
Is Y = alpha + betaX + e. Y is the dependent, X is the explanatory, and alpha and beta are the coefficients. A model specifies how different
variables interact. We can treat the regression as a technique for generalizing correlation and interpret the numbers the regression model
produces purely as reflecting the association between the variables. The implicit assumption of causality can be a problem and develop new
methods. Alpha^(hat) and beta^(hat) are estimates of alpha and beta that we guess estimate of the unknown true values. The way we find
estimates is by drawing a straight line through the points on an XY-plot which fits best. The error is the distance between a data point and
the true regression line. If we replace alpha and beta with the ^hat versions, we get a straight line which is generally a little different from the
true regression line. The deviations from the estimated regression line are called the residuals [µ] (errors and residuals are basically the same
thing). Residuals are the vertical difference between the line drawn and the points (u1, u2, u3). A good fitting line will have small residuals.
The usual way of measuring the size of the residuals is by means of the SUM OF SQUARED RESIDUALS (SSR) which is given by: SSR =
Σ________. We want to find the best fitting line which minimizes the sum of squared residuals. For this reason, estimates found in this way are
called LEAST SQUARED estimates (OLS). INTERPRETING OLS ESTIMATES: Beta is the coefficient of the best fitting straight line through the XYplot. If B^ is positive X and Y are positively correlated. Beta^ can also be interpreted as the marginal effect of X on Y and is a measure of how
much X influences Y or how much Y tends to change when X is changed by one unit. Unusual (large) observations are called outliers. FITTED
VALUES & R^2: The most common measure of fit is referred to as R^2 (for simple regression model it is the correlation squared but not for
multiple regression model). Fitted value does not pass precisely through each point on a plot (I.e. for each data point an error is made). The
fitted value for observation ¡ is the value that lies on the regression line corresponding to the Xi value for that particular observation. If you
draw a straight vertical line through a particular point in the XY-plot, the intersection between this vertical line and the regression line is the
fitted value corresponding to the point that you chose. Adding an ¡ subscript indicates that we are referring to a particular observation. By
looking at actual Yi and Y^I, we can gain a rough impression of the “goodness of fit” of the regression model. This helps measure how good
the regression model fits and allows you to examine individual observations to determine which ones are close to the regression line and which
ones are not. The difference between the actual and fitted value of Y is another way to express the residual (ui = Yi – Y^I). Sometimes big
outliers are of interest for information. A closely related concept of R^2 is the TOTAL SUM OF SQUARES (TSS). TSS = Σ(Yi – Y Y^(bar on top).
TSS is the measure of the variability of Y through the explanatory variable X. The total variability of Y can be broken down into two parts:
TSS = RSS + SSR. PSS = Σ(Y^I = Y(bar on top))^2. SSR is the sum of squared residuals and a good fitting regression model will make the SSR
very small. We combine the equation to yield a measure of fit R^2 = 1 - (SSR/TSS) or equivalently, R^2 = RSS/TSS. RSS SSR AND TSS ARE ALL
POSITIVE (TSS ≥ RSS and TSS ≥ SSR. This means that 0 ≤ R^2 ≤ 1. A regression line that fits all data points perfectly in the XY-plot will have no
errors and hence SSR = 0 and R^2 = 1. In summary, high values of R^2 imply a good fit and low values of bad fit. If RSS is near TSS that means
the fit is good because it accounts for almost all the variability in the dependent variable.
5. STATISTICAL ASPECTS OF REGRESSION: You can think of a point estimate as your best guess at what beta is. We can obtain different
confidence intervals for example 95% CI says that “we are 95% confident that beta lies in the interval.” the degree of confidence is referred to
as confidence level. WHICH FACTORS AFFECT THE ACCURACY OF THE ESTIMATE B^?: The line fitted narrow and bunched together is the
most accurate. 1. Having more data points improves the accuracy of estimation. 2. Having smaller errors improves the accuracy of
estimation. 3. Having a larger spread of values (I.e. a large variance) of the explanatory variable (X) improves accuracy of estimation. You
want your data to be diverse and have a broad spectrum. You will also want X to have a high variance. Having a large spread of values (I.e. a
larger variance) for the error (e), is not. CALCULATING A CONFIDENCE INTERVAL FOR BETA: The factors mentions before are commonly used
for interval estimates for beta: the confidence interval. Sb is the standard deviation of beta^ and is often referred as standard error.
Notes that the more confident you wish to be about your interval, the wider it becomes. For ex. A 99% confidence interval is wider than a
95% confidence interval. Tb decreases with N (the more data points you have, the smaller the CI). Tb increases with the level of confidence
you choose. TESTING WETHER BETA = 0: One way to test if Beta = 0 is to look at the CI and see whether it contains a 0. If you use the CI
approach to hypothesis testing, then you do 100% minus your CI (so if CI is 95% you can say I reject the hypothesis that Beta = 0 at the %5
level of significance. The alternate way of carrying out hypothesis testing is to calculate the test statistic (t-stat). T = beta^/ Sb. The P-Value
provides a direct measure of whether t is “large” or “small”.
6. MULTIPLE REGRESSION: Not much changes between the statistical techniques from multiple and simple regression. Since multiple
regression implies the existence of more than two variables, we cannot draw an XY-plot on a two-dimensional graph. If we have three
variables, we can show how multiple regression involves fitting a line through a four-dimensional graph in which Y is plotted on 1 axis and X1
on the 2nd., X2, on the 3rd, and X3 on the 4th. OLS ESTIMATION AS A BEST FITTINF LINE: The multiple regression model with k explanatory
variables is written....
Y = alpha + beta1X1 + beta2X2 + …. + betakXk + e. The SSR is.... SSR = Σ(Yi – alpha(hat) - B1X1i - …. - B(hat)kXki). STATISTICAL ASPECTS OF
MULTIPLE REGRESSION: The statistical aspects of multiple regression are essentially identical to the simple regression case. INTERPRETING
CONTEXT: For the bedroom example in a simple regression model you only look at the number of bedrooms and figure out the house price.
There are other variables in play thought most of the time and with the multiple regression analysis we can include those. For example, if
bathrooms matter more to the buyer instead of bedrooms. The simple regression combines the contribution of all these factors and allocates
it to the only explanatory variable it can: bedrooms. Hence, beta(hat) is very big. The multiple regression model allows us to disentangle the
individual contributions into however many explanatory variables are assumed affect the house price. OMMITED VARIABLE BIAS: Omitted
variable bias affects the results. If we omit explanatory variables that should be present in the regression and if these omitted variables are
correlated with the explanatory variables that are included, then the coefficients on the included variables will be wrong. You will almost
always have omitted variable bias and there is little that can be done about it.
7. REGRESSION WITH DUMMY VARIABLES: Dummy variables are a way of turning qualitative variables into quantitative variables. Once this
change occurs then we can continue to use the formulas in the previous chapters. Formally, a dummy variable is a variable that can take on
only two values; 0 and 1. SIMPLE REGRESSION WITH A DUMMY VARIABLE: 1 dummy variable regression model.... Y = alpha + betaD + e. If we
carry out OLF estimation of the above regression model we obtain alpha^(hat) and beta^(hat). The straight-line relationship between Y and
D gives a fitted value of I-th observation of: Y^(hat)I = alpha^(hat) + beta^(hat)Di. Since Di is either 0 or 1, Y(hat) = alpha(hat) or Y(hat) =
alpha(hat) + beta(hat). MULTIPLE REGRESSION WITH DUMMY VARIABLES: The multiple regression model with several dummy explanatory
practice, you may have a mix of different types of explanatory variables. The simplest such case is where there is one dummy variable (D) and
one quantitative explanatory variable (X) in a regression: Y = alpha + B1D + B2X + e. We can extend this to multiple dummy variables and nondummy explanatory variables. The following example has two dummy and two non-dummy explanatory variables: Y = alpha + B1D1 + B2D2 +
B3X1 + B4X2 + e. <== The interpretation of results from this regression model combines elements from the previous examples in this chapter.
1. Work out the Mean (the simple average of the numbers)
2. Then for each number: subtract the Mean and square the result
3. Then work out the mean of those squared differences.
4. Take the square root of that and we are done!
The residual (u1) measures the different between the predicted value of the dependent variable & mean value of dependent variable (Y(bar)).
FALSE: It’s the difference between Y1 and Y^.
The correlation coefficient should be between –1 and 1? - TRUE
The estimated intercept in a regression model must always be positive? - FALSE: It can be a negative number or 0.
The estimated regression line is obtained by finding the values of alpha(hat) and beta(hat) that minimize the sum of the residuals?
FALSE: alpha(hat) and beta(hat) minimize the sum of the squared residuals.
Why would someone want to include a dummy variable in a regression? - We might have qualitative factors to include a regression model to
explore their effects on the dependent variable we might include a dummy to account for them.