Lecture 7: Introduction to regression analysis BUEC 333 Professor David Jacks 1 First week focused our attention on a single RV X and its relation to population quantities like μ. We have also considered the critical difference between samples and populations; in particular: 1.) how to compute sample statistics to estimate population quantities; 2.) the properties of all estimators w.r.t. their bias, efficiency, and consistency; 3.) how to form and test hypotheses about the population parameter of interest. Where we have been and where we are going 2 Second week expanded our attention to two or more RVs X…our baseline case. We generally care about the relationship between or among variables; for example, the role of the VIX index in driving changes in the LIBOR. We can also use one or more variables to predict a variable of interest; for example, variables from monthly purchasing managers’ surveys to forecast quarterly inventories and, thus, GDP. Where we have been and where we are going 3 We can get a lot of mileage from studying the conditional expectation of a variable of interest, Y, given values of other variables, X. Rest of the term spent learning about regression analysis based on E(Y|X)…a very powerful tool for analyzing economic data. Where we have been and where we are going 4 Regression analysis is also the foundation for almost all econometric analysis/techniques used to measure/explain relationships between variables. Example: casual observation will reveal that more educated individuals tend to have higher incomes. Regression methods can be used to measure the rate of return of an extra year of education and/or estimate the relationship between income and education, gender, labour market experience, etc. What is regression analysis? 5 A 101 example: theory tells us that if the price of a good increases, individuals will consume less of it; that is, demand curves slope down. But theory does not predict how big the change in consumption will be for a given price change. Use regression analysis to measure how much individuals reduce their consumption in response to a price increase; in other words, estimate the price elasticity of demand. What is regression analysis? 6 Goal of regression analysis is to explain the value of one variable of interest (dependent variable) as a function of the values of other variables (independent or explanatory variables). Usually, dependent variable denoted as Y and independent variables are X1, X2, X3 … Often want to explain “movement” in Y as a function of “movement” in the X variables The regression model 7 We always use (at least) one equation to specify the relationship between Y and the X variables. This equation is known as the regression model: E(Y|X1, X2, X3) = f(X1, X2, X3 ) Example from before: Y is income, X1 is years of education, X2 is gender, X3 is years of experience, and f is some function... The regression model 8 The simplest example of a regression model is the case where the regression function f is linear and where there is only one X: E[Y|X] = β0 + β1X This says that the conditional expected value of Y is a linear function of the independent variable X. We call β0 and β1 the regression Simple linear regression 9 β0 is called the intercept or constant term; tells us the expected value of Y when X is zero. β1 is the slope coefficient; measures the amount that the expected value of Y changes for a unit change in X; that is, simply the slope of the line relating X and E[Y | X]: Simple linear regression 10 There are two kinds of linearity present in the regression model from the previous slide: E[Y|X] = β0 + β1X This regression function is linear in X; counter-example: E[Y|X] = β0 + β1X2 This regression function is linear in the coefficients, β0 and β1; counter-example: E[Y|X] = β0 + Xβ1 A word on linearity 11 In general, neither kind of linearity is necessary. However, we will focus our attention mostly on what is known as the linear regression model. The linear regression model requires linearity in the coefficients, but not linearity in X. So for our purposes, OK: E[Y|X] = β0 + β1X2 A word on linearity 12 A word on linearity 13 Must recognize that the regression function is never an exact representation of the relationship between dependent and independent variables. For example, there is no exact relationship between income (Y) and education, gender, experience…because of things like luck. That is, there is always some variation in Y The stochastic error term 14 Many possible reasons: important explanatory variables left out of model; wrong functional form (f); variables measured with error; or maybe just some randomness in outcomes… These are all sources of error. To reflect these kinds of error, we include a stochastic (random) error term in the model The stochastic error term 15 After adding the error term, our simple linear regression model is: Y = β0 + β1X + ε Regression model now has two components: 1.) a deterministic (non-random) component, β0 + β1X 2.) a stochastic (random) component, ε More about the error term 16 The right way to think about it: β0 + β1X is the conditional mean of Y given X. That is, Y = E(Y|X) + ε where E(Y|X) = β0 + β1X. Remember “regression function” = E(Y|X). Give us another way to think about errors: ε = Y − E(Y|X) More about the error term 17 More about the error term 18 Think about starting salaries for new university graduates (Y). There is a lot of variation in starting salaries among individuals. Some of this variation is predictable; starting salary depends on university, field of study, industry, firm size... Call all of this X. A further example 19 Predictable part of starting salary goes into the deterministic component of the regression: E(Y|X). No need to impose that X enters linearly, but will require E(Y|X) to be linear in β’s. Choose the specific functional form of E(Y|X) when we build the model. A further example 20 Starting salary also depends on luck, nepotism, interview skill, and a host of other unobservables. We cannot measure these things, so we cannot include them in E(Y|X). Thus, the unpredictable/unobservable part ends up in the error term, ε. A further example 21 We need to extend our notation of the regression function to accurately reflect the number of observations. As usual, we will work with an iid random sample of n observations. If we use the subscript i to indicate a particular observation in our sample, our regression function with one independent variable is: Notes on notation 22 What we really have are n equations, one for each observation, such that: Y1 0 1 X 1 1 Y2 0 1 X 2 2 Yn 0 1 X n n Note: the coefficients β0 and β1 are the same in each equation Notes on notation 23 If we have more (say k) independent variables, then we need to extend our notation further. We could use a different letter for each independent variable (X, Z, W,...); instead, we usually just introduce another subscript on the X. Now, we have two subscripts on the X: one for the variable number (first subscript) and one for the observation number (second subscript). Further notes on notation 24 Before, said β1 was the marginal effect of X on Y: dY dE[Y | X ] 1 dX dX What do the regression coefficients measure now? They are partial derivatives. That is, Yi Yi Yi 1 2 k X 1i X 2i X ki β1 measures the effect on Yi of a one-unit increase in X1i Further notes on notation 25 Summary of what is known, unknown, and assumed/hypothesized… Known: the data, Yi and X1i , X2i , ... , Xki Unknown: the coefficients and errors, β0 , β1 , β2 , ... , βk and εi Hypothesized: functional form of regression, E(Yi | Xi) = β0 + β1X1i + β2X2i + βkXki That which is known, unknown, and assumed 26 We want to use what is known to learn about the unknown using the hypothesized functional form of the regression function. We might learn a lot about the β’s because they are the same for each observation. We might also learn about the functional form. That which is known, unknown, and assumed 27 Think of the regression function we have so far developed as the population regression function. As always, we (have to) collect a sample of data to learn about the population. We do not know the population regression coefficients (β), so we must estimate them. The details of how these are estimated is the subject of next week’s lecture; for now, we need to develop the intuition of what they really are. Estimated regression coefficients 28 Goal to end up with a set of estimated coefficients: ˆ0 , ˆ1 , ˆ2 ,..., ˆk The estimated coefficients are sample statistics that we compute from our data. Because they are sample statistics, they are RVs: a different sample contains different observations, different observations take different values, and so the value of the statistic would be different. Estimated regression coefficients 29 Estimated regression coefficients in hand, we can calculate the predicted value of Yi. This predicted value is a sample estimate of the conditional expectation of Yi given all the X’s: Yˆi ˆ0 ˆ1 X 1i ˆ2 X 2i ˆk X ki It is our “best guess” of the value of Yi given all the X’s Predicted values and residuals 30 Of course, Yi and its predicted value are rarely equal. We call the difference between Yi and its predicted value the residual, ei Yi Yˆi . Then write the estimated regression function as: Y ˆ ˆ X ˆ X ˆ X e i 0 1 1i 2 2i Predicted values and residuals k ki i 31 One big lesson: the (unknown) stochastic error term, εi = Yi – E(Yi|Xi), is a population quantity as it depends on the population quantities, β. Residuals are the sample counterpart to εi. Luckily, they are computable but they are RVs themselves. Errors and residuals 32 Predicted values and residuals 33 Here, we expand on the example of the weight guessing job given in the book (pp. 18-20). Recap: customers pay $2 each which you get to keep if you guess their weight within 10 pounds. If you miss by more than 10 pounds, you return the $2 and give them a prize which costs $3. The only information given to you is their height, but you know there is a positive relationship between height and weight. A first example 34 Time to get your regression on… Initially, you have a collection of the height of 20 males A first example 35 Going one step further… Scatter-plot of 20 (male) observations with weight in pounds on vertical axis and height above five feet in inches on horizontal axis. A first example 36 Yˆi ˆ0 ˆ1 X 1i Wˆ 103.40 6.38 H i i Suggests that our best guess of someone’s weight who is five feet tall is 103.40 pounds. Likewise, our best guess of someone’s weight who is six feet tall is 103.40 + 6.38*(72 – 60) = 179.96 pounds. A first example 37 Again, our predicted values lie on the estimated regression line by construction A first example 38 When we plot residuals, 3 instances where estimate is off by more than 10 pounds are clear A first example 39 Suppose you collected data on the heights and weights of 20 different male customers and estimated the following: Wˆi 125.1 4.03H i The coefficients are not the same as before because the sample is different; when the sample changes so will the estimates. A first example 40 First regression has a steeper slope while the second has a higher intercept; the lines intersect at only one point when H = 9.23 inches & W-hat = 162.3 pounds. A first example 41 Suppose you could run a regression on the whole population… A first example 42 Also remember if an equation has more than one independent variable, we have to be careful when interpreting the regression coefficients. For example, consider that amount of spending per public school student across Canada. Any regression for this variable should include at least two variables, income in a province (since this is the level of government where school funding is determined) and enrollment growth. A further example 43 Si 0 1Yi 2Gi i where Si = educational dollars spent per public school student in province i Yi = per capita income in province i Gi = the percent growth of public school enrollment in province i Should always not only think about what the coefficients tell us (given units of measurement) A further example 44 First, β1 is the change in dollars spent per public school student associated with a one-unit increase in provincial income, holding the percent growth of provincial public school enrollment constant. Likewise, β2 is the change in dollars spent per student associated with a one-unit increase in the percent growth of provincial public school enrollment, holding provincial income constant. A further example 45 And what about their expected signs? This would be positive for income since the more income a province has the more they probably spend on schools (if enrollment growth is held constant). But this would be negative for enrollment growth since the faster enrollment is growing, the less there will be to spend on each student (if income is held constant). A further example 46 Example: suppose we estimate a regression model that predicts a student’s grade in BUEC 333 (Y) as a function of the number of hours per week they spend studying (X1) and working off-campus (X2). The regression model is given by: Yi 0 1 X1i 2 X 2i i for i 1, 2, ... , n Before estimation, consider what your expectations on X1 and X2 are… A final example 47 You then estimate this model on a random sample of 100 BUEC students and obtain the following estimates: ˆ0 45, ˆ1 2.5, ˆ2 1 What do these estimates tell us? 1.) The expected grade of a student who studies zero hours per week and works zero hours per week off-campus is 45. A final example 48 2.) A student’s expected grade increases by 2.5 points for each additional hour that they study, holding constant the number of hours they work off campus. 3.) A student’s expected grade decreases by 1.0 points for each additional hour that they work offcampus, holding constant the number of hours they study. A final example 49 Given these estimates, we can predict the grade of a student that studies 12 hours per week and works 10 hours per week off-campus as the following: A final example 50