Lecture 10: Regression analysis in practice BUEC 333 Professor David Jacks 1 The lectures of the past two weeks may give you the impression that regression analysis is simply the mechanical application of a few formulas. Since computers do the dirty work for us, all you are required to do is tell them what your model is, locate some data, and push the button. Thus, it is easy to think that regression is easy and requires little thought. Regression is “easy” 2 At the same time, making money in financial markets is easy as well: “buy low, sell high”. This art or craft lies in developing a good model to address an interesting question, finding convincing data to estimate it with, and providing plausible interpretations of your results…things to keep in mind when vegging out in front of a computer. Regression is “easy” 3 There are six basic steps to regression analysis in real world applications: 1.) Review the literature and develop a model to explain your dependent variable. 2.) Specify the regression equation by choosing the independent variables and functional form. The six steps 4 4.) Collect and clean the data. 5a.) Estimate and evaluate the regression. 6.) Document the results. Some of these require a little elaboration; some decidedly do not… The six steps 5 Again, regression is dominant in empirical work in economics and used to answer specific questions. These questions could be related to measurement; e.g. what is the price elasticity of demand for cigarettes or how large is the gender wage gap? Or these questions could be more subjective; e.g. is there a role for government in alleviating the duration of unemployment or is buying shares of Apple a good investment thesis? 1.) Developing a model 6 In either case, the first step is to educate yourself about the problem at hand: a.) Does economic intuition or theory have anything to say about my question and does it suggest a particular regression model? b.) What have other researchers done? 1.) Developing a model 7 For example, Mincer (1974) borrowed from a theory of human capital used to explain the sources of economic growth. His application of this theory related to the formation of wages in perfect competition: ln(Wi) = β0 + β1Si + β2Xi + β3(Xi)2 where Wi = wages, Si = years of education, and Basis of almost all studies of education and wages. 1.) Developing a model 8 With any luck, economic theory provides guidance on an appropriate specification. In the Mincer model, the dependent variable should be measured in logarithms, education should enter linearly, and experience should enter linearly and as a square. You need to ensure that all variables that are 2.) Model specification 9 If you leave out important variables, there is a real risk that OLS estimates of the β’s will be biased. The intuition: if that which is and is not included are correlated, we may over/under-estimate the importance of the included variables. Again, using theory as a guide is helpful, but intuition can be just as valuable…even without Mincer’s theory 2.) Model specification 10 You should also exclude variables that are not important for explaining your dependent variable. Remember: we like concise models and will be penalized for Again, theory hopefully guides us here, but intuition is just as important…we need no stinking theory 2.) Model specification 11 The process of model specification involves choosing three (separate but related) components: a.) the independent variables and how they should be measured; b.) the functional (mathematical) form of the variables; c.) the properties of the stochastic error term. 2.) Model specification 12 Before going any further, you should think about what kind of results you expect from your model. In particular, which coefficients do you expect to be positive or negative? In the Mincer model, you should expect β1 > 0, β2 > 0, and β3 < 0. Whether or not you are surprised by any coefficient estimates 3.) Forming expectations 13 Data comes in two varieties: hand-made and off-the-shelf. For most purposes, you will likely collect it offthe-shelf from sources like StatCan, the TSX, Datastream…. In either case (but especially for hand-made), you must clean your data. 4.) Collecting and cleaning data 14 “Data” inspired by Woody’s example from Chapter 3.2… Y = gross sales volume N = number of competitors … 4.) Collecting and cleaning data 15 A general rule regarding the appropriate sample size is “the more, the merrier”…provided the observations are from the same population. Relates back to the notion of degrees of freedom; when there are more degrees of freedom: a.) every positive error is likely to be balanced by a negative error; b.) the estimated regression coefficients are 4.) Collecting and cleaning data 16 We understand the mechanics of estimation and know how to let the computer do its thing; estimation is easy after first four steps… but first four steps can take a long, long time! Next, we need to decide whether we are happy with the output: a.) do the coefficients have the expected sign? b.) is the R2 acceptable? 5.) & 6.) Estimation, evaluation, and documentation 17 Regression line (observed versus predicted values) on top and plot of residuals on bottom from height-weight guessing game in Chapter 1. 5.) & 6.) Estimation, evaluation, and documentation 18 Whenever there is a large residual, we call the particular observation an outlier. Questions to ask: Are there many outliers (bad) or few (better)? Can we modify the model to reduce the number of outliers (i.e. improve the fit)? When you are happy with a model, always report: a.) coefficient estimates and their standard errors b.) sample size, R2, and adjusted R2 c.) qualitative information on data & sampling 5.) & 6.) Estimation, evaluation, and documentation 19 “Modern” portfolio theory (MPT) is the backbone of a tremendous amount of financial analysis. Roots lie in the 1950s and 1960s with the work of Markowitz and others in defining the efficient frontier of investment options. Basic framework: positive correlation between risk and return. An extended example 20 It is also an example where economic theory suggests a very specific regression model used daily in financial institutions worldwide. This model is called the Capital Asset Pricing Model (or CAPM). The driving idea behind CAPM is that any asset’s risk can be decomposed into two sub-types of risk: 1.) systematic risk 2.) specific risk An extended example 21 Systematic risk is common to all assets: it is the risk you would face if you held some of every asset…hence also called, “market risk”. Specific risk is particular to an individual asset: it is that part of an asset’s risk that is unrelated to the broader risk of the market. MPT: investors are compensated for taking on systematic An extended example 22 Let: Zs denote the rate of return on asset s. Zm denote the market rate of return. Zf denote the risk-free rate of return, the return on a hypothetically risk-free asset. In practice, Zm is the rate of return on some broad market index like the S&P 500. An extended example 23 CAPM theoretically establishes that the expected rate of return on asset s is: E(Zs) = Zf + β[E(Zm) – Zf] Because of the linearity of the expectations operator, we can aggregate the CAPM model up to a given portfolio p (a set of assets whose return is the weighted sum of the returns of each asset): An extended example 24 Thus, β measures the sensitivity of the expected return of portfolio p to systematic risk (as measured by the expected market rate of return). The expected return of the portfolio: 1.) tracks the broader market’s return if β = 1 2.) moves in the same direction as the broader market’s return but is more volatile if β > 1 3.) moves in the same direction as the broader market’s return but is less volatile if β=(0, 1) An extended example 25 The CAPM model given before lends itself nicely to regression analysis because of its linearity. We rewrite it as E(Zp) - Zf = β[E(Zm) – Zf]. Now, suppose we have the actual (historical) rate of return on our portfolio over some period t (Zp,t) and similarly for a risk-free asset (Zf,t) and some measure of the market rate of return (Zm,t). An extended example 26 Let 1.) Yt = Zp,t – Zf,t (“excess return” of portfolio p) 2.) Xt = Zm,t – Zf,t (“excess return” of the market) We can estimate a portfolio’s β (its sensitivity to systematic risk) from the regression: Yt = α + βXt + εt All we have done here is add an intercept term α An extended example 27 Monthly data on returns for 3 stocks for 10 years: Apple, Rogers, and Shell. rprogers = ZR,t – Zf,t (“excess return” of Rogers) rpmarket = Zm,t – Zf,t (“excess return” of the market) An extended example 28 Now, suppose an equally weighted portfolio of Apple, Rogers, and Shell. rpportfolio = (ZA,t /3 + ZR,t /3+ ZS,t /3) – Zf,t (“excess return” of portfolio) rpmarket = Zm,t – Zf,t (“excess return” of the market) An extended example 29