ADVANCE ECONOMETRICS LECTURE SLIDES, 2012 SPRING The distinction between qualitative and quantitative data The microeconomist’s data on sales will have a number corresponding to each firm surveyed (e.g. last month’s sales in the first company surveyed were £20,000). This is referred to as quantitative data. The labor economist, when asking whether or not each surveyed employee belongs to a union, receives either a Yes or a No answer. These answers are referred to as qualitative data. Such data arise often in economics when choices are involved (e.g. the choice to buy or not buy a product, to take public transport or a private car, to join or not to join a club). Economists will usually convert these qualitative answers into numeric data. For instance, the labor economist might set Yes = 1 and No = 0. Hence, Y1 = 1 means that the first individual surveyed does belong to a union, Y2 = 0 means that the second individual does not. When variables can take on only the values 0 or 1, they are referred to as dummy (or binary) variables. What is Descriptive Research? Involves gathering data that describe events and then organizes, tabulates, depicts, and describes the data. Uses description as a tool to organize data into patterns that emerge during analysis. Often uses visual aids such as graphs and charts to aid the reader Descriptive Research takes a “what is” approach What is the best way to provide access to computer equipment in schools? Do teachers hold favorable attitudes toward using computers in schools? What have been the reactions of school administrators to technological innovations in teaching? Descriptive Research We will want to see if a value or sample comes from a known population. That is, if I were to give a new cancer treatment to a group of patients, I would want to know if their survival rate, for example, was different than the survival rate of those who do not receive the new treatment. What we are testing then is whether the sample patients who receive the new treatment come from the population we already know about (cancer patients without the treatment). Economic and Econometric Models The model of the economic behavior that has been derived from a 7 theory is the economic model After the unknown parameters have been estimated by using economic data foe the variables and by using an appropriate econometric estimation method, one has obtained an econometric model. It is common to use Greek characters to denote the parameters. C=f(Y) economic model C=β1 + β2 Y , C=15.4+0.81Y econometric model Economic data Time series data, historical data Cross sectional data Panel data Variables of an economic model Dependent variable Independet variable, explanatory variable, control variable The nature of economic variables can be endogeneous , exogeneous or lagged dependent 8 1-9 1-10 1-11 A variable is an endogenous variable if the variable is determined 12 in the model. Therefore dependant variable is always an endogenous variable The exogenous variables are determined outside of the model. Lagged dependent or lagged endogenous variables are predetermined in the model The model is not necessary a model of only one equation if more equations have been specified to determine other endogenous variables of the system then the system is called a simultaneous equation model (SEM) If the number of equation is identical to the number of endogenous variables, then that system of equation is called complete. A complete model can be solved for the endogenous variables. Static model A change of an explanatory variable in period t is fully reflected in 13 the dependent variable in the same period t. Dynamic model The equation is called a dynamic model when lagged variables have been specified. Structural equations The equations of the economic model as specified in the economic theory, are called the structural equations. Reduced form model A complete SEM can be solved for the endogenous variables. The solution is called the reduced form model. The reduced form wil be used to stimulate a policy or to compute forecasts for the endogenous variables. Parameters and elasticities, The parameters are estimated by an estimator and the result is 14 called an estimate The log transformation is a popular transformation in econometric research, because it removes non-linearities to some extent.1 Stochastic term A disturbance term will be included at the right hand side of the equation and is not observed At the right hand side of the equation, two parts of the specification ; the systematic part which concerns the specification of variables based on the economic theory; and the non-systematic part which is remaining random non-systematic variation.2 Applied quantitative economic research The deterministic assumptions; It concerns the specification of the economic model, which is formulation of the null hypothesis about the relationship between the economic variables of interest. The basic specification of the model originates from the economic theory. An important decision is made about the size of the model, whether one or more equation have to be specified. the choice of which variables have to be included in the model stems from the economic theory The availability and frequency of the data can influence the assumptions that have to be made The mathematical form of the model has to be determined. Linear or nonlinear. Linear is more convenient to analyze. 15 Evaluation of the estimation results, The evaluation concerns the verification of the validity and reliability of all the assumptions that have been made A first evaluation is obtained by using common sense and economic knowledge. This is followed by testing the stochastic assumptions by using a normality test, autocorrelation test, heteroscedasticity tests, etc. looking at a plot of the residuals can be very informative about cycles or outliers If the stochastic assumptions have not been rejected , the deterministic assumptions can be tested by using statistical tests to test restrictions on parameters. The t-test and F-test can be performed. The coefficient of determination R2 can be interpreted. 16 Hypothesis Testing Purpose: make inferences about a population parameter by analyzing differences between observed sample statistics and the results one expects to obtain if some underlying assumption is true. X Z • Null hypothesis: n • Alternative hypothesis: If the null hypothesis is rejected then the alternative hypothesis is accepted. H 0 : 12.5Kbytes H1 : 12.5Kbytes Hypothesis Testing A sample of 50 files from a file system is selected. The sample mean is 12.3Kbytes. The standard deviation is known to be 0.5 Kbytes. H 0 : 12.5Kbytes H1 : 12.5Kbytes Confidence: 0.95 Critical value =NORMINV(0.05,0,1)= -1.645. Region of non-rejection: Z ≥ -1.645. So, do not reject Ho. (Z exceeds critical value) Steps in Hypothesis Testing 1. State the null and alternative hypothesis. 2. Choose the level of significance a. 3. Choose the sample size n. Larger samples allow us to detect even small differences between sample statistics and true population parameters. For a given a, increasing n decreases b. 4. Choose the appropriate statistical technique and test statistic to use (Z or t). 5. Determine the critical values that divide the regions of acceptance and nonacceptance. 6. Collect the data and compute the sample mean and the appropriate test statistic (e.g., Z). 7. If the test statistic falls in the non-reject region, Ho cannot be rejected. Else Ho is rejected. Z test versus t test 1.Z-test is a statistical hypothesis test that follows a normal distribution while T-test follows a Student’s T-distribution. 2. A T-test is appropriate when you are handling small samples (n < 30) while a Z-test is appropriate when you are handling moderate to large samples (n > 30). 3. T-test is more adaptable than Z-test since Z-test will often require certain conditions to be reliable. Additionally, T-test has many methods that will suit any need. 4. T-tests are more commonly used than Z-tests. 5. Z-tests are preferred than T-tests when standard deviations are known. One tail versus two tail we were only looking at one “tail” of the distribution at a time (either on the positive side above the mean or the negative side below the mean). With two-tail tests we will look for unlikely events on both sides of the mean (above and below) at the same time. So, we have learned four critical values. 1-tail 2-tail α = .05 1.64 1.96, -1.96 α = .01 2.33 2.58/-2.58 Notice that you have two critical values for a 2-tail test, both positive and negative.You will have only one critical value for a one-tail test (which could be negative). Which factors affect the accuracy of ˆ the estimate 1. Having more data points improves accuracy of estimation. 2. Having smaller errors improves accuracy of estimation. Equivalently, if the SSR is small or the variance of the errors is small, the accuracy of the estimation will be improved. 3. Having a larger spread of values (i.e. a larger variance) of the explanatory variable (X) improves accuracy of estimation. Calculating a confidence interval for β If the confidence interval is small, it indicates accuracy. Conversely, a large confidence interval indicates great uncertainty over β’s true value. Confidence interval for β is ˆ tb sb , ˆ tbor sb ˆˆ tb sb ˆ tb sb sb is the standard deviation of ˆ Large values of sb will imply large uncertainty One group Two unrelated groups Nonparametric tests Parametric tests Nominal data Chi square goodness of fit Chi square Ordinal, interval, ratio data One group t-test Ordinal data Wilcoxon signed rank test Wilcoxon rank sum test, Mann-Whitney test Two related McNemar’s Wilcoxon groups test signed rank test K-unrelated Chi square Kruskal -Wallis groups test one way analysis of variance K-related Friedman groups matched samples Student’s t-test Paired Student’s t-test ANOVA ANOVA with repeated measurements Hypothesis testing involving R2: the F-statistic R2 is a measure of how well the regression line fits the data or, equivalently, of the proportion of the variability in Y that can be explained by X. If R2 = 0 then X does not have any explanatory power for Y.The test of the hypothesis R2 = 0 can therefore be interpreted as a test of whether the regression explains anything at all ( N 2) R 2 F 1 R2 we use the P-value to decide what is “large” and what is “small” (i.e. whether R2 is significantly different from zero or not) F-Test Usage of the F-test We use the F-test to evaluate hypotheses that involved multiple parameters. Let’s use a simple setup: Y = β0 + β1X1 + β2X2 + β3X3 + εi F-Test For example, if we wanted to know how economic policy affects economic growth, we may include several policy instruments (balanced budgets, inflation, trade-openness, &c) and see if all of those policies are jointly significant. After all, our theories rarely tell us which variable is important, but rather a broad category of variables. F-statistic The test is performed according to the following strategy: 1. If Significance F is less than 5% (i.e. 0.05), we conclude R2 ≠ 0. 2. If Significance F is greater than 5% (i.e. 0.05), we conclude R2 = 0. Multiple regression model The model is interpreted to describe the conditional expected wage of an individual given his gender, years of schooling and experience; wagei 1 2malei 3schooli 4 experi i The coefficient β2 for malei measures the difference in expected wage between a male and a female with the same schooling and experience. The coefficient β3 for schooli gives the expected wage difference between two individuals with the same experience and gender where one has additional year of schooling. The coefficients in a multiple regression model can only be interpreted under a ceteris paribus condition. Estimation by OLS gives the following results; Dependent variable: wage Variable Constant Male School Exper s = 3.0462 Estimate -3.38 1.3444 0.6388 0.1248 R2 = 0.1326 Standard Error t-ratio 0.4650 -7.2692 0.1077 12.4853 0.0328 19.4780 0.0238 5.2530 R2 = 0.13 F = 167.63 The coefficient for malei suggest that if we compare an arbitrary male and female with the same years of schooling and experience, the expected wage differential is 1.34$ with standard error 0.1077 which is statistically highly significant. The null hypothesis that schooling has no effect on a persons wage, given gender and experience, can be tested using the t-test with a test statistic of 19.48. The estimated wage increase from one additional year of schooling, keeping years of experience fixed is $0.64 The joint hypothesis that all three partial slope coefficients are zero, that is wages are not affected by gender, schooling or experience has to be rejected as well. R2 is 0.1326 which means that the model is able explain 13.3% of the within sample variation in wages. A joint test on the hypothesis that the two variables, schooling and experience both have zero coefficient by performing the F test. f (0.1326 0.0317) / 2 191.35 (1 0.1326) /(3294 4) With a 5% critical value of 3.0, the null hypothesis is obviously rejected. We can thus conclude that the model includes gender schooling and experience performs significantly better than model which only includes gender. Hypothesis testing When a hypothesis is statistically tested two types of errors can be made. The first one is that we reject the null hypothesis while it is actually true- type I error. The second one is that the null hypothesis is not rejected while the alternative is true- type II error The probability of a type I error is directly controlled by the researcher through his choice of the significance level α. When a test is performed at the 5% level, the probability of rejecting the null hypothesis while it is true is 5%. The probability of a type II error depends upon the true parameter values. If the truth deviates much from the stated null hypothesis, the probability of such an error will be relatively small, while it will be quite large if the null hypothesis is close to the truth. The probability of rejecting the null hypothesis when it is false, is known as the power of the test. It indicates how powerful a test is in finding deviations from the null hypothesis Ordinary Least Squares (OLS) Yi 0 1 X1i 2 X 2i ... K X Ki i Objective of OLS Minimize the sum of squared residuals: min n 2 ei ˆ i 1 where ei Yi Yˆi Remember that OLS is not the only possible estimator of the βs. But OLS is the best estimator under certain assumptions… CAPM Expected returns on individual assets are linearly related to the expected return on the market portfolio. rjt rf j (rmt rf ) jt regression without an intercept CAPM regression without intercept Dependent variable: excess industry portfolio returns Industry food durables construction Excess market return 0.790 1.113 1.156 (0.028) (0.029) (0.025) Adj.R2 0.601 0.741 0.804 S 2.902 2.959 2.570 CAPM regression with intercept Dependent variable: excess industry portfolio returns Industry food durables Constant 0.339 0.064 (1.128) (0.131) Excess market return 0.783 1.111 (0.028) (0.029) Adj.R2 0.598 0.739 S 2.885 2.961 construction -0.053 (0.114) 1.157 (0.025) 0.803 2.572 CAPM regression with intercept and January dummy Dependent variable: excess industry portfolio returns Industry food durables Constant 0.417 0.069 (0.133) (0.137) January dummy -0.956 -0.063 (0.456) (0.473) Excess market return 0.788 1.112 (0.028) (0.029) Adj.R2 0.601 0.739 S 2.876 2.964 construction -0.094 (0.118) 0.498 (0.411) 1.155 (0.025) 0.804 2.571 OLS Assumptions But OLS is the best estimator under certain assumptions… Regression is linear in parameters 2. Error term has zero population mean 3. Error term is not correlated with X’s (exogeneity) 4. No serial correlation 5. No heteroskedasticity 6. No perfect multicollinearity and we usually add: 7. Error term is normally distributed Exogeneity All explanatory variables are uncorrelated with the error term E(εi|X1i,X2i,…, XKi,)=0 Explanatory variables are determined outside of the model (They are exogenous) What happens if assumption 3 is violated? Suppose we have the model, Yi =β0+ β1Xi+εi Suppose Xi and εi are positively correlated When Xi is large, εi tends to be large as well. Exogeneity We estimate the relationship using the following model: salesi= β0+β1pricei+εi What’s the problem? Assumption 3: Exogeneity What’s the problem? What else determines sales of hamburgers? How would you decide between buying a burger at McDonald’s ($0.89) or a burger at TGI Fridays ($9.99)? Quality differs salesi= β0+β1pricei+εi quality isn’t an X variable even though it should be. It becomes part of εi Assumption 3: Exogeneity What’s the problem? But price and quality are highly positively correlated Therefore x and ε are also positively correlated. This means that the estimate of β1will be too high This is called “Omitted Variables Bias” (More in Chapter 6) Serial Correlation Serial Correlation:The error terms across observations are correlated with each other i.e. ε1 is correlated with ε2, etc. This is most important in time series If errors are serially correlated, an increase in the error term in one time period affects the error term in the next Homoskedasticity: The error has a constant variance This is what we want…as opposed to Heteroskedasticity:The variance of the error depends on the values of Xs. Perfect Multicollinearity Two variables are perfectly collinear if one can be determined perfectly from the other (i.e. if you know the value of x, you can always find the value of z). Example: If we regress income on age, and include both age in months and age in years. Adjusted/Corrected R2 R2 = SSR/SST . As before, R2 measures the proportion of the sum of squares of deviations of Y that can be explained by the relationship we have fitted using the explanatory variables. Note that adding regressors can never cause R2 to decrease, even if the regressors) do not seem to have a significant effect on the response of Y . Adjusted (sometimes called \corrected") R2 takes into account the number of regressors included in the model; in effect, it penalizes us for adding in regressors that don't \contribute their part" to explaining the response variable. Adjusted R2 is given by the following, where k is the number of regressors 2 ( n 1 ) R k 2 Adjusted R n k 1 Interpreting Regression Results β measures the expected change in Yi if Xi changes with one unit but all other variables in Xi do not change Yi X i i In a multiple regression model single coefficients can only be interpreted under ceteris paribus conditions. It is not possible to interpret a single coefficient in a regression model without knowing what the other variables in the model are. The other variables in the model are called control variables Economists are interested in elasticities rather than marginal effect. An elasticity measures the relative change in the dependant variable due to a relative change in one of the Xi variables. Linear model implies that elasticities are nonconstant and vary with Xi, while the loglinear model imposes constant elasticities. Explaining log Yi rather than Yi may help reducing heteroscedasticity problems. If Xi is a dummy variable (or another variable that may take negative values) we cannot take logarithm and we include the original variable in the model. Thus we estimate, logY X i i i It is possible to include some explanatory variables in logs and some in levels. The interpretation of a coefficient β is the relative change in Yi due to absolute change of one unit in Xi. If Xi is a dummy variable for males β is the (ceteris paribus) relative wage differential between men and women. Selecting regressors What happens when a relevant variable is excluded from the model and what happens when an irrelevant variable is included in the model. Omitted variable bias Including irrelevant variables in your model, even though they have a zero coefficient, will typically increase the variance of the estimators for the other model parameters. While including too few variables has the danger of biased estimates. To find potentially relevant variables we can use economic theory. General-to-spesific modelling approach – LSE methodology. This approach starts by estimating a general unrestricted model (GUM), which subsequently reduced in size and complexity by testing restrictions. In presenting your estimation results, it is not a sin to have insignificant variables included in your specification Be careful including many variables in your model can cause multicollinearity, in the end almost none of the variables appears individually significant Another way to select a set of regressors, R2 measures the proportion of the sample variation in Yi , that is explained by variation in Xi. If we were to extend the model by including Zi in the set of regressors the explained variation would never decrease, so that R2 will never decrease we include relevant additional variables in the model. -not optimal because with too many variables we will not be able to say very much about the models coefficients. Becaause R2 does not punish the inclusion of many variables. Better to use two of them. Trade-off between goodness of fir and the number of regressors employed in the model. Use adjusted R2 Adjusted R2 provides a trade-off between goodness of fit as measured by e and simplicity of the model as measured by N i 1 2 i number of parameters N R 2 1 1 /( N K ) ei2 i 1 N 1 /( N 1) ( yi y ) 2 i 1 Akaikes Information Criterion Schwarz Bayesian Information Criterion Models with lower AIC and BIC are preferred. It is possible to test whether the increase in R2 is statistically significant. Approproate F-statistics as follows, j: number of added variables ( R12 R02 ) / J f (1 R 2 ) /( N K ) N-K : degrees of freedom 1 If we want to drop two variables from the model at the same time we should be looking at a joint test( f or wald test) rather than at two separate t tests. Once the first variable is omitted from the model the second one may appear significant. This is of importance if collinearity exits between two variables. Comparing non-nested models Both are interpreted as describing the conditional expectation of yi give xi and zi, yi xi i yi zi vi The two models are non-nested if zi includes a variable that is not in xi and vice versa.because both models are explaining the same endogenous variable, it is possible to use the adj R2, AIC and BIC criteria. An alternative and more formal way that can be used to compare the two models is that of encompassing. - if a model A is believed to be correct model it must be able to encompass model B, that is, it must be able to explain model B s result. If model A is unable to do so, it has to be rejected. There are two alternatives; non-nested F test and J test. Compared to non-nested F test, the J test involves only one restriction. This means that J test may be more powerful if the number of additional regressors in the non-nested F test is large. If the non-netsed F test involves only one additional regressor, ir is equivalent to the J test. Two alternative ways that are nested is the choice between a linear and log linear functional form. Because the depeandant variables are different comparison of R2, AIC and BIC is inappropriate. One way to test is Box-Cox transformation. Misspecifying the functional form Nonlinear models, Nonlinearity can arise in two ways; 1. model is still linear in the parameters but non linear in its explanatory variables. We include nonlinear functions of xi as additional explanatory variables, for 2 example the variables agei .and.agei malei could be included in an individual wage equation. The resulting model is still linear in parameters and can still be estimated by OLS. Nonlinear models can also be estimated by a nonlinear version of the LS method. Testing the Functional Form To test whether the additional nonlinear terms in xi are significant. This can be done using t-test, F-test or Wald test Or RESET test (regression equation specification error test) Testing for a structural break The coefficient in the model may be different before and after a major change in macro-economic policy. The change in regression coefficient is referred to as a structural break. A convenient way to express the general specification is given by; yi xi gi xi i Specification consist of two groups indicated by gi =0 and gi=1. The null hypothesis is γ=0, in which case the model reduces to the restricted model A first way to test γ=0 is obtained by using the F-test; ( S R SUR ) / K K: number of regressors in the f restricted model (inc.intecept) SUR /( N 2 K ) SUR and SR residual sum of squares of the unrestricted and restricted model The above f test is referred to as the Chow test for structural change OLS results hedonic price function Dependant variable:log (price) Variable estimate standard error t-ratio Constant 7.094 0.232 30.636 Log(lot size) 0.400 0.028 14.397 Bedrooms 0.078 0.015 5.017 Bathrooms 0.216 0.023 9.386 Air conditioning 0.212 0.024 8.923 S= 0.2456 R2=0.5674 Adj R2=0.5642 F=177.41 OLS results hedonic price function Dependant variable:log (price) Variable estimate standard error Constant 7.745 0.216 Log(lot size) 0.303 0.027 Bedrooms 0.034 0.014 Bathrooms 0.166 0.020 Air conditioning 0.166 0.021 Driveway 0.110 0.028 Recreational room 0.058 0.026 Full basement 0.104 0.022 Gas for hot water 0.179 0.044 Garage places 0.048 0.011 Preferred area 0.132 0.023 Stories 0.092 0.013 S= 0.2104 R2=0.6865 Adj R2=0.6801 F=106.33 (0.6865 0.5674) / 7 f 28.99 (1 0.6865) /(546 12) t-ratio 35.801 11.356 2.410 8.154 7.799 3.904 2.225 4.817 4.079 4.178 5.816 7.268 OLS results hedonic price function Dependant variable:log (price) Variable estimate standard error Constant -4038.35 3409 lot size 3.546 0.350 Bedrooms 1832 1047 Bathrooms 14335 1489 Air conditioning 12632 1555 Driveway 6687 2045 Recreational room 4511 1899 Full basement 5452 1588 Gas for hot water 12831 3217 Garage places 4244 840 Preferred area 9369 1669 Stories 6556 925 S= 15423 R2=0.6731 Adj R2=0.6664 F=99.97 t-ratio -1.184 10.124 1.75 9.622 8.124 3.27 2.374 3.433 3.988 5.050 5.614 7.086 Heteroscedasticity Remember Gaus-markov assumptions – BLUE Three ways of handling the problems of heteroscedasticity and autocorrelation; Derivation of an altenative estimator that is best linear unbiased Sticking to the OLS estimator but somehow adjust the standard erors to allow heteroscedasticity or/and autocorrelation Modify your model specification Deriving an alternative estimator We transform the model such that it satisfies the Gauss-markov conditions again We obtain error terms that are homoscedastic and exhibit no autocorrelation This estimator is referred to as the generalized least squares (GLS) estimator. Hypothesis testing pg. 84-85 Testing heteroscedasticity, Testing equality of two unknown variances, H 0 : A2 B2 H 1 : A2 B2 For one sided; H0 : A2 B2 Goldfeld-Quandt test The Breusch-Pagan test – lagrange multiplier test for heteroscedasticity The white test – most common one but has limited power against a large number of alternatives Multiplicative heteroscedasticity- has more power but only against a limited number of alternatives. illustration Labour: total employment Capital: total fixed assets Wage: total wage costs divided by number of workers Output: value added OLS result linear model Dependent variable: labour Variable estimate st.error t-ratio Constant 287.72 19.64 14.648 Wage -6.742 0.501 -13.446 Output 15.40 0.356 43.304 Capital -4.590 0.269 -17.067 S=156.26 R2=0.9352 Adj.R2=0.9348 F=2716.02 auxiliary regression Breusch-Pagan test Dependent variable: ei2 Variable estimate st.error t-ratio Constant -22719.51 11838.88 -1.919 Wage 228.86 302.22 0.757 Output 5362.21 214.35 25.015 Capital -3543.51 162.12 -21.858 S=94182 R2=0.5818 Adj.R2=0.5796 F=262.05 High t-ratios and relatively high R2 indicate that the error variance is unlikely to be constant. Breusch-Pagan test statistic = 569 (N) x R2=331.0 As the asymtotic distribution under the null hypothesis is a chi-squared with 3 degrees of freedom. This implies rejection of homoscedasticity. Use logarithms to alleviate this problem First step in handling the heteroscedasticity problem is to consider loglinear model, If the production function is Cobb Douglas type Q=AKαLβ OLS results loglinear model Dependent variable: log(labour) Variable estimate st.error t-ratio Constant 6.117 0.246 25.089 logWage -0.928 0.071 -12.993 logOutput 0.990 0.026 37.487 logCapital -0.004 0.019 -0.197 S=0.465 R2=0.8430 Adj.R2=0.8421 F=1011.02 A more general test is the White test. Run an auxiliary regression of squared OLS residuals upon all original regressors; auxiliary regression Breusch-Pagan test Dependent variable: ei2 Variable estimate st.error Constant 2.545 3.003 Log(Wage) -1.299 1.753 log(Output) -0.904 0.560 Log(Capital) 1.142 0.376 log2Wage 0.193 0.259 log2Output 0.138 0.036 log2Capital 0.090 0.014 logWagelogoutput 0.138 0.163 Logwagelogcapital -0.252 0.105 logoutputlogCapital -0.192 0.037 S=0.851 R2=0.1029 Adj.R2=0.0884 t-ratio 0.847 -0.741 -1.614 3.039 0.744 3.877 6.401 0.849 -2.399 -5.197 F=7.12 Auxiliary regression multiplicative heteroscedasticity Dependent variable: logei2 Variable estimate st.error t-ratio Constant -3.254 1.185 -2.745 logWage -0.061 0.344 -0.178 logOutput 0.267 0.127 2.099 logCapital -0.331 0.090 -3.659 S=2.241 R2=0.0245 Adj.R2=0.0193 F=4.73 F values leads to rejection of the null hypothesis of homoscedasticity. Autocorrelation When two or more consecutive error terms are correlated OLS remain unbiased but it becomes inefficient and its standard errors are estimated in the wrong way Occurs only when using time series data First order autocorrelation t ot 1 ut error term is assumed to depend upon its predecessor as follows; o t ot 1 ut ut is an error term with mean zero and canstant variance and exhibits no serial correlation o |ρ|<1 first order autoregressive process is stationary(means, variance and covariances of εt do not change over time Testing for first order autocorrelation When ρ=0 no autocorrelation is present Breusch-Godfrey Lagrange multiplier test Durbin-Watson test T dw 2 ( e e ) t t 1 t 2 T e t 1 2 t dw ≈ 2-2ρ dw close to 2 indicates that first order autocorrelation coefficient ρ is close to 0. if dw is much smaller than 2 this is an indication for positive correlation What to do when you find A.correlation In many cases finding of autocorrelation is an indication that the model is misspecified. The most natural way is not to change your estimator from OLS to EGLS but to change your model. Three types of misspecification may lead to a finding of autocorrelation in your OLS residuals: Dynamic misspecification Omitted variables Functional form misspecification Endogeneity Some of explanatory variables can be correlated with the error term, such that OLS estimator is biased and inconsistent Autocorrelation with a lagged dependent variable; yt 1 2 xt 3 yt 1 t t t 1 t yt 1 2 xt 3 yt 1 t 1 t If ρ≠0 OLS no longer yields consistent estimators. A possible solution is the use of maximum likelihood or instrumental variables techniques Durbin-Watson test is not valid Use Breusch-Godfey lagrange multiplier test for autocorrelation, or Lagrange multiplier test Maximum likelihood Instrumental Variable Simultaneity When the model of interest contains behavioural parameter, usually measuring the causal effects of changes in the explanatory variables, and one or more of these explanatory variables are jointly determined with the left hand side variable. Illustration The instrumental variables estimator It is clear that people with more education have higher wages. It is less clear whether this positive correlation reflects a causal effect of schooling, or that individuals with a greater earnings capacity have chosen for more years of schooling. If the latter possibility is true, the OLS estimates on the return to schooling simply reflect differences in unobserved characteristics of working individuals and an increase in a persons schooling due to an exogenous shock will have no effect on this persons wage. The problem of estimating the causal effect of schooling upon earnings Most studies are based upon the human capital earnings function; wi 1 2 Si 3 Ei 4 Ei2 i Wi denotes the log of individual earnings, Si denotes years of schooling and Ei denotes years of experience in the absense of information on actual experience, Ei sometimes replaced by potential experience measured as age-Si-6 Which variable serve as instrument. And instrument is thought of as a variable that affects the cost of schooling and thus the choice of schooling but not earnings. There is a long tradition of using family background variables, e.g. parents education as an instrument. Table reports the results of an OLS regression of an individuals log hourlywage upon years of schooling, experience and experience squared and three dummy variables indicating whether the individual was black, lived in a metropolitan area and lived in the south Wage equation estimated by ols Dependent variable: log(wage) Variable Constant Schooling Exper Exper2 Black Smsa South S=0.374 estimate Standard error 4.7337 0.0676 0.0740 0.0035 0.0836 0.0066 -0.0022 0.0003 -0.1896 0.0176 0.1614 0.0156 -0.1249 0.0151 R2=0.2905 Adj. R2=0.2891 t-ratio 70.022 21.113 12.575 -7.050 -10.758 10.365 -8.259 F=204.93 Reduced form for schoolingestimated by ols Dependent variable: schooling Variable estimate Constant -1.8695 Age 1.0614 age2 -0.0188 Black -1.4684 Smsa 0.8354 South -0.4597 Lived near coll. 0.3471 S=2.5158 R2=0.1185 Standard error 4.2984 0.3014 0.0052 0.1154 0.1093 0.1024 0.1070 Adj. R2=0.1168 t-ratio -0.435 3.522 -3.386 -12.719 7.647 -4.488 3.244 F=67.29 Wage equation estimated by IV Dependent variable: log(wage) Variable estimate Standard error Constant 4.0656 0.6085 Schooling 0.1329 0.0514 Exper 0.0560 0.0260 Exper2 -0.0008 0.0013 Black -0.1031 0.0774 Smsa 0.1080 0.0050 South -0.0982 0.0288 Instruments: age, age2, lived near college Used for exper, exper2 and schooling OLS undeestimates the true causal effect of schooling t-ratio 6.682 2.588 2.153 -0.594 -1.133 2.171 -3.413 2SLS GMM This approach estimates the model parameters directly from the moment conditions that are imposed by the model. These conditions can be linear in the parameters but quite often are nonlinear. The number of moment conditions should be at least as large as the number of unknown parameters The great advantage of GMM is that 1- it doesn’t require distributional assumption, like normality 2-it can allow for heteroscedasticity of unknown form 3-it can estimate parameters efen if the model cannot be solved analytically from the first order condition Consider a linear model; yi xi i With instrument vector zi the moment conditions are E i zi E( yi xi) zi 0 If εi is i.i.d. the optimal GMM is the instrumental variables estimator Consider the consumption based pricing model. Assume that there are risky assets as well as riskless asset with certain return With one riskless asset and ten risky portfolios provide eleven moment conditions with only two parameters to estimate These parameters can be estimated using the identity matrix as a suboptimal weighting matrix, using the efficient two-step GMM estimatoror using iterated GMM estimator Folowing Table presents the estimation results on the basis of the monthly returns using one step GMM and iterated GMM. GMM estimation results consumption based asset pricing model δ γ ξ (df=9) One step GMM Estimate s.e. Iterated GMM Estimate s.e. 0.6996 (0.1436) 0.8273 (0.1162) 91.4097 4.401 (38.1178) (p=0.88) 57.3992 5.685 (34.2203) (p=0.77) The γ estimates are huge and rather imprecise. For the iterated GMM procedure, for example a 95% confidence interval for γ based on the approximate normal distribution is as large as (-9.67, 124.47). The estimated risk aversion coefficients of 57.4 and 91.4 are much higher than what is considered economically plausible. To investigate the economic value of the model, it is possible to compute pricing errors One can directly compute the average expected excess return according to the model, simply by replacing the population moments by the corresponding sample moments and using the estimated values for δ and γ. On the other hand the average excess returns on asset I can be computed from the data. (verbeek 157) Maximum Likelihood Stronger than GMM approach It assumes knowledge of the entire distribution, not just of a number of moments If the distribution of a variable yi conditional upon a number variables xi, is known up to a small number of unknown coefficients, we can use this to estimate these unknown parameters by choosing them in such a way that the resulting distribution corresponds as well as possible. The conditional distribution of an observed phenomenon (the endogenous variable) is known, except for a finite number of unknown parameters. These parameters will be estimated by taking those values for them that give the observed values the highest probability, the highest likelihood. Specification Tests The Wald, the likelihood ratio and Lagrange multiplier principle. The wald test is generally applicable to any estimator that is consistent and asymptotically normal. The likelihood ratio (LR) principle provides an easy way to compare two alternative nested models Lagrange multiplier (LM) test allow one to test restrictions that are imposed in estimation. This makes the LM approach particularly suited for misspecification tests where a chosen specification of the model is tested for misspecification in several directions (like heteroscedasticity, non-normality or omitted variables) Suppose that we have a sample of N=100 balls, of which 44% are red. If we test the hypothesis that p=0.5, we obtain wald, LR and LM test statistics of 1.46, 1.44 and 1.44 respectively. The 5% srtical value taken from the asymptotic chi-squared distribution with one degree of freedom is 3.84, so that the null hypothesis is not rejected at the 5% level with each of the three tests. Testing for Omitted variables in non-linear models Hhh yi xi zi i Zi is a J-dimensional vector of explanatory variables, independent of εi. The null hypothesis states H0:γ=0. note that under the assumptions above the F-test provides an exact test for γ=0 and there is no real need to look at asymptotic tests. We can compute LM test for nonlinear models. Testing for heteroscedasticity The squared OLS or maximum likelihood residuals Testing for autocorrelation Auxiliary regression of the OLS or ML. Quasi-maximum likelihood and moment conditions tests Conditional moment test Information matrix test Testing for normality Models with limited dependant variable Binary Choice Models; To overcome the problems with the linear model, there exist a class of binary choice models which are; Probit model Logit model Linear probability model An underlying latent model It is possible to derive a binary choice model from underlying behavioural assumptions. This leads to a latent variable representation of the model.1 yi* xi i Because yi* is unobserved , it is referred to as a latent variable. Our assumption is that an individual choices to work if the utility difference exceeds a certain threshold level, which can be set to zero without loss of generality. Consequently, we observe yi=1 (job) if and only if yi*>0 and yi=0(no job) otherwise. Thus we have2 Pyi 1 P yi* 0 Pxi i 0 P i xi F (xi) yi* xi i ,.. i ~ NID(0,1) yi=1 if yi*>0 yi=0 if yi*≤0 Where εi independent of all xi. For the logit model, the normal distribution is replaced bv the standard logictic one. Most commonly, the parameters in binary choice models (or limited dependent variable models in general) are estimated by the model of maximum likelihood. Goodness of fit There is no single measure for the goodness of fit in binary choice models that contains only a constant as explanatory variable. There are two goodness of fit measure defined in the lit.1 An alternative way to evaluate goodness of fit is comparing correct and incorrect predictions.2 wr1 R 1 wr0 2 p Because it is possible that the model predicts worse than the simple model one can have wr1>wr0, in which case the Rp2 becomes negative. This is not good sign for the predictive quality of the model. Illustration verbeek page; 197 Specification tests in binary choice models The likelihood function has to be correctly specified. This means that we must be sure about the entire distribution that we impose on our data. Deviations will cause inconsistent estimators and in binary choice models this typically arises when the probability that yi=1 is misspecified as a function of xi. Usually such misspecifications are motivated from the latent variable model and reflect heteroscedasticity or non-normality of εi. In addition we may want to test for omitted variables without having to re-estimate the model. The most convenient framework for such tests is the lagrange multiplier framework.1 Multi response models 1-ordered response models; yi* xi i yi j...if .. j 1 yi* j The probability that alternative j is chosen is the probability that the latent variable yi* is between two boundaries γj-1and γj . Assumng that εi is i.i.d. standard normal results in the ordered probit model. The logistic distribution gives the ordered logit model. For M=2 we are back at the binary choice model. Illustration; married females, how much would you like to work? verbeek 203 Illustration; willingness to pay for natural areas Multinominal models; verbeek 208 Models for count data In certain applications, we would like to explain the number of times a given event occurs, for example, how often a consumer visits a supermarket in a given week, or the number of patents a firm has obtained in a given year. Outcame might be zero for a subtantial part of the population. While the outcomes are discrete and ordered, there are two important differences with ordered response outcomes. First,he values of the outcome have a cardinal rather than just an ordinal meaning (4 is twice as much as 2 and two is twice as much as 1). Second, there is no natural upper bound to the outcomes Poisson regression model. See page 211 Illustration, see page 215 patent and Rand D expenditures Tobit Models The dependant is continuous, but its range may be constraint. Most commonly this occurs when the dependant variable is zero for a substantial part of the population but positive for the rest of the population. Expenditures on durable goods, hours of work and the amount of foreign direct investment of a firm. Estimation is usually done through maximum likelihood Ilustration ; expenditures on alcohol and tobacco Extensions of Tobit Models It is coceivable that households with many children are less likely to have positive expenditures, while if a holiday is taken up, the expected level of expenditures for such households is higher. The tobit II model; Traditional model to describe sample selection problems is the tobit II model also referred to as the selection model. In this h sample x context, it consists of a linear wage equation * i 2 2i 2i wi* 1 x1i 1i where x1i denotes a vector of exogenous characteristics (age, education, gender,…) and wi* denotes person i s wage. The wage wi* is not observed for people that are not working. To describe whether a person is working or not a second equation is specified, which is of the binary choice type. That is; Dd h 2 x2 i 2i * i Where we have the following observation rule: wi=wi*, hi=1 if hi*> 0 Wi not obseved, hi=0 if hi*≤0 Where wi denotes person i s actual wage. the binary variable hi simply indicates working or not working. The model is completed by a distributional assumption on the unobserved erors (ε1i, ε 2i), usually a bivariate normal distribution with expectations zero, variances σ12, σ22 respectively, and a covariance σ12 . The model is a standard probit model, describing the choice working or no working Illustration: expenditures on Alcohol and Tobacco, verbeek page 233 Sample selection bias When the sample used in a statistical analysis is not randomly drawn from a larger population, selection bias may occur. If you interview people in the restaurant and ask how often they visit it, thoe that go there every day are much more likely to end up in the sample than those that visit every two weeks Nonresponse may result in selection bias, people refuse to report their income are typicaly those with relatively high or low income levels. Self selection of economic agents. Individuals select themselves into a certain state, working, union member, public sector employment, in a nonrandom way on the basis of economic arguments. In general who benefit most from being in a certain state will be more likely to be in this state. Estimating treatment effects Another area where sample selection plays an important role is in the estimation of treatment effects. A treatment effect refers to the impact of receiving a certain treatment upon a particular outcome variable, for example the effect of participating in a job training programme on future earnings. Because this effect may be different across individuals and selection into the training programme may be nonrandom. The treatment effect is simply the coefficient for a treatment dummy variable in a regression model. Because interest is in the causal effect of the treatment, we need to worry about the potential endogeneity of the treatment dummy. We need to worry about selection into treatment. Duration models To explain the time it takes for an unemployed person to find a job. It is Often used in labour market studies. Illustration; duration of the firm-bank relationship Verbeek page 250 Univariate time series models Yt depends linearly on its previous value Yt-1 .that is, Yt Yt 1 t Where εt denotes a serially uncorrelated innovation with a mean of zero and a constant variance. This process is called a first order autoregressive process or AR(1). it says that the current valueYt equals a constant plus θ times its various value plus an unpredictable component εt Assume that |θ|<1. The process for εt is an important building block of time series models and it is referred to as white noise process. In this chapter it will always denote such s process that is homoscedastic and exhibits no autocorrelation The expected value of Yt can be solved from EYt EYt 1 Assuming that E{Yt } does not depend on t, allow us to write EYt yt Yt 1 yt yt 1 t , Yt j t j j 0 Writing time series models in terms of yt rather then Yt is often notationally more convenient. Joint distribution of all values of Yt is characterized by the so-called autocovariances, the covariances between Yt and one of its lags,Yt-k If we impose that variances and autocovariances do not depend on the index t. this is so called stationary assumption. Another simple time series model is the first order moving average process or MA(1) process, given by Yt t t 1 This says that Y1 is a weighted average of ε1 and ε0 ,Y2 is a weighted average of ε2 and ε1 The values of Yt are defined in terms of drawings from the white noise process εt The simple moving average structure implies that observations that are two or more periods apart are uncorrelated. this expression is referred to as the j Yt t j moving average representation of j 0 the autoregressive process AR process is writen as an infinite order moving average processes. We can do so provided that |θ|<1. For some purposes a moving average representation is more convenient than an autoregressive one. We assume that the process for Yt is stationary. Stationarity A stochastic process is said to be strictly stationary if its properties are unaffected by a change of time origin We will be only concerned with the means, variances and covariances of the series, and it is sufficient to impose that these moments are independent of time, rather than the entire distribution. This is referred to as weak stationarity or covariance stationarity. Strict stationarity is stronger as it requires that the whole distribution is unaffected by a change in time horizon, not just the first and second order moments. Autocovariance and autocorrelation1 Defining autocorrelation ρk as, k covYt , Yt k k V Yt 0 The autocorrelation considered as a function of k are referred to as autocorrelation function(ACF) or sometimes correlogram of the series Yt. The ACF plays a major role in modelling the dependencies among observations, because it characterizes the process describing the evolution of Yt over time.1 From the ACF we can infer the extent to which one value of the process is correlated with previous values and thus the length and strength of the memory of the process. It indicates how long and how strongly a shock in the process εt affects the values of Yt. For the AR(1) process, Yt Yt 1 t we have autocorrelation coefficients while for the MA(1) process k k we have Yt t t 1 1 and. k 0....k 2,3,4.... 2 1 Consequently, a shock in an MA(1) process affects Yt in two periods only, while a shock in the AR(1) process affects all future observations with a decreasing affect. Illustration. Verbeek page 260-261. General ARMA Processes We define more general autoregressive and moving average processes MA(q) yt t 1 t 1 ... q t q AR(p) yt 1 yt 1 2 yt 2 ... p yt p t ARMA(p,q) yt 1 yt 1 2 yt 2 ... p yt p t 1 t 1 2 t 2 ... q t q ( L) yt (L) t Often it is convenient to use the lag operator, denoted by L. it is defined by, Lyt yt 1 Most of the time lag operator can be manipulated just as if it were a 2 constant; L yt L( Lyt ) Lyt 1 yt 2 So that, more generally. Lp yt yt p Using the lag operator allow us to write ARMA models in a concise way, AR(1) yt j L t j t j j 0 j j 0 yt ( ) j yt j 1 t MA(1) j 0 For more parsimonious representation, we may want to work with an ARMA model that contains both an AR and MA part. The general ARMA model can be written as (L) yt (L) t Lag polynomials Common roots, verbeek page 164-265 Stationarity and Unit Roots Stationarity of a stochastic process requires that the variances and autocovariances are finite and independent of time. See Verbeek page 266, 267 A series which becomes stationary after first differencing is said to be integrated of order one, denoted I(1). If Δyt is described by a stationary ARMA(p,q) model, we say that yt is described by an autoregressive integrated moving average (ARIMA) model of order p, 1, q or in short an ARIMA (p,1,q) model. First differencing quite often can transform a nonstationary series into a stationary one. If a series must be differenced twice before it becomes stationary, then it is said to be integrated of order two, denoted I(2) and it must have two unit roots. Testing for Unit Roots Dickey-Fuller test, to test the null hypothesis that Ho: θ=1 (a unit root) , it is possible to use the standard t-statistic given by DF 1 se ( ) ADF test: testing for unit roots in higher order autoregressive models1 KPSS test: the null hypothesis of trend stationary specifies that the variance of the random walk component is zero. The test is actually a lagrange multiplier test. t statistic is given by, T KPSS St2 / ˆ 2 t 1 ILLUSTRATIONS, verbeek page 276 ESTIMATION OF ARMA MODELS Least Squares Maximum Likelihood1 CHOOSING A MODEL Before estimating any model it is common to estimate autocorrelation and partial autocorrelation coefficients directly from the data. Often this gives some idea which model might be appropriate. After one or more models are estimated, their quality can be judged by checking whether the residuals are more or less white noise, and comparing them with alternative specifications. These comparisons can be based on statistical significance tests or the use of particular model selection criteria. The Autocorrelation Function. ACF Describes the correlation between Yt and its lag Yt-1 as a function of k. The partial autocorrelation function DIAGNOSTIC CHECKING As a last step in model building cycle some checks on the model adequacy are required. Possibilities are doing a residual analysis and overfitting the specified model. For example if ARMA(p,q) model is chosen on the basis of the sample ACF and PACF, we could also estimate an ARMA(p+1,q) and an ARMA(p,q+1) model and test the significance of the additional parameters. A residual analysis is usually based on the fact that the residuals of an adequate model should be approximately white noise. A plot of the residuals can be a useful tool in checking for outliers. Moreover the estimated residual autocorrelation are usually examined. For A white noise series the autocorrelations are zero. Therefore the significance of the residual autocorrelations is often checked by comparing with approximate two standard error bounds 2 / T . To check the overall acceptability of the residual autocorrelations, the Ljung-Box portmanteau test statistics is used; o K 1 2 Qk T T 2 rk k 1 T k If a model is rejected at this stage, the model building cycle has to be repeated. Criteria for model selection p q 1 Akaike s information Criterian (AIC) AIC logˆ 2 T p q 1 2 ˆ BIC log logT Bayesian Information Criterian T Both criteria are likelihood based and represent a different tradeoff between fit as measured by the loglikelihood value, and parsimony, as measured by the number of free parameters, p+q+1. Usually the model with the smallest AIC and BIC value is preferred. ILLUSTRATION verbeek page 286. 2 PREDICTING WITH ARMA MODELS A main goal of building a time series model is predicting the future path of economic variables. The optimal predictor Our criterian for choosing a predictor from the many possible ones is to minimize the expected quadratic prediction error. Prediction accuracy It is important to know how accurate this prediction is The accuracy of the prediction decreases if we predict further into the future. The MA(1) model gives more efficient predictors only if one predicts one period ahead. More general ARMA models however will yield efficiency gains also in further ahead predictors.1 For the computation of the predictor, the autoregressive representation is most convenient. The informational value contained in an AR(1) process slowly decays over time For a random walk, with θ=1, the forecast error variance increase linearly with the forecast horizon. In practical cases, the parameters in ARMA models will be unknown and we replace them by their estimated values. This introduces additional uncertainty in predictors. However this uncertainty is ignored. ILLUSTRATION, the expectations theory of the term structure AUTOREGRESSIVE CONDITIONAL HETEROSCEDASTICITY In financial times series one often observes what is referred to as volatility clustering In this case big shocks (residuals) tend to be followed by big shocks in either direction, and small shocks tend to follow small shocks. ARCH: engel 1982, the variance of the error term at time t depends on the squared error terms from previous periods. The most simplest form is; E | t 1 2 t 2 t 2 t 1 This specification does not imply that the process for εt is nonstationary. It just says that the squared values of 2 .and, 2 are correlated. t t 1 The unconditional variance of εt is given by and has a stationary solution 2 E 2 E 2 t t 1 provided that 0≤α<1. Note 1 that unconditional variance does not depend on t t2 1 t21 2 t22 ... p t2 p ( L) t21 2 The ARCH(1) model is easily extended to an ARCH(p) process The effect of a shock j periods ago on current volatility is determined by the coefficient αj In ARCH(p) model old shocks of more than p periods ago have no effect on current volatility. GARCH MODELS ARCH models have been generalized in many different ways GARCH(p,q) model can be written as; p j 2 t j 1 q 2 t j j t2 j j 1 or t2 ( L ) t21 ( L ) t21 Where α(L) and β(L) are lag polynomials. In practice a GARCH (1,1) specification often perform very well. It can be written as; 2 t 2 t 1 2 t 1 which has only three parameters to estimate. Non negative of t2 requires that ϖ, α and β are non negative If we define the surprise in the squared innovations as t t2 t2 the GARCH(1,1) process can be written as; t2 ( ) t21 t t 1 It implies that the effect of a shock on current volatility decreases over time. 1 An important restriction of the ARCH and GARCH specifications is their symmetry; only absolute values of the innovations matter, not their sign. That is a big negative shock has the same impact on future volatility as a big positive shock of the same magnitude. An interesting extension is toward asymmetric volatility models, in which good news and bad news have a different impact on future volatility. Distinction between good and bad news is more sensible for stock markets than for exchange rates. An asymmetric model should allow for the possibility that an unexpected drop in price has a larger impact on future volatility than an expected increase in price of similar magnitude. A fruitful approach to capture such asymmetry is provided by nelson s exponential GARCH or EGARCH t 1 t 1 log log t 1 t 1 2 t 2 t 1 Where α, β and γ are constant parameters. The EGARCH model is asymmetric as log as γ ≠0. when γ < 0 positive shocks generate less volatility than negative shocks It is possible to extend the EGARCH model by including additional lags. Estimation Quasi maximum likelihood estimation Feasible GLS In financial markets, GARCH models are frequently used to forecast volatility of returns, which is an important input for investment, option pricing, risk management and financial market regulation ILLUSTRATION: Volatility in daily exchange rates verbeek page 303 Multivariate Time Series Models Dynamic models with stationary variables let us consider two stationary variables Y1 and X1, and assume that it holds that Y1 Yt 1 0 X t 1 X t 1 t We can think of Yt as company sales and Xt as advertising both in month t. if we assume that εt is a white noise process, independent of Xt and Yt values. The above relation referred to as an autoregressive distributed lag model. We use OLS to estimate it. The interesting element in the model is that it describes the dynamic effects of a change in Xt upon current and future values of Yt. Taking partial derivatives, we can derive that the immediate response is given by Y / X t t 0 This is referred to as the impact multiplier. An increase in X with one unit has an immediate impact on Y of 0 units. The effect after one period is Yt 1 / X t Yt / X t 1 0 1 And after two periods Yt 2 / X t Yt 1 / X t 1 (0 1 ) This shows that after the first period, the effect is decreasing if |θ|<1. imposing this so-called stability condition allows us to determine the long run effect of a unit change in Xt. It is given by the long-run multiplier or equilibrium multiplier. 0 o(0 1 ) (0 1 ) ..... 0 1 0 (1 ...)(0 1 ) 1 2 This says that if advertising Xt increase with one unit, the expected cumulative increase in sales is given by 0 1 1 If the increase in Xt is permanent, the long run multiplier also has the interpretation of the expected long run permanent increase in Yt. An alternative way to formulate ARDL model is; Yt 0X t (1 )[Yt 1 X t 1 ] t This formulation an example of an ECM. It says that the change in Yt due to current change in Xt, plus an error correction term. If Yt-1 is above the equilibrium value that corresponds to Xt-1, that is if the equilibrium error in square brackets is positive, an additional negative adjustments in Yt is generated. The speed of adjustment is determined by 1-θ, which is the adjustment parameter.(1-θ>0) It is also possible to estimate ECM by OLS. Both the ARDL model and the ECM assume that the values of Xt can be treated as given, that is , as being uncorrelated with the equations error terms. ARDL is describing the expected value of Yt given its own history and conditional upon current and lagged values of Xt. If Xt is simultaneously determined with Yt, OLS in ARDL and ECM would be inconsistent. Partial adjustment model see verbeek page 312 Generalized ARDL see verbeek 312 MODELS WITH NON-STATIONARY VARIABLES Spurious regression Consider two variables,Yt and Xt, generated b two independent random walks Yt Yt 1 1t X t o X t 1 2t a researcher may want to estimate a regression model such as Yt X t t Yt Yt 1 1t X t X t 1 2t Yt X t t The results from this regression are likely to be characterized by a fairly high R2 statistic, highly autocorrelated residuals and significant value of β. This is the well known problem of nonsense or spurious regression. Two independent nonstationary series are spuriously related due to the fact that they are both trending. Including lagged variables in the regression is sufficient to solve many of the problems associated with the spurious regression. COINTEGRATION Consider two series integrated of order one,Yt and Xt, and suppose that a linear relationship exist between them. This is reflected in the proposition that there exist some value β such that Yt- βXt is I(0), although Yt and Xt are both I(1). In such a cse it is said that Yt and Xt are cointegrated and they share common trend. The presence of a cointegrating vector can be interpreted as the presence of a long run equilibrium relationship. If Yt and Xt are cointegrated the error term is I(0). If not, error term will be I(1). Hence we can test for the presence of a cointegartion relationship by testing for a unit roo in the OL residuals. It seems that this can be done by using Dickey-Fuller test1 An alternative test for cointegration is basd on the usual Durbin watson statistic referred to as cointegrating regression durbin watson test or CRDW (page 316) The presence of a cointegrating relationship implies that there exist a error correction model that desribes the short run dynamics consistently with the long run relationship. COINTEGRATION AND ECM The granger representation theorem states that if a set of variables are cointegrated then there exist a valid error-correction representation of the data. Yt 1X t 1 (Yt 1 X t 1 ) t If Yt and xt are both I(1) but have a long run relationship, there must be some force which pulls he equilibrium error back towards zero. The ECM describes how Yt and Xt behave in the short run consistent with a long run cointegrating relationship. The repreenation theorem also holds conversely. If Yt and Xt ae both I(1) and have error correction representation, then they are necessarily cointegrated. The concept of cointegration can be applied to nonstationary time series only. ILUSTRATION; Long run purchasing power parity VECTOR AUTOREGRESSIVE MODELS A VAR describes the dynamic evolution of a number of variables from their common history. If we consider two variables,Yt and Xt, say the VAR consists of two equations. A first order VAR would be given by; Yt 1 11Yt 1 12 X t 1 1t X t 2 21Yt 1 22 Xt 1 2t Yt 1 11Yt 1 12 X t 1 1t Xt 2 Y 21 t 1 22 X t 1 2t Where ε1t and ε2t are two white noise processes (independent of the history of Y and X) that may be correlated. If, for example, θ12≠0 it means that the history of X helps explaining Y. The VAR model implies univariate ARMA models for each of its comonents. The advantages of considering the components simultaneously that the model may be more parsimonious and includes fewer lags, and that more accurate forecasting is possible, because the information set is extended to also include the history of the other variables. Sims (1980) has advocated the use of VAR models instead of structural simultaneous equations models because the distinction between endogenous and exogenous variables does not have to be made a priori, and arbitrary constraints to ensure identification are not required. We can use VAR model for forecasting in a straightforward way To estimate A VAR we can simply use OLS equation by equation which is consistent because the white noise terms are assumed to be independent of the history of Yt. Impulse Response Function It measures the response of Yj,t+s to an impulse in Y1t, keeping constant all other variables dated t and before. COINTEGRATION: The Multivariate Case We have a set of k I(1) variables, there may exist up to k-1 independent linear relationships that are I(0). While any linear combination of these relationships is also I(0). Vector error correction model Johansen developed a maximum likelihood estimation procedure, which allows one test for the number of cointegrating relations. Trace test and maximum eigenvalue test. They are actually likelihood ratio tests.1 ILLUSTRATION: long run purchasing power parity, page 331 Illustration Mt: log of real M1 money balances Inflt: quarterly inflation rate (in % per year) Cpr t: commercial paper rate yt: log real GDP (in billions of 1987 dollars) tbr t: treasury bill rate We can think three possible cointegrating relationships governing the long run behavior of these variables First we can specify an equation for money demand as; mt 1 14 yt 15tbrt 1t where β14 denotes the income elasticity. It can be expected that it is close to unity, corresponding to a unitary income elasticity., and β15 < 0. Second, if real interest rates are stationary we can expect that inf lt 2 25tbrt 2t Corresponds to a cointegrating relationship with β25 = 1. this is referred to as the Fisher relation, where are using actual inflation as a proxy for expected inflation. Third, it can be expected that the risk premium, as measured by the difference between the commercial paper rate and the treasury bill cpr tbr rate, is stationary such that, t 3 cprto 3 35tbrt 3t 35 t 3t with β35 =1 Before proceeding to the vector process of these five variables, let us consider the OLS estimates of the above three regressions. Univariate cointegrating regressions by OLS, intercept estimates not reported, standard errors in the parenthese Money Demand -1 0 0 0.423 mt: Inflt: cpr t: yt: o tbr t: R2 dw ADF(6) Fisher Equation 0 -1 0 0 Risk Prem. 0 0 -1 0 (0.016) -0.031 (0.002) 0.815 0.199 -3.164 0.558 (0.053) 0.409 0.784 -1.188 1.038 (0.010) 0.984 0.705 -3.975 Note that the OLS standard errors are inappropriate if the variables in the regression are integrated. Except for the risk premium equation, the R2 are not close to unity, which is an informal requirement for a cointegrating regression. The empirical evidence for the existence of the suggested cointegration relationships between the five variables is somewhat mixed. Only for the risk premium equation we find and R2 close to unity, a sufficiently high dw statistic and a significant rejection of the ADF test for a unit root in the residuals. For the two other regressions there is little reason to reject the null hypotheses of no cointegration. Potentially this is caused by the lack of power of the tests that we employ, and it is possible that a multivariate vector analysis provides stronger evidence for the existence of cointegrating relationships between these five variables. Plot of the residuals verbeek page 335 The first step in the Johansen approach involves testing for the cointegrating rank r. to compute these tests we need to choose the maximum lag length p in the vector autoregressive model. Choosing p too small will invalidate the tests and choosing p too large may result in a loss of power Trace and maximum eigenvalue tests for cointegration test statistics Null hypothesis H0: r=0 H0: r≤1 H0: r≤2 H0: r≤3 H0: r=0 H0: r≤1 H0: r≤2 H0: r≤3 See page 337-338 Alternative p=5 λtrace- statistic H1:r≥1 108.723 H1:r≥2 59.189 H1:r≥3 29.201 H1:r≥4 13.785 p=6 5%critical value 127.801 75.98 72.302 53.48 35.169 34.87 16.110 20.18 49.534 55.499 734.40 29.988 37.133 28.27 15.416 19.059 22.04 9.637 11.860 15.87 λmax- statistic H1:r=1 H1:r=2 H1:r=3 H1:r=4 We impose two restrictions and, assuming that the money demand and risk premium relationships are the most likely candidates. We shall impoase that mt and cprt have coefficients of -1 ,0 and 0, -1, respectively. Economically we expect that inflt does not enter in any of the cointegrating vektors. With these two restrictions, the cointegrating vectors are ectimated by maximum likelihood, jointly with the coefficients in the vector error-correction model. The results for the cointegrating vectors are presented in table. ML estimates of cointegrating vectors (after normalization) based on VAR with p=6 (standard errors in the parentheses) Money Demand Risk Premium mt: -1 0 inflt: -0.023 0.041 (0.006) (0.031) cpr t: 0 -1 yt: 0.425 -0.037 (0.033) (0.173) tbr t: -0.028 1.017 (0.005) (0.026) Likelihood value: 808.2770 It is possible to test our a priori cointegrating vectors by using likelihood ratio tests. These tests require that the model is re estimated imposing some additional restrictions on the cointegrating vectors. This way we can test the following hypotheses, H 0a : 12 0, 14 1 H 0b : 22 24 0, 25 1, H 0c : 12 22 24 0, 14 25 1 The loglikelihood values for the complete model, estimated imposing H0a , H0b andH0c respectively are given by 782.3459, 783.7761 and 782.3196. the likelihood ratio test statistics defined as twice the difference in loglikelihood values, for the three null hypothesis are thus given by 51.86,49.00 and 51.91. compared with the chi-squared critical values with 3,2 or 5 degrees of freedom, each of the hypotheses is clearly rejected. As a last step we consider the VECM for this system. This corresponds to a VAR of order p-1=5 for the first differenced series, with the inclusion of two error correction terms in each equation. One for each cointegrating vector. The number of parameters estimated in this VECM is well above 100, so we shall concentrate on a limited part of the results only. The two error correction terms are given by ecm1t mt 0,023inf lt 0.425yt 0,028tbrt 3.362 ecm2t cprt 0,04inf lt 0,037yt 1.017tbrt 0,687 The adjustment coefficients in 5x2 matrix with their associated standard errors are reported in the table. The long run money demand equation contributes significantly to the short run movements of both money demand and income. The short run behaviour of money demand, inflation and the commercial paper rate appears to be significanatly affected by the long run risk premium relationship Estimated matrix of adjustment coefficients (standard errors in the parentheses) * indicates significance at the 5% level Error correction term Equation ecm1t-1 em2t-2 Δmt 0.0276 0,0090* (0.0104) (0.0024) Δinflt 1.4629 -1.1618* (2.3210) (0.5287) Δcprt -2.1364 0.6626* (1.1494) (0.2618) Δyt 0.0687* 0,0013 (0.0121) (0.0028) Δtbrt -1.2876 0.3195 (1.0380) (0.2365) There is no statistical evidence that treasury bill rate adjusts to any deviation from long run equlibria, so that it could be treated as weakly exogenous Model based on panel data An important adavaantage of the panel data compared to time series and cross-sectional data sets it allows identification of sertain paraameters or questions, without the need to make restrictive assumptions. For example panel data maake it possible to analyse changes on an individual level. Consider a situation in wwhich the average consumption level rises with 2% from ona yeaar to another. Panel data can identfy whether this rise is a result of an increase of 2% for all individuals or an increase of 4% for approximately one half of the individuals and no change for the other half(or any other combinations) Panel data are not only suitable to model or explain why individual units behave differently but also to model why a given unit behaves differently at different time periods. We shall in the sequel index all variables by an I for the individual (i=1,2,3,…,N)and a t for the time period (t=1,2,3,…,T) We can specify a linear model as, yit it xit it where βit measures the partial effects of xit in period t for unit i. Xit is a K-dimensional vector of explanatory variables, not including a constant. This means that the effects of a change in x are the same for all units and all periods, but that the average level for unit I may be different from that for unit j. the αi thus capture the effects of those variables that are peculiar to the i-th individual and that are constant over time. In the standard case, εit is assumed to be independent and identicallyNdistributed over individuals and yit j dij. xit treat the αi as N fixed time, with mean 0 and variance If we j 1 unknown parameters the model is referred to as the standard fixed effects model 2 An alternative approach assumes that the intercepts of the individuals are different but that they can be treated as drawings from a distribution with mean μ and variance 2 . The essential assumption here is that these drawings are independent of the explanatory variables in xit. This leads to random effect model where the individual effects αi are treated as random Most panel data models are estimated under either the fixed effects or the random effects assumption Efficiency of Parameter Estimation Because panel data sets are typically larger than cross-sectional or time series data sets, and explanatory variables vary over two dimensions (individuals and time) rather than one, estimators based on panel dat are quite often more accurate tha from other sources. Even with identical sample sizes the use of a panel data set wil often yield more efficient estimators than a series of independent cross sections (where different units are sampled in each period)1 Identification of parameters A second adavantage of the availability of panel dat is that it reduces identification problem. Estimating the model under fixed effects assumption eliminates αi from the error term and, consequently, eliminates all endogeneity problems relating to it. 1 THE STATIC LINEAR MODEL 1. Fixed effects model This is simply a linear regression model in which the intercept terms vary over the individual units i, yit i xit it Where it is usually assumed that all xit are independent of all εit. We can write this in the usual regression framework by including a dummy variable for each unit i in the model N we thus have a set of N dummy yit j dij xit it variables in the model. The j 1 parameters can be estimated by OLS. The implied estimator for β is referred to as the least squares dummy variable estimator (LSDV) We eliminate the individual effect of αi first by transforming the data yit yi ( xit xi ) ( it i ) This is a regression model in deviations from individual means and does not include the individual effects αi. The transformation that produces observations in deviation from individual means is called the within transformation. The OLS estimator for β obtained from this transformed model is often called the within estimator or fixed effects estimator, and it is exactly identical to the LSDV estimator. We call xit strictly exogenous. A strictly exogeneous variable is not allowed to depend on current, future and past values of the error term. The fixed effects model concentrates on differences within individuals. It is explaining to what extent yit differs from yi and does not explain why yi is different from yj The random effect model It is commonly assumed in regresion analysis that all factors that affect the dependent variable, but that have not been included as regressors, can be appropriately summarized by a random error term. In our case this leads to the assumption that the αi are random factors, independently and identically distributed over individuals. Thus write the random affects model as; yit xit i it i it Where i it is treated as an error term consisting of two components: and individual specific component, which does not vary over time, and the remainder component, which is assumed to be uncorrelated over time. That is all correlation of the error terms over time is attributed to the individual affects of xjs (for all j and s). This implies that OLS estimator is unbiased and consistent. Estimators for the parameter vector β; 1- the between estimator, exploiting the between dimension of the data (differences between individuals) determined as the OLS in a regression of individual averages of y on individual averages of x. the explanatory variables are strictly exogenous and uncorrelated with the individual specific affect αi. 2- the fixed effect estimator, exploiting the within dimension of the data (differences between individuals) determined as the OLS estimator in a regression in deviations from individual means. 3-the OLS estimator, exploiting both dimensions (within and between) but not efficiently. The explanatory variables to be uncorrelated with αi but does not impose that they are strictly exogenous. 4- The random effect (EGLS) estimator, combining the information from the between and within dimensions in an efficient way. Fixed effects or random effects The fixed effects approach is conditional upon the values for αi. If the individuals in the sample are one of a kind and cannot be viewed as a random draw some underlying population. When I denotes countries, large companies or industries, and predictions we want to make are for a particular country, company or industry. In contrast the random effects approach is not conditional upon the individual αis, but integrates them out. In this case, we are usually not interested in the particular value of some persons αi, we just focus on arbitrary individuals that have certain characteristics. The random effects approach allows one to make inference with respect to the population characteristics. Hausman test: tests whether the fixed effects and random effects estimator are significantly different. Goodness of fit The computation of goodness of fit measures in panel data applications is somewhat uncommon. One reason is the fact that one may attach different importance to explaining the within and between variation in the data. Another reason is that the usual R2 or adjusted R2 criteria are only appropriate if the model is estimated by OLS. The goodness of fit measures are not adequate to choose between alternative estimators Robust Inference, page 355 Testing for heteroscedasticity and autocorrelatio pg.357 ILLUSTRATION- 358 . Explaining individual wages Dynamic linear models An autoregressive panel data model Dynamic models with exogenous variables Illustration- wage elasticities of labour demand Non-stationarity, unit roots and cointegration Panel data unit root test Panel data cointegration tests Models with limited dependant variables Binary choice models The fixed effects logit model The random affects probit model Tobit models Incomplete panels and selection bias