Matrix algebra: Matrix of (a x b) dimensions has a rows and b columns. Vector of order n denotes the amount of rows/columns in the vector. For beta <0, X< -tcalc, for beta not equal to 0, |X|>tcalc and use half the significance level. P values in Eviews are assuming a two tailed test, so if a one sided test is used p value must be halved to account for this. Result of AB Transpose rows become columns. Symmetric if base and transpose are equal. Trace is sum of diagonal. AB=BA=Identity, then B is an inverse of A. Data structures: Cross sectional is observations taken at the same point in time. Time series is taken at different points in time. Pooled cross-sectional is cross-sectional at different points in time. Panel is the same cross sectional at different points in time (earnings of the same workers over time). Probability: Prob of A given B Discrete can only be a set number of numbers, Continuous has an infinite amount of numbers within a range. Expected value is the sum of outcomes multiplied by their probability, or the integral of such for a continuous variable. Variance measures how tightly clustered values are around the mean = π 2 Standard deviation always has the same units. Covariance is how strong two variables move together. If independent, will equal 0. Can only be between or equal to -1 and 1. Is equivalent to OLS Estimator: The best coefficients are found by partially deriving the sum of squared residuals equation to find the values that make the error the lowest. B1 is equal to covariance over variance. When using OLS, properties arise. 1.The sample values of the OLS residuals sum to zero. 2. Residual is orthoganal to x 3.Mean of the sample and estimation are equal. Where SST is the total sum squared, SSE is the amount explained by regressors, and SSR is the Basic T Test: 1. State Null and alternative hypothesis 2.State significance level of test 3. State t stat and null distribute k = number of regressors 4. Calculate t stat 5. Find t crit from tables 6. State decision rule and decision “since X>x, we reject the null of (null) in favour of alternative that (alternative)” By substituting B1 into an equation, finding the significance of S can be used to prove/disprove the hypothesis of B1 and B2. Confidence interval: By using Where c is the critical value from the textbook tables and a is the significance level, a confidence interval can be found with the estimated equation. This can also be used to test if a coefficient can be a certain value. F Test: 1. Find the restrictions and impose them, making an unrestricted model and a restricted model (e.g. b1=b2, sub into equation) 1a. State the confidence level 2. Estimate both equations and get the SSR of both models 3. Test statistic Where q is the number of restrictions (= signs) and k is the number of regressors in the unrestricted. 4. Find F crit from tables and calculate f calc. 5. Reject if Fcalc > Fcrit. Used to find if all restrictions apply or if one or more of them doesn’t apply. E.g. b1=b2=b3=0 would fail f test if b1 didn’t = 0 Dummy variables: Intercept dummy variables stand on their own, whereas Slope dummy variables are attached to another variable. E.g. intercept would be a female dummy variable with a coefficient attached, whereas a slope would be a female dummy variable multiplied by an education variable with a coefficient. Therefore, dependent does not change just based off male or female, but also the effect of gendered education on dependent. Perfect Multicollinearity: Arises when there is an exact linear relationship between some or all the regressors. Typically occurs due to mistakes in including dummy variables. Cannot use OLS if present. This can potentially solved by omitting certain dummy variables to form a base category, or by omitting the intercept. Near Multicollinearity: This occurs when there is not an exact linear relationship, but some regressors are still highly correlated. Results in the standard errors for T tests being too large. Makes it more likely to result in null not being rejected. However F tests can still be used. Regression with logs: Log level coefficients measure the % change in y given a one unit change in an X variable. A Log log or log-linear is when both an x and the y are logged. It measures the percentage change in y from a one percent change in x. When using logs, the variable must have a strictly positive range, and must think on whether its more helpful to have variables in % form or level form. Variables measured in % are not logged. Quadratic regression: The marginal effect of a x variable with a power to it is the derivative of that x variable. E.g. when π½1 π₯ + π½2 π₯ 2 marginal effect is equal to Therefore marginal effect varies depending on xi. Relationship involves deriving an estimated equation to find the turning point, when the derivative = 0 This shows that minutes of sleep per week decrease up until age 42.5, where it reaches a minimum, then starts to increase. R^2: R^2 will always increase when more explanatory variables are added. Also, Use when y is logged. Level log measures the change in the level of y when a one percent change in the level of an x variable is made. Adjusted R^2 cannot be used to compare different dependent variables. Information criteria: Using the formula below: Where C and P(k) are a constant and a function defined by the type of Information criteria. AIC(Akaike Information Criteria), HQ(Hanna-Quinn), SIC/BIC(Schwarz/Bayes). The preferred model is the lowest of the options. Penalties when n>16 are SIC>HQ>AIC. Statistical properties: Unbiased estimators are the estimate of a coefficient, which will be unbiased if the mean of the estimate is equal to the original value. For this to apply to an estimated model, the following assumptions are required. 1.Linear in parameters: 2.Columns of X are linearly independent: No column can be written as a linear function of another. 3.Zero conditional mean assumption holds: Expected value of u given X is equal to 0 e.g. all correlation between errors and x variables = 0. Variance covariance matrices for a standard linear regression look like: These variances are used to find standard error and confidence interval of coefficients. These variances are found using Μ2 πΌπ+1π+1 Μ (π½Μπ ) = π πππ Where a is the element in row j+1 and column j+1 of the matrix (π ′ π)−1 , Which is also the estimator of variance in the error term ui. This variance can be rooted with the above equation to find the SE of betaJ. Gauss-Markov Theorm: Two estimates that are unbiased can be compared by their variance, where the lower variance results in it being a better estimator. The therom states that when the three assumptions hold for unbiased estimators, as well as an additional assumption that πππ(π’|π) = π 2 πΌπ = πππ(π¦|π),that errors are homoskedastic and serially uncorrleated, then the OLS estimator is BLUE (best linear unbiased estimator). Heteroskedasicity: When the variance differs between error terms. Can be caused by different kinds of things e.g greater variance in the amount of food consumed depending on income level. This results in OLS estimator no longer being BLUE, which means default standard errors are incorrect and t and F tests will be incorrect also. To detect HTSK, the graph can be viewed to see the variance in points on a scatterplot, or tests can be done. Breusch-Pagan test 1.Null is that all coefficients in aux regression = 0, alternative is that at least one does not = 0. 2.Estimate model to obtain residuals 3. Create an auxilary regression using the same variables but for u^2. Obtain the R^2 from that regression. 4.Create test statistic: Where n is sample size and q is number of regressors. (x is chi in tables) 5. Reject if calc is higher than crit from tables. Otherwise estimating aux regression and doing F test also works. White test is the same process, however alternative hypothesis is different where H1: the variance is a smooth function of xi1…xik. In auxiliary regression include values, squared values, and cross product. q is number of regressors in aux regression. Use white standard errors and wald statistic in place of standard errors and f stat for tests, however in small samples this can be misleading as they are asymptotic. Serial correlation: When there is correlation between error terms, common in time series data. Same consequences as heteroskedasticity. Can be observed from a line graph of errors by looking for a trend. Correlogram shows Corr(ut, ut-j). If column 1 has values outside the bands, reject the null that Corr(ut, ut-j)=0, and use column 3 to denote the order of the equation (ut-1,2,etc) with what’s outside the bands. Breusch-Godfrey test involves the null that coefficients to the lags of the error term is equal to 0, with the alternative stating that at least one is not = 0. Amount of lags used depends on data e.g. use 4 if using quarterly data. Estimate model equation to gain OLS residuals, then obtain R^2 from auxiliary regression, which is estimate of ut with original regressors and designated lags of ut. Test statistic: Where q is number of lags. Then find chi values from table and if BG is bigger null is false. HAC standard errors are used over HTSK errors, as they use a different formula for variance matrix. Modelling dynamics: Time series trends typically grow over time and can be exponential, can be shown by logging the dependent. Seasonality is where data shows trends in certain time frames, often removed from data and can be done so by using dummy variables and testing for significance. Structural change occurs when an event causes the trend to switch e.g. 2008. Static models do not include lags of regressors whereas Dynamic Models do. Autoregressive models are simply an intercept, lags of itself with coefficients, and an error term that is white noise (Mean = 0, Var=sigma^2, no correlation with lags) or iid (independent errors). AR(p) has p lags of itself. AR(1) is the simplest, with Stationary has a constant mean, constant variance, and covariance between set points is constant. And π0 πΈ(π¦) = 1 − π1 π2 πππ(π¦) = 1 − π12 π πΆππ£(π¦π‘, π¦π‘ − π) = π1 πππ(π¦π‘) π πΆπππ(π¦π‘, π¦π‘ − π) = π1 An ARDL(p,q) model is one with p lags of the dependent and q lags of the x. Long run effect of a one unit change in x is Large sample properties of OLS: Time series violates E(u|X)=0, thus is not unbiased, but instead consistent. Means as sample size grows, the mean and variance of the dependent become closer to the actual value. Creates new assumption 3 (E(ut|xt)=0) , which does not always hold with correlation between dependent and error term lags. Additionally, new assumption 4 (Var(ut|xt)=sigma^2 and E(ut us|xt xs) = 0 for all of t not +s) allows the OLS estimator to be asymptotically normal. Allows usage of OLS if n is large, otherwise use HAC standard errors. Nonstationary time series: Any time series with a trend is non-stationary. Deterministic trend depends on time, and the mean is time dependent. If detrended series is stationary, it will not include a unit root. Detrended series is a model of the error terms. Unit root/stochastic trend follows an AR(1) process, where mean=initial y value and Var(y) = (sigma^2)*t. A random walk model is a AR(1) process with no intercept, no lag coefficient, only stochastic, and a random walk model with drift is the same but with an intercept constant and deterministic trend. MLR1-5: 1.Linear in parameters 2.Random Sampling 3. No perfect collinearity implies Columns of X are linearly independent and X'X is invertible 4.Zero conditional mean implies Cov(u|x)=0, u and x are independent and orthogonal to one another, also results in E(X’u)=0 5.Homoskedascity