Cross-Sectional, Time Series, and Panel Data Cross-sectional data consists of multiple entities observed at a single time period. Time series data consists of a single entity observed at multiple time periods. Panel data (also known as longitudinal data) consists of multiple entities, where each entity is observed at two or more periods. Expected Value and the Mean Suppose the random variable Y takes on k possible values y1, ... yk , where y1 denotes the first value, y2 denotes the second value, and so on, and the probability that Y takes on y1 is p1 , the probability that Y takes on y2 is p2 , and so forth. The expected value of Y , denoted E (Y ) , is k E (Y ) y1 p1 y2 p2 ... yk pk yi pi i 1 The expected value of Y is also called the mean of Y or the expectation of Y and is denoted Y . Variance and Standard Deviation The variance of the discrete random variable Y , denoted Y2 , is k Y2 var(Y ) E[(Y Y ) 2 ] ( yi Y ) 2 pi i 1 The standard deviation of Y is Y , the squared root of the variance. The units of the standard deviation are the same as the units of Y . Means, Variances, and Covariances of Sums of Random Variables Let X , Y , and V be random variables, let X and X2 be the mean and variance of X , let XY be the covariance between X and Y (and so forth for the other variables), and let a , b , and c be constants. The following facts follow from the definitions of the mean, variance and covariance. E (a bX cY ) a b X cY , var(a bY ) b 2 Y2 , var(aX bY ) a 2 X2 2ab XY b 2 Y2 , E (Y 2 ) Y2 Y2 , Cov(a bX cV , Y ) b XY c VY , and E ( XY ) XY X Y Corr ( X , Y ) 1 and | XY | X2 Y2 Estimators and Estimates An estimator is a function of a sample of data to be drawn randomly from a population. An estimate is the numerical value of the estimator when it is actually computed using data from a specific sample. An estimator is a random variable because of randomness in selecting the sample, while an estimate is a nonrandom number. Bias, Consistency, and Efficiency Let Y be an estimator of Y . Then The bias of Y is E ( Y ) Y Y is an unbiased estimator of Y if E ( Y ) Y p Y is a consistent estimator of Y if Y Y Let Y be another estimator of Y , and suppose that both Y and Y are unbiased. Then Y is said to be more efficient than Y if var ( Y ) var( Y ) . The terminology of Hypothesis Testing The prespecified rejection probability of a statistical hypothesis test under the null hypothesis is the significance level of the test. The critical value of the test statistic is the value of the statistic for which the test just rejects the null hypothesis at the given significance level. The set of values of the test statistic for which the test rejects the null is the rejection region, and the set of values of the test statistic for which it does not reject the null hypothesis is the acceptance region. The probability that the test actually incorrectly rejects the null hypothesis when the null is true is the size of the test, and the probability that the test correctly rejects the null hypothesis when the alternative is true is the power of the test. The p-value is the probability of obtaining a test statistic, by random sampling variation, at least as adverse to the null hypothesis value as is the statistic actually observed, assuming the null hypothesis is correct. Equivalently, the pvalue is the smallest significance level at which you can reject the null hypothesis. Example: Testing the Hypothesis E (Y ) Y ,0 Against the Alternative E (Y ) Y ,0 Computer the standard error of Y , SE (Y ) Compute the t-statistic t Y Y ,0 SE (Y ) Reject the null hypothesis if | t | 1.96 Confidence Interval for the Population Mean A 95% two-sided confidence interval for Y is an interval constructed so that it contains the true value of Y in 95% of its applications. When the sample size n is large, 90%, 95% and 99% confidence interval for Y are 90% confidence interval for Y = {Y 1.64SE (Y )} . 95% confidence interval for Y = {Y 1.96SE (Y )} . 99% confidence interval for Y = {Y 2.57 SE (Y )} . Terminology for the Linear Regression Model with a Single Regressor The linear regression model is: Yi 0 1 X i ui , where: the subscript i runs over observations, i 1,...n; Yi is the dependent variable, the regressand, or simply the left-hand variable; X i is the independent variable, the regressor, or simply the right-hand variable; 0 1 X i is the population regression line or population regression function; 0 is the intercept of the population regression line; 1 is the slope of the population regression line; and ui is the error term. The OLS Estimator, Predicted Values, and Residuals The OLS estimators of the slope 1 and the intercept 0 are: n 1 (X i 1 i X )(Yi Y ) n (X i 1 i X )2 s XY s X2 0 Y 1 X The OLS predicted values Yi and residuals u i are: Yi 0 1 X i , i 1,...n ui Yi Yi , i 1,...n The estimated intercept ( 0 ), slope ( 1 ) and residuals ( ui ) are computed from a sample of n observations of X i and Yi , i 1,...n . These are estimates of the unknown true population intercept ( 0 ), slope ( 1 ) and residuals ( ui ). The Least Squares Assumptions Yi 0 1 X i ui , i 1,...n , where: 1. The error term ui has conditional mean zero given X i , that is E (ui | X i ) 0 ; 2. ( X i , Yi ) , i 1,...n are independent and identically distributed (i.i.d.) draw from their joint distribution; and 3. ( X i , ui ) have nonzero finite fourth moments. General Form of the t-Statistic In general, the t-statistic has the form: t=(estimator – hypothesized value) / (standard error of the estimator). Testing the Hypothesis 1 1,0 Against the Alternative 1 1,0 1. Computing the standard error of 1 , SE ( 1 ) 2 , where 1 n 2 1 1 ( X i X ) 2 ui2 1 n 2 i 1 * 1 n n [ ( X i X ) 2 ]2 n i 1 2. Computer the t-statistic t 1 1,0 SE ( 1 ) 3. Computing the p-value. Reject the hypothesis at 5% significance level if the p-value is less than 0.05 or, equivalently, if the absolute t-value is greater than 1.96. Confidence Interval for 1 A 95% two-sided confidence interval for 1 is an interval that contains the true value of 1 with a 95% probability, that is, it contains the true value of 1 in 95% of all possible randomly drawn samples. Equivalently, it is also the set of values of 1 that cannot be rejected by a 5% two-sided hypothesis test. 95% confidence interval for 1 = ( 1 1.96SE ( 1 ),1 1.96SE ( 1 )) . The R2 The Regression R2 is the fraction of the sample variance of Yi explained by X i . The definitions of predicted value and residuals (Key concept 10) allow us to write the dependent variable Yi as the sum of the predicted value Y i , plus the residuals u i : Yi Y i u i Define the explained sum of squares as n ESS (Y i Y ) 2 i 1 and the total of sum of squares as n (Y Y ) i 1 2 i so that the sum of squares residuals is n SSR u i TSS ESS 2 i 1 2 The R is the ratio of explained sum of squares to the total sum of squares ESS SSR R2 1 TSS TSS Note that the R2 ranges between 0 and 1. The Standard Error of the Regression The standard error of regression (SER) is an estimator of the standard deviation of the regression error ui . Because the regression errors u1...un are not observed, the SER is computed using the OLS residuals u1...u n . The formula for the SER is 1 n 2 SSR SER su where su2 ui n 2 n 2 i 1 Omitted Variable Bias in Regression with a Single Regressor Omitted variable bias is the bias in the OLS estimator that arises when the regressor, X , is correlated with an omitted variable. For omitted variable bias to occur, tow conditions must be true: 1. X is correlated with the omitted variable; and 2. the omitted variable is a determinant of the dependent variable Y . (The Mozart Effect: Omitted Variable Bias?) The Multiple Regression Model The multiple regression model is Yi 0 1 X1i 2 X 2i ... k X ki ui i 1,..., n. where a. Yi is i th observation on the dependent variable; X1i , X 2i ... X ki are the i th observations on each of the k regressors; and ui is the error term. b. The population regression line is the relationship that holds between Y and X ' s on average in the population: E (Y | X1i x1 , X 2i x2 ,..., X ki xk ) 0 1 x1 ... k xk . c. 1 is the slope coefficient on X1 , 2 is the slope coefficient on X 2 ,etc. The coefficient 1 is the expected change in Yi resulting from change of X1 by one unit, holding constant X 2i ... X ki . The coefficients on the other X ' s are interpreted similarly. d. The intercept 0 is the expected value of Y when all the X ' s equal zero. The intercept can be thought of as the coefficient on a regressor, X 0 , that equals one for all i . The OLS Estimators, Predicted Values, and Residuals in the Multiple Regression Model The OLS estimators 0, 1, ... k , are the values of b0 , b1 ,...bk that minimize the n sum of squared prediction mistakes (Y b i 1 i 0 b1 X 1i b2 X 2i ... bk X ki ) 2 . The OLS predicted value Y i and residuals u i are: Y i 0 1 X 1i 2 X 2i ... k X ki ,i 1,..., n and u i Yi Yi ,i 1,..., n . The OLS estimators 0, 1, ... k , and residuals u i are computed from a sample of n observations of ( X1i ,... X ki , Yi ),i 1,..., n . There are estimators of the unknown true coefficients 0 , 1 ,... k and error term, ui . The Least Squares Assumptions in the Multiple Regression Model Yi 0 1 X1i 2 X 2i ... k X ki ui , i 1,..., n , where a. ui has conditional mean zero given X 1i , X 2i ,... X ki , that is E (ui | X 1i , X 2i ,... X ki ) 0 b. ( X1i , X 2i ,... X ki , Yi ), i 1,..., n are independently and identically distributed draws from their joint distribution; c. ( X1i , X 2i ,... X ki , ui ) has nonzero finite fourth moments; and d. there is no perfect multicollinearity. Testing the Hypothesis j j ,0 Against the Alternative j j ,0 1.Computing the standard error of j , SE ( j ) 2.Computer the t-statistic t j j ,0 SE ( j ) 3.Computing the p-value. Reject the hypothesis at 5% significance level if the p-value is less than 0.05 or, equivalently, if the absolute t-value is greater than 1.96. R2 The “Adjusted R2 ”In multiple regression, the R2 increases whenever a regressor is added, unless the new regressor is perfectly multicollinear with the original regressors. To see this, think about starting with one regressor and then adding a second. When you use OLS to estimate the model woth both regressors, OLS finds the value of the coefficients that minimize the sum of squared residuals. If OLS happens to choose the coefficient on the new regressor to be exactly zero, then the SSR will be the same whether or not the second variable is included in the regression. But if OLS choose any value other than zero, then it must be that this value reduces the SSR relative to the regression that excludes this regressor. In practice, it is extremely unusual for an estimated coefficient to be exactly zero, so in general the SSR will decrease when a new regressor is added. But this means that R2 generally increases when a new regressor is added. An increase in the R2 does not mean that adding a variable actually improves the fit of the model. One way to correct that is to deflate the R2 by some factor, and this is what the adjusted R2 or R 2 , does. The R 2 is modified version of the R2 that does not necessarily increase when a new regressor is added. The R 2 is R2 1 s2 n 1 SSR 1 u2 n k 1 TSS sY There are three useful things to know about R 2 . n 1 First, is always greater than 1, so R 2 is always less than R2 . n k 1 Second, adding a regressor has two opposite effects on the R 2 . On the one n 1 hand the SSR falls, which increases R 2 . On the other hand, the factor n k 1 increases. Whether the R 2 increases or decreases depends on which of these two factors is stronger. Third, the R 2 can be negative. This happens when the regressors, taken together, reduce the sum of squared residuals by such a small amount that this n 1 reduction fails to offset the factor . n k 1 R2 and R 2 : What they tell you – and What They Don’t R2 and R 2 tell you whether the regressors are good at predicting, or explaining the values of the dependent variable in the sample of data on hand. If R2 or( R 2 ) is nearly one, then the regressors produce good predictions of the dependent variable in that sample, in the sense that the variance of the OLS residual is small compared to the variance of the dependent variable. If R2 or( R 2 ) is nearly zero, the opposite is true. R2 and R 2 do NOT tell you 1. an included variable is statistically significant; 2. the regressors are a true cause of the movements in the dependent variable; 3. there is omitted variable bias; or 4. you have chosen the most appropriate set of regressors. Logarithms in Regression: Three Cases Logarithms can be used to transform the dependent variable Y , an independent variable X , or both (but they must be positive). The following table summarizes these three cases and the interpretation of the regression coefficient 1 . In each case, 1 can be estimated by applying OLS after taking the logarithm of the dependent/independent variable Case I Regression Specification Yi 0 1 ln( X i ) ui II ln(Yi ) 0 1 X i ui III ln(Yi ) 0 1 ln( X i ) ui Interpretation of 1 A 1% increase in X is associated with a change in Y of 0.011 A change in X by 1 unit ( X 1 ) is associated with a 100 1 % change in Y A 1% increase in X is associated with a 1 % in Y , so 1 is the elasticity of Y with respect to X .