Given name:____________________ Student #:______________________ Family name:___________________ Section #:______________________ BUEC 333 MIDTERM Multiple Choice (2 points each) 1.) The Gauss-Markov Theorem says that when the 6 classical assumptions are satisfied: a.) The least squares estimator is unbiased b.) The least squares estimator has the smallest variance of all linear unbiased estimators c.) The least squares estimator has an approximately normal sampling distribution d.) The least squares estimator is consistent e.) None of the above 2.) Which of the following is not a linear regression model: a.) Yi X i X i2 i 2 b.) Yi cos( X i ) exp( X i ) i c.) log( Yi ) 0 1 log( X i ) i d.) Yi 0 1 log( X i ) i e.) none of the above 3.) The distribution of X when Y is known is called the _____ distribution of X, and is written as _____. These blanks are best filled with the following a.) conditional, p(X) b.) conditional, p(X|Y) c.) marginal, p(X) d.) marginal, p(X|Y) e.) none of the above 4.) In the linear regression model, the degrees of freedom a.) affects the precision of the coefficient estimates b.) is equal to the number of observations (n) minus 1 c.) affects the value of the coefficient estimates d.) all of the above e.) none of the above 5.) The power of a test statistic should become larger as the a.) the probability of a type II error becomes smaller b.) null becomes closer to being true c.) significance level becomes larger d.) sample size becomes larger e.) none of the above (note: typo in exam – credit given for either answer) 1 6.) The central limit theorem tells us that the sampling distribution of the sample mean: a.) is always normal b.) is always normal in large samples c.) approaches normality as the sample size increases d.) is normal in Monte Carlo simulations e.) none of the above 7.) Suppose [L(X), U(X)] is a 90% confidence interval for a population mean. Which of the following is/are true? a.) Pr L X U X 0.90 b.) Pr L X U X 0.90 c.) Pr L X Pr U X 0.10 d.) a and c e.) none of the above 8.) The sampling variance of the slope coefficient in the regression model with one independent variable: a.) will be smaller when there is less variation in ε b.) will be larger when there is less variation in ε c.) will be smaller when there is less variation in X d.) will be larger when there is less co-variation in ε and X e.) none of the above 9.) The central limit theorem tells us that the sampling distribution of least squares regression coefficient: a.) is always normal b.) is always normal in large samples c.) approaches a uniform distribution as the sample size increases d.) is normal in Monte Carlo simulations e.) none of the above 10.) In order for our independent variables to be labelled “exogenous” which of the following must be true: a.) E(εi) = 0 b.) Cov(Yi,εi) = 0 c.) Cov(εi,εj) = 0 d.) Var(εi) = σ2 e.) none of the above 11.) Which of the following statements is false regarding the Central Limit Theorem: a.) when the sample size is large, the mean of X-bar is approximately equal to the mean of X. b.) when the sample size is large, X-bar is approximately normally distributed. c.) when the sample size is large, the standard deviation of X-bar is approximately the same as the standard deviation of X. d.) all of the above e.) none of the above 2 12.) If the covariance between two random variables X and Y is zero then a.) X and Y are independent b.) Knowing the value of X provides no information about the value of Y c.) E(X) = E(Y) = 0 d.) a and b are true e.) none of the above 13.) If two random variables X and Y are independent, a.) their joint distribution equals the product of their conditional distributions b.) the conditional distribution of X given Y equals the joint distribution of X c.) their covariance is zero d.) a and c e.) a, b, and c 14.) If a random variable X has a normal distribution with mean μ and variance σ2 then: a.) X takes positive values only b.) ( X ) / 2 has a standard normal distribution c.) ( X ) /( s / n ) has a t distribution with n-1 degrees of freedom d.) ( X ) 2 / 2 has a chi-squared distribution with n degrees of freedom e.) none of the above 15.) Suppose you want to test the following hypothesis at the 5% level of significance: H0: μ = μ0 H1: μ ≠ μ0 Which of the following is/are true? a.) the probability of a Type I error is 0.05 b.) the probability of a Type I error is 0.025 c.) the t statistic for this test has a t distribution with n degrees of freedom d.) a and c e.) b and c 3 Short Answer #1 (10 points – show your work!) Consider the case of a uniformly distributed random variable where each outcome (1, 2, 3, 4) has an equal chance of occurring. It can be easily shown that the population mean and variance of this random variable are 2.50 and 1.25, respectively. a.) Suppose that a random number generator provides the following sequence of numbers, 2-1-4-1. What is the mean and variance of this sample? b.) What is the sampling distribution of the sample mean calculated above? Provide a verbal interpretation of the sampling distribution of the sample mean. c.) Compute the value of the t-statistic for testing the null hypothesis that μ = 2.5. Hint: the square root of two is approximately equal to 1.40. d.) The critical value for a t distribution with 3 degrees of freedom and a 0.20 level of significance in the presence of a two-sided alternative is equal to 1.638. Can you reject the null hypothesis that μ = 2.5 at the 20% level of significance? What about at the 10% level of significance? a.) X 1 (2 1 4 1) Xi 2 n 4 2 1 (2 2)2 (1 2) 2 (4 2) 2 (1 2) 2 X X i n 1 3 2 2 2 2 (0) (1) (2) (1) 0 1 4 1 6 s2 2 3 3 3 s2 b.) The mean is equal to the population mean of 2.50. The sampling variance is equal to the population variance divided by the number of observations or 1.25/4 = 0.3125. This represents the set of possible values that the statistic might take, and the probabilities associated with each of them. It measures uncertainty over the possible value that the statistic might take in repeated samples from the same population. c.) t X 2.0 2.5 0.5 1.0 0.7 ~ t3 1.4 / 2 1.4 s/ n 2/ 4 d.) In this case, you will fail to reject the null that μ = 2.5 at the 20% level of significance. And as the critical value will only increase with the level of significance, we will fail to reject at the 10% level as well. 4 Page intentionally left blank. Use this space for rough work or the continuation of an answer. 5 Short Answer #2 (20 points – show your work!) A researcher is using data for a sample of 1,000 male wage-earners to investigate the relationship between hourly wage rates, Yi (measured in dollars per hour), and length of work experience with a particular firm, Xi (measured in years). Analysis of the data in Excel produces the following sample information: Y 12,500 n 1, 000 Cov( X i , Yi ) 36 e 2 i X i Var(X i ) 60 i 3,500 Var(Yi ) 60 15, 000 Use the information above to answer the following questions. Show all formulas and calculations, using the following approximation, n = (n – 1) = (n – 2). a.) What are the OLS estimates of the constant term (β0) and the slope coefficient (β1)? b.) Interpret the estimate of the slope coefficient you calculated in part a.). c.) Calculate an estimate of the variance of the error term in the population regression model. d.) Calculate an estimate of the variance of the estimated slope coefficient. e.) Compute the value of R2 and briefly explain what the calculated value of R2 means. a.) The easiest way forward is in remembering that ˆ1 ( X i X )(Yi Y ) (X i X )2 (X X )(Yi Y ) Cov( X i , Yi ) 36 (n 1) 0.6 2 Var ( X i ) 60 (Xi X ) (n 1) i From the formula sheet, we also know that ˆo Y ˆ1 X Yi n 0.6* X n i 12,500 3,500 0.6* 1, 000 1, 000 ˆo 12.5 0.6*3.5 12.5 2.1 10.4 b.) The estimate of 0.6 means than an increase in length of work experience by one year is associated on average with an increase in male wage-earners hourly wage rate of 0.60 dollars per hour, or 6o cents per hour. Critically, we are not holding constant the variation in other potential independent variables. c.) Here, we make use of the equation for s 2 e i 2 n k 1 = 15, 000 15, 000 15.0 1, 000 2 1, 000 6 Page intentionally left blank. Use this space for rough work or the continuation of an answer. d.) Here, we make use of the equation for ˆ ˆ Var 1 ei 2 / n k 1 Xi X X 2 = 15.0 Xi X 2 where X (n 1) *Var ( X i ) 1, 000*60 60, 000 2 i ˆ ˆ Var 1 15 1 60, 000 4, 000 e.) Here, we make use of the equation for ei2 ESS TSS RSS RSS R 1 1 where 2 TSS TSS TSS Yi Y 2 Y Y i R2 1 2 (n 1) *Var (Yi ) 1, 000*60 60, 000 15, 000 1 0.25 0.75 60, 000 The value of 0.75 indicates that 75% of the observed variation in Yi is “explained” by variation in worker’s experience with a particular firm. 7 Short Answer #3 (20 points – show your work!) Consider the standard univariate population regression model: Yi 0 1 X i i Assume that all of the classical assumptions are satisfied. Show that the OLS estimator ˆ1 is an unbiased estimator of 1 . Hint: you should make use of the fact that Yi 0 1 X i i Y 0 1 X From the formula sheet, we know that ˆ1 X X Y Y X X i i 2 i Here, we can make use of the hint in the second expression of the numerator: ˆ1 X i X 0 1 X i i 0 1 X X i X 2 Collecting terms, we find that ˆ1 ˆ1 X i X 1 X i X X ˆ1 1 X 1 X i X i i X 2 i 2 X 2 X X X X i i 2 i X X X X i i 2 i Finally, we know that unbiasedness entails that the expected value of an estimator should equal the true parameter value we are interested in. In this case, X X i i E ( ˆ1 ) E ( 1 ) E 2 X X i The first term is, of course, a constant and just returns the expression in the parenthesis. 8 Page intentionally left blank. Use this space for rough work or the continuation of an answer. The second term can be expressed as X X i i X X Cov( X i , i ) ( n 1) i i E E E 2 2 Var ( X i ) X X X X i i (n 1) Since all of the classical assumptions including #3 regarding exogeneity are satisfied, we have Cov( X i , i ) 0 E ( ˆ1 ) 1 E 1 E 1 Var ( X ) Var ( X ) i i 9 Short Answer #4 (20 points – show your work!) You wish to determine if the application of fertilizer and water affect plant growth. To that end, you run an experiment where you randomly apply different amounts of fertilizer and water to your hemp plants. You then use regression analysis to determine how they affect the yield of a plant in grams. Yield is measured in grams. Fertilizer is measured in kilograms and ranges in value from (0.0 to 1.0) while water is measured in liters per week. The standard errors of the regression coefficients are reported in parentheses. You get the following results: Yieldi 12.1 5.5* Fertilizeri 12*Wateri (0.4) (0.5) n 140 RSS=1234 (2.7) R 0.76 2 a.) Do you think this type of analysis will give you an unbiased estimate of how much adding fertilizer increases your crop yield? Why or why not? b.) How do you interpret the constant in this case? Explaining why in most instances we ignore such results. c.) How much does fertilizer increase plant growth? d.) What is the regression’s predicted yield for a plant exposed to 50 kilograms of fertilizer? Do you think this prediction is reliable? Why or why not? a.) The unbiasedness of our estimator will depend on the first three assumptions of the classical linear regression model being satisfied. That is, we need our model to be correctly specified, the error terms to have a zero mean, and our independent variables to be exogenous with respect to the error term. It seems to be the case that our estimates will be unbiased given the controlled setting of the environment, in particular, the ability to control for omitted variables which could be correlated with our independent variables. b.) This suggests a predicted yield of 12.1 in the case where zero fertilizer is applied and a plant is completely deprived of water. Even our basic understanding of biology as economists would suggest that this type of interpretation should not be pushed too far as water, in particular, is necessary for plant growth and so it is unreasonable expect any yield in its absence. This reflects the fact that the constant term absorbs the constant effect of any omitted variables. c.) The results suggest that applying an entire kilogram of fertilizer in a week is leads on average to an additional 5.5 grams of yield from hemp plants. d.) The predicted value would be 12.1 plus 5.5 * 50 which equals 287.1 grams. But we probably should not take this at face value. First of all, this is for a level of fertilizer which is well outside of our sample of values for X which is bounded by zero and one. Second of all, we would expect some sort of diminishing marginal returns to set in above some threshold level. It is unlikely then that this would be a very accurate prediction. Finally, this also implicitly assumes no application of water which, even not being botanists, we can assume is bad for plant yields. 10 Page intentionally left blank. Use this space for rough work or the continuation of an answer. 11 Useful Formulas: E( X ) p x i i 2 i X 2 pi Pr( X x, Y y) Pr(Y y | X x) Pr( X x) k Pr( X x) Pr X x, Y yi i 1 m E Y E Y | X xi Pr X xi k E Y | X x yi PrY yi | X x i 1 i 1 k Var (Y | X x) yi E Y | X x PrY yi | X x Ea bX cY a bE( X ) cE (Y ) 2 i 1 Cov( X , Y ) x j X yi Y PrX x j , Y yi k m i 1 j 1 Cov X , Y Var X Var Y Corr X , Y XY Cova bX cV ,Y bCov( X ,Y ) cCov(V ,Y ) E XY Cov( X ,Y ) E( X ) E(Y ) t X 1 xi x 2 s n 1 i 1 2 i i 1 Z s/ n n n Var a bY b 2Var (Y ) Var aX bY a 2Var ( X ) b 2Var (Y ) 2abCov( X ,Y ) E Y 2 Var (Y ) E (Y ) 2 x k i 1 i 1 1 X n x Var ( X ) E X X k s XY X 2 X ~ N , n rXY s XY / s X sY n 1 xi x yi y n 1 i 1 X n For the linear regression model Yi 0 1 X i i , ˆ1 i 1 i X Yi Y n X i 1 i X 2 & βˆ0 Y ˆ1 X Yˆi ˆ0 ˆ1 X 1i ˆ2 X 2i ˆk X ki e2 ESS TSS RSS RSS i i R 1 1 2 TSS TSS TSS Yi Y e / (n k 1) R 1 Y Y / (n 1) e / n k 1 ˆ ˆ Var X X 2 s2 e 2 i 2 where E s 2 2 n k 1 i i 2 i i 2 i i 2 i i 1 2 i Z ˆ j H Var[ ˆ j ] ~ N 0,1 Pr[ˆ j t* /2 s.e.(ˆ j ) j ˆ j t* /2 s.e.(ˆ j )] 1 e e d e T t 2 t T 2 t 1 t t 1 t F i ˆ1 H ~ tn k 1 s.e.( ˆ1 ) ESS / k ESS (n k 1) RSS / (n k 1) RSS k 2 2(1 ) 12