SAMPLE MULTIPLE CHOICE QUESTIONS FOR MIDTERM 1.) Suppose the monthly demand for tomatoes (a perishable good) in a small town is random. With probability 1/2, demand is 50; with probability 1/2, demand is 100. You are the only producer of tomatoes in this town. Tomatoes sell for a fixed price of $1, cost $0.50 to produce, and can only be sold in the local market. If you produce 60 tomatoes, your expected profit is: a) $15 b) $25 c) $45 d) $50 e) none of the above E(PROFIT) = E(REVENUE – COST) = ½*($50-$30) + ½*($60-$30) = ½*$20 + ½*$30 = $25 2.) Suppose you have the following information about the cdf of a random variable X, which takes one of 4 possible values: Value of X Cdf 1 0.25 2 0.4 3 0.8 4 Which of the following is/are true? a) Pr(X = 2) = 0.4 b) E(X) = 2.5 c) Pr(X = 4) = 0.2 d) all of the above e) none of the above The cdf tells you what is the cumulative probability of observing a value of X which is less than or equal to Xi. Thus, there is a 80% chance that you will observe a value of X less than or equal to 3 or a 100% chance that you will observe a value of X less than or equal to 4. So a) is incorrect. By subtracting the cumulative probabilities from one another, you can construct a pdf. Thus, Pr(X=1) = 0.25, Pr(X=2) = 0.15, Pr(X=3) = 0.40, and Pr(X=4) = 0.20. With the pdf in hand, you can calculate E(X) = 0.25*1 + 0.15*2 + 0.40*3 + 0.20*4 = 2.55. So b) is incorrect and the only correct answer is c). 3.) If the covariance between two random variables X and Y is zero then a) X and Y are independent b) Knowing the value of X provides no information about the value of Y c) E(X) = E(Y) = 0 d) a and b are true e) none of the above Remember that independence implies zero covariance but not the other way around, so it cannot be a). Likewise b) is the very definition of independence and c) is just nonsense. 4.) If two random variables X and Y are independent, a) their joint distribution equals the product of their marginal distributions b) the conditional distribution of X given Y equals the marginal distribution of X c) their covariance is zero d) a and c e) a, b, and c Again, this answer follows from the definition of independence given in Lecture 3a. 5.) Suppose you have a random sample of 10 observations from a normal distribution with mean = 10 and variance = 2. The sample mean (x-bar) is 8 and the sample variance is 3. The sampling distribution of xbar has a) mean 8 and variance 3 b) mean 8 and variance 0.3 c) mean 10 and variance 0.2 d) mean 10 and variance 2 e) none of the above The correct answer is c.) Note how the mean and variance of the sampling distribution of x-bar are given by the population quantities and not the sample characteristics. 6.) If q is an unbiased estimator of Q, then: a) Q is the mean of the sampling distribution of q b) q is the mean of the sampling distribution of Q c) Var[q] = Var[Q] / n where n = the sample size d) q = Q e) a and c We define an unbiased estimator as one for which E(q)-Q = 0 or E(q)=Q. So a) is correct whereas b) is incorrect. The third statement is incorrect for a number of reasons, one of which is that Q is a constant and has a zero variance. Finally, d) is nonsense. 7.) Suppose you compute a sample statistic q to estimate a population quantity Q. Which of the following is/are false? [1] the variance of Q is zero [2] if q is an unbiased estimator of Q, then q = Q [3] if q is an unbiased estimator of Q, then q is the mean of the sampling distribution of Q [4] a 95% confidence interval for q contains Q with 95% probability a) 2 only b) 3 only c) 2 and 3 d) 2, 3, and 4 e) 1, 2, 3, and 4 Since Q simply exists and is fixed number or constant, it has no variance. So [1] is correct. Again, [2] is nonsense whereas [3] confuses what is from the sample and what is from the population and is incorrect. Finally, [4] misinterprets what a confidence interval captures. So d) is the correct answer. 8.) The law of large numbers says that: a) the sample mean is a biased estimator of the population mean in small samples b) the sampling distribution of the sample mean approaches a normal distribution as the sample size approaches infinity c) the behaviour of large populations is well approximated by the average d) the sample mean is an unbiased estimator of the population mean in large samples e) none of the above The law of large numbers states that as n approaches infinity, q approaches Q. It definitely has some implications about the likely biasedness of an estimator, but it is really a statement about the consistency of an estimator (where consistency is defined by the condition given about q approaching Q in the limit). 9.) Suppose you draw a random sample of n observations, X1, X2, …, Xn, from a population with unknown mean μ. Which of the following estimators of μ is/are biased? a) the first observation you sample, X1 b) X2 c) X 2 s 2 / n d) b and c e) a, b, and c We have seen before that a) is actually unbiased. The second line seems like it might be unbiased since by taking the square root of a square you just arrive back at X-bar. A problem comes up though if your original X-bar is negative, say -10. Squaring -10 and then taking the square root you arrive at 10. This is problematic and will lead to b) being a biased estimator because we cannot say that its expected value is indeed μ. Likewise for c) which compounds the problems of b) by subtracting out a constant, an operation which we know will impart bias. So the correct answer is d) 10.) The significance level of a test is the probability that you: a) reject the null when it is true b) fail to reject the null when it is false c) reject the null when it is false d) fail to reject the null when it is true e) none of the above This is simply the definition of the significance level of a test. 11.) Suppose you want to test the following hypothesis at the 5% level of significance: H0: μ = μ0 H1: μ ≠ μ0 Which of the following is/are true? a) the probability of a Type I error is 0.05 b) the probability of a Type I error is 0.025 c) the t statistic for this test has a t distribution with n-1 degrees of freedom d) a and c e) b and c Again, a) follows from the definition of the significance level of a test. The second option is eliminated as it contradicts a). The third answer draws from Lecture 4, Homework #1, and Tutorial #3. It is also true so that the correct answer is d) 12.) Suppose [L(X), U(X)] is a 95% confidence interval for a population mean. Which of the following is/are true? a) Pr L X X U X 0.90 b) PrL X X U X 0.95 c) PrX L X PrU X X 0.05 d) a and c e) none of the above The key here is that X-bar is not the population mean, so that our confidence interval has nothing to say about it. Therefore, e) is the correct answer. 13.) Which of the following is a linear regression model: a) Yi 1 X i 2 X i2 i b) log( Yi ) 0 1 log( X i ) i c) Yi 0 1e X i i d) all of the above e) none of the above The general rule is that if you can substitute a generic expression for your independent variables (but not your parameters), your regression model is linear if it looks like Yi i X i i So in the first case, you can substitute Z for X-squared; in the second case, you can substitute Z for the log of X; and in the third case, you can substitute Z for the exponential of X. Therefore, they are all acceptable linear regression models. 14.) In the linear regression model, the stochastic error term: a) measures the difference between the dependent variable and its predicted value b) measures the difference between the independent variable and its predicted value c) is unbiased d) a and c e) none of the above The stochastic error term measures the difference between Y and the conditional expectation of Y. So a) and b) are incorrect whereas as c) is nonsense (the error term is not an estimator). 15.) In the linear regression model, the least squares estimator a) minimizes the sum of squared residuals b) is unbiased c) is most efficient among the class of linear estimators d) maximizes the value of R2 e) all of the above We know by definition that the OLS estimator minimizes the sum of squared residuals (thus, it produces the “least squares”). We have also noted that it has the desirable property of being both unbiased and most efficient. Finally, we have seen in Lecture 7b that since it minimizes the sum of squared residuals (or in other words, RSS) it automatically maximizes the value of R2. So, the answer must be e). 16.) Suppose that in the simple linear regression model Yi = β0 + β1Xi + εi on 100 observations, you calculate that R2= 0.5, the sample covariance of X and Y is 10, and the sample variance of X is 15. Then the least squares estimator of β1 is: a) not calculable using the information given b) 1/3 c) 1 / 3 d) 2/3 e) none of the above In Top Hat Monocle, we saw how the least squares estimator of beta-one in this case of a single independent variables can be expressed as the ratio of the covariance of X and Y to the variance of X. Thus, 10/15 = 2/3 and the correct answer is d). 17.) Suppose upon running a regression, EViews reports a value of the explained sum of squares as 1648 and an R2 of 0.80. What is the value of the residual sum of squares in this case? a.) 0 b) 412 c) 1318.4 d) unknown as it is incalculable e) none of the above Since R2 is defined as ESS/TSS = (TSS – RSS)/TSS = 1 – RSS/TSS, we can solve for RSS by substitution. That is, 0.80 = 1648/TSS which implies that TSS = 2060 and 0.80 = 1 – RSS/2060 which implies that RSS = 412. So, the correct answer is b). 18.) In the linear regression model, adjusted R2 measures a) the proportion of variation in Y explained by X b) the proportion of variation in X explained by Y c) the proportion of variation in Y explained by X, adjusted for the number of independent variables d) the proportion of variation in X explained by Y, adjusted for the number of independent variables e) none of the above This is simply the definition of adjusted R-squared. 19.) In the linear regression model, the degrees of freedom a) is equal to the number of observations (n) minus 1 b) affects the precision of the coefficient estimates c) affects the value of the coefficient estimates d) all of the above e) none of the above We know that the degrees of freedom in the linear regression model will actually be equal to n – k – 1 where k equals the number of parameters (or coefficients on the independent variables). We also know that the value of the coefficient estimates should not depend on the degrees of freedom. This leaves us with b) which is a claim made in Lecture 8. 20.) In the Capital Asset Pricing Model (CAPM), a) β measures the sensitivity of the expected return of a portfolio to systematic risk b) β measures the sensitivity of the expected return of a portfolio to specific risk c) β is greater than one d) α is less than zero e) R2 is meaningless This is simply the definition of beta in the CAPM. SAMPLE SHORT ANSWER QUESTIONS FOR MIDTERM 1.) Suppose the monthly demand (x) for a perishable good is a random variable that takes one of six possible values. The pdf of monthly demand is f(x): x f(x) 100 0.10 200 0.10 300 0.20 400 0.35 500 0.20 600 0.05 The good sells for a fixed price of $15 per unit, and production costs are $10 per unit. Therefore, the firm earns $5 profit on each unit sold and loses $10 on each unit that goes unsold. If the firm brings 400 units to market, what is the expected profit? What is the variance of profit? Note: expected revenue is not the same thing as expected demand times price. It is constrained by the fact you can never sell more than 400 units, no matter what the demand. E(PROFIT) = E(REVENUE - COST) E(REVENUE - COST) = 0.10*($15*100 - $4000) + 0.10*($15*200 - $4000) + 0.20*($15*300 - $4000) + 0.35*($15*400 - $4000) + 0.20*($15*400 - $4000) + 0.05*($15*400 - $4000) E(REVENUE - COST) = 0.10*(-$2500) + 0.10*(-$1000) + 0.20*($500) + 0.35*($2000) + 0.20*($2000) + 0.05*($2000) E(REVENUE - COST) = -$250 - $100 + $100 + $700 + $400 + $100 = $950 Equivalently, E(REVENUE - COST) = E(REVENUE) – E(COST) = E(REVENUE) – COST(Q=400) E(REVENUE) – $4000 = 0.10*($15*100) + 0.10*($15*200) + 0.20*($15*300) + 0.35*($15*400) + 0.20*($15*400) + 0.05*($15*400) - $4000 E(REVENUE) – $4000 = $150 + $300 + $900 + $2100 + $1200 + $300 - $4000 = $950 Var(PROFIT) = 0.10*(-$2500-$950)2 + 0.10*(-$1000-$950)2 + 0.20*($500-950)2 + 0.35*($2000-$950)2 + 0.20*($2000-$950)2 + 0.05*($2000-$950)2 Var(PROFIT) = 1190250 + 380250 + 40500 + 385875 + 220500 + 55125 = $22272500 Note: the standard deviation is a much more reasonable and easily interpreted number, $1507.48. 2.) Suppose the price of a stock X is a random variable. On any day, its value may increase, decrease, or not change at all. The distribution of daily price changes is as follows: Price Change ($) Probability -1.00 0.283 0.00 0.25 0.50 1.00 0.10 a) What is the probability of a $0.5 increase in price? b) Draw the pdf and cdf of price changes. c) What is the expected price change? d) What is the variance of the price changes? e) Suppose the stock’s price today is $10. What is the expected value of tomorrow’s price? What is its variance? a) Pr($0.5) = 1 - 0.283 - 0.25 - 0.10 = 0.367 b) First, the pdf 1.0 0.8 0.6 0.4 0.2 0.0 -1.00 0.00 0.50 1.00 0.00 0.50 1.00 Now, the cdf 1.0 0.8 0.6 0.4 0.2 0.0 -1.00 c) E(Price change) = -1.00*0.283 + 0.00*0.25 + 0.50*0.367 + 1.00*0.10 E(Price change) = -0.283 + 0.00 + 0.1835 + 0.10 = $0.0005 d) Var(Price change) = (-1.00-0.0005)2*0.283 + (0.00-0.0005)2*0.25 + (0.50-0.0005)2*0.367 + (1.00-0.0005)2*0.10 Var(Price change) = 0.2833 + 0.0000 + 0.0916 + 0.0999 = $20.4748 Again, the variance dwarfs the mean. e) For the expected value: E(Tomorrow’s price) = E($10 + price change) = E($10) +E(price change) = $10.0005 For the variance: Var(Tomorrow’s price) = (9.00-10.0005)2*0.283 + (10.00-10.0005)2*0.25 + (10.50-10.0005)2*0.367 + (11.00-10.0005)2*0.10 Var(Tomorrow’s price) = 0.2833 + 0.0000 + 0.0916 + 0.0999 = $20.4748 Note: this is exactly what we had in part d) as adding a constant changes nothing about the underlying variance. Now suppose the distribution above only applies when the weather is sunny. When the weather is rainy, the distribution is: Price Change ($) Probability -1.00 0.50 0.00 0.20 0.50 0.20 1.00 0.10 f) What is the expected price change on a rainy day? g) Suppose the probability of rain is 0.4, and the probability of sun is 0.6. What is the expected price change? f) E(Price change | Rainy day) = -1.00*0.50 + 0.00*0.20 + 0.50*0.20 + 1.00*0.10 E(Price change | Rainy day) = -0.50 + 0.00 + 0.10 + 0.10 = -$0.30 g) From the law of iterated expectations, an unconditional expectation is just a weighted average of conditional expectations where the weights are the probabilities of outcomes on which we are conditioning, so E(Price change) = E(Price change | sunny day) * Pr(sunny day) + E(Price change | rainy day) * Pr(rainy day) E(Price change) = $0.0005 * 0.6 + (-$0.30 * 0.4) = -$0.1197 3.) Suppose you collect the following data that are a random sample from a N(μ,σ2) population: 4.37 6.99 7.85 2.60 3.34 5.94 4.21 5.99 8.53 4.92 a) Compute the t-statistic for testing the hypothesis: H0 : μ = 4 H1 : μ ≠ 4 b) What is the sampling distribution of the test statistic you computed in part a)? c) Can you reject the null hypothesis of part a at the 5% level of significance? Explain. a) We need to form T X ~ t 0 s/ n n 1 First, X-bar = (4.37 + 6.99 + 7.85 + 2.60 + 3.34 + 5.94 + 4.21 + 5.99 + 8.53 + 4.92) / 10 X-bar = 5.4740 Next, s2 = 1/9 * [(4.37 – 5.474)2+ (6.99 – 5.474)2 + (7.85 – 5.474)2 + (2.60 – 5.474)2 + (3.34 – 5.474)2 + (5.94 – 5.474)2 + (4.21 – 5.474)2 + (5.99 – 5.474)2 + (8.53 – 5.474)2 + (4.92 – 5.474)2] s2 = 3.7448 which implies s = 1.9351 and T 5.4740 4.00 1.9351 / 10 1.4740 2.4088 ~ tn1 1.9351 / 10 b) The sampling distribution is defined as the set of possible values that the statistic might take, and the probabilities associated with each of them. It measures uncertainty over the possible value that the statistic might take in repeated samples from the same population. In this case, the sampling distribution is simply tn-1 = t9. c) The critical value for a t distribution with 9 degrees of freedom and 0.05 level of significance in the presence of a two-sided alternative is equal to 2.262. This suggests we can safely reject the null that μ = 4. 4.) Suppose you have a random sample of 100 SFU students. In response to a short survey, each student reported the usual number of hours per week that he/she spent working off-campus (Xi), and the usual number of hours per week that he/she spent engaged in social activities (Yi). The university has hired you to analyze these data. Their main interest is the total number of hours per week that SFU students engage in non-academic activities. Define a new variable, Zi = Xi + Yi, to measure the number of hours that SFU students engage in non-academic activities. Suppose that in the population of SFU students: EX i X , EYi Y ,Var X i X2 ,Var Yi Y2 , CovX i , Yi XY . a) The university has asked you to estimate the average number of hours that SFU students engage in non-academic activities, i.e., E[Zi] = μz. Your roommate, who has already taken BUEC 333, says “That’s easy! Just take the first person in your sample. Their value, Z1, is an unbiased estimator of μz.” Is your roommate right? Explain. b) What is the variance of your roommate’s estimator in part a)? c) Give a more efficient, but still unbiased, estimator of μz. Show that it is unbiased and that it is more efficient than your roommate’s estimator. d) What is the variance of the sampling distribution of your estimator in part c)? Explain what the variance of the sampling distribution measures. a) Yes, your roommate is right. Since E(V + W) = E(V) + E(W), it follows that E(Zi)=E(Xi + Yi) = E(Xi) + E(Yi)= μx + μy = μz. And since E(Zi) = E(Z1), E(Z1) = μz. 1 n 2 Xi . n i 1 n b) A good place to start is to remember that Var X Var Using Z1 is just a special case of X-bar where n = 1, so Var Z1 2 . c) If Z1/1 is unbiased, then it stands to reason that (Z1 + Z2)/2 is unbiased as well since E((Z1 + Z2)/2) = ½*(E(Z1) + E(Z2)) = ½*(μz + μz) = μz. Z1 Z 2 2. 2 2 2 Furthermore, Var d) See above. The variance of the sampling distribution measures the dispersion of the sample statistic of interest and represents the fact that different samples drawn from the same population will necessarily generate different values of the sample statistic as different observations take different values. 5.) Suppose we have a linear regression model with one independent variable and no intercept: Yi = βXi + εi a) Verbally explain the steps necessary to derive the least squares estimator (hint: this should entail four distinct steps). The least squares estimator seeks to minimize the sum of the squared residuals from the estimating equation. 1.) Thus, we first have to define our residual as the difference between that which is observed and that which is predicted by the regression. (In this way, the residual is best thought of as a prediction error, that is, something we would like to make as small as possible. Because these residuals will likely be both positive and negative, simply considering their sum is unsatisfactory as this will likely be equal to zero. A better way forward is to consider the sum of the squared “prediction errors” which will definitely not be zero and which will penalize us for making big errors.) 2.) Next, we need to define a minimization problem. 3.) We must take the derivative of the sum of squared residuals with respect to beta-hat and set it equal to zero. 4.) Finally, we must solve for beta-hat. b) Formally derive an expression for this estimator given your answer in part a). ei Yi ˆ X i e n Minˆ i 1 n 2 i i 1 Yi ˆ X i Y 2 n n 2 i 1 i i 1 n i 1 This allows us to derive the following first order condition: n ei2 i 1 ˆ 2 X iYi 2 ˆ X i2 0 n n i 1 i 1 X iYi ˆ X i2 0 n n i 1 i 1 ˆ X i2 X iYi n n i 1 i 1 n ˆ XY i 1 n i i X i 1 2 i 2Yi ˆ X i ˆ X i 2