Economic Statistics – 2011-12 – Gozzi Here is a collection of exercises on non-linear models. For solutions not shown ask the teacher EXERCISE 1 Consider how cloth consumption is related to income and price. LN_Cloth: natural log of the textile consumption per capita LN_Income: natural log of the real income per capita LN_Price: natural log of the relative price of textile Here you find a partial Gretl output for the regression of LN_Cloth on LN_Income and LN_Price: R2 = 97.4% Analysis of Variance Source DF Regression ? Error ? SS MS F ? ? 0.51734 ? ANOVA ? Total 15 Variable Coef. Est. Intercept 3.1636 0.7047 ? LN_Income 1.1432 0.1560 ? -0.82886 0.03611 ? LN_Price SE t-stat (a) Fill in the blanks. Test the statistical significance of the model at α = 0.05. Source DF SS MS Regression ______ 0.51734 ______ Error ______ ______ ______ Total 15 1 F ______ (b) Write the regression line. Test the estimated coefficients at α= 0.05. Solution: LN_Cloth = 3.1636 + 1.1432 LN_Income – 0.82886 LN_Price H0 : βi = 0, i = LN _ Income, LN _ Price H1 : βi ≠ 0, i = LN _ Income, LN _ Price . (c) Interpret the estimated coefficient for LN_Income? How is the type of data, observational or experimental, relevant to the interpretation? (d) Consider the hypothesis that the demand for cloth is price inelastic. Test the hypothesis at the 5% level. (e) Write the regression equation in term of original value. What type of model we obtain? (f) Sketch on two single graphs the relationship between Cloth and Income holding constant Price and Cloth and Price holding constant Income. SOLUTION (4) (a) Solution: ANOVA Source Regression Error Total DF 2 13 15 SS 0.51734 0.01381 MS 0.25867 0.00106 F 243.5 At the 0.05 significance level, reject H0 if F ≥ 3.81, do not reject H0 if F 3.81. (Note: The sample size is 16. The total DF is 15 = n – 1. Hence n = 16.) F (k 1, n k ) R 2 (k 1) 0.974 / 2 243.5 2 (1 R ) (n k ) (1 0.974) /13 Since F = 243.5 falls into the rejection region. Reject 0 H and conclude that the model is statistically significant. (b) Solution: LN_Cloth = 3.1636 + 1.1432 LN_Income – 0.82886 LN_Price H0 : βi = 0, i = LN _ Income, LN _ Price H1 : βi ≠ 0, i = LN _ Income, LN _ Price At the 0.05 significance level, reject H0 if t ≥2.160 or t 2.160. Do not reject H0 if 2 2.160 t 2.160. Variable Intercept LN_Income LN_Price Coef. Est. 3.1636 1.1432 -0.82886 SE 0.7047 0.1560 0.03611 t-stat 4.4893 7.3282 -22.954 (c) Solution: The estimated coefficient, 1.1432, is the income elasticity of Cloth purchases. A one percent increase in income is associated with a 1.1432 percent increase in per capita cloth consumption, holding price constant. However, to interpret the coefficient as an elasticity requires that we can infer causality. However, if we have observational data, which is typical for economic analyses, then we cannot interpret the coefficient as an elasticity. If the data are experimental (for example, if income were randomly varied and we saw what happened to cloth purchases), then we could interpret this as a causal relationship and hence interpret the coefficient as an elasticity. (d) Solution: H0 :βprice = -1 H1 : βprice > -1 t = [-0.82886-(-1)]/0.03661 = 4.7394 From the table t = 1.771 (df= 16 – 3 = 13) => Reject the null. Infer that demand is inelastic, BUT if have observational data, we cannot make this inference because cannot conclude that we have a causal relationship. EXERCISE 2 For each of the 29 airline companies in Western Europe in 1987, we collected the following variables: Q = Output (supply of transport ton kilometer) L = The work force K = Weight of the fleet (in tons) We consider the following log log model (log in base e): (1) log Q = β1’ + β2*log L + β3*log K + u 3 where u is a normal random variable. 1) Comment briefly this particular model explaining the meaning of coefficients β 2 and β3; (0.5 points) 2) Comment (see Table 1 and Table 2): a. the significance of three parameter (at α=5%) (0.5 points) b. the goodness of fit (0.5 points). 3) We knows that if β2+ β3= 1, the production function has constant returns to scale: a. What means constant returns of scale? (0.5 points) b. What happens in our case, keeping in mind that the estimation obtained are point estimation and test F result (at α=5%) on restriction (see Table 3); (0.5 points) 4) Rewrite the production function in the original non-linear form. About the value of the parameters to be used to take into account the conclusions of the restriction tests F (0.5 points). Table 1 -OLS, using observations 1-29 Dependent variable: logQ coefficient std. error t-ratio p-value -----------------------------------------------------------------------const 6.47719 0.248886 26.02 3.80e-020 *** logL 0.230860 0.123851 1.864 0.0737 * logK 0.747993 0.124123 6.026 2.30e-06 *** Mean dependent var 15.06329 S.D. dependent var 1.127402 Sum squared resid 0.629542 S.E. of regression 0.155606 R-squared 0.982311 Adjusted R-squared 0.980950 Table 2 - Analysis of Variance: Regression Residual Total Sum of squares df Mean square 34.9595 0.629542 35.589 2 26 28 17.4797 0.0242132 1.27104 Table 3 - Restriction: b[logL] + b[logK] = 1 Test statistic: F(1, 26) = 0.67253, with p-value = 0.419626 4 Restricted estimates: coefficient std. error t-ratio p-value -------------------------------------------------------------------------const 6.29574 0.113260 55.59 2.20e-029 *** logL 0.240363 0.122558 1.961 0.0602 * log K 0.759637 0.122558 6.198 1.26e-06 *** EXERCISE 3 Suppose that the scatterplot of (log x, log y) shows a strong positive correlation close to 1. Which of the following are true? (0.5 points). The variables x and y also have a correlation close to 1. A scatterplot of (x, y) shows a strong nonlinear pattern. The residual plot of the variables x and y shows a random pattern. I. II. III. (a) (b) (c) (d) (e) I only II only III only I and II I, II, and III EXERCISE 4 Figure 2.4 shows that the Nusselt number (y) can be correlated with Reynolds number (x) on a log log plot such that y = b1xb2. (a) fit b1 and b2 to at least 10 points using nonlinear regression (b) take the log of both sides of y = b1xb2.and fit b1 and b2 using linear regression. Compare the fitted values for the two approaches. Are the results different? Why or why not? Part (a): The data points I used in my regression are given below Re 0.2 1.0 2.0 20 100 400 2000 6000 20000 200000 Nu 0.55 0.90 1.10 2.90 5 10 23 40 80 400 5 Using nonlinear regression to minimize the sum of squared error between the model prediction and the data, with Gretl I get the following results: Model 1: NLS, using observations 1-10 Nu = b1*Re^(b2) estimate std. error t-ratio p-value -----------------------------------------------------b1 0.107838 0.0168529 6.399 0.0002 *** b2 0.673219 0.0129341 52.05 2.06e-011 *** Mean dependent var 56.34500 S.D. dependent var 123.3411 Sum squared resid 81.70629 S.E. of regression 3.195823 R-squared 0.999403 Adjusted R-squared 0.999329 # Gretl Script for NLR NUSSELT Exemple genr scalar b1 =0.001 genr scalar b2 =0.5 nls Nu = b1*Re^(b2) deriv b1 = Re^(b2) deriv b2 = b1 * Re^(b2)*log(Re) params b1 b2 end nls The model fit looks like this on a regular plot 6 450 400 Data 350 Model Nu 300 250 200 150 100 50 0 0 50000 100000 150000 200000 250000 Re And like this on a log/log plot 1000.00 Data 100.00 Model Nu 10.00 1.00 0.10 0.01 0.1 1.0 10.0 100.0 1000.0 10000.0 100000.0 1000000.0 Re Part (b): Transforming the data and using linear regression gives the following results Model 7: OLS, using observations 1-10 Dependent variable: l_Nu const l_Re Coefficient -0.217756 0.46479 Mean dependent var Sum squared resid R-squared F(1, 8) Std. Error 0.140901 0.021012 2.156671 0.666473 0.983913 489.3052 t-ratio -1.5455 22.1202 S.D. dependent var S.E. of regression Adjusted R-squared P-value(F) b1 = exp(-0.217756) = 0.804321 b2 = 0.46479 7 p-value 0.16082 <0.00001 *** 2.145539 0.288633 0.981902 1.84e-08 The model fit looks like this on a log-log plot And like this on a regular plot 450 400 Data 350 Model Nu 300 250 200 150 100 50 0 0 50000 100000 150000 200000 250000 Re You can see that the nonlinear regression fits the points at higher Re and Nu, while the linear regression fits the points at low Re and Nu better. This is due to the fact that the logarithm makes large errors seem smaller (when you take the log of a large number, it is less than the original number). Linear regression on the log transformed model weights points at low Re and Nu more than points at high Re and Nu. While it appears to be a better fit on the log-log plot, the large error at high Re and Nu is apparent on the regular 8 plot. The error apparent on the log-log plot for the nonlinearly regressed model is actually not very large when you view it on the regular plot. . EXERCISE 5 Consider the following five equations. (i) y=3+2x (ii) y = 3 + 2 (1/x) (iii) y = 3 + 2 ln(x) (iv) ln(y) = 3 + 2 x (v) ln(y) = 3 + 2 ln(x) For which equation is each of the following statements true? (No justification is necessary.) a. A one percent increase in x causes a two percent increase in y. b. The equation relating y and c. As x approaches infinity, y approaches 3. d. A one-unit increase in x causes a two percent increase in y. e. An increase in x causes a decrease in y. f. The elasticity of y with respect to x is constant and equal to 2. x is a straight line. EXERCISE 6 In the equation ln(y) = 7 + 0.3 ln(x), which of the following is correct? a) b) c) d) If x increases by 1 unit, then y increases by 0.3 units. If x increases by 1%, then y increases by 0.3 units. If x increases by 1 unit then y increases by 0.3%. The elasticity of y with respect to x is 0.3. 9 EXERCISE 7 Suppose Q = quantity demanded, P = price of the good, and I = consumer income. The price elasticity of demand equals –0.8 in which equation below? a. b. c. d. e. Q = 64.2 – 0.8 ln(P) + 1.2 ln(I) . Q = 64.2 – 0.8 P + 1.2 I . Q = 64.2 – 0.8 (P/I) . ln(Q) = 3.5 – 0.8 P + 1.2 I . ln(Q) = 3.5 – 0.8 ln(P) + 1.2 ln(I) . EXERCISE 8 When using the linearized data model to find the parameters of the regression model y 1e 2 x to best fit x1 , y1 , x2 , y2 ,........, xn , yn , the sum of the square of the residuals that is minimized is y e n i. 2 xi i 1 i 2 1 n ii. ln( y ) ln x i i 1 1 2 n iii. y ln x i 1 i 1 2 2 i n iv. ln( y ) ln i 1 i 1 2 i 2 ln( xi ) 2 EXERCISE 9 The linearized data model for the model curve Y 1e 2 X between Y and X, 1. ln Y ln 1 2 X Y 2. ln ln 1 2 X X Y 3. ln ln 1 2 X X 4. ln Y ln( 1 ) 2 X EXERCISE 10 The quadratic equation, Y = a + bX + cX2, can be estimated using linear regression by estimating 1. 2. 3. 4. 5. Y = a + ZX where Z = (b + c)2 Y = a + bZ where Z = X2 Y = a + ZX where Z = (b + c) Y = a + bX + ZX where Z = c2 none of the above will work 10 EXERCISE 11 In the nonlinear function Y = aXbZc, the parameter c measures a) b) c) d) e) ΔY / ΔZ the percent change in Y for a 1 percent change in Z. the elasticity of Y with respect to Z. both b and c all of the above EXERCISE 12 Suppose that the scatterplot of (log x, log y) shows a strong positive correlation close to 1. Which of the following are true? I. II. III. (a) (b) (c) (d) (e) The variables x and y also have a correlation close to 1. A scatterplot of (x, y) shows a strong nonlinear pattern. The residual plot of the variables x and y shows a random pattern. I only II only III only I and II I, II, and III EXERCISE 13 If the model for the relationship between the score on Economic Statistics (Y) and the number of hours spent preparing for the test (X) was ln Y_hat = 1.10 + 1.5 ln X, determine the residual if a student studied 9 hours and earned a score of 85 out of 100. (a) (b) (c) (d) (e) 6.53 3.89 15.23 0 –4.86 Solution: The original model was: Y 1 X 2 u e Y_hat = b1*Xb2 = e1.10 * 91.5 = 3.0042 * 65.0221 = 81.112 Residual = Y – Y_hat = 85 – 81.112 = 3.89 11 EXERCISE 14 A regression model in which β1 represents the expected change in Y in response to a 1unit increase in X1 is a. Y = β0 + β1X1 + u. b. ln(Y) = β0 + β1X1 + u. c. Y = β0 + β1 ln(X1) + u. d. ln(Y) = β0 + β1 ln(X1) + u EXERCISE 15 A regression model in which β1 represents the expected percentage change in Y in response to a 1% increase in X1 is a. Y = β0 + β1X1 + u. b. ln(Y) = β0 + β1X1 + u. c. Y = β0 + β1 ln(X1) + u. d. ln(Y) = β0 + β1 ln(X1) + u. EXERCISE 16 The quadratic equation, Y = a + bX + cX2, can be estimated using linear regression by estimating 1) 2) 3) 4) 5) Y = a + ZX where Z = (b + c)2 Y = a + bZ where Z = X2 Y = a + ZX where Z = (b + c) Y = a + bX + ZX where Z = c2 none of the above will work EXERCISE 17 Consider this constant-elasticity demand model: log(Q ) = β1 − β2*log(P ) +u Suppose that experimental data of 56 consumers are available. With this data we obtain the following OLS parameter estimates and standard errors. log_Q_hat = 6.01 – 1.47*log_P (1.23) (0.38) 1) Test whether we can infer that demand is downward sloping. 2) Test whether we can infer that demand is elastic. 3) Write the model in term of original variable Q and P and provides a graphic representation of the model. 12 Solution Test whether we can infer that demand is downward sloping. H0 : β2 = 0 H1 : β2 < 0 t = (-1.47-0)/0.38 = -3.87 Prob(t < -1.67, df =54) = 0.05 Hence reject the null hypothesis in favor of the research hypothesis. We do have sufficient evidence to conclude at conventional significance levels that demand is downward sloping (i.e. has a negative elasticity). 2) Test whether we can infer that demand is elastic. Test if demand is elastic: H0 : β2 = -1 H1 : β2 < -1 t = (-1.47- (-1))/0.38 = - 1.24 Prob(t < -1.67, df =54) = 0.05 Hence fail to reject the null hypothesis. We do not have sufficient evidence to conclude at conventional significance levels that demand is elastic. 3) The linearized model is double logarithmic, so in term of original variable we obtain the following multiplicative model: Q _ hat exp( 6.01) * P 1.47 407.48 * P 1.47 13