suppose regression

advertisement
Economic Statistics – 2011-12 – Gozzi
Here is a collection of exercises on non-linear models. For solutions not
shown ask the teacher
EXERCISE 1
Consider how cloth consumption is related to income and price.
LN_Cloth: natural log of the textile consumption per capita
LN_Income: natural log of the real income per capita
LN_Price: natural log of the relative price of textile
Here you find a partial Gretl output for the regression of LN_Cloth on LN_Income and
LN_Price:
R2 = 97.4%
Analysis of Variance
Source
DF
Regression
?
Error
?
SS
MS
F
?
?
0.51734
?
ANOVA
?
Total 15
Variable
Coef. Est.
Intercept
3.1636
0.7047
?
LN_Income
1.1432
0.1560
?
-0.82886
0.03611
?
LN_Price
SE
t-stat
(a) Fill in the blanks. Test the statistical significance of the model at α = 0.05.
Source
DF
SS
MS
Regression
______
0.51734
______
Error
______
______
______
Total
15
1
F
______
(b) Write the regression line. Test the estimated coefficients at α= 0.05.
Solution:
LN_Cloth = 3.1636 + 1.1432 LN_Income – 0.82886 LN_Price
H0 : βi = 0, i = LN _ Income, LN _ Price
H1 : βi ≠ 0, i = LN _ Income, LN _ Price .
(c) Interpret the estimated coefficient for LN_Income? How is the type of
data, observational or experimental, relevant to the interpretation?
(d) Consider the hypothesis that the demand for cloth is price inelastic. Test
the hypothesis at the 5% level.
(e) Write the regression equation in term of original value. What type of model we
obtain?
(f) Sketch on two single graphs the relationship between Cloth and Income holding
constant Price and Cloth and Price holding constant Income.
SOLUTION
(4) (a) Solution:
ANOVA
Source
Regression
Error
Total
DF
2
13
15
SS
0.51734
0.01381
MS
0.25867
0.00106
F
243.5
At the 0.05 significance level, reject H0 if F ≥ 3.81, do not reject H0 if F 3.81. (Note: The
sample size is 16. The total DF is 15 = n – 1. Hence n = 16.)
F (k  1, n  k ) 
R 2 (k  1)
0.974 / 2

 243.5
2
(1  R ) (n  k ) (1  0.974) /13
Since F = 243.5 falls into the rejection region. Reject 0 H and conclude that the model is
statistically significant.
(b) Solution: LN_Cloth = 3.1636 + 1.1432 LN_Income – 0.82886 LN_Price
H0 : βi = 0, i = LN _ Income, LN _ Price
H1 : βi ≠ 0, i = LN _ Income, LN _ Price
At the 0.05 significance level, reject H0 if t ≥2.160 or t 2.160. Do not reject H0 if
2
2.160 t 2.160.
Variable
Intercept
LN_Income
LN_Price
Coef. Est.
3.1636
1.1432
-0.82886
SE
0.7047
0.1560
0.03611
t-stat
4.4893
7.3282
-22.954
(c) Solution: The estimated coefficient, 1.1432, is the income elasticity of Cloth
purchases. A one percent increase in income is associated with a 1.1432 percent increase
in per capita cloth consumption, holding price constant.
However, to interpret the coefficient as an elasticity requires that we can infer causality.
However, if we have observational data, which is typical for economic analyses, then we
cannot interpret the coefficient as an elasticity. If the data are experimental (for example, if
income were randomly varied and we saw what happened to cloth purchases), then we
could interpret this as a causal relationship and hence interpret the coefficient as an
elasticity.
(d) Solution:
H0 :βprice = -1
H1 : βprice > -1
t = [-0.82886-(-1)]/0.03661 = 4.7394
From the table t = 1.771 (df= 16 – 3 = 13) => Reject the null. Infer that demand is
inelastic, BUT if have observational data, we cannot make this inference because cannot
conclude that we have a causal relationship.
EXERCISE 2
For each of the 29 airline companies in Western Europe in 1987, we collected the
following variables:
Q = Output (supply of transport ton kilometer)
L = The work force
K = Weight of the fleet (in tons)
We consider the following log log model (log in base e):
(1)
log Q = β1’ + β2*log L + β3*log K + u
3
where u is a normal random variable.
1) Comment briefly this particular model explaining the meaning of coefficients β 2 and
β3; (0.5 points)
2) Comment (see Table 1 and Table 2):
a. the significance of three parameter (at α=5%) (0.5 points)
b. the goodness of fit (0.5 points).
3) We knows that if β2+ β3= 1, the production function has constant returns to scale:
a. What means constant returns of scale? (0.5 points)
b. What happens in our case, keeping in mind that the estimation obtained are
point estimation and test F result (at α=5%) on restriction (see Table 3); (0.5
points)
4) Rewrite the production function in the original non-linear form. About the value of
the parameters to be used to take into account the conclusions of the restriction
tests F (0.5 points).
Table 1 -OLS, using observations 1-29
Dependent variable: logQ
coefficient
std. error t-ratio p-value
-----------------------------------------------------------------------const
6.47719
0.248886 26.02 3.80e-020 ***
logL
0.230860
0.123851 1.864 0.0737 *
logK
0.747993
0.124123 6.026 2.30e-06 ***
Mean dependent var 15.06329 S.D. dependent var 1.127402
Sum squared resid 0.629542 S.E. of regression 0.155606
R-squared
0.982311 Adjusted R-squared 0.980950
Table 2 - Analysis of Variance:
Regression
Residual
Total
Sum of squares
df
Mean square
34.9595
0.629542
35.589
2
26
28
17.4797
0.0242132
1.27104
Table 3 - Restriction:
b[logL] + b[logK] = 1
Test statistic: F(1, 26) = 0.67253, with p-value = 0.419626
4
Restricted estimates:
coefficient
std. error t-ratio
p-value
-------------------------------------------------------------------------const
6.29574
0.113260 55.59 2.20e-029 ***
logL
0.240363
0.122558 1.961 0.0602 *
log K
0.759637
0.122558 6.198 1.26e-06 ***
EXERCISE 3
Suppose that the scatterplot of (log x, log y) shows a strong positive correlation close to 1.
Which of the following are true? (0.5 points).
The variables x and y also have a correlation close to 1.
A scatterplot of (x, y) shows a strong nonlinear pattern.
The residual plot of the variables x and y shows a random pattern.
I.
II.
III.
(a)
(b)
(c)
(d)
(e)
I only
II only
III only
I and II
I, II, and III
EXERCISE 4
Figure 2.4 shows that the Nusselt number (y) can be correlated with Reynolds number (x)
on a log log plot such that y = b1xb2.
(a)
fit b1 and b2 to at least 10 points using nonlinear regression
(b)
take the log of both sides of y = b1xb2.and fit b1 and b2 using linear
regression. Compare the fitted values for the two approaches. Are the
results different? Why or why not?
Part (a): The data points I used in my regression are given below
Re
0.2
1.0
2.0
20
100
400
2000
6000
20000
200000
Nu
0.55
0.90
1.10
2.90
5
10
23
40
80
400
5
Using nonlinear regression to minimize the sum of squared error between the model
prediction and the data, with Gretl I get the following results:
Model 1: NLS, using observations 1-10
Nu = b1*Re^(b2)
estimate
std. error
t-ratio
p-value
-----------------------------------------------------b1
0.107838 0.0168529
6.399
0.0002
***
b2
0.673219 0.0129341
52.05
2.06e-011 ***
Mean dependent var 56.34500
S.D. dependent var
123.3411
Sum squared resid
81.70629
S.E. of regression
3.195823
R-squared
0.999403
Adjusted R-squared
0.999329
# Gretl Script for NLR NUSSELT Exemple
genr scalar b1 =0.001
genr scalar b2 =0.5
nls Nu = b1*Re^(b2)
deriv b1 = Re^(b2)
deriv b2 = b1 * Re^(b2)*log(Re)
params b1 b2
end nls
The model fit looks like this on a regular plot
6
450
400
Data
350
Model
Nu
300
250
200
150
100
50
0
0
50000
100000
150000
200000
250000
Re
And like this on a log/log plot
1000.00
Data
100.00
Model
Nu
10.00
1.00
0.10
0.01
0.1
1.0
10.0
100.0
1000.0
10000.0
100000.0 1000000.0
Re
Part (b): Transforming the data and using linear regression gives the following results
Model 7: OLS, using observations 1-10
Dependent variable: l_Nu
const
l_Re
Coefficient
-0.217756
0.46479
Mean dependent var
Sum squared resid
R-squared
F(1, 8)
Std. Error
0.140901
0.021012
2.156671
0.666473
0.983913
489.3052
t-ratio
-1.5455
22.1202
S.D. dependent var
S.E. of regression
Adjusted R-squared
P-value(F)
b1 = exp(-0.217756) = 0.804321
b2 = 0.46479
7
p-value
0.16082
<0.00001
***
2.145539
0.288633
0.981902
1.84e-08
The model fit looks like this on a log-log plot
And like this on a regular plot
450
400
Data
350
Model
Nu
300
250
200
150
100
50
0
0
50000
100000
150000
200000
250000
Re
You can see that the nonlinear regression fits the points at higher Re and Nu, while the
linear regression fits the points at low Re and Nu better. This is due to the fact that the
logarithm makes large errors seem smaller (when you take the log of a large number, it is
less than the original number). Linear regression on the log transformed model weights
points at low Re and Nu more than points at high Re and Nu. While it appears to be a
better fit on the log-log plot, the large error at high Re and Nu is apparent on the regular
8
plot. The error apparent on the log-log plot for the nonlinearly regressed model is actually
not very large when you view it on the regular plot.
.
EXERCISE 5
Consider the following five equations.
(i)
y=3+2x
(ii)
y = 3 + 2 (1/x)
(iii)
y = 3 + 2 ln(x)
(iv)
ln(y) = 3 + 2 x
(v)
ln(y) = 3 + 2 ln(x)
For which equation is each of the following statements true? (No justification is
necessary.)
a.
A one percent increase in x causes a two percent increase in y.
b.
The equation relating y and
c.
As x approaches infinity, y approaches 3.
d.
A one-unit increase in x causes a two percent increase in y.
e.
An increase in x causes a decrease in y.
f.
The elasticity of y with respect to x is constant and equal to 2.
x is a straight line.
EXERCISE 6
In the equation
ln(y) = 7 + 0.3 ln(x),
which of the following is correct?
a)
b)
c)
d)
If x increases by 1 unit, then y increases by 0.3 units.
If x increases by 1%, then y increases by 0.3 units.
If x increases by 1 unit then y increases by 0.3%.
The elasticity of y with respect to x is 0.3.
9
EXERCISE 7
Suppose Q = quantity demanded, P = price of the good, and I = consumer income. The
price elasticity of demand equals –0.8 in which equation below?
a.
b.
c.
d.
e.
Q = 64.2 – 0.8 ln(P) + 1.2 ln(I) .
Q = 64.2 – 0.8 P + 1.2 I .
Q = 64.2 – 0.8 (P/I) .
ln(Q) = 3.5 – 0.8 P + 1.2 I .
ln(Q) = 3.5 – 0.8 ln(P) + 1.2 ln(I) .
EXERCISE 8
When using the linearized data model to find the parameters of the regression model
y  1e  2 x to best fit x1 , y1 , x2 , y2 ,........, xn , yn , the sum of the square of the residuals that
is minimized is
 y   e  
n
i.
2 xi
i 1
i
2
1
n
ii.
 ln( y )  ln     x 
i
i 1
1
2
n
iii.
  y  ln     x 
i 1
i
1
2
2
i
n
iv.
 ln( y )  ln    
i 1
i
1
2
i
2
ln( xi ) 
2
EXERCISE 9
The linearized data model for the model curve Y  1e   2 X between Y and X,
1. ln Y  ln 1    2 X
Y 
2. ln    ln 1    2 X
X
Y 
3. ln    ln 1    2 X
X
4. ln Y  ln( 1 )   2 X
EXERCISE 10
The quadratic equation, Y = a + bX + cX2, can be estimated using linear regression by
estimating
1.
2.
3.
4.
5.
Y = a + ZX where Z = (b + c)2
Y = a + bZ where Z = X2
Y = a + ZX where Z = (b + c)
Y = a + bX + ZX where Z = c2
none of the above will work
10
EXERCISE 11
In the nonlinear function Y = aXbZc, the parameter c measures
a)
b)
c)
d)
e)
ΔY / ΔZ
the percent change in Y for a 1 percent change in Z.
the elasticity of Y with respect to Z.
both b and c
all of the above
EXERCISE 12
Suppose that the scatterplot of (log x, log y) shows a strong positive correlation close to 1.
Which of the following are true?
I.
II.
III.
(a)
(b)
(c)
(d)
(e)
The variables x and y also have a correlation close to 1.
A scatterplot of (x, y) shows a strong nonlinear pattern.
The residual plot of the variables x and y shows a random pattern.
I only
II only
III only
I and II
I, II, and III
EXERCISE 13
If the model for the relationship between the score on Economic Statistics (Y) and the
number of hours spent preparing for the test (X) was ln Y_hat = 1.10 + 1.5 ln X,
determine the residual if a student studied 9 hours and earned a score of 85 out of 100.
(a)
(b)
(c)
(d)
(e)
6.53
3.89
15.23
0
–4.86
Solution:
The original model was:
Y  1 X
2 u
e
Y_hat = b1*Xb2 = e1.10 * 91.5 = 3.0042 * 65.0221 = 81.112
Residual = Y – Y_hat = 85 – 81.112 = 3.89
11
EXERCISE 14
A regression model in which β1 represents the expected change in Y in response to a 1unit increase in X1 is
a. Y = β0 + β1X1 + u.
b. ln(Y) = β0 + β1X1 + u.
c. Y = β0 + β1 ln(X1) + u.
d. ln(Y) = β0 + β1 ln(X1) + u
EXERCISE 15
A regression model in which β1 represents the expected percentage change in Y in
response to a 1% increase in X1 is
a. Y = β0 + β1X1 + u.
b. ln(Y) = β0 + β1X1 + u.
c. Y = β0 + β1 ln(X1) + u.
d. ln(Y) = β0 + β1 ln(X1) + u.
EXERCISE 16
The quadratic equation, Y = a + bX + cX2, can be estimated using linear regression by
estimating
1)
2)
3)
4)
5)
Y = a + ZX where Z = (b + c)2
Y = a + bZ where Z = X2
Y = a + ZX where Z = (b + c)
Y = a + bX + ZX where Z = c2
none of the above will work
EXERCISE 17
Consider this constant-elasticity demand model:
log(Q ) = β1 − β2*log(P ) +u
Suppose that experimental data of 56 consumers are available. With this data we obtain
the following OLS parameter estimates and standard errors.
log_Q_hat = 6.01 – 1.47*log_P
(1.23)
(0.38)
1) Test whether we can infer that demand is downward sloping.
2) Test whether we can infer that demand is elastic.
3) Write the model in term of original variable Q and P and provides a graphic
representation of the model.
12
Solution
Test whether we can infer that demand is downward sloping.
H0 : β2 = 0
H1 : β2 < 0
t = (-1.47-0)/0.38 = -3.87
Prob(t < -1.67, df =54) = 0.05
Hence reject the null hypothesis in favor of the research hypothesis. We do have
sufficient evidence to conclude at conventional significance levels that demand is
downward sloping (i.e. has a negative elasticity).
2) Test whether we can infer that demand is elastic.
Test if demand is elastic:
H0 : β2 = -1
H1 : β2 < -1
t = (-1.47- (-1))/0.38 = - 1.24
Prob(t < -1.67, df =54) = 0.05
Hence fail to reject the null hypothesis. We do not have sufficient evidence to
conclude at conventional significance levels that demand is elastic.
3) The linearized model is double logarithmic, so in term of original variable we
obtain the following multiplicative model:
Q _ hat  exp( 6.01) * P 1.47  407.48 * P 1.47
13
Download