Exam 2004 - Universitetet i Oslo

advertisement
UNIVERSITY OF OSLO
DEPARTMENT OF ECONOMICS
Exam: ECON4135 - Applied statistics and econometrics, fall 2004
Date of exam: Wednesday, December 1, 2004
Time for exam: 14:30 – 17:30
The problem set covers 6 pages
Resources allowed:
 All written and printed resources, as well as calculators, are allowed
Grades given: A (best), B, C, D, E and F, with E as the weakest passing grade.
Comments given in arial font
“Broken limits to life expectancy” by Oeppen and Vaupel (Science, VOL 296, 10 May 2002)
shoved that many previous claims of upper limits to expected life length for a newborn have
been broken, and also that expected life length has shown a remarkable linear development
since 1840. We shall look at some of these data for females.
For each of the years 1840 – 2000 Oeppen and Vaupel looked at observed life expectancy in
best practicing country, called record life expectancy. Best practicing country is defined as the
country with the highest life expectancy in the actual year. Life expectancy in a country, often
denoted by e(0) , is calculated from the observed age-specific mortality rates in the actual
year, and is the expected life length for a newborn under the hypothesis that it is subject to
mortality rates throughout its life as was observed in its year of birth. Record life expectancy
in a given year is denoted by Yyear .
Problems
1. Female life expectancy in best practicing country is plotted against year in Figure 1. A
linear regression model Yyear   0  1 year  U year is fitted to the data by ordinary least
squares, see Stata output in Exhibit 1. What is the interpretation of ̂1 ? Is the intercept
estimate directly meaningful? Give a 95% confidence interval for the gain in life
expectancy in one calendar year, and also in 10 calendar years. What is the 99%
confidence interval for yearly gain in expected life length?
1 is yearly gain in female life expectancy in best practicing country in the
model, and ̂1 is its estimate from the1840-2000 series. Since year is
measured after Christ, year=0 is way outside the observed data span, and
extrapolation to the value at year 0, the intercept, is risky business. The
estimated intercept is large negative, which is nonsense for life expectancy.
2
95% CI for 1 : (0.238, 0.248); 95% CI for 101 is 10 times that for 1 , (2.38,
2.48); 99% CI for 1 : (0.237, 0.249).
2. From Figure 1 there was clearly more variation in the data in the first part of the period
than in the remaining period, and there were outlying observations in the period 19161919. What could have caused these patterns? Another pattern is that record life
expectancy is flat over several periods around 1900. Why could that be? The regression
results in Exhibit 1 were calculated with robust standard errors. Why is it a good idea to
calculate robust standard errors in this case?
Few countries gathered statistical data on death rates in the early part of the
period, and those who did produced vital statistics more prone to error and
variability than in the 20th Century. 1916-1919: war and Spanish disease. Flats
over periods: Around 1900, several countries, including Norway, published
vital statistics every fifth year. With the best practicing country in this group,
the series has flats between publication years.
The standard errors for regression coefficients are biased if computed with the
classical method rather than the robust method when there is
heteroscedasticity such as the observed.
3. For a given year from 1841 on, the first difference in record life expectancy is
Dyear  Yyear  Yyear 1 . Figure 2 shows first differences record life expectancy versus year,
and Exhibit 2 gives summary information for this variable. Comment briefly on Figure 2
in view of Figure 1, and explain how the regression result relates to the mean in Exhibit 2.
Figure 2 shows more variability early in the period (due to more variation
around the regression curve), several flats at zero around 1900 (due to flats in
Figure 1), and variation around a constant level slightly above zero (due to
linearity in Figure 1). The mean 0.243 in Exhibit 2 estimates the yearly gain in
record life expectancy, which is modelled as 1 . It agrees with the regression
estimate of 0.243. The standard error obtained from Exhibit 2, 1.097 / 160 is
0.087, which does not match the standard error in Exhibit 1 (0.0024). The
discrepancy is due to the former being based on all differences having the
same variance, which certainly is not the case, while the latter is calculated by
the robust method not relying on this assumption.
4. From 1946 on D appears to have a rather stable development. An auto-regressive model
of order 1 was estimated for this period. What could a rationale be for this model? The
Stata result (in condensed form) is given in Exhibit 3, which means that the estimated
model is Dˆ year  .257-.135 Dyear 1 . What is the standard error for the autoregressive
coefficient? Is there a significant first order auto regression in the first differences in
record life expectancy?
The standard error for the auto-regressive coefficient is 0.148. With a twosided p-value of 0.36 (from the exhibit) when testing for no auto-correlation,
the auto-regression coefficient is certainly not statistically different from zero.
2
3
The first differences in record life expectancy might thus very well be
uncorrelated.
5. It is puzzling that record life expectancy has been growing nearly linearly over such a long
period, and indeed seems to continue to grow at about the same pace. Figure 3 is taken
from Oeppen and Vaupel (2000). In the first half of the period, only a few countries had
life expectancy close to record life expectancy (or, in fact, adequate vital statistics). In
more recent years, more and more countries, such as Chile, are getting their vital statistics
in shape, and are catching up with the leading group. That the group of nations with nearly
record life expectancy is growing in number is due to economic and other development in
many nations. Discuss whether the continued growth in record life expectancy could
partly be a statistical consequence of the fact that more and more countries belong to the
group of leading nations regarding female life expectancy.
In a hypothetical situation with life conditions (underlying mortality) not
changing in the group of best practicing countries, but with new countries
joining this group, estimated record life expectancy will tend to grow simply
since the record is the maximum of a larger and larger number of largely
independent random variables. If, say country i in the group of size nt in year
t has observed life expectancy X it which are iid with cumulative distribution
function F, the record Yt  max( X 1t , , X ntt ) has distribution
nt
P Yt  y   P  i  X it  y   P  X it  y   F ( y )nt due to independence. As nt
i 1
increases this certainly decrease for each y such that F(y)<1. The distribution
of record life expectancy is thus moving to the right from year to year. This
formal argument was not required at the exam.
6. Scholars have made claims of upper limits to female life expectancy. These claims have
been based on a variety of biological, demographic and other grounds. A claim of an
upper limit, say 64.8 years, is that no country will ever have a female life expectancy
above that limit. Oeppen and Vaupel (2000) identify 19 independent such claims or
asserted ceilings on female life expectancy, see Figure 4. The first claim was made by
Dublin in 1928, and the claim was that female life expectancy could not exceed 64.8
years. This was a failure even when it was made; since the record life expectancy
exceeded the limit already in 1921 (New Zealand had 65.9). Of the 19 claims, 14 have
come out as failures by 2002. For claim i let ti be the year the claim was made, and let Fi
be the binary variable (coded 1 for failure) recording whether the claim has come out as a
failure, i.e. has been beaten by record life expectancy by 2002. Exhibit 4 shows output
from two logistic regressions, both with F as the dependent variable. The first logistic
regression had only t as regressor, while in the second case both year of claime and lapse
time x  2002  t were attempted introduced as regressors. Interpret the two sets of results.
In the second case t was dropped by Stata. Why?
The two results agree since the linear predictor in the two logistic regressions
are identical: a  bt  a  2002b  bx . Here, b  .599558 is the regression
estimate, and a is the intercept in Exhibit 1. t and x  2002  t are perfectly
3
4
collinear, and Stata rightly rejects to have both terms in a linear logistic
regression. Otherwise regression coefficients would not have been identifiable.
The regression is logistic, and the regression coefficient is therefore
interpreted as a log odds ratio. With pt  P  Ft  1 as the failure probability, the
model is ln  pt / 1  pt    ln  Ot   0  1t . This gives 1  ln  Ot 1 / Ot  for all t.
The result is thus that the odds for failure for a claime made in year t is
reduced by an estimated factor of exp ˆ1  0.55 to the odds for failure made
 
in year t  1 . The second regression gives exactly the same result. The
confidence interval for this log odds ratio is however wide and contains 0. One
should expect a reduction in the odds, and thus in the failure probability the
closer to 2002 the claim was made. The result is in agreement with this, but
due to the data being weak, the result is not a statistyically significant
improvement in failuer probability (decrease). From Figure 4 it seems that only
4 or 5 of the 19 claims helt water by 2002, and they were mixed with failed
claims made in recent years. The large standard error is due to the low
number of data points (19) and the relatively low resolution in the timing of
non-failed claims.
7. To what extent is life expectancy determined by economic variables like GDP per capita?
Suppose you had data on life expectancy e(0) and GDP per capita, Z , for your own
country, or say Norway. Would you think that a regression of the form
e(0) year   0  1Z year  U year would yield valid results regarding the posed question?
Regard your hypothetical data as the outcome of a quasi-experiment, and discuss potential
threats to internal and external validity.
The suggested regression will probably be subject to omitted variables
bias since both GDP and mortality is likely to depend on common
variables. One might also have validity problems with the regression
since GDP and life expectancy are likely to be mutually dependent in the
sense that they are determined endogenously. These are threats to
internal validity. The dependency between the two variables need not be
the same in different parts of the world. One might, for example, have
egalitarian societies like the Norwegian, which had high life expectancy
while being among the poorer European nations in the 19th Century.
Differences in relationship between the two variables are threats to
external validity: results from Norway might not be valid for U.A.R. etc.
4
5
Exhibit 1
Regression with robust standard errors
Number of obs
F( 1,
159)
Prob > F
R-squared
Root MSE
=
161
= 9928.61
= 0.0000
= 0.9821
= 1.5325
-----------------------------------------------------------------------------|
Robust
Y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------year |
.2429773
.0024385
99.64
0.000
.2381613
.2477933
_cons | -401.4199
4.754271
-84.43
0.000
-410.8096
-392.0303
Exhibit 2
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------D
|
160
.2431875
1.096574 -4.209999
5.060001
Exhibit 3
Sample:
1946 to 2000
Number of obs
Wald chi2(1)
Prob > chi2
=
=
=
55
0.84
0.3607
-----------------------------------------------------------------------------D
|
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_cons
|
.2566701
.0598694
4.29
0.000
.1393282
.3740119
-------------+---------------------------------------------------------------ar
L1 | -.1353576
.1480772
-0.91
0.361
-.4255836
.1548684
-------------+----------------------------------------------------------------
Exhibit 4
Logit estimates
Log likelihood =
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
-3.865933
=
=
=
=
19
14.17
0.0002
0.6470
-----------------------------------------------------------------------------F
|
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------t
| -.5955806
.4474039
-1.33
0.183
-1.472476
.2813149
_cons |
1184.766
889.9979
1.33
0.183
-559.5983
2929.129
note: t dropped
Logit estimates
Log likelihood =
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
-3.865933
=
=
=
=
19
14.17
0.0002
0.6470
-----------------------------------------------------------------------------F
|
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------x |
.5955806
.4474039
1.33
0.183
-.2813149
1.472476
_cons | -7.586783
5.772359
-1.31
0.189
-18.9004
3.726832
------------------------------------------------------------------------------
5
40
50
60
Y
70
80
90
6
1850
1950
1900
2000
year
-4
-2
0
,D
2
4
6
Figure 1. Female life expectancy (in years) in best practicing country by calendar year.
Source: Oeppen and Vaupel (2000).
1850
1900
1950
year
Figure 2. First differences in record life expectancy versus year.
6
2000
7
Figure 3. Female life expectancy in five countries compared with the trend in record life
expectancy. Source: Oeppen and Vaupel (2000).
7
8
Figure 4. Record female life expectancy from 1840 to the present. The linear-regression trend
is depicted by a bold black line and the extrapolated trend by a dashed gray line. The
horizontal black lines show asserted ceilings on life expectancy, with a short vertical line
indicating the year of publication. The three dashed red lines denote projections of female life
expectancy in Japan published by the United Nations in 1986, 1999, and 2001: It is
encouraging that the U.N. altered its projection so radically between 1999 and 2001. Oeppen
and Vaupel (2001).
8
Download