Grado de importancia 2

advertisement
Methodology employed in the
calculation of the mortality
tables of the population in
Spain 1992-2005
Madrid, July 2007
Index
1
Introduction
3
2
Obtaining death probability series
3
3
Obtaining derivative series
4
4
Obtaining prospective series
5
5
Synthesis of the smoothing procedure
employed
6
Tables calculated and base information
used
7
Glossary of symbols
9
6
Methodology employed in the calculation of the
mortality tables of the population in Spain 19922005
1
Introduction
Mortality tables are compiled to measure the
incidence of this phenomenon on the
population under study, regardless of the
structure by age.
The type of table used is created after
performing a transversal analysis of the
mortality, examining how said phenomenon
affects the population classified by age or age
groups, at a certain moment in time.
Given the evolution usually experimented by
mortality, that does not present any brusque
modifications, these tables appear as an
acceptable description of the phenomenon for
short periods of time, close to the moment
they refer to.
To calculate the functions of a complete
mortality table, it is necessary to have
information on the deceased and the
population classified both by ages and
referred to the same time period.
Since the figures for deaths classified by ages
are quite small (except for the oldest age
groups), not only in provinces and
Autonomous Communities but also on a
national level, recount errors and possible
disruptions, which could exceptionally affect
mortality in a certain year, have a notable
bearing on this information. Consequently, it is
necessary to eliminate these anomalies since
if they were to remain in the data, they would
present
an
incorrect
image of
the
phenomenon under study. This elimination is
performed during the initial stage. To calculate
the mortality table for a certain moment in time
and for each age group, it is necessary to
consider average deaths corresponding to a
specific number of years (generally from two
to four), focusing on that particular moment.
In a second stage, it is necessary to eliminate
the disruptions, both in terms of the number of
deaths and the population, caused by errors
when stating the age, and produce an
increase of the values observed for certain
ages to the detriment of the contiguous ones,
distorting the series of death probabilities on
the mortality table. This problem is usually
avoided applying a smoothing procedure to
the original data.
2
Obtaining death probability series
Death probability at age x, qX, is defined as
the probability a person from a specific
generation, exactly x years old, has of dying
before reaching age x+1. Therefore, it is
necessary to consider possible death cases, in
other words, persons who could die, as well as
real events, that is, persons of that age and
generation who have actually died. Possible
cases are persons who are x years old,
calculated as the sum of the inhabitants who
are that age at the end of the year and half of
the persons deceased aged x during the year
in question, since it is supposed that deaths
are distributed uniformly throughout the year in
question. Accepting the hypothesis that the
deaths of persons from a certain generation
aged x occur half in one year and half in the
following, the death probability would be
expressed by:
qx =
1 / 2 (Dzx +
z+ 1
Dx )
z
z
Px + 1 / 2 (Dx)
where:
Dzx represents deaths occurred in year z aged
x.
Dzx1 represents deaths occurred in year z+1
aged x.
Pxz population on December 31st of year z
aged x.
The previous expression has been used to
calculate all qx corresponding to all ages
ranging from two to ninety years old, both
inclusive.
Since the deaths of babies under one year old
mainly occur during the first weeks of life, it is
not possible to apply this hypothesis uniformly
throughout the year. Therefore, for this age,
the death probability has been calculated
using:
z
q0 =
z+ 1
D0, g(z) + D0, g(z)
z
z
P0 + D0, g(z)
cubic parabola, given high mortalities for ages
around one hundred and ten years old.
where:
Dz0 ,g(z) deaths occurred in year z, 0 years old,
from the generation born that year.
Dz0,g1(z) deaths occurred in year z+1, 0 years
old, from the generation born the previous
year.
3
Obtaining derivative series
The death probability series can provide the
mortality
tables
functions
described
hereunder.
P0z population on December 31st of year z
aged 0.
Consequently, for babies aged one year old,
q1, has been calculated using:
z
1, g(z-1)
1
+ D1,z+g(zD
1)
q1 =
z
z
P1 + D1, g(z-1)
PROBABILITY OF LIFE OR SURVIVAL AT AGE x,
px
The probability of survival between two exact
ages. Therefore, for each age x,
px = 1 - qx
where:
D1z,g(z1) deaths in year z, aged 1, from
generation z-1.
z1
1,g(z1)
D
deaths in year z+1, aged 1, from
SURVIVORS AGED x YEARS OLD, lx
Number of persons aged x among the initial l 0
on the mortality table. Therefore, for each age
x,
generation z-1.
P1z population on December 31st of year z
aged 1.
The low number of deaths registered for
persons who are over ninety years old and the
greater repercussion of errors when stating
the age, lead to distortions in the death
probability series for the aforementioned ages.
Therefore, the latter have been estimated
adjusting a third grade parabola, by least
squares, based on the qx calculated using the
previous expression, for
X = 90, 91, 92, 93 and 94.
The following conditions were established in
order to perform said adjustment: a) The cubic
parabola passes through point q90, which
implies the continuity of the qx adjusted and
those calculated for ages under 90 years old,
b) value q110 = 1, truncating the parabola as
from this point, which means that, a priori,
there are no survivors over one hundred and
ten years old, and c) the cubic parabola has a
tangent parallel to the x axis at point x = 110,
which implies an accelerated increase of
mortality as from the point of inflection of the
lx = lx-1px-1
Surveys usually work with l0 = 100,000
THEORETICAL DEATHS AGED x YEARS OLD, dx
Deaths occurred between two exact ages x
and x+1, obtained from the mortality table.
L0  a0 l 0  a1l1 , where a0  a1  1
in which
z+ 1
D0, g(z)
z+ 1
z+ 1
D0, g(z) + D0, g(z+ 1)
Therefore, for each age x,
a0 =
d x  l x q x  l x  l x 1
where
Dz0,g1(z) represents the deaths of children under
LIFE EXPECTANCY AT AGE x, ex
Average number of years each person aged
exactly x is expected to live, for survivors that
reach said age, under the supposition that the
years lived by all persons are the same for all
of them.
Considering the hypothesis that all persons
who die at a certain age live, on average, half
the year in which they die, life expectancy is
calculated as
ex =
1
2
+
1
lx
For x = 99 and x = 100
L99 = e99 l99 - e100 l100
L100 = e100 l100
where L100 are survivors aged 100 years old
and older.

l
PROBABILITY OF SURVIVAL AT x YEARS OLD,
TX
i
i= x+ 1
with  representing the oldest age, for which
there are supposedly no survivors.
4
one year old occurred in year z+1 among
those born in generation g(z).
Obtaining prospective series
As well as the previous classical biometric
series or functions, it was considered of major
importance to include the two prospective
series specified hereunder.
The probability of survival for ages x and x+1
for persons aged x years old. This is easily
obtained from the former using
Tx =
and, for the population aged 99 years old and
older, the probability of reaching 100 years old
or over is
T99 =
SURVIVORS AGED x YEARS OLD, LX
Represents the number of survivors on the
mortality table who are x years old. The
estimate of this function has been performed
implementing
this
next
formula
(see
Introduction to the Mathematics of Population.
Keyfitz. Addison-Wesley):
Lx =
13
24
(lx + lx+ 1) -
1
24
for x = 1, 2, ..., 98.
For the remaining ages
(lx-1+ lx+ 2)
Lx+ 1
Lx
L100
L99 + L100
5
Synthesis
of
the
procedure employed
smoothing
Both population stocks obtained from
population censuses and register renewals,
and data on deaths obtained from the Vital
Statistics, sometimes, contain mistakes due to
flaws that appear when interviewees state
their age. This increases the values of some
ages to the detriment of those corresponding
to similar ages, which causes distortions in the
death probability series calculated. In order to
avoid this problem, it is necessary to
implement a smoothing procedure for the
original data before employing them.
The smoothing procedure employed for the
original data was the Variate Difference
Method. The National Statistics Institute had
used said method to compile the former
comprehensive mortality tables. A complete
explanation of the application, with vast
bibliography, can be found in the book by G.
Tintner, The Variate Difference Method, 1940,
in the Cowles Commission collection. The
following paragraphs explain the foundations
of the procedure briefly.
The basic hypothesis for the application of the
method is that the series observed is the
additive superimposition of two other series,
one of which expresses the correct value or
the value expected for each age x, and the
other the random distortion that alters the
observed value. In this case, the latter would
be the sum of all the causes and
circumstances that lead to persons stating an
incorrect age.
Therefore, the model is:
yx = ux + ex
where for each age x:
yx is the observed value.
ux is the expected or correct value.
ex is the error or random distortion.
In this application, values ux theoretically
follow a slow trend, without sharp zigzags, and
random errors are supposedly unrelated. This
hypothesis could be smoothed given the noncorrelation of the random errors.
A second essential hypothesis, that has
allowed the implementation of the Variate
Difference Method, consists in supposing that
the expected value ux is simply a grade n
polynomial, when n is an unknown value. The
Variate Difference Method determines the
exact value of n. Subsequently, after obtaining
n, a polynomial for said degree is adjusted to
the data observed yx. In this respect, it is
necessary to mention the existence of a close
relationship between the moving average
method and the variate difference method.
Specifically, M..G. Kendall (A Theorem in
Trend Analysis, Biometrika, vol. 48, 1961.
Advanced Theory of Statistics) has proven
that the moving average method calculations
result from the application of the variate
difference method based on a lineal
combination of some of the successive terms
of the observed values yx. More precisely, all
moving average formulae result in an
adjustment of 2K + 1 successive terms of a
grade p - 1 polynomial, with 2K - p + 1
numbers bi (with p - K < i < K), so that
 k

ûx = y x - p   bi y x+ i 
 i= p-k

where:
p difference of order p.
ûx the estimated value of ux.
bi coefficients of the Sheppard smoothing
formula.
If the expected value ux follows a grade n
polynomial, it is a case of determining the
latter. For this, an iterative process is
implemented to calculate the successive finite
differences. There will evidently be a point in
the process when the expected value u x will
disappear, on cancelling the grade n
polynomial. That is to say, the corresponding
difference will be constant, thus cancelling
subsequent differences. Nevertheless, since
calculations are performed with the observed
values yx, it is necessary to know the moment
at which the expected value has supposedly
been cancelled in this process of successive
finite differences, with only a residue
remaining from the existence of random errors
ex. This question can be answered using the
following consideration: if there is a time series
that only contains a random element, the
variations of the successive series of finite
differences are equal, after correcting them by
multiplying a binomial coefficient given that the
series, which is random, is not ordered in time.
Consequently, the variation of the first and
second differences is the same as in the
original series.
problem to be resolved. Nevertheless, the
number of values included in each average
should be taken respecting the length of the
main cycle that is to be cancelled. In this case,
moving averages have been taken considering
each series of five consecutive observed
values yx.
The aforementioned provides a criterion to
determine when the expected value ux has
disappeared. If a certain difference k is
calculated with variation equal to that of
difference k + 1, and equal to that of K+2, etc.,
it is possible to say that the expected value u x
has been cancelled, taking K-th difference.
Nevertheless, the equation between two
variations is never reached, since there is
always a random variation residue. Yet since
the table uses a probability method it is proven
that the only necessary element is that the
difference between the variation of two
successive series of finite differences is
smaller than three time the standard error of
the lowest difference.
To apply this aspect to the compilation of the
mortality tables, the series of the expected
values always disappears in the first or second
differences. This implies a constant application
of moving averages when smoothing original
series.
After determining the degree of the polynomial
to be adjusted, it is merely a case of applying
the corresponding weighted average to the
coefficients of Sheppard's smoothing formula.
The moving average type is determined as
follows: if the non-random element or
expected value ux is more or less cancelled in
the first or second difference, the table uses n
= 1 or a moving average that is equivalent to
adjusting a straight line to a certain number
(not determined by the method) of consecutive
observed values yx. If the expected value is
cancelled in the third or fourth finite
differences, we will obtain n = 2, and select a
moving average equivalent to adjusting a
second grade parabola to a certain number of
consecutive observed values. If the nonrandom element is cancelled in the fifth or
sixth differences, n = 3, we use a weighted
average equivalent to adjusting a third grade
(cubic) polynomial to a selected number of
consecutive observed values, etc. If the nonrandom element is cancelled in the k-th finite
difference, n = k/2, when k is even, or n = (k
+1)/2, when k is uneven.
As aforementioned, moving averages are
implemented on a specific number of
consecutive observed values, which have
been centred appropriately. Nevertheless, this
number is undetermined. The criterion is open
to the experience and specific nature of the
6
Tables
calculated
information used
and
base
Annually, the mortality tables have been
calculated for the population in Spain and its
Autonomous Communities, for the period
1992-2005. In the case of Ceuta and Melilla,
the tables obtained are for the two cities
together, though since 2002 they have been
presented separately as two autonomous
cities. In all of the geographical areas
considered, tables have been obtained for the
populations of males, females and the total.
The deaths used in the calculation of the
mortality tables (national and Autonomous) for
each year, have been obtained as an average
of the figures registered by age in the Vital
Statistics for each consecutive two-year
period, the reference year and the previous
year, for males, females, and the total
population tables.
The irregularities in these original death
figures, caused by the possible errors
regarding the classification by age, have been
cancelled using the smoothing procedure
explained in the previous section.
The populations that have been used, by
Autonomous Community, sex and simple age,
referring to 1 January of each year,
correspond to the Intercensal population
estimates obtained between the Population
Censuses of 1991 and 2001 and the
Estimates of the current population calculated
from the last census mentioned.
The figures used are published alongside the
biometric functions of the mortality tables
calculated.
Glossary of symbols
Q(X)= Risk or probability of death between the
exact ages of X and X+1.
L(X)= Survivors aged
100,000 initial persons.
exactly X
among
D(X) = Theoretical deaths occurred between
two specific ages X and X+1.
E(X) = Life expectancy at a specific age X.
LL(X) = Survivors aged X years old.
T(X) = Probability of survival among persons
aged X and X+1.
Download