A method for estimating the age-specific mortality pattern in limited

advertisement
A method for estimating the agespecific mortality pattern in
limited populations of small areas
Anastasia Kostaki
Athens University of Economics and Business
email: kostaki@aueb.gr
Byron Kotzamanis
University of Thessaly, Greece
email: bkotz@prd.uth.gr
1
Need of analytical and reliable mortality data
e.g.
 for providing population projections




for construction of complete life tables
for calculation of net reproduction rates
for Age-specific mortality comparisons
for construction of complete multiple decrement
tables
2
Problems in mortality data of small
population areas
A. Limited information


Aggregated data or
Incomplete data sets
B. Misleading information (Data of low quality)

Data affected by sources of systematic errors (age
misstatements –heaping, under registrations of deaths, etc).
C. Unstable documentation (small exposed-torisk population, and low age-specific death
counts)
3
A. Limited information
Incomplete data sets
POSSIBLE WAYS OUT:
Fit a parametric model to the existed one-year values in order to
produce estimates for the missing values and also smooth the
rates, providing closer estimates to the true probabilities
underlying the empirical rates.

e.g. 1992: Kostaki A.: "A Nine-Parameter Version of the Heligman-Pollard Formula". Mathematical
Population Studies, Vol. 3, No. 4, pp. 277-288.
or
Apply a nonparametric graduation technique to existed one-year
values

e.g.
2005 Kostaki, A., Peristera P. “ Graduating mortality data using Kernel techniques: Evaluation and
comparisons” Journal of Population Research, Vol 22(2), 185-197 .
2009 Kostaki, A., , Moguerza,M.J., Olivares, A., Psarakis, S. Graduating the age-specific fertility
pattern using Support Vector Machines. Demographic Research., Vol 20(25) 599-622.
2010 Kostaki, A., , Moguerza,M.J., Olivares, A., Psarakis, “Support Vector Machines as tools for4
Mortality Graduations” to appear in Canadian Studies in Population 38, No. 3–4, pp. 37–58.
B. Misleading information
(data of low quality)
A Typical problem:
HEAPING: At age declaration, a preference of the responder to
round off the age in multiples of five.
Greece, males 1960
deathcounts
1600
1100
600
100
-400
45
50
55
60
65
70
75
80
age
5
A way out:
Group death counts in five-year groups with central ages the multiples of
five. Form the five-year death rates for these groups. Then apply an expanding
technique in order to estimate the correct rates and/or counts.
TECHNIQUES FOR EXPANSION:




1991 Kostaki, A.: "The Helignman - Pollard Formula as a Tool for Expanding an
Abridged Life Table". Journal of Official Statistics Vol. 7, No 3, pp. 311-323.
2000: Kostaki A., Lanke J. ‘Degrouping mortality data for the elderly” Mathematical
Population Studies, Vol. 7(4), pp. 331-341.
2000: Kostaki A. “A relational technique for estimating the age-specific mortality
pattern from grouped data” . Mathematical Population Studies, Vol. 9(1), pp.83-95.
2001: Kostaki, A., Panousis, E. “Methods of expanding abridged life tables:
Evaluation and Comparisons” Demographic Research, Vol 5(1) pp 1-15.
http://www.demographic-research.org/volumes/vol5/1/
These techniques can also be used if the data are provided in
five-year age groups
6
C. Small exposed-to-risk populations - low death
counts
(a sneaky problem)
On the contrary of problems of types A and B
(incompleteness bad quality), problems of type C is
easy for the researcher to neglect, though their impact
can lead to seriously misleading results.
Let us consider this problem, which is highly actual
when we deal with spatial (small area) population
analysis or limited population samples.
7
Denote as nDx the observed death count at the age
interval [x, x+5)
Dx is a random variable binomially distributed with
n
E(nDx )=Ex .nqx and Var(nDx)= Ex . nqx .npx
where Ex is the exposed-to risk population at age x,
and nqx is the unknown probability of dying in [x,x+n).
8
Let us now consider the observed death rate,
nQx
= nDx/Ex
Since nDx~Bin , while Ex is large and nqx very small,
nDx can also be considered as approximately ~Po ,
with
E(nDx)=Var(nDx)= Ex nqx
and also asymptotically normal distributed. Therefore nQx
can also be considered as asymptotically normal distributed,
with
E(nQx)= nqx and Var(nQx)= [nqx (1- nqx)]/Ex
9
Hence, the unknown probability of dying at the age interval
[x, x+n), nqx is expected to have a value in the interval:
n
Qx  3
n
Qx *(1  n Qx ) / Ex
or simpler
n
Qx  3
n
Qx / Ex
10
What do all these mean in practice?
Consider a real example : in Eurytania in Greece, the exposed
population at age 10 is E10 = 1888 while the death count at age 10
Thus the observed death rate 5Q10 at age 10 is zero!
5D10=0 .
Does it mean that 5q10=0 ? …. Unfortunately not!
=> We are not able to provide estimates for the value of 5q10.
Another example from the same population:
The death count at age 55, D55 =6 and the exposed population, E55=1464
Thus Q55=6/1464=> Q55=0,0041
Thus calculating the confidence interval we conclude that:
-0,0009<q55<0,0091 !!!

Qx can be a highly inaccurate estimator of qx when Dx is
small
11
Alternatively a more properly defined CI might be derived
from the following:
We know that:
 ( Dx  Ex  qx )2
2 (1) 
P
 1a   1  a
 Ex  qx  (1  qx )

this is equivalent to:
2
2
2

P  Ex qx ( Ex  z1 a /2 )  Ex qx (2 Dx  z1 a /2 )  Dx  0   1  a 
 P  (qx )  0  1  a
The inequality in the probability statement is satisfied for
those values of qx which lie between the roots qx,1 , qx,2
of the quadratic equation  (qx )  0
12
We therefore have:
P  qx ,1  qx  qx ,2   1  a
and [qx,1 , qx,2 ] forms a CI for qx where qx,1 , qx,2 are
given by
(2 Dx  z12a /2 )  z1a /2 [ z12a /2  4 Dx (1  ( Dx / Ex ))]1/2
2
2( Dx  z1a /2 )
2002: Garthwaite, P.H., Jolliffe, I.T., Jones, B., Statistical Inference 2nd
13
edition, Oxford Science Publication
Considering the previous example:
The count of deaths at the age 55, D55 is 6 and the
exposed population, E55 is equal to 1464
Q55=6/1464=> Q55=0,0041
Putting these values in Q55 ± z1-a/2 √Q55*(1-Q55)/Ex
we took: -0,0009<q55<0,0091, (95% CI: 0.0008,
0.0074)
while now using the above defined CI
we take
0.0112)
0,0012<q55<0,0130 , (95% CI: 0.0014,
14
Possible ways out…

Use wide age intervals (five-year ones, or ten year ones)

Consider wide periods of investigation (up to ten-year periods)

Utilize an expansion technique for estimating the unknown
age-specific probabilities from grouped data.
The later can be a cure for problems of Types A and B too
At the outset a technique for estimating the age-specific
death counts from data given in age-groups is presented.
15
A technique for estimate age-specific death
counts data given in age groups

Consider the empirical death count for the five-year age
interval [x, x+5), 5Dx

x=0, 5, 10,… w-5.
Then the five-year death rates , x=0, 5, 10, … are calculated
by
5
qx  5 d x /  5 d y
y x
where the summation is restricted to multiples of five.
Obviously the consideration of the exposed-to-risk population of a given
age as the sum of deaths after that age is precisely valid when the data
concern a closed cohort. However, as it will be demonstrated, this 16
procedure produces excellent results.

Let us consider the set of the abridged 5qx-values as calculated
before. The next step is to expand them. For that:

Let us also consider a set of one-year probabilities, qx(s) (S for Standard)
of a standard complete life table.
Under the assumption that the force of mortality, μ(x), underlying the
target abridged life table is, in each age of the 5-year age interval [x,
x+5), a constant multiple of the one underlying the standard life table
in the same age interval, μ(S)(x), i.e
 ( x)n K x *  ( x)
(S )
17
the one-year probabilities qx+i , i= 0,1,..., 4, for each age in the
five-year age interval can be calculated using
qxi  1  (1  q
(S )
x i
)
5 Kx
(1)
where
5
Kx 
ln(1  5 q x )
4
(S )
ln(1

q

x i )
i 0
An inherent property of the new technique is that its results fulfill the desired
relation:
4
1   (1  qx i )  n qx
i 1
18
Italy, Males 1990-91
10
9
8
7
6
5
4
3
2
1
Ita ly 1990-91
0
0
10
20
30
40
50
60
70
80
90
100
a ge x
19
Norway males 1951-55
9
8
7
6
5
4
3
2
1
0
N orw a y 1951-55
0
10
20
30
40
50
60
70
80
a ge x
20
Now for the ages that are multiples of five dˆx using
will be calculated using
dˆx  qˆ x   5 d y
y 5
while for the rest of ages we calculate dˆ
x
4
dˆx i   (1  qˆ x  j ) qˆ x i   5 d y ,
j 0
using
x  1, 2,3, 4
y x
21
It is interesting to observe that the resulting dˆx
fulfill the desirable property
4
ˆ  d
d
 x i 5 x
i 0
22
number of deaths
FIGURE 1: Swedish males, 1976
2000
1800
1600
1400
1200
1000
800
600
400
estimated nr.
by age
200
0
60
observed nr.
by age
70
80
90
100
110
age
23
number of deaths
FIGURE 2: Swedish males, 1990
2000
1800
1600
1400
1200
1000
800
600
400
estimated nr.
by age
200
0
60
observed nr.
by age
70
80
90
100
110
age
24
number of deaths
FIGURE 3: Swedish females, 1949
1800
1600
1400
1200
1000
800
600
estimated nr.
by age
400
200
0
60
observed nr.
by age
70
80
90
100
110
age
25
number of deaths
FIGURE 4: Swedish females, 1984
1800
1600
1400
1200
1000
800
600
estimated nr.
by age
400
200
0
60
observed nr.
by age
70
80
90
100
110
age
26
Comments and guidelines



The results are nice in the sense that they are very close to the
true values and also fulfill desirable properties.
The choice of the standard table does not affect the results . Use
every complete life table you have at hand!
Easiest to apply using a simplest software
27
Thank you!
28
References
2002: Garthwaite, P.H., Jolliffe, I.T., Jones, B., Statistical Inference 2nd edition, Oxford
Science Publications
2001: Karlis D., Kostaki A. “Bootstrap techniques for mortality models” Biometrical
Journal, Vol. 44(7) pp 850-866.
1992: Kostaki A.: "A Nine-Parameter Version of the Heligman-Pollard Formula".
Mathematical Population Studies, Vol. 3, No. 4, pp. 277-288.
2005 Kostaki, A., Peristera P. “ Graduating mortality data using Kernel techniques:
Evaluation and comparisons” Journal of Population Research, Vol 22(2), 185-197 .
2009 Kostaki, A., , Moguerza,M.J., Olivares, A., Psarakis, S. Graduating the agespecific fertility pattern using Support Vector Machines. Demographic Research.,
Vol 20(25) 599-622.
2010 Kostaki, A., , Moguerza,M.J., Olivares, A., Psarakis, “Support Vector Machines
as tools for Mortality Graduations” to appear in Canadian Studies in Population 38,
No. 3–4, pp. 37–58.
29
Download