A method for estimating the agespecific mortality pattern in limited populations of small areas Anastasia Kostaki Athens University of Economics and Business email: kostaki@aueb.gr Byron Kotzamanis University of Thessaly, Greece email: bkotz@prd.uth.gr 1 Need of analytical and reliable mortality data e.g. for providing population projections for construction of complete life tables for calculation of net reproduction rates for Age-specific mortality comparisons for construction of complete multiple decrement tables 2 Problems in mortality data of small population areas A. Limited information Aggregated data or Incomplete data sets B. Misleading information (Data of low quality) Data affected by sources of systematic errors (age misstatements –heaping, under registrations of deaths, etc). C. Unstable documentation (small exposed-torisk population, and low age-specific death counts) 3 A. Limited information Incomplete data sets POSSIBLE WAYS OUT: Fit a parametric model to the existed one-year values in order to produce estimates for the missing values and also smooth the rates, providing closer estimates to the true probabilities underlying the empirical rates. e.g. 1992: Kostaki A.: "A Nine-Parameter Version of the Heligman-Pollard Formula". Mathematical Population Studies, Vol. 3, No. 4, pp. 277-288. or Apply a nonparametric graduation technique to existed one-year values e.g. 2005 Kostaki, A., Peristera P. “ Graduating mortality data using Kernel techniques: Evaluation and comparisons” Journal of Population Research, Vol 22(2), 185-197 . 2009 Kostaki, A., , Moguerza,M.J., Olivares, A., Psarakis, S. Graduating the age-specific fertility pattern using Support Vector Machines. Demographic Research., Vol 20(25) 599-622. 2010 Kostaki, A., , Moguerza,M.J., Olivares, A., Psarakis, “Support Vector Machines as tools for4 Mortality Graduations” to appear in Canadian Studies in Population 38, No. 3–4, pp. 37–58. B. Misleading information (data of low quality) A Typical problem: HEAPING: At age declaration, a preference of the responder to round off the age in multiples of five. Greece, males 1960 deathcounts 1600 1100 600 100 -400 45 50 55 60 65 70 75 80 age 5 A way out: Group death counts in five-year groups with central ages the multiples of five. Form the five-year death rates for these groups. Then apply an expanding technique in order to estimate the correct rates and/or counts. TECHNIQUES FOR EXPANSION: 1991 Kostaki, A.: "The Helignman - Pollard Formula as a Tool for Expanding an Abridged Life Table". Journal of Official Statistics Vol. 7, No 3, pp. 311-323. 2000: Kostaki A., Lanke J. ‘Degrouping mortality data for the elderly” Mathematical Population Studies, Vol. 7(4), pp. 331-341. 2000: Kostaki A. “A relational technique for estimating the age-specific mortality pattern from grouped data” . Mathematical Population Studies, Vol. 9(1), pp.83-95. 2001: Kostaki, A., Panousis, E. “Methods of expanding abridged life tables: Evaluation and Comparisons” Demographic Research, Vol 5(1) pp 1-15. http://www.demographic-research.org/volumes/vol5/1/ These techniques can also be used if the data are provided in five-year age groups 6 C. Small exposed-to-risk populations - low death counts (a sneaky problem) On the contrary of problems of types A and B (incompleteness bad quality), problems of type C is easy for the researcher to neglect, though their impact can lead to seriously misleading results. Let us consider this problem, which is highly actual when we deal with spatial (small area) population analysis or limited population samples. 7 Denote as nDx the observed death count at the age interval [x, x+5) Dx is a random variable binomially distributed with n E(nDx )=Ex .nqx and Var(nDx)= Ex . nqx .npx where Ex is the exposed-to risk population at age x, and nqx is the unknown probability of dying in [x,x+n). 8 Let us now consider the observed death rate, nQx = nDx/Ex Since nDx~Bin , while Ex is large and nqx very small, nDx can also be considered as approximately ~Po , with E(nDx)=Var(nDx)= Ex nqx and also asymptotically normal distributed. Therefore nQx can also be considered as asymptotically normal distributed, with E(nQx)= nqx and Var(nQx)= [nqx (1- nqx)]/Ex 9 Hence, the unknown probability of dying at the age interval [x, x+n), nqx is expected to have a value in the interval: n Qx 3 n Qx *(1 n Qx ) / Ex or simpler n Qx 3 n Qx / Ex 10 What do all these mean in practice? Consider a real example : in Eurytania in Greece, the exposed population at age 10 is E10 = 1888 while the death count at age 10 Thus the observed death rate 5Q10 at age 10 is zero! 5D10=0 . Does it mean that 5q10=0 ? …. Unfortunately not! => We are not able to provide estimates for the value of 5q10. Another example from the same population: The death count at age 55, D55 =6 and the exposed population, E55=1464 Thus Q55=6/1464=> Q55=0,0041 Thus calculating the confidence interval we conclude that: -0,0009<q55<0,0091 !!! Qx can be a highly inaccurate estimator of qx when Dx is small 11 Alternatively a more properly defined CI might be derived from the following: We know that: ( Dx Ex qx )2 2 (1) P 1a 1 a Ex qx (1 qx ) this is equivalent to: 2 2 2 P Ex qx ( Ex z1 a /2 ) Ex qx (2 Dx z1 a /2 ) Dx 0 1 a P (qx ) 0 1 a The inequality in the probability statement is satisfied for those values of qx which lie between the roots qx,1 , qx,2 of the quadratic equation (qx ) 0 12 We therefore have: P qx ,1 qx qx ,2 1 a and [qx,1 , qx,2 ] forms a CI for qx where qx,1 , qx,2 are given by (2 Dx z12a /2 ) z1a /2 [ z12a /2 4 Dx (1 ( Dx / Ex ))]1/2 2 2( Dx z1a /2 ) 2002: Garthwaite, P.H., Jolliffe, I.T., Jones, B., Statistical Inference 2nd 13 edition, Oxford Science Publication Considering the previous example: The count of deaths at the age 55, D55 is 6 and the exposed population, E55 is equal to 1464 Q55=6/1464=> Q55=0,0041 Putting these values in Q55 ± z1-a/2 √Q55*(1-Q55)/Ex we took: -0,0009<q55<0,0091, (95% CI: 0.0008, 0.0074) while now using the above defined CI we take 0.0112) 0,0012<q55<0,0130 , (95% CI: 0.0014, 14 Possible ways out… Use wide age intervals (five-year ones, or ten year ones) Consider wide periods of investigation (up to ten-year periods) Utilize an expansion technique for estimating the unknown age-specific probabilities from grouped data. The later can be a cure for problems of Types A and B too At the outset a technique for estimating the age-specific death counts from data given in age-groups is presented. 15 A technique for estimate age-specific death counts data given in age groups Consider the empirical death count for the five-year age interval [x, x+5), 5Dx x=0, 5, 10,… w-5. Then the five-year death rates , x=0, 5, 10, … are calculated by 5 qx 5 d x / 5 d y y x where the summation is restricted to multiples of five. Obviously the consideration of the exposed-to-risk population of a given age as the sum of deaths after that age is precisely valid when the data concern a closed cohort. However, as it will be demonstrated, this 16 procedure produces excellent results. Let us consider the set of the abridged 5qx-values as calculated before. The next step is to expand them. For that: Let us also consider a set of one-year probabilities, qx(s) (S for Standard) of a standard complete life table. Under the assumption that the force of mortality, μ(x), underlying the target abridged life table is, in each age of the 5-year age interval [x, x+5), a constant multiple of the one underlying the standard life table in the same age interval, μ(S)(x), i.e ( x)n K x * ( x) (S ) 17 the one-year probabilities qx+i , i= 0,1,..., 4, for each age in the five-year age interval can be calculated using qxi 1 (1 q (S ) x i ) 5 Kx (1) where 5 Kx ln(1 5 q x ) 4 (S ) ln(1 q x i ) i 0 An inherent property of the new technique is that its results fulfill the desired relation: 4 1 (1 qx i ) n qx i 1 18 Italy, Males 1990-91 10 9 8 7 6 5 4 3 2 1 Ita ly 1990-91 0 0 10 20 30 40 50 60 70 80 90 100 a ge x 19 Norway males 1951-55 9 8 7 6 5 4 3 2 1 0 N orw a y 1951-55 0 10 20 30 40 50 60 70 80 a ge x 20 Now for the ages that are multiples of five dˆx using will be calculated using dˆx qˆ x 5 d y y 5 while for the rest of ages we calculate dˆ x 4 dˆx i (1 qˆ x j ) qˆ x i 5 d y , j 0 using x 1, 2,3, 4 y x 21 It is interesting to observe that the resulting dˆx fulfill the desirable property 4 ˆ d d x i 5 x i 0 22 number of deaths FIGURE 1: Swedish males, 1976 2000 1800 1600 1400 1200 1000 800 600 400 estimated nr. by age 200 0 60 observed nr. by age 70 80 90 100 110 age 23 number of deaths FIGURE 2: Swedish males, 1990 2000 1800 1600 1400 1200 1000 800 600 400 estimated nr. by age 200 0 60 observed nr. by age 70 80 90 100 110 age 24 number of deaths FIGURE 3: Swedish females, 1949 1800 1600 1400 1200 1000 800 600 estimated nr. by age 400 200 0 60 observed nr. by age 70 80 90 100 110 age 25 number of deaths FIGURE 4: Swedish females, 1984 1800 1600 1400 1200 1000 800 600 estimated nr. by age 400 200 0 60 observed nr. by age 70 80 90 100 110 age 26 Comments and guidelines The results are nice in the sense that they are very close to the true values and also fulfill desirable properties. The choice of the standard table does not affect the results . Use every complete life table you have at hand! Easiest to apply using a simplest software 27 Thank you! 28 References 2002: Garthwaite, P.H., Jolliffe, I.T., Jones, B., Statistical Inference 2nd edition, Oxford Science Publications 2001: Karlis D., Kostaki A. “Bootstrap techniques for mortality models” Biometrical Journal, Vol. 44(7) pp 850-866. 1992: Kostaki A.: "A Nine-Parameter Version of the Heligman-Pollard Formula". Mathematical Population Studies, Vol. 3, No. 4, pp. 277-288. 2005 Kostaki, A., Peristera P. “ Graduating mortality data using Kernel techniques: Evaluation and comparisons” Journal of Population Research, Vol 22(2), 185-197 . 2009 Kostaki, A., , Moguerza,M.J., Olivares, A., Psarakis, S. Graduating the agespecific fertility pattern using Support Vector Machines. Demographic Research., Vol 20(25) 599-622. 2010 Kostaki, A., , Moguerza,M.J., Olivares, A., Psarakis, “Support Vector Machines as tools for Mortality Graduations” to appear in Canadian Studies in Population 38, No. 3–4, pp. 37–58. 29