Appendix: Estimating the prevalence of breast cancer using a disease model: data problems and trends The steady state IPM model Assuming a steady state situation and equal other-cause mortality for cases and noncases, prevalence at exact age n, can be calculated from incidence and cause-specific mortality data, using [1]: pn pn1 mn1 in1 * 1 pn1 / 1 mn1 , (1a) where mn-1 is the mortality probability of age group n-1; and in is incidence probability of susceptibles of that age group. The formula gives an exact solution of a continuous time Markov process. The input parameters for this formula are mortality probability at age group n and the incidence probability of susceptibles at that age group. We interpolated incidence and mortality rates from five-year to one-year age groups using the cubic-spline method adopting a mean age of 90.8 for women aged 85 and over that was derived from a life table corresponding to general mortality. We assumed these rates approximate the hazards fairly well. Hazards were then converted to probabilities: probabilit yn 1 EXP(hazard n ) . (1b) The incidence rate among susceptibles was calculated from national incidence hazards using the prevalence of the previous age group, resulting in a slight deviation: IR n , susceptibles IR n, national /(1 p n 1 ) , (1c) where IRn,susceptible is the incidence rate among susceptibles, and IRn,national the incidence rate within the national population. Confidence interval for prevalence data Assuming prevalence data are binomially distributed, their 95 % confidence interval can be calculated from [2]: CI 95 n sin arcsin p n 1.96 1 / 4 N n ^2 , (2) with Nn representing the total number of women in age-group n: N n C n S n . A dynamic IPM model The baseline IPM model is extended to incorporate the parameter y, the number of years prior to the reference year: p n , y prev n 1, y 1 mort n 1 inci n 1 * 1 prev n 1, y 1 / 1 mort n 1 . -1- (3) To correct for the fact that more than 23 years before the year of reference incidence was not measured, incidence, and thus prevalence, was set to zero for y>23. To correct for a one-% trend in incidence, incidence was decreased by one-% for each additional year prior to the reference year (maximum 95): p n, y prev n 1, y 1 mort n 1 inci n 1 * 0.99^ y 1 * 1 prev n 1, y 1 / 1 mort n 1 . (4) Describing survival from breast cancer as lognormally distributed with a proportion cured We estimated excess mortality with breast cancer from duration-specific relative survival data assuming a lognormal survival with a proportion cured [3]. When RSurvn,d is the relative survival in age-group n, d years after incidence, cn is the proportion cured for that age-group, and μn and σn are the parameters of the cumulative lognormal distribution (Logndist), the model can be described as: RSurv n ,d 1 c n * 1 Logndist n , n , d c n . (5a) Fitting this model to the survival data allows the estimation of the cumulative mortality at different years d after incidence. Non-cumulative mortality at exact d years after incidence is calculated by: mortyrn ,d 1 c n * Logndist n , n , d 1 Logndist n , n , d . (5b) A duration specific IPM model Prevalence at age n, d years after incidence can be calculated from the prevalence at the previous age and year, using: p n ,o i n , for d=0, and p n 1,d 1 p n ,d * 1 mortyrn ,d * 1 c n / p n ,d mortyrn ,d * 1 c n (6a) for d>0. Summation across all years d results in an estimate of age-specific prevalence: pn p n,d . (6b) d 0 References 1. JJ Barendregt, GJ Oortmarssen, van, BA Hout, van, JM Bosch, van den, L Bonneux. Coping with multiple morbidity in a life table. Mathematical Population Studies 1998, 7:29-49 2. C Radhakrrishna Rao. Linear statistical inference and its applications, 2nd edn. New York: Wiley, 1973. 3. LE Rutqvist. On the utility of the lognormal model for analysis of breast cancer survival in Sweden 1961-1973. Br J Cancer 1985, 52:875-883 -2-