Supplementary Material Interpreting sero-epidemiological studies for influenza in a context of non-bracketing sera Contents 1 Model for estimating cumulative incidence of infection ................................................... 2 1.1 First level: Distribution of antibody titers level before the epidemic .................................... 2 1.2 Second level: The probability of infection between two consecutive titer measurements .. 2 1.3 Third level: the antibody titer boosting after infection or waning without infection ........... 3 1.4 Inference ................................................................................................................................ 3 1.4.1 Likelihood function .............................................................................................................. 3 1.4.2 Priors ..................................................................................................................................... 4 1.4.3 Algorithms............................................................................................................................. 4 2 Model validation ................................................................................................................ 5 3 References ......................................................................................................................... 5 1 Model for estimating cumulative incidence of infection For every individual, we introduce a vector (π΄π(π‘0 ), π΄π(π‘1 ), π΄π(π‘2 ), π΄π(π‘3 ), π¦(π‘1 ), π¦(π‘2 ), π¦(π‘3 ), π), where AT(t i ) was the HAI titer level for the serum drawn at time t i in round i and a was the age group of the participant. We denote age group 1 as children (age smaller than or equal to 18), age group 2 as adults (age between 19 to 50) and age group 3 as older adults (age greater than 50). In some cases, π΄π(. ) may be missing. We denoted y(t i ) = 1 if infection occurred in (t i−1 , t i ]. We build a 3-level Bayesian hierarchical model that corporates the information on serology data and surveillance data to have reliable estimate of infection risk in a given epidemic. Details for each level are as follows. 1.1 First level: Distribution of antibody titers level before the epidemic This level described the distribution of the antibody titer level in the first round, denoted by P(AT(t 0 )). We used multinomial distributions to model the pre-epidemic titer level. P(AT(t 0 ) = 2π ∗ 5) = πΌ0π for k = 0,1,2,…,9. We used two multinomial distributions for children and adults to account for the potential differences of pre-epidemic titer levels for them. AT(t 0 ) was only observed for a subset of participants. Hence, we considered AT(t 0 ) as an augmented data when it was unavailable. 1.2 Second level: The probability of infection between two consecutive titer measurements This level described the risk of infection during the epidemic, by using the influenza proxy to approximate influenza activity. Details were given in the surveillance data section in main text. A 2week delay between infection and rise in antibody titers was assumed. The hazard of infection at time t is π(π‘|π) = ππ,π ∗ ππ‘ where ππ,π is the scaling factor for the influenza risk of infection for the age group a and ππ‘ is the influenza activity proxy at time t based on local surveillance data for H1N1pdm09. Hence, the probability of infection in time period (t j−1 , π‘π ), P(y(t j ) = 1|π) is π‘π P(y(t j ) = 1|π) = 1 − expβ‘{− ∑ π(π‘|π)} π‘=π‘π−1 If participants were infected in time period (t 0 , π‘1 ), they were no longer at risk of infection after t1 . 1.3 Third level: the antibody titer boosting after infection or waning without infection This level described the antibody titer boosting after infection or waning without infection. Consider the first pair of sera (AT(t 0 ), π΄π(π‘1 )). We used multinomial distributions to model the boosting distribution after infection and waning distribution without infection. Since we defined infection by using 4-fold rise in paired sera, we assume there would be no waning in the paired sera that have greater than or equal to 4-fold rise. For boosting distribution, we model AT(t1 )|π΄π(π‘0 ), π¦(π‘1 ) = 1 as follows, AT(t1 ) P( = 2π |π΄π(π‘0 ), π¦(π‘1 ) = 1) = πΎπ π΄π(π‘0 ) for k = 2,3,4,..9. For waning distribution, we model AT(t1 )|π΄π(π‘0 ), π¦(π‘1 ) = 0 as follows, P( AT(t1 ) = 2π |π΄π(π‘0 ), π¦(π‘1 ) = 0) = πΏπ π΄π(π‘0 ) for k = -9,-8,…0,1. The same boosting and waning distributions applied for subsequent pairs of sera We used different multinomial distributions for children and adults to account for the potential differences of boosting/waning distributions between age groups. One issue concerned the censoring in boosting or waning due to the fact we only have 10 levels for the observed titer, namely <1:10, 1:10, 1:20, …, 1:1280 and ≥1:2560. Consider an example that antibody titers drop from 1:40 to <1:10, then we only know there was a >4-fold drop but we cannot quantify the exact fold of the drop. Here, we considered these exact missing values as augmented data with the restriction drop was greater than 4 fold. In our estimation, we would impute them by assigning them from 2−3 , 2−4 , … 2−9 with probability πΏ3 , πΏ4 , … πΏ9 . 1.4 Inference Estimation of parameters is performed in a Bayesian framework. Data augmentation was used to account for missing antibody titer levels for some participants. 1.4.1 Likelihood function For a given individual, we have the following contribution to the likelihood: P(π΄π(π‘0 ), π΄π(π‘1 ), π΄π(π‘2 ), π΄π(π‘3 ), π¦(π‘1 ), π¦(π‘2 ), π¦(π‘3 )|π) 3 ∝ P(AT(t 0 )) ∏{π(π¦π |π΄π(π‘π−1 ))π(π΄π(π‘π )|yi , π΄π(π‘π−1 ))} 1 Each component was modeled as mentioned above. 1.4.2 Priors For all parameters, we use a vague Uniform(0,1000) prior. 1.4.3 Algorithms To impute the missing antibody titer levels for some participants, we use the information of age, available antibody titer level and the surveillance data to impute them. We consider the following cases: Case 1: if the pre-epidemic antibody titer level π΄π(π‘0 ) was missing but π΄π(π‘1 ) was available The joint posterior distribution of the pre-epidemic antibody titer level and the parameter would be P(AT(t 0 ), π|π΄π(π‘1 ), π¦(π‘1 ), π) ∝ π(π΄π(π‘1 )|π΄π(π‘0 ), π¦(π‘1 ), π, π) ∗ π(π¦(π‘1 )|π΄π(π‘0 ), π, π) ∗ π(π΄π(π‘0 )|π, π) ∗ P(θ) Hence at each MCMC step, we could sample the pre-epidemic antibody titers level for each individual based on their corresponding posterior distribution of the pre-epidemic antibody titer level. From that, we could compute the unscaled posterior density for every titer levels (from <1:10 to 1:2560) and do imputation. Case 2: if antibody titer level observed during epidemic π΄π(π‘π ) was missing but π΄π(π‘π−1 ), π΄π(π‘π+1 )were available, where k>0. The joint posterior distribution of the missing antibody titer level and the parameter would be P(AT(t k ), π|π΄π(π‘π−1 ), π΄π(π‘π+1 ), π¦(π‘π ), π) ∝ π(π΄π(π‘π+1 )|π΄π(π‘π ), π¦(π‘π+1 ), π, π) ∗ π(π¦(π‘π+1 )|π΄π(π‘π ), π, π) ∗ π(π΄π(π‘π )|π΄π(π‘π−1 ), π¦(π‘π+1 ), π, π) ∗ π(π¦(π‘π )|π΄π(π‘π−1 ), π, π) ∗ P(θ) Hence we could impute the missing antibody titer level based on the same method mentioned in Case 1. We only imputed the missing antibody up to the last available sera since there was no information after the time of having that serum. For example, for a given person, if only π΄π(π‘0 ), π΄π(π‘2 ) was available, then only π΄π(π‘1 ) were imputed, but not π΄π(π‘3 ), since there was no information in (π‘2 , π‘3 ). If only pre-epidemic titers are available, then they will only contribute to the estimation of preepidemic titer distribution only. After imputation on missing, we update the parameters by using random walk Metropolis-Hastings update [1]. The algorithm runs for 15000 iterations after a burn-in of 5000 iterations. Converge is visually assessed. One run takes about 90 minutes on a desktop. 2 Model validation To assess the performance of our estimation procedure, we performed a simulation study. We simulated 50 data sets with a structure identical to that of the observed data (in terms of age, time of serum drawing and the availability of serum in each round) and simulated the infection with parameters equal to their posterior median. Then we applied our proposed estimation procedure to the simulated data sets and tested if the parameters could be estimated. The result of simulation was summarized in Table S5. We found that in general the true value was in the credible interval in about 86% to 98% of times for the 6 parameters for the cumulative incidence infection. It suggested that our estimation procedure would be able to provide reasonable estimates of the parameter. 3 References 1. Gilks WR, Richardson S, Spiegelhalter D (1996) Markov Chain Monte Carlo in Practice. London: Chapman & Hall. FIGURE LEGENDS eFigure 1. Model fit for models with different change points. Model fit was measured by difference between observed number of 4-fold rise and expected number of 4-fold rise predicted from the model. X-axis is in log10 scale and dotted lines represented the threshold of chi-squared statistic with p-value <0.95. The models with chi-squared statistic under the threshold were considered to be adequate. eTable 1. Observed number of 4-fold rise and estimated cumulative incidence of infection for H1N1pdm09 epidemic from the main model Round 1+2 Round 2+3 Round 3+4 Round 2+41 246 698 676 1126 04/15/09 (04/02/09, 10/31/09 (08/29/09, 04/24/10 (04/16/10, 11/14/09 (09/18/09, 04/29/09) 01/29/10) 05/15/10) 02/20/10) 09/26/09 (08/29/09, 04/30/10 (04/16/10, 11/06/10 (07/22/10, 11/13/10 (09/25/10, 10/17/09) 05/15/10) 12/11/10) 12/11/10) 0-18 yrs 17/74 (23%) 46/213 (22%) 14/208 (7%) 81/378 (21%) 19-50 yrs 10/150 (7%) 45/435 (10%) 17/417 (4%) 38/649 (6%) >50 yrs 0/22 (0%) 5/50 (10%) 2/51 (4%) 6/99 (6%) Number of pairs Collection period (median, range) First serum Second serum Proportion of 4-fold rise Predicted number of 4-fold rise based on the fitted model 0-18 yrs 15 (8, 23) 44 (32, 57) 13 (6, 20) 82 (63, 103) 19-50 yrs 10 (5, 18) 37 (25, 50) 12 (5, 19) 47 (32, 64) >50 yrs 1 (0, 3) 3 (0, 8) 1 (0, 4) 7 (2, 14) 1only participants with missing serum in Round 3 were included in this group eTable 2. Sensitivity analysis on the results from the assumption on days from infection to sero-conversion Days from infection to sero-conversion 10 days Model fit Model with change points at 14 days 21 days Chi-square statistics p-value Chi-square statistics p-value Chi-square statistics p-value 8.5 0.74 8.9 0.71 9.3 0.67 2009/11/21 Cumulative incidence of infection for H1N1pdm09 epidemic 0-18 yrs 0.47 (0.42, 0.51) 0.46 (0.41, 0.51) 0.48 (0.43, 0.52) 19-50 yrs 0.17 (0.14, 0.21) 0.17 (0.13, 0.2) 0.18 (0.14, 0.21) >50 yrs 0.12 (0.07, 0.18) 0.12 (0.06, 0.18) 0.12 (0.07, 0.19) eTable 3. Estimated pre-epidemic antibody distribution Children Adults <1:10 86.3% (81%, 90.5%) 88.6% (84.6%, 91.9%) 1:10 3.3% (1.5%, 6.4%) 1.9% (0.9%, 3.6%) 1:20 3.6% (1.6%, 6.5%) 2.5% (1.3%, 4.1%) 1:40 1.4% (0%, 5.2%) 2.3% (0.7%, 4.9%) 1:80 0.7% (0%, 4.1%) 1.2% (0.1%, 4%) 1:160 0.5% (0%, 4%) 2% (0.3%, 4.4%) 1:320 1.8% (0%, 4.7%) 0.6% (0%, 2.1%) 1:640 0.2% (0%, 3.3%) 0% (0%, 0.5%) 1:1280 0.1% (0%, 1.2%) 0% (0%, 0.4%) 1:2560 0.1% (0%, 0.7%) 0% (0%, 0.3%) eTable 4. Estimated distribution for antibody waning for 6 months Fold-drop Children Adults 0.5 4.2% (3.1%, 5.6%) 1.2% (0.8%, 1.7%) 1 48.5% (43.6%, 53.0%) 37.3% (31.8%, 41.1%) 2 36.5% (31.5%, 41.0%) 29.0% (24.5%, 34.4%) 4 4.6% (2.5%, 7.4%) 6.1% (3.2%, 9.2%) 8 0.9% (0.2%, 2.6%) 5.9% (4.3%, 10.4%) 16 0.4% (0.0%, 2.0%) 4.8% (3.8%, 6.9%) 32 0.1% (0.0%, 0.9%) 4.3% (3.3%, 5.6%) 64 2.0% (1.0%, 4.2%) 3.8% (2.7%, 4.9%) 128 1.2% (0.5%, 2.2%) 3.2% (2.0%, 4.2%) 256 0.7% (0.1%, 1.4%) 2.5% (1.2%, 3.6%) 512 0.3% (0.0%, 1.0%) 1.7% (0.4%, 3.0%) eTable 5. Simulation study for the model for estimating the cumulative incidence of infection True value Number of credible intervals that cover the true value Scale parameter of children before the 0.737 43 0.231 48 0.130 49 3.059 45 0.784 48 0.737 49 change point Scale parameter of adults before the change point Scale parameter of older adults before the change point Scale parameter of children after the change point Scale parameter of adults after the change point Scale parameter of older adults after the change point 12