Supplementary Material

advertisement
Supplementary Material
Interpreting sero-epidemiological studies for influenza in a context of non-bracketing sera
Contents
1
Model for estimating cumulative incidence of infection ................................................... 2
1.1
First level: Distribution of antibody titers level before the epidemic .................................... 2
1.2
Second level: The probability of infection between two consecutive titer measurements .. 2
1.3
Third level: the antibody titer boosting after infection or waning without infection ........... 3
1.4
Inference ................................................................................................................................ 3
1.4.1
Likelihood function .............................................................................................................. 3
1.4.2
Priors ..................................................................................................................................... 4
1.4.3
Algorithms............................................................................................................................. 4
2
Model validation ................................................................................................................ 5
3
References ......................................................................................................................... 5
1 Model for estimating cumulative incidence of infection
For every individual, we introduce a vector (𝐴𝑇(𝑑0 ), 𝐴𝑇(𝑑1 ), 𝐴𝑇(𝑑2 ), 𝐴𝑇(𝑑3 ), 𝑦(𝑑1 ), 𝑦(𝑑2 ), 𝑦(𝑑3 ), π‘Ž),
where AT(t i ) was the HAI titer level for the serum drawn at time t i in round i and a was the age
group of the participant. We denote age group 1 as children (age smaller than or equal to 18), age
group 2 as adults (age between 19 to 50) and age group 3 as older adults (age greater than 50). In
some cases, 𝐴𝑇(. ) may be missing. We denoted y(t i ) = 1 if infection occurred in (t i−1 , t i ].
We build a 3-level Bayesian hierarchical model that corporates the information on serology data
and surveillance data to have reliable estimate of infection risk in a given epidemic. Details for each
level are as follows.
1.1 First level: Distribution of antibody titers level before the epidemic
This level described the distribution of the antibody titer level in the first round, denoted by
P(AT(t 0 )).
We used multinomial distributions to model the pre-epidemic titer level.
P(AT(t 0 ) = 2π‘˜ ∗ 5) = 𝛼0π‘˜
for k = 0,1,2,…,9. We used two multinomial distributions for children and adults to account for the
potential differences of pre-epidemic titer levels for them.
AT(t 0 ) was only observed for a subset of participants. Hence, we considered AT(t 0 ) as an
augmented data when it was unavailable.
1.2 Second level: The probability of infection between two consecutive titer
measurements
This level described the risk of infection during the epidemic, by using the influenza proxy to
approximate influenza activity. Details were given in the surveillance data section in main text. A 2week delay between infection and rise in antibody titers was assumed. The hazard of infection at
time t is
πœ†(𝑑|π‘Ž) = πœ“π‘,π‘Ž ∗ 𝑃𝑑
where πœ“π‘,π‘Ž is the scaling factor for the influenza risk of infection for the age group a and 𝑃𝑑 is the
influenza activity proxy at time t based on local surveillance data for H1N1pdm09. Hence, the
probability of infection in time period (t j−1 , 𝑑𝑗 ), P(y(t j ) = 1|π‘Ž) is
𝑑𝑗
P(y(t j ) = 1|π‘Ž) = 1 − exp⁑{− ∑ πœ†(𝑑|π‘Ž)}
𝑑=𝑑𝑗−1
If participants were infected in time period (t 0 , 𝑑1 ), they were no longer at risk of infection after t1 .
1.3 Third level: the antibody titer boosting after infection or waning without
infection
This level described the antibody titer boosting after infection or waning without infection.
Consider the first pair of sera (AT(t 0 ), 𝐴𝑇(𝑑1 )). We used multinomial distributions to model the
boosting distribution after infection and waning distribution without infection. Since we defined
infection by using 4-fold rise in paired sera, we assume there would be no waning in the paired sera
that have greater than or equal to 4-fold rise.
For boosting distribution, we model AT(t1 )|𝐴𝑇(𝑑0 ), 𝑦(𝑑1 ) = 1 as follows,
AT(t1 )
P(
= 2π‘˜ |𝐴𝑇(𝑑0 ), 𝑦(𝑑1 ) = 1) = π›Ύπ‘˜
𝐴𝑇(𝑑0 )
for k = 2,3,4,..9.
For waning distribution, we model AT(t1 )|𝐴𝑇(𝑑0 ), 𝑦(𝑑1 ) = 0 as follows,
P(
AT(t1 )
= 2π‘˜ |𝐴𝑇(𝑑0 ), 𝑦(𝑑1 ) = 0) = π›Ώπ‘˜
𝐴𝑇(𝑑0 )
for k = -9,-8,…0,1.
The same boosting and waning distributions applied for subsequent pairs of sera
We used different multinomial distributions for children and adults to account for the potential
differences of boosting/waning distributions between age groups.
One issue concerned the censoring in boosting or waning due to the fact we only have 10 levels for
the observed titer, namely <1:10, 1:10, 1:20, …, 1:1280 and ≥1:2560.
Consider an example that antibody titers drop from 1:40 to <1:10, then we only know there was
a >4-fold drop but we cannot quantify the exact fold of the drop. Here, we considered these exact
missing values as augmented data with the restriction drop was greater than 4 fold. In our
estimation, we would impute them by assigning them from 2−3 , 2−4 , … 2−9 with probability
𝛿3 , 𝛿4 , … 𝛿9 .
1.4 Inference
Estimation of parameters is performed in a Bayesian framework. Data augmentation was used to
account for missing antibody titer levels for some participants.
1.4.1
Likelihood function
For a given individual, we have the following contribution to the likelihood:
P(𝐴𝑇(𝑑0 ), 𝐴𝑇(𝑑1 ), 𝐴𝑇(𝑑2 ), 𝐴𝑇(𝑑3 ), 𝑦(𝑑1 ), 𝑦(𝑑2 ), 𝑦(𝑑3 )|π‘Ž)
3
∝ P(AT(t 0 )) ∏{𝑃(𝑦𝑖 |𝐴𝑇(𝑑𝑖−1 ))𝑃(𝐴𝑇(𝑑𝑖 )|yi , 𝐴𝑇(𝑑𝑖−1 ))}
1
Each component was modeled as mentioned above.
1.4.2
Priors
For all parameters, we use a vague Uniform(0,1000) prior.
1.4.3
Algorithms
To impute the missing antibody titer levels for some participants, we use the information of age,
available antibody titer level and the surveillance data to impute them. We consider the following
cases:
Case 1: if the pre-epidemic antibody titer level 𝐴𝑇(𝑑0 ) was missing but 𝐴𝑇(𝑑1 ) was available
The joint posterior distribution of the pre-epidemic antibody titer level and the parameter would
be
P(AT(t 0 ), πœƒ|𝐴𝑇(𝑑1 ), 𝑦(𝑑1 ), π‘Ž)
∝ 𝑃(𝐴𝑇(𝑑1 )|𝐴𝑇(𝑑0 ), 𝑦(𝑑1 ), πœƒ, π‘Ž) ∗ 𝑃(𝑦(𝑑1 )|𝐴𝑇(𝑑0 ), πœƒ, π‘Ž) ∗ 𝑃(𝐴𝑇(𝑑0 )|πœƒ, π‘Ž) ∗ P(θ)
Hence at each MCMC step, we could sample the pre-epidemic antibody titers level for each
individual based on their corresponding posterior distribution of the pre-epidemic antibody titer
level. From that, we could compute the unscaled posterior density for every titer levels (from <1:10
to 1:2560) and do imputation.
Case 2: if antibody titer level observed during epidemic 𝐴𝑇(π‘‘π‘˜ ) was missing but
𝐴𝑇(π‘‘π‘˜−1 ), 𝐴𝑇(π‘‘π‘˜+1 )were available, where k>0. The joint posterior distribution of the missing
antibody titer level and the parameter would be
P(AT(t k ), πœƒ|𝐴𝑇(π‘‘π‘˜−1 ), 𝐴𝑇(π‘‘π‘˜+1 ), 𝑦(π‘‘π‘˜ ), π‘Ž)
∝ 𝑃(𝐴𝑇(π‘‘π‘˜+1 )|𝐴𝑇(π‘‘π‘˜ ), 𝑦(π‘‘π‘˜+1 ), πœƒ, π‘Ž) ∗ 𝑃(𝑦(π‘‘π‘˜+1 )|𝐴𝑇(π‘‘π‘˜ ), πœƒ, π‘Ž)
∗ 𝑃(𝐴𝑇(π‘‘π‘˜ )|𝐴𝑇(π‘‘π‘˜−1 ), 𝑦(π‘‘π‘˜+1 ), πœƒ, π‘Ž) ∗ 𝑃(𝑦(π‘‘π‘˜ )|𝐴𝑇(π‘‘π‘˜−1 ), πœƒ, π‘Ž) ∗ P(θ)
Hence we could impute the missing antibody titer level based on the same method mentioned in
Case 1.
We only imputed the missing antibody up to the last available sera since there was no information
after the time of having that serum. For example, for a given person, if only 𝐴𝑇(𝑑0 ), 𝐴𝑇(𝑑2 ) was
available, then only 𝐴𝑇(𝑑1 ) were imputed, but not 𝐴𝑇(𝑑3 ), since there was no information in (𝑑2 , 𝑑3 ).
If only pre-epidemic titers are available, then they will only contribute to the estimation of preepidemic titer distribution only.
After imputation on missing, we update the parameters by using random walk Metropolis-Hastings
update [1].
The algorithm runs for 15000 iterations after a burn-in of 5000 iterations. Converge is visually
assessed. One run takes about 90 minutes on a desktop.
2 Model validation
To assess the performance of our estimation procedure, we performed a simulation study. We
simulated 50 data sets with a structure identical to that of the observed data (in terms of age, time
of serum drawing and the availability of serum in each round) and simulated the infection with
parameters equal to their posterior median. Then we applied our proposed estimation procedure to
the simulated data sets and tested if the parameters could be estimated.
The result of simulation was summarized in Table S5. We found that in general the true value was
in the credible interval in about 86% to 98% of times for the 6 parameters for the cumulative
incidence infection. It suggested that our estimation procedure would be able to provide
reasonable estimates of the parameter.
3 References
1. Gilks WR, Richardson S, Spiegelhalter D (1996) Markov Chain Monte Carlo in Practice. London:
Chapman & Hall.
FIGURE LEGENDS
eFigure 1. Model fit for models with different change points. Model fit was measured by difference
between observed number of 4-fold rise and expected number of 4-fold rise predicted from the
model. X-axis is in log10 scale and dotted lines represented the threshold of chi-squared statistic
with p-value <0.95. The models with chi-squared statistic under the threshold were considered to
be adequate.
eTable 1. Observed number of 4-fold rise and estimated cumulative incidence of infection for H1N1pdm09 epidemic from the main model
Round 1+2
Round 2+3
Round 3+4
Round 2+41
246
698
676
1126
04/15/09 (04/02/09,
10/31/09 (08/29/09,
04/24/10 (04/16/10,
11/14/09 (09/18/09,
04/29/09)
01/29/10)
05/15/10)
02/20/10)
09/26/09 (08/29/09,
04/30/10 (04/16/10,
11/06/10 (07/22/10,
11/13/10 (09/25/10,
10/17/09)
05/15/10)
12/11/10)
12/11/10)
0-18 yrs
17/74 (23%)
46/213 (22%)
14/208 (7%)
81/378 (21%)
19-50 yrs
10/150 (7%)
45/435 (10%)
17/417 (4%)
38/649 (6%)
>50 yrs
0/22 (0%)
5/50 (10%)
2/51 (4%)
6/99 (6%)
Number of pairs
Collection period (median, range)
First serum
Second serum
Proportion of 4-fold rise
Predicted number of 4-fold rise based on the fitted model
0-18 yrs
15 (8, 23)
44 (32, 57)
13 (6, 20)
82 (63, 103)
19-50 yrs
10 (5, 18)
37 (25, 50)
12 (5, 19)
47 (32, 64)
>50 yrs
1 (0, 3)
3 (0, 8)
1 (0, 4)
7 (2, 14)
1only
participants with missing serum in Round 3 were included in this group
eTable 2. Sensitivity analysis on the results from the assumption on days from infection to sero-conversion
Days from infection to sero-conversion
10 days
Model fit
Model with change points at
14 days
21 days
Chi-square statistics
p-value
Chi-square statistics
p-value
Chi-square statistics
p-value
8.5
0.74
8.9
0.71
9.3
0.67
2009/11/21
Cumulative incidence of infection for H1N1pdm09 epidemic
0-18 yrs
0.47 (0.42, 0.51)
0.46 (0.41, 0.51)
0.48 (0.43, 0.52)
19-50 yrs
0.17 (0.14, 0.21)
0.17 (0.13, 0.2)
0.18 (0.14, 0.21)
>50 yrs
0.12 (0.07, 0.18)
0.12 (0.06, 0.18)
0.12 (0.07, 0.19)
eTable 3. Estimated pre-epidemic antibody distribution
Children
Adults
<1:10
86.3% (81%, 90.5%)
88.6% (84.6%, 91.9%)
1:10
3.3% (1.5%, 6.4%)
1.9% (0.9%, 3.6%)
1:20
3.6% (1.6%, 6.5%)
2.5% (1.3%, 4.1%)
1:40
1.4% (0%, 5.2%)
2.3% (0.7%, 4.9%)
1:80
0.7% (0%, 4.1%)
1.2% (0.1%, 4%)
1:160
0.5% (0%, 4%)
2% (0.3%, 4.4%)
1:320
1.8% (0%, 4.7%)
0.6% (0%, 2.1%)
1:640
0.2% (0%, 3.3%)
0% (0%, 0.5%)
1:1280
0.1% (0%, 1.2%)
0% (0%, 0.4%)
1:2560
0.1% (0%, 0.7%)
0% (0%, 0.3%)
eTable 4. Estimated distribution for antibody waning for 6 months
Fold-drop
Children
Adults
0.5
4.2% (3.1%, 5.6%)
1.2% (0.8%, 1.7%)
1
48.5% (43.6%, 53.0%)
37.3% (31.8%, 41.1%)
2
36.5% (31.5%, 41.0%)
29.0% (24.5%, 34.4%)
4
4.6% (2.5%, 7.4%)
6.1% (3.2%, 9.2%)
8
0.9% (0.2%, 2.6%)
5.9% (4.3%, 10.4%)
16
0.4% (0.0%, 2.0%)
4.8% (3.8%, 6.9%)
32
0.1% (0.0%, 0.9%)
4.3% (3.3%, 5.6%)
64
2.0% (1.0%, 4.2%)
3.8% (2.7%, 4.9%)
128
1.2% (0.5%, 2.2%)
3.2% (2.0%, 4.2%)
256
0.7% (0.1%, 1.4%)
2.5% (1.2%, 3.6%)
512
0.3% (0.0%, 1.0%)
1.7% (0.4%, 3.0%)
eTable 5. Simulation study for the model for estimating the cumulative incidence of
infection
True value
Number of credible intervals
that cover the true value
Scale parameter of children before the
0.737
43
0.231
48
0.130
49
3.059
45
0.784
48
0.737
49
change point
Scale parameter of adults before the
change point
Scale parameter of older adults before the
change point
Scale parameter of children after the
change point
Scale parameter of adults after the change
point
Scale parameter of older adults after the
change point
12
Download