Estimation of the proportion of infected people with the aid two data sets: repeated q-PCR tests and a cohort survey Ezer Miller, Amit Huppert, Ilya Novikov, Laurence Freedman The Hebrew University of Jerusalem Addis Ababa University Gertner Institute for Epidemiology and Health Policy Research Background Visceral Leishmaniasis (VL) or Kala Azar disease is a vector borne infectious disease that is responsible for 500,000 infections annually. In the last 5 years there is a disease outbreak in northern Ethiopia The statistical model presented here is a part of a large study, that aims to explore the disease transmission dynamics in northern Ethiopia. Leishmania promastigotes in the sand fly gut Sand fly ♀ feeding blood 2.0 mm 2.5 µm Leishmania parasites are transmitted by blood-sucking phlebotomine sand flies. Female sand flies take blood which constitutes a source of protein for maturing their eggs. Leishmania amastigotes that are ingested with the blood, transform into promastigotes and multiply in the gut of the infected sand fly. Cohort survey results - table The parasitemia (no. of parasites per ml) level obtained from the cohort study in north Ethiopia during spring 2011 via q-PCR. n=4756. ei=experimental value in i test = 1 (infected) or 0 (uninfected) j = group index PCR result 0 j=1 1-10 j=2 11-100 j=3 101-1000 j=4 >1000 j=5 4076 468 93 96 23 P(e1=0,j=1)=0.857 P(e1=1,j=2)=0.0984 P(e1=1,j=3)=0.0196 P(e1=1,j=4)=0.0202 P(e1=1,j=5)=0.0048 Cohort survey results – Pie chart The parasitemia (no. of parasites per ml) level obtained from the cohort study in north Ethiopia during spring 2011 via q-PCR. n=4756. 0 1 - 10 11 - 100 101 - 1000 > 1000 The proportion of infected people is ~14%. Study goal The q-PCR has a limited accuracy. The probability of getting false-negative result (FN) and false-positive result (FP) are not zero ! The goal of the study is to estimate q = the probability of being infected = the proportion of infected people within the population To achieve this, the cohort participants were divided into five groups according to their parasitemia level, and a repeated assay performed on selected members of each group. Repeated test results ei = 1 → infected, ei = 0 → uninfected Group index 1st measurement n 2nd measurement: The number who got positive result in the 2st test Conditional probabilities calculated according to the repeated test results. j=1 0 107 9 P(e2=1|e1=0,j=1)=0.084 j=2 1 - 10 108 64 P(e2=1|e1=1,j=2)=0.503 j=3 11 - 100 48 41 P(e2=1|e1=1, j=3)=0.854 J=4 101 -1000 24 23 P(e2=1|e1=1,j=4)=0.9583 j=5 > 1000 19 19 P(e2=1|e1=1,j=5)=1 Statistical approach 1 Calculation of the estimated probabilities of two separated measurements: P(e1=0,e2=0) = ? P(e1=0,e2=1 U e1=1, e2=0) = ? P(e1=1,e2=1) = ? ei = experimental measurements 2 Expressing these probabilities as a function of The true probability of being infected = q The probability of getting false-negative result = p The probability of getting false-positive result = r 3 Calculation of q*, p* and r*. 4 Calculation the variance of q* 1 Calculation of the estimated probabilities of two separated measurements by using the cohort and the repeated test results: P ( e1 0 , e 2 0 ) P ( e 2 0 | e1 0 , j 1) P ( e1 0 , j 0 ) 0 . 785 P ( e 1 0 , e 2 1) P ( e 2 1 | e 1 0 , j 1) P ( e 1 0 , j 1) 0 . 07208 5 P ( e1 1, e 2 0 ) P (e 0 | e1 1, j k ) P ( e1 1, j k ) 0 . 04375 2 k 2 5 P ( e1 1, e 2 1) P (e k 2 2 1 | e1 1, j k ) P ( e1 1, j k ) 0 . 0992 2 The relationship between the probability of two separated measurements and q, p, and r q = P(T=1) = the proportion of infected people p = P(e=0 | T=1) = probability of getting false-negative result. r = P(e=1 | T=0) = probability of getting false-positive result T = true status e = measured status P(e1=0, e2=0) = P(e1=0, e2=0| T=1)P(T=1)+P(e1=0,e2=0|T=0)P(T=0) = p2q+(1-r)2 (1-q) P(e1=0,e2=1U e1=1, e2=0)=P(~|T=1)P(T=1)+P(~|T=0)P(T=0) = 2r(1-q)(1-r)+2qp(1-p) P(e1=1,e2=1) = P( (~|T=0)P(T=0)+p(~|T=1)P(T=1) = r2(1-q)+q(1-p)2 3 Calculation of q*, p* and r*. For zero false-positive probability, r=0 P(e1=1,e2=1)= (1-p)2q = 0.099 P(e1=1,e2=0 U e1=0, e2=1) = 2qp(1-p) = 0.116 P(e1=0,e2=0) = p2q+1-q = 0.785 p* = the probability of getting false-negative result = 0.369 q* = the infection probability = 0.249 For zero false-negative probability, p=0 P(e1=1,e2=1)= (1-r)2(1-q) = 0.099 P(e1=1,e2=0 U e1=0, e2=1) = 2r(1-q)(1-r) = 0.116 P(e1=0,e2=0) = r2(1-q)+q = 0.785 r* = the probability of getting false-positive result = 0.06 q* = the infection probability = 0.11 4 Calculation of the variance of q through ad hoc method q* is a functions of u* and s*. u = P(e1=0,e2=0) s = P(e2=1,e1=1) q* = f(u*,s*) Var [ f ( u *, s *)] ( f u ) Var ( u *) ( 2 f s ) Var ( s *) 2 ( 2 f u )( f s ) Cov ( u *, s *) Example of calculation the variance of q in case where r = FP = 0 u = p2q+1-q s = (1-p)2q q s Var ( q *) [ 1 (1 p ) ] Var ( s *) [ 2 2 1 (1 p ) 1 p 1 2 2 , q u 1 p 1 2 ] Var ( u *) [ 2 1 ( p 1)( p 1) q* = 0.249, Var(q*) = 0.0011, σ* = 0.0033 3 ]Cov ( u *, s *) Summary A cohort survey combined with repeated tests results can be used together in order to estimate the proportion of infected people within the population. Our analysis shows that: The probability of having false-negative result is high – 37%, when there is no false-positive tests. The proportion of infected people is ranged between 11% (FN=0) and 25% (FP=0). Under the assumption of zero false-positive result, the proportion of infected people among the population is much higher than recorded in the cohort survey (~14%). These result have an important implications in the analysis of a dynamic infectious model we are currently developing. Thank you for listening ! Acknowledgements… The model development and analysis could not have been performed without the valuable help and support from Prof. Laurence Freedman Dr. Amit Huppert Dr. Ilya Novikov Dr. Asrat Hailu Dr. Ibrahim Abassi Prof. Alon Warburg The study is funded by Bill and Melinda Gates Foundation, (BMGF)