Sequential probability ratio tests based on grouped observations Karl-Heinz Eger Chemnitz University of Technology Chemnitz, Germany eger@mathematik.tu-chemnitz.de Evgeny Borisovich Tsoy Novosibirsk State Technical University Novosibirsk, Russia ebcoi@nstu.ru Abstract - This paper deals with sequential likelihood ratio tests based on grouped observations. It is demonstrated that the method of conjugated parameter pairs known from the non-grouped case can be extended to the grouped case obtaining Waldlike approximations for the OC- and ASNfunction. For near hypotheses so-called F -optimal groupings are recommended. As example an SPRT based on grouped observations for the parameter of an exponentially distributed random variable is considered.1 {XiG }∞ i=1 . This test is defined as follows (see, e.g., [5], [3] or [4]). Let n Y pGθ1 (XiG ) LGn,θ0 ,θ1 = pG (XiG ) i=1 θ0 ~G = be for n = 1, 2, ... the likelihood ratio of sample X n G G (X1 , ..., Xn ), then to given stopping bounds 0 < B < 1 < A < ∞ the sample size NG and the termination rule δG of a Wald SPRT for the hypotheses (1) are defined as follows: / (B, A)} NG = inf{n ≥ 1 : LGn,θ0 ,θ1 ∈ and δG = 1{LG ≤B} . N,θ0 ,θ1 Index Terms - Hypotheses testing, probability ratio test, classified observations, grouped observations, That means, we continue the observations for n = 1, 2, ... as long as the critical inequality B < LGn,θ0 ,θ1 < A holds. If on sequential test, sequential analysis. observation stage n for the first time LGn,θ0 ,θ1 ∈ / (B, A) and if then LGn,θ0 ,θ1 ≤ B or LGn,θ0 ,θ1 ≥ A holds we accept the 1 Introduction hypothesis H0 or H1 , respectively. We denote this SPRT by SG (B, A). Let {Xi }∞ i=1 be a sequence of i.i.d. random variables with The most important characteristics for evaluation of the a density function fθ (x), θ ∈ Θ, with respect to some meastatistical properties of our test are the operating characsure µ. Our aim is to discriminate between two simple teristic function (OC-function) QG (θ) = Eθ δG , θ ∈ Θ, and hypotheses the average sample number function (ASN-function) Eθ NG , H0 : θ = θ0 and H1 : θ = θ1 , θ0 6= θ1 , (1) θ ∈ Θ. If Pθ (LG1,θ0 ,θ1 = 1) < 1 then we have Pθ (NG < ∞) = 1 by means of a sequential probability ratio test (SPRT). and Eθ NG < ∞. Moreover, the Wald-WolfowitzIn this context we suppose, that the random variables Theorem holds. That means, the test SG (B, A) minimises {Xi }∞ i=1 can be observed only in a restricted manner as the average sample number function for θ = θ0 and θ = θ1 follows. Let G be a partition of the domain X of the random among all tests whose error probabilities are not greater variables {Xi }∞ i=1 in disjoint subsets X1 , ..., Xm , m ≥ 2, such than the error probabilities of Wald’s SPRT at θ = θ0 and that on each observation stage i, i = 1, 2, ..., instead of Xi θ = θ1 . only a random variable XiG can be observed defined by The general problem of Wald’s SPRT consists in the computation of its characteristics, e.g., the OC-function or XiG = k ⇐⇒ Xi ∈ Xk , k = 1, ..., m, the ASN-function. This especially holds for the grouped case considered here. We will demonstrate that the with so-called method of conjugated parameter pairs known pGθ (k) = Pθ (XiG = k) = Pθ (Xi ∈ Xk ) > 0 from the non-grouped case (see [3]) can be extended to the θ ∈ Θ. That means, instead of a special measured value we grouped case obtaining Waldlike approximations for the observe only a corresponding group number and we have OC- and ASN-function. Moreover we will discuss some a so-called grouped or classified observation scheme. The possibilities for the determination of optimal groupings. In partition G of X is called a grouping. this context the Fisher information and the so-called F In the following we consider a Wald SPRT for the hy- optimal groupings will play an important part. As example potheses (1) based on observations of the random variables we consider an SPRT based on grouped observations for the parameter of an exponential distribution and present 1 Proceedings of ’The Second International Forum on Strategic Technology - IFOST 2007’, 3-5 October 2007, Ulaanbaatar, Mongolia, corresponding F -optimal groupings. pp. 284-287. 284 2 2.2 The Wald approximations The stopping bounds Under the conditions (4) we obtain a test SG (B, A) at size The OC- and ASN-function of test SG (B, A) can be com(α, β), 0 < α, β < 1, α + β < 1, that means QG (θ0 ) = 1 − α puted approximately in sense of the so-called Wald apand QG (θ1 ) = β, if the stopping bounds B and A satisfy proximations by means of conjugated parameter pairs as the condition follows [3]. β 1−β Definition. Two parameter pairs (θ0 , θ00 ) and (θ0 , θ1 ) ∈ and A = A∗ = . (5) B = B∗ = Θ × Θ are said to be conjugated, if a real number h, h 6= 0, α 1−α exists, such that The values B ∗ and A∗ are the so-called Wald approximations for the stopping bounds. LGn,θ0 ,θ00 = (LGn,θ0 ,θ1 )h , n = 1, 2, ..., A sufficient condition for an admissible test for the hypotheses (1) at size (α, β) is B = β and A = 1/α. Then we h holds. We write: (θ0 , θ00 ) ∼ (θ0 , θ1 ). have QG (θ0 ) ≥ 1 − α and QG (θ1 ) ≤ β. h If (θ0 , θ00 ) ∼ (θ0 , θ1 ) the OC-function QG (θ) and the power function MG (θ) = Eθ (1−δG ), θ ∈ Θ, of test SG (B, A) satisfy 2.3 The ASN-function the relations G By means of the moment equation Eθ ZN,θ = E θ NG · 0 ,θ1 00 G G QG (θ ) G h h E Z which holds for our tests if, e.g., P (L θ 1,θ0 ,θ1 θ 1,θ0 ,θ1 = = Eθ0 (LN,θ0 ,θ1 ) |H0 is acc. ≤ B (2) 0 G QG (θ ) 1) < 1 we get in case of Eθ Z1,θ0 ,θ1 6= 0 for the average sample number and G |H0 is acc.)QG (θ) Eθ NG = (Eθ (ZN,θ 0 ,θ1 MG (θ00 ) G h h 0 = E (L ) |H is acc. ≥ A , (3) G G θ 1 N,θ ,θ 0 1 +Eθ (ZN,θ0 ,θ1 |H1 is acc.)(1 − QG (θ))/Eθ Z1,θ . MG (θ0 ) 0 ,θ1 If we again assume that condition (4) holds approximately we obtain the so-called Wald approximation Eθ∗ NG for the average sample number Eθ NG : where in case of Pθ0 (LGN,θ0 ,θ1 = B|H0 is accepted) = Pθ0 (LGN,θ0 ,θ1 = A|H1 is accepted) = 1 (4) Eθ NG ≈ Eθ∗ NG = ln BQ∗G (θ) + ln A(1 − Q∗G (θ)) G Eθ Z1,θ 0 ,θ1 . (6) the equals signs hold. We remark, that in case of Pθ (NG < ∞) = 1 (closed test) moreover MG (θ) = 1 − QG (θ) In case of Eθ Z G 1,θ0 ,θ1 = 0 we get by means of the holds. A sufficient condition for closeness is, for instance, G G moment equation Eθ (ZN,θ )2 = Eθ NG · Eθ (Z1,θ )2 0 ,θ1 0 ,θ1 Pθ (LG1,θ0 ,θ1 = 1) < 1. ∗ analogously the approximation Eθ NG ≈ Eθ NG = G − ln B ln A/Eθ (Z1,θ )2 . 0 ,θ1 2.1 The OC-function For a closed test SG (B, A) we get under the condition (4) h and (θ0 , θ00 ) ∼ (θ0 , θ1 ) by (2) and (3) for the OC-function QG (θ0 ) = Q∗G (θ0 ) = Ah − 1 Ah − B h and QG (θ00 ) = Q∗G (θ00 ) = B h Q∗G (θ0 ). If condition (4) holds approximately, that means the excess over the stopping bounds is negligible, then we have QG (θ0 ) ≈ Q∗G (θ0 ) and QG (θ00 ) ≈ Q∗G (θ00 ) = B h Q∗G (θ0 ). This are the famous Wald approximations for the OCfunction. If to given θ0 an h 6= 0 and θ00 6= θ0 do h not exist such that (θ0 , θ00 ) ∼ (θ0 , θ1 ), e.g., in case of G Eθ0 Z1,θ = Eθ0 ln LG1,θ0 ,θ1 = 0, we can extend the Wald 0 ,θ1 approximation for the OC-function by QG (θ0 ) ≈ Q∗G (θ0 ) = G ln A/(ln A − ln B) for Eθ0 Z1,θ = 0. 0 ,θ1 2.4 Conjugated parameter pairs According to our definition of conjugated parameter pairs we have in the i.i.d. case the following criterion. It holds h (θ0 , θ00 ) ∼ (θ0 , θ1 ) if to a given parameter value θ0 a real number h 6= 0 and a parameter value θ00 6= θ0 exist such that !h pGθ1 (x) pGθ00 (x) = (7) pGθ0 (x) pGθ0 (x) holds for x ∈ {1, ..., m}. Hence, a necessary existence condition for conjugated parameter pairs is, that the function qθG0 (x) defined by qθG0 (x) = pGθ1 (x) pGθ0 (x) !h pGθ0 (x), x ∈ {1, ..., m}, is a probability mass function. Because of qθG0 (x) ≥ 0 for x ∈ {1, ..., m} we can compute a value h, −∞ < h < ∞, 285 G expectation value Eθ∗ (Z1,θ )2 with respect to G, respec0 ,θ1 tively. An interesting case are near hypotheses: If ∆θ = |θ1 − G θ0 | is small, then it can be shown that Eθ0 Z1,θ = 0 ,θ1 G G ∗ 1 G 2 2 − 2 IF (θ0 )(∆θ) , Eθ∗ (Z1,θ0 ,θ1 ) = IF (θ )(∆θ)2 and G Eθ1 Z1,θ = 21 IFG (θ1 )(∆θ)2 for ∆θ → 0 holds, where 0 ,θ1 such that ϕθ0 (h) = m X qθG0 (x) x=1 = m X pGθ1 (x) x=1 pGθ0 (x) = Eθ0 e !h G hZ1,θ pGθ0 (x) 0 ,θ1 =1 holds. The function ϕθ0 (h) is as function of h, −∞ < h < ∞, the moment-generating function of the random G variable Z1,θ = ln LG1,θ0 ,θ1 . It holds ϕθ0 (0) = 1, 0 ,θ1 G limh→±∞ ϕθ0 (h) = ∞, ϕ0θ0 (0) = Eθ0 Z1,θ as well as 0 ,θ1 hZ G G ϕ00θ0 (h) = Eθ0 (Z1,θ )2 e 1,θ0 ,θ1 > 0. This means that 0 ,θ1 ϕθ0 (h) is a convex function in h. Hence, we have in case G G of Eθ0 Z1,θ < 0 and Eθ0 Z1,θ > 0 beside the trivial 0 ,θ1 0 ,θ1 solution h = 0 of equation ϕθ0 (h) = 1 always an unique solution h > 0 and h < 0, respectively. The case m = 2: In this case our test becomes an SPRT for discriminating between two probabilities. Then in case of Eθ0 Z1,θ0 ,θ1 6= 0 beside the solution h 6= 0 of ϕθ0 (h) = 1 always a parameter value θ00 6= θ0 exists such that condition h (7) holds for x ∈ {1, 2}. This implies (θ0 , θ00 ) ∼ (θ0 , θ1 ) and we obtain the usual Wald approximations for the OC- and ASN-function. The case m > 2: Examples show that as a rule to a given solution h 6= 0 of ϕθ0 (h) = 1 it does not exist a parameter value θ00 such that pGθ00 (x) = qθG0 (x), x ∈ {1, ..., m}. However, we can find always a value θ00 such that this relation holds approximately. Hence we have then pGθ00 (x) ≈ pGθ0 (x) pGθ1 (x) pGθ0 (x) !h , x ∈ {1, ..., m}, IFG (θ) = Eθ ∂ ln pGθ (X1G ) ∂θ !2 denotes the Fisher information of a single observation of random variable X1G depending on the parameter value θ. This underlines the importance of the Fisher information with respect to optimal groupings in this context. Definition. Let G = {X1 , ...Xm } be an interval grouping such that Xi = [x∗i−1 , x∗i ) for i = 1, ..., m and inf x∈X x = x∗0 < x∗1 < · · · < x∗m = supx∈X x, holds. An interval grouping G0 is said to be F -optimally for θ, if IFG0 (θ) = max∗ IFG (θ) ,...,x x∗ 1 m−1 holds. The F -efficiency of such a grouping can be measured by the ratio Feff (G, θ) = IFG (θ)/IF (θ), where 2 fθ (X) denotes the Fisher information IF (θ) = Eθ ∂ ln ∂θ of a non-grouped observation of the random variable X1 . It holds 0 ≤ Feff (G, θ) ≤ 1. Numerical studies show that F -optimal groupings are quite robust with respect to their efficiencies against modifications of the group bounds. For instance, a simplification of the group bounds by moderate rounding does not lead to a significant loss of F -efficiency or discrimination information. and in sense of this approximation we can compute corresponding modified Wald approximations for the OC- and Table 1 F -optimal group bounds of interval groupings Gm ASN-function if we use now the left-hand approximation for θ = 1, m = 2, ..., 9, and F -efficiencies F (Gm , θ) [%] eff for the OC-function in (2.1) exponential distribution. QG (θ0 ) ≈ Q∗G (θ0 ) = Ah − 1 . Ah − B h An explicit determination of the parameter value θ00 is not necessary then. 3 Optimal groupings The Wald approximations for the ASN-function provide hints how a grouping does influence to the average sample size. While the numerator in (6) is independent on the grouping G the denominator depends on G via the exG pectation value Eθ Z1,θ . This dependence can be used 0 ,θ1 optimising the average sample number by means of an appropriate grouping for a given parameter value θ. Especially, we get for our test corresponding small average sample sizes for θ0 , θ1 or θ∗ in sense of the Wald approximations if a grouping G is chosen which maximises G G G |Eθ0 Z1,θ |, Eθ1 Z1,θ or, in case of Eθ∗ Z1,θ = 0, the 0 ,θ1 0 ,θ1 0 ,θ1 286 m x∗1 x∗2 x∗3 x∗4 % 2 1.5936 3 1.0176 2.6112 4 0.7540 1.7716 3.3652 64.76 82.03 89.10 m x∗1 x∗2 x∗3 x∗4 x∗5 x∗6 x∗7 x∗8 % 6 0.4993 1.0997 1.8538 2.8714 4.4650 7 0.4276 0.9269 1.5273 2.2813 3.2989 4.8925 8 0.3739 0.8015 1.3008 1.9012 2.6553 3.6729 5.2665 94.76 96.06 96.93 5 0.6004 1.3545 2.3720 3.9657 92.69 9 0.3323 0.7062 1.1338 1.6331 2.2336 2.9876 4.0052 5.5988 97.54 Figure 1 Wald approximations Q∗Gm (θ) of the OCSince Eθ Z1,θ0 ,θ1 < 0 and Eθ Z1,θ0 ,θ1 > 0 for θ < 1 and functions of tests SGm (B ∗ , A∗ ) for m = 2, ..., 9 and Q∗ (θ) θ > 1, respectively, the test should prefer the hypothesis H0 of the non-grouped test S(B ∗ , A∗ ). for the first case and the hypothesis H1 for the second one. That means, the test should be most selective for parameter 1.0 values in the neighbourhood of θ = 1. This can be reached 0.9 by means of F -optimal groupings for θ = 1. Table 1 presents the corresponding F -optimal group 0.8 bounds x∗1 , ..., x∗m−1 of interval groupings Gm for θ = 1 and m = 2, ..., 10 as well as the reached relative efficien0.7 cies (in percent) Feff (Gm , θ) = IF (Gm , θ)/IF (θ), IF (θ) = 0.6 Eθ (∂ ln fθ (x)/∂θ)2 = 1/θ2 for θ = 1. We now consider the special hypotheses H0 : θ0 = 0.5 0.85 and H1 : θ1 = 1.166687. Then we have Eθ Z1,θ0 ,θ1 = 0 for θ = 1. Let α = 0.05 and β = 0.05 be the 0.4 given risks of an error of first and second kind, respectively. 0.3 Then we get by (5) the following Wald approximations for the stopping bounds B and A: B ∗ = 0.052632 and A∗ = 0.2 19. Figure 1 shows the Wald approximations of the OC0.1 functions obtained by the method of conjugated parameter 0.0 pairs of Section 2.4 for the tests SGm (B ∗ , A∗ ), m = 2, .., 10, 0.7 0.8 0.9 1.0 1.1 1.2 1.3 based on the F -optimal interval groupings Gm of Table 1. Figure 2 Wald approximations Eθ∗ NGm of the ASNWe see that grouping has only a slight influence on the functions of tests SGm (B ∗ , A∗ ), m = 2, ..., 9 and Eθ∗ N of WALD approximation of the OC-function. That means, the non-grouped test S(B ∗ , A∗ ) (bold). the OC-function of Wald’s SPRT remains almost unaltered 150 if we switch over from non-grouped to grouped observations. Figure 2 presents the corresponding Wald approximations of the ASN-functions of the tests SGm (B ∗ , A∗ ), m = 2, ..., 10, as well as the Wald approximation of the ASNfunction Eθ∗ N for the non-grouped test S(B ∗ , A∗ ) (bold line). Here we can see how grouped observations increase 100 the Wald approximations of the ASN-functions of our tests depending on the F -efficiency of a grouping or the number of groups, respectively. 50 5 0 The method of conjugated parameter pairs is an effective method obtaining Waldlike approximations for the OC- and ASN-function for sequential likelihood ratio tests based on grouped observations. With respect to their hight efficiencies F -optimal groupings are recommended. 0.7 4 0.8 0.9 1.0 1.1 1.2 1.3 Example Conclusions References {Xi }∞ i=1 Let be independent exponentially distributed random variables with density function −θx θe for x≥0 fθ (x) = 0 for x < 0, 0 < θ < ∞. Our aim is to discriminate between the hypotheses H0 : θ = θ0 and H1 : θ = θ1 , θ0 < θ1 , where 0 < θ0 < 1 < θ1 < ∞ and Eθ Z1,θ0 ,θ1 = ln θ1 /θ0 −(θ1 −θ0 )/θ = 0 for θ = 1 holds. This side condition is no restriction since the parameter θ is a scale parameter here and other simple hypotheses can be reduced to this one by an appropriate transformation of the random variables {Xi }∞ i=1 . [1] Denisov, V.I., Eger, K.-H., Lemesko, B.Yu., Tsoi, E.B. (2004). Design of experiments and statistical analysis for grouped observations. Novosibirsk, NSTU Publishing House. [2] Eger, K.-H. (2003). Likelihood ratio tests for grouped observations. Chemnitz University of Technology, Faculty of Mathematics, Preprint 2003-10. [3] Eger, K.-H. (1985). Sequential tests. Teubner, Leipzig. [4] Ghosh, B.K., Sen, P.K. (editors) (1991). Handbook of Sequential Analysis. Marcell Dekker, Inc., New York. [5] Wald, A. (1947). Sequential Analysis. Wiley, New York. 287