Theoretical Mathematics & Applications, vol.4, no.1, 2014, 55-69 ISSN: 1792- 9687 (print), 1792-9709 (online) Scienpress Ltd, 2014 Sequential Estimation of the Mean of a Class of Skewed Distributions Mohamed Tahir 1 Abstract In this paper, we propose a sequential procedure (t , µˆ t ) for estimating the mean, µ, of a class of skewed probability density functions, subject to the loss function La = a 2 ( µˆ t − µ ) 2 + t , where a is a given positive number, t is a stopping time of the type proposed by Robbins (1959) and µ̂ t is a bias-corrected estimator of µ. We provide a second-order asymptotic expansion, as a → ∞, for the regret with respect to the loss La. For the Pareto and Skew-uniform distributions, the proposed sequential procedure (t , µˆ t ) performs better than the procedure (t , X t ), in the sense that it has a lower asymptotic regret. Moreover, the regret is negative for large values of a under the Gamma, Pareto, Rayleigh and Skew-uniform distributions. Using the loss considered by Chow and Yu (1981) and Martinsek (1988), we propose a bias-corrected estimator of µ and provide a second-order asymptotic expansion, as a → ∞, for the incurred regret. 1 Department of Mathematics, University of Sharjah, United Arab Emirates. E-mail: mtahir@sharjah.ac.ae Article Info: Received: January 3, 2014. Revised: March 1, 2014. Published online : March 31, 2014. 56 Sequential estimation of the mean Mathematics Subject Classification: 62F10; 62L10; 62L12 Keywords: Asymptotic expansion, bias-corrected estimator, excess over the stopping boundary, loss function, regret, skewness, stopping time. 1 Introduction Let X1, X2, … be independent random variables with common probability density function f θ (x ) , where the value of θ is unknown, but lies in some interval Ω ⊂ (-∞, ∞). Suppose that X1, X2, … are to be observed sequentially up to stage n at a cost of one unit per observation and that when observation is terminated, the population mean µ= ∞ ∫ xfθ ( x)dx −∞ is estimated by an appropriate estimator, µ̂ n , and the loss incurred is of the form La ( µˆ n ,θ ) = a 2 ( µˆ n − µ ) 2 + n, (1) where a is a known positive number, determined by the cost of estimation relative to the cost of a single observation. Robbins (1959) proposed the sequential procedure (t , X t ), which stops the sampling process after observing X1, …, Xt and estimates µ by µˆ t = X t , where t = inf n ≥ ma : n > a n ∑ (X i =1 i − X n )2 n (2) with ma being a positive integer. Let C denote the class of skewed probability density functions, fθ (x), θ ∈ Ω, for which the skewness is independent of θ. This class contains, among Mohamed Tahir 57 others, the density functions of the following distributions: 1- GAMMA(α, θ): the Gamma distribution with known shape parameter α and scale parameter β f θ ( x) = = θ. Its density function is 1 x α −1 e x / θ , x > 0 and its skewness is γ = 2 . α θ Γ(α ) α 2- PARETO(α, θ): the Pareto distribution with known shape parameter α > 0 and scale parameter β = θ. skewness is γ = Its density function is f θ ( x) = 2(1 + α ) α − 2 α −3 α αθ α , x ≥ θ and its x α +1 for α > 3. 3- RAYLEIGH(θ): the Rayleigh distribution with shape parameter α =θ. density function is f θ ( x) = x θ 2 − e x2 2θ 2 , x > 0 and its skewness is γ = Its 2 π (π − 3) . (4 − π ) 3 / 2 4- SKEWUNIFORM(λ,θ): the Skew-uniform distribution with known λ and unknown θ. Its density function is f θ ( x) = for − θ < x < θ and its skewness is γ = 1 θ2 [max{min{λx, θ },−θ } + θ ], 2λ (5λ2 − 9) for − 3 < λ < 3. 5(3 − λ2 ) 3 / 2 In this paper, we propose a bias-corrected estimator µ̂t of µ. and provide a second-order asymptotic expansion, as a → ∞, for the regret ra (t , µˆ t ) with respect to the loss defined by (1). It is seen that the asymptotic regret is negative for the Gamma, Pareto, Rayleigh and Skew-uniform distributions. We also provide second-order asymptotic expansion, as a → ∞, for the regret with respect to the more general loss function considered by Chow and Yu (1981) and Martinsek (1988). 58 Sequential estimation of the mean In the Normal case, Starr and Woodroofe (1969) showed that ra (t , X t ) = O (1) as a → ∞. Woodroofe (1977) showed that ra (t , X t ) = 0.5 + o(1) as a → ∞ if ma ≥ 4. For the Gamma and Poisson cases, Starr and Woodroofe (1972) and Vardi (1979) obtained bounded regret using stopping times different from the one in (2). For the distribution-free case, Ghosh and Mukhopadhyay (1979) and Chow and Yu (1981) established asymptotic risk efficiency based on (2) under certain conditions. Tahir (1989) proposed a class of bias-reduction estimators of the mean of the one-parameter exponential family and provided a second order approximation for the regret. 2 Preliminary Notes Let t be as in (2). Martinsek (1988) indicated that E[ X t ] = µ − γ 1 + o 2a a (3) as a → ∞, provided that E[|X1|8+p] < ∞ for some p > 0, where γ denotes the population skewness; that is, γ = σ −3 E[( X 1 − µ ) 3 ] , where σ is the population standard deviation. Thus, X t is an asymptotically biased estimator of µ if f θ (x) ∈ C.. Consider the bias-corrected estimator µˆ n = X n + for n ≥ 1. γ (4) 2a Then, E[ µˆ t ] = µ + o(1) as a → ∞, by (3). In order to define the regret incurred by the sequential procedure (t , µˆ t ) under the loss (1), we first assume that X1, …, Xn have been observed sequentially up to a predetermined stage n from a population with density function f θ (x) ∈ C. The risk incurred by estimating µ by (4), subject to the loss (1), is Mohamed Tahir 59 Ra (n, θ ) = E[ La (n, µˆ n )] = E[a 2 ( X n − µ ) 2 ] + a 2γ γ2 )] [( E X µ − + +n n 4 a2 a 2σ 2 γ 2 = + + n, 4 n This risk is minimized with respect to n by choosing n as the greatest integer less than or equal to na = aσ. The minimum risk is Ra* (θ ) = Ra (na , θ ) = 2aσ + for a > 0. γ2 4 Since σ is unknown, there is no fixed-sample-size procedure that attains the minimum risk in practice. Therefore, we propose to use the sequential procedure (t , µˆ t ), where t be as in (2). The performance of this procedure is measured by its regret, which is defined below. Definition 2.1 The regret of the procedure (t , µˆ t ) under the loss (2) is defined as ra (t , µˆ t ) = E[ La (t , µˆ t )] − Ra* (θ ) = E[a 2 ( µˆ t − µ ) 2 + t ] − 2aσ − γ2 4 (5) for a > 0. The stopping time t in (2) can be rewritten as −1 / 2 V t = inf n ≥ ma : n n > a , n n Vn = ∑ ( X i − X n ) 2 where (6) i =1 for n ≥ 1. Let U a = t (Vt / t ) −1 / 2 − a denote the excess over the stopping boundary. Chang and Hsiung (1979) showed that Ua converges in distribution to a random variable U as a → ∞. 60 Sequential estimation of the mean Lemma 2.2. Let t be as in (2). Then, t →σ a w.p.1 as a → ∞. Moreover, If E[|X1|8+p] < ∞ for some p > 0, then 3 E[t ] = a + ν − 0.5 − σ 4 (κ − 1) + o(1) 8 as a → ∞, where ν =E[U] is the asymptotic mean of the excess over the boundary and κ = σ −4 E [( X 1 − µ ) 4 ] is the population kurtosis. Proof: The first assertion follows from Lemma 1 of Chow and Robbins (1965). The second assertion is adopted from Chang and Hsiung (1979). 3 Main Results 3.1 Asymptotic regret under the loss (1) Let X1, X2, … be as in Section 1. The following theorem provides a second-order asymptotic expansion for the regret in (5). Theorem 3.1. Let t be defined by (2) with ma being such that δ√a ≤ ma = o(a) as a → ∞ for some δ > 0. For any probability density function f θ (x) ∈C with respect to which E[|X1|8+p] < ∞ for some p > 0, ra (t , µˆ t ) = 2.75 − 0.75κ + 2γ 2 − γ 2 + o(1) as a → ∞. Proof: Substituting (4) in (5) yields ra (t , µˆ t ) = E[a 2 ( X t − µ ) 2 + t − 2aσ ] + aγE[( X t − µ )] = ra (t , X t ) + aγE[( X t − µ )] for a > 0. (7) Moreover, 2 aE [( X t − µ )] = −γ / 2 + o(1) and ra (t , X t ) = 2.75 − 0.75κ + 2γ + o(1) (8) Mohamed Tahir 61 as a → ∞, by (3) and Martinsek (1983). Take the limit as a → ∞ in (7) and use (8) to complete the proof. The distributions considered in Tables 1-5 in Section 4 below are positively skewed, except for the Skew-uniform distribution with − 3 < λ < − 3 and 5 Skew-Laplace distribution with λ = 0.5. For Table 1, the minimum value of ρ* is 75/28 ≈ 2.68, which is attained when α = 49. The tables show that 1- the sequential procedure (t , µˆ t ) is a clear improvement over the procedure (t , X t ) since its asymptotic regret is lower, except for the Skew-uniform distribution with λ = -1.4. 2- the asymptotic regret of the procedure (t , µˆ t ) under the PARETO(5, θ) and SKEWUNIFORM(λ, θ) distributions is negative; which means that, for large values of a that the procedure (t , µˆ t ) performs better for these distributions than the best fixed-sample-size procedure. 3.2 Asymptotic regret under a more general loss function Let X1, X2, … be as in Section 1 and suppose that the loss function for estimating µ is of the form considered by Chow and Yu (1981) and Martinsek (1988); that is, La ( µ n* , θ ) = a 2σ 2 β − 2 ( µ n* − µ ) 2 + n for a > 0, where β is a given positive number and µ n* is an estimator of µ. (9) If * θ is estimated by µ n = X n , Martinsek (1988) proposed to use the stopping time 62 Sequential estimation of the mean β /2 1 n 2 T = inf n ≥ ma : n > a ∑ ( X i − X n ) n i =1 (10) and showed that the regret of the procedure (T , X T ) under the loss (9) is ra* (T , X T ) = E[a 2σ 2 β − 2 ( X T − µ ) 2 + T ] − 2aσ β = 3β − β2 β2 + − β κ + ( β 2 + β )γ 2 + o(1) 4 4 as a → ∞, provided that E[|X1|8+p] < ∞ for some p > 0. (11) Straightforward calculations yield that, for large values of a, 1) ra* (T , X T ) is negative under the Gamma distribution with α = 0.5 if 0 < β < 0.1. 2) ra* (T , X T ) is negative under the Pareto distribution with α = 5 if 0 < β < 1.24. Martinsek (1988) also indicated that E[ X T ] = µ − as a → ∞. βγ 1 + o β −1 2aσ a (12) Thus, if the distribution of X1 is not symmetric, then X T is biased for large values of a. Proposition 3.2: Suppose that γ does not depend on θ and let µ n* = X n + for n ≥ 1, where β > 1. βγ 2a 1/ β n1−1 / β Let T be defined by (10) with ma being such that δ√a ≤ ma = o(a) as a → ∞ for some δ > 0. For any probability density function f θ (x) ∈C with respect to which E[|X1|8+p] < ∞ for some p > 0, E[ µT* ] = µ +o(1) as a → ∞. Proof: For a > 0, T − (1−1 / β ) E . 2 a βγ (13) Next, E[(T / a ) − (1−1 / β ) ] → σ 1− β as a → ∞ if β > 1, by the fact that T/a → σβ aE [ µT* − µ ] = aE [ X T − µ ] + Mohamed Tahir 63 w.p.1 as a → ∞ and (2.2) of Martinsek (1983). Taking the limit as a → ∞ in (13), using this fact and (12) yields the desired result. Let ra* (T , µT* ) denote the regret of the biased-corrected procedure (T , µT* ) under the loss (9). Then, = ra* (T , µT* ) = 1 E[a 2σ 2 β − 2 ( X T − µ ) 2 + T − 2aσ β ] + βγσ 2 β − 2 a 2−1/ β E 1−1/ β ( X T − µ ) T γ 2σ 2 β − 2 a 2− 2/ β (14) + E 2− 2/ β 4 T a1−1/ β γ 2σ 2 β − 2 a 2− 2/ β ra* (T , X T ) + βγσ 2 β − 2 E 1−1/ β a ( X T − µ ) + E 2− 2/ β 4 T T Lemma 3.3: Let T be as in (3.2) with β > 1. If E[|X1|8+p] < ∞ for some p > 0, then a1−1 / β 2( β − 1) βγ o(1) E 1−1 / β a (X T − µ ) = 2 β +1 2σ 2( β -1) σ T as a → ∞. Proof: First, observe that a 1−1 / β a 1−1 / β 1 1 E 1−1 / β a (X T − µ ) = E 1−1 / β − β −1 a (X T − µ ) + β −1 aE[ X T − µ ] σ T T σ for a > 0. Moreover, aE[ X T − µ ] = − as a → ∞, (15) by (12) . Next, βγ 2σ β −1 + o(1) (16) expand g(y) = 1/y 1-1/β at y = σβ, substitute y = a/T and multiply by a (X T − µ ) to obtain a1−1 / β 1 1 T 1−1 / β − β −1 a( X T − µ ) = − 1T*1 / β − 2 − σ β a( X T − µ ), β a T σ (17) where T* is a random variable such that | T* - σβ| ≤ |T/a - σβ|. Next, rewrite T in as T = inf{ n ≥ ma: n(Vn/n)-β/2 > a}, where Vn is as in (6), and let 64 Sequential estimation of the mean V U a* T T = T − β /2 −a denote the excess over the stopping boundary. Expanding h(y) = y-β/2 at y = σ2, substituting y = VT/T and multiplying by T yields V T T T − β /2 T = β − σ β β ( β + 2) (VT − T σ 2 ) 2 2 ( V T σ ) − + T 2σ β + 2 8λTβ /2+ 2 T for a > 0, where λT is a random variable between VT/T and σ2. Furthermore, T write V T = ∑ ( X i − µ ) 2 − T ( X T − µ ) 2 to obtain i =1 U a*= T σβ −a− β β β ( β + 2) (VT − T σ 2 ) 2 2 2 ( W − T σ ) + T ( X − µ ) + T T 2σ β + 2 2σ β + 2 8λTβ /2+ 2 T T for a > 0, where WT = ∑ ( X i − µ ) 2 . It follows from easily that i =1 T σβ * β β − σ= (U a − ξT ) + (WT − T σ 2 ) a a 2aσ 2 (18) for a > 0, where ξt = β ( β + 2) (VT − Tσ 2 ) 2 β 2 T ( X − µ ) + . T T 2σ β + 2 8λTβ / 2 + 2 Substituting (18) in (17) yields a1−1/ β 1 β 1/ β − 2 1 (U a − ξT )( X T − µ ) 1−1/ β − β −1 a ( X T − µ ) = − 1 σ T* σ β T 1 β + − 1 2 T*1/ β − 2 (WT − T σ 2 )( X T − µ ) β 2σ 1 1− β I 2 (a ), = − 1 σ β I1 (a ) + 2σ 2 β say. Let S n = X 1 + + X n , n ≥ 1. Then, (19) Mohamed Tahir 65 T*1/ β − 2 ( ) |] (U a − ξT )( ST −= E[| I= a E µT ) 1 T σβ ≤ a σβ aσ β ( S − µT ) a E (U a − ξT ) T*1/ β − 2 T T aσ β 2 2 2/ β − 4 a ST − µT E[(U a − ξT ) ] E T* T aσ β 2 2 2 1 2 2 2/ β − 4 a ST − µT β β 2σ E[U a ] + 2σ E[ξT ] E T* ≤ a T aσ β →0 β β (20) β as a → ∞, by Hölder’s inequality, the fact that T* → σ (| T* - σ | ≤ |T/a - σ |→ 0 w.p.1 since T/a → σβ, as in the first assertion of Lemma 1), ST − µT aσ β converges in distribution to a Standard Normal random variable by Anscombe’s theorem, the facts that E[U a2 ] → E[U 2 ] < ∞ and E[ξ T2 ] = O(1) as a → ∞ and (2.3), (2.8) and (2.9) of Martinsek (1983). I 2 (a) To evaluate E[I2(a)], observe that 2 ST − µT 2aσ β 1/ β − 2 (WT − T σ 2 )( ST − µT ) β a 1/ β − 2 WT − σ T 2 T* T σ = + * T aσ β T aσ β aσ β 2 2 2 S − µT a 1/ β − 2 WT − σ 2T a T* − 2σ − 2σ β T*1/ β − 2 T β T T aσ β aσ in distribution → 2σ 1− 2 β (2 Z ) 2 − 2σ 1− 2 β Z 2 − 2σ 1− 2 β Z 2 = 4σ 1− 2 β Z 2 β β as a → ∞, by Anscombe’s theorem and the fact that T* → σ w.p.1 as where Z is a random variable having the Standard Normal distribution. E[ I= 4σ 1− 2 β + o(1) 2 ( a )] as a → ∞, by (21) and (2.3) and (2.4) of Martinsek (1983). ( 21) a → ∞, Thus, (22) Taking expectation in (19) and using (20) and (22) yields a1−1/ β 2(1 − β ) 1 E 1−1/ β − β −1 a ( X T − µ= + o(1) ) σ σ 2 β +1 T as a → ∞. (23) The lemma follows by taking the limit, as a → ∞, in (15) and using (23) and (16). 66 Sequential estimation of the mean Theorem 3.4: Let T be defined by (3.2) with ma being such that δ√a ≤ ma = o(a) as a → ∞ for some δ > 0 and β > 1. f θ (x) ∈ C Then, for any probability density function with respect to which E[|X1|8+p] < ∞ for some p > 0, ra* (T , µT* ) = 3β − β2 β2 2 β ( β − 1)γ β 2γ 2 γ 2 − + + o(1) + − β κ + ( β 2 + β )γ 2 + σ3 4 4 2 4 as a → ∞. Proof: The theorem follows by taking the limit, as a → ∞, in (14) and using (11), Lemma 3.3 and the fact that a 2−2 / β E 2−2 / β T as a → ∞ if β 1 = 2 β − 2 + o(1) σ > 1, by the fact that T/a → σβ w.p.1 as a → ∞ (see the first assertion of Lemma 2.2) and (2.2) of Martinsek (1983) . 4 Tables The tables below show the values of ρ and ρ* for certain skewed distributions, where ρ * = ρ − γ is the asymptotic regret incurred by the procedure 2 (t , µˆ t ) and ρ = 2.75 − 0.75κ + 2γ 2 represents the asymptotic regret incurred by the procedure (t , X t ) . Table 1: GAMMA(α, θ) with known α γ κ ρ 2 6 2.75 + α α ρ* 3.5 α 2.75 + 3.5 α − 1 α Mohamed Tahir 67 Table 2: PARETO(5, θ) γ κ ρ ρ* 2(1 + α ) α − 2 = 4.6476 α −3 α 6(α 3 + α 2 − 6α − 2) + 3 = 73.8 α (α − 3)(α − 4) -9.4 -11.7238 Table 3: RAYLEIGH(θ) γ 2 π (π − 3) = 0.6311 (4 − π ) 3 / 2 κ 3− 6π 2 − 24π + 16 = 3.2451 (4 − π ) 2 ρ ρ* 1.11245 0.7969 Table 4: SKEW-UNIFORM(λ, θ) with λ = -1.4 and λ = 1.35 γ = 2λ (5λ2 − 9) 5(3 − λ2 ) 3 / 2 κ= 2λ (5λ2 − 9) 5(3 − λ2 ) 3 / 2 ρ ρ* γ<0 κ >0 -29.9109 -29.6997 3 3 if λ ∈ − 3,− ∪ 0, 5 5 if − 3 < λ < 3 (λ = -1.4) (λ = -1.4) 3 <λ<0 γ > 0 if − 5 -0.7671 (λ = 1.35) -0.7909 (λ = 1.35) 68 Sequential estimation of the mean 5 Conclusion We have proposed a bias-corrected estimator of the mean of a class of skewed probability density functions and provided a second-order asymptotic expansion for the regret under the squared error loss. The results indicate that the proposed procedure performs better than the best fixed-sample-size procedure when the observations are taken from the Gamma, Pareto, Rayleigh or Skew-uniform distribution. For a more general loss function, we have proposed bias-corrected estimator of the mean and provided a second-order asymptotic expansion for the incurred regret. References [1] F. Anscombe, Large sample theory of sequential estimation, Proceedings Cambridge Philos. Soc., 48, (1952), 600 – 607, 1952. [2] C. Chang, A.K. Gupta and W.J. Huang, Some skew-symmetric models, Random Operators and Stochastic Equations, 10, (2002), 133 – 140. [3] I.S. Chang and C.A. Hsiung, Approximations to the expected sample size of certain sequential procedures, Proceedings of the Conference on Recent Developments in Statistical Methods and Applications, Taipei, December 1979, Inst. Of Math. Acad. Sinica, (1979), 71 – 82. [4] Y.S. Chow and K.F.Yu, The performance of a sequential procedure for the estimation of the mean, Annals of Statistics, 9, (1981), 189 – 198. [5] M. Ghosh and N. Mukhopadhyay, Sequential point estimation of the mean when the distribution is unspecified, Commununications in Statistics -Theory & Methods, A8, (1979), 637 – 652. [6] A.T. Martinsek, Second order approximation to the risk of a sequential procedure, Annals of Statistics, 11, (1983), 827 – 836. [7] A.T. Martinsek. Negative regret, optional stopping and the elimination of Mohamed Tahir 69 outliers. J. A. S. A., 83, (1988), 160 – 163. [8] H. Robbins, Sequential estimation of the mean of a normal population, Probability and Statistics, The Harold Cramer Volume, (1959), 235 – 245. [9] N. Starr and M. Woodroofe, Remarks on sequential point estimation. Proceedings Nat. Acad. Sci., 63, (1969), 285 – 288. [10] M. Tahir, An asymptotic lower bound for the local minimax regret in sequential point estimation, Annals of Statistics, 17, (1989), 1335 – 1346. [11] Y. Vardi, Asymptotic optimal sequential estimation: the Poisson case. Annals of Statistics, 7, (1979), 1040 – 1051. [12] M. Woodroofe, Second order approximations for sequential point and interval estimation, Annals of Statistics, 5, (1977), 984 – 995. [13] M. Woodroofe, Non Linear Renewal Theory in Sequential Analysis, Society for Industrial and Applied Mathematics, Philadelphia, 1982.