STATISTICS: MODULE 12122 CHAPTER 5 DESIRABLE PROPERTIES OF POINT ESTIMATORS As we said in Chapter 4, in estimation we are concerned about obtaining the best ~ , σ , σ 2 , p or possible estimates of the unknown population parameters such as µ , µ functions of these parameters. In order to help us to decide what is the best estimator to use in any situation, we have to know about the properties of these estimators and choose those which have ‘good’properties such as unbiasedness, efficiency, consistency. We need therefore to know what these terms mean but also we need to know about the sampling distributions of certain estimators. 5.1. Point estimators A point estimator is a statistic obtained from the sample, which is used to estimate an unknown parameter or function of the parameter. It is therefore a random variable as it varies from sample to sample and so it has a sampling (probability) distribution. Example 5.1 Suppose X 1 , X 2 , X 3 ,......., X n is a random sample from a population which has mean µ and variance σ 2 . Estimating µ If we wish to estimate the population mean µ , we could use the sample mean X X + X 2 + .......+ X n where X = 1 , and we know from Chapter 4 that the sampling n distibution of X is exactly Normal if the population is Normal (see 4.7) or approximately Normal if the population is non-normal, the approximation to normality improving as n, the sample size increases(see 4.12). We will see in this chapter that X is an unbiased, efficient and consistent estimator of µ and so it is a good point estimator of µ . Estimating σ 2 If we wish to estimate the population variance σ 2 , we can use n S2 = ∑ 1 (Xi − X) n− 1 ∑ (X n 2 or S *2 = 1 − X) 2 i n . We will see in this chapter that S 2 is an unbiased, consistent point estimator of σ 2 which is more efficient an estimator than S *2 . Also S *2 is a biased estimator of σ 2 so the ‘best’estimator of σ 2 is S 2 . We will consider the sampling distribution of S 2 later on. It involves a distribution called the χ 2 (chi-square) distribution. 2 5.2 Desirable Properties of Point Estimators Certain point estimators are better than others because they have good properties. We will consider those next. 1. Unbiasedness Definition Suppose θ∃ is a point estimator of the parameter θ, then θ$is an unbiased estimator of θ if E(θ∃) = θ . The bias of an estimator is given by b θ∃ () () where b θ∃ = E(θ∃) − θ . Asymptotic Bias The asymptotic bias is lim n→ ∞ () b θ∃ . Example 5.2 Suppose X 1 , X 2 , X 3 ,......., X n is a random sample from a Normal population which ∑ (X n has mean µ and variance σ 2 . Show that if S 2 = ( ) and Var S 2 = 1 − X) 2 i n− 1 ( ) , then E S 2 = σ 2 2σ 4 2 2 . Is S an unbiased estimator of σ ? (n − 1) Solution We need to know about the sampling distribtion of S 2 Sampling distribution of the sample variance S 2 Suppose we take many many random samples of size n from this Normal population and for each sample we compute the sample variance S 2 . The sample variance S 2 will vary from sample to sample. What can we say about its sampling distribution ?. In theory (n − 1)S 2 ~ χ 2 (n − 1) σ2 i.e. the sampling distribution of S 2 is some form of chi-square distribution with parameter (or degrees of freedom) equal to (n − 1). 3 Example 5.3 In a Binomial experiment (e.g suppose you are looking at whether there is discrimination against women in the police force with regards to promotion or ascertaining how many consumers prefer Goldtaste to other brands of coffee in a consumer survey) suppose Y is the number of successes in a random sample of size n ( e.g. Y = the number of women policeofficers who get promoted or the number of consumers who prefer Goldtaste). Assume Y ~Bin( n , p) where p is the probability of success (e.g. the probability of promotion or the probability of consumer preferring Goldtaste). The two following statistics are proposed as estimators of p. p∃1 = Y n and p∃2 = (Y + 1) n+ 2 (a) Show that p∃1 is unbiased but p∃2 is not. What can you say about the asymptotic bias of the two estimators? (b) Suppose a random sample of 240 policewomen is taken and the number of them who have been promoted is 36. Suppose also that a random sample of 960 policemen is taken and the number of them who have been promoted is 288. Obtain point estimates of the probabilities that policemen and policewomen get promoted using both point estimators defined in (a). Which estimator is likely to give the best estimates? 2. Efficiency The mean squared error of a point estimator θ∃is defined as: ( M.S.E. ( θ∃) = E θ∃− θ ) 2 Estimator θ∃1 is said to be more efficient than estimator θ∃2 if: M.S.E. ( θ∃1 ) < M.S.E. ( θ∃2 ) M.S.E. ( θ∃) We can obtain the mean squared error in terms of the variance of θ∃and the bias of θ∃as follows: 2 ∃ + θ2 = E θ∃− θ = E θ∃2 − 2 θθ ( ) ( ) If an estimator is unbiased then M.S.E. ( θ∃) = Var ( θ∃) so an unbiased estimator θ∃1 is said to be more efficient than another unbiased estimator θ∃ if Var ( θ∃1 ) < Var( θ∃2 ) 2 4 Example 5.4 Let X1 , X 2 ,... X n denote a random sample from a population with probability density function : f ( x) = λx λ− 1 = 0 0 < x < 1, λ > 0 elsewhere. λ . Obtain an expression for the It is proposed that X be used as an estimator of λ+ 1 M.S.E. ( X ). 3 Consistency Definition Let θ∃be an estimator of parameter θ based on a random sample of size n, then θ∃is a consistent estimator of θ if : lim n→ ∞ or ( P θ∃− θ ≤ε ) = 1 for all values of ε and ε > 0 plim θ∃= θ (where ε is very small e.g. 10 − 20 ) i.e the (sampling) probability distribution of θ∃gets more and more concentrated around θ as the sample size n increases towards infinity. We say that θ∃converges in probability to θ as n → ∞ . As you can see this definition is not an easy definition to check but fortunately there is a sufficient condition for consistency (a sufficient condition is one which guarantees the truth of something, see Chapter 1, QM1) and this can be used to prove consistency in many cases. Sufficient condition for consistency A sufficient but not necessary condition for estimator θ∃to be a consistent estimator of θ is that M.S.E. ( θ∃) → 0 as n → ∞ N.B. It is important to realise that if M.S.E. ( θ∃) → 0 as n → ∞ , it does not follow that θ∃is inconsistent. There are some such estimators which you will meet in Econometrics. Their M.S.E.’s do not tend to 0 as n → ∞ , but they are consistent. 5 Example 5.5 Show that the estimator X is a consistent estimator of λ in Example 5.4. λ+ 1 Example 5.6 (a) Suppose we have a random sample from a Normal population N ( µ , σ 2 ) . Sample mean X and E( X )= µ , Var ( X ) = σ2 n ~ ~ π σ2 Sample median X and Var X ≅ for large n 2n () () ~ ~ X and X are both unbiased estimators of µ but Var ( X ) < Var X . So ~ X is a more efficient estimator of µ than X . Hence X is the ‘best’estimator to use when estimating µ . (b) From Example 5.2, S 2 is an unbiased estimator of σ 2 and hence 2σ 4 M.S.E. ( S 2 ) = Var S 2 = (n − 1) ( ) So as n → ∞ , M.S.E. ( S 2 ) → 0 so S 2 is a consistent estimator of σ 2 . ∑ (X n Considering S ( ) so E S *2 = *2 = − X) 2 i 1 n (n − 1)E (S 2 ) n = , S *2 (n − 1)σ 2 n n − 1)S 2 ( = n ≠ σ 2 hence S *2 is a biased estimator of σ 2 . Also although S *2 is a consistent estimator of σ 2 , it can be shown that M.S.E.( S *2 ) > M.S.E. ( S 2 ), so S *2 is not as efficient an estimator of σ 2 as S 2 . Hence S 2 is the ‘best’estimator to use when estimating σ 2 . C.Osborne March 2000 6 Example 5.3 In a Binomial experiment suppose Y is the number of successes in a random sample of size n. Assume Y ~Bin( n , p) where p is the probability of success. The two following statistics are proposed as estimators of p. p∃1 = Y n and p∃2 = (Y + 1) n+ 2 (a) Show that p∃1 is unbiased but p∃ 2 is not. What can you say about the asymptotic bias of the two estimators? (b) Suppose a random sample of 240 policewomen is taken and the number of them who have been promoted is 36. Suppose also that a random sample of 960 policemen is taken and the number of them who have been promoted is 288. Obtain point estimates of the probabilities that policemen and policewomen get promoted using both point estimators defined in (a). Which estimator is likely to give the best estimates? (c) Show that the two estimators are consistent. Solution Note that I have added part(c) to the original question. (a) Here the parameter being estimated is p and the estimators are p∃1 and p∃ 2. i.e θ = p and θ∃= p∃1 or p∃ 2 We need to look at E ( p∃1 ) and using the notation of Chapter 5. E ( p∃2 ). As Y ~Bin( n , p) , E (Y )= mean of the Binomial = np 1 Y 1 E ( p∃1 )= E = E (Y ) = np = p so p∃1 is an unbiased estimator of p. n n n E ( p∃2 )= E (Y + 1) n+ 2 = 1 1 1 E (Y + 1)= E (Y )+ 1) = (np + 1) ≠ p. ( n+ 2 n+ 2 n+ 2 Hence p∃2 is an biased estimator of p. The bias of p∃2 = b( p∃2 )= E ( p∃2 )- p = = 1 (np + 1)- p n+ 2 (np + 1) − p (n + 2 ) n+ 2 = 1− 2p . n+ 2 Asymptotic bias of p∃ 1 = lim n→ ∞ b( p∃1 )= 0 since b( p∃1 )= 0 as p∃1 is an unbiased estimator of p. Asymptotic bias of p∃ 2 = lim n→ ∞ b( p∃2 )= 0 since 1− 2p → 0 as n → ∞ . n+ 2 (b) Let pW = P( Policewoman is promoted) and let p M = P( Policeman is promoted) Using p∃1 = Y 36 288 , p∃W = = 0.1500 and p∃M = =0.3000. n 240 960 7 (Y + 1) 37 288 = 0.1529 and p∃M = = 0.3004 n+ 2 242 960 ∃ Which is the best estimator of p, p∃ 1 or p2 ? Using p∃2 = , p∃W = This is not an easy question to answer. On the basis of unbiasedness, you would theoretically ∃ choose p∃ 1 since it is unbiased whereas p2 is biased. However as the sample size n → ∞ , b( p∃2 ) → 0 and here in this particular example on police promotion, you can see that for large ∃ n the estimates obtained by using p∃ 1 and p2 agree to at least 2 decimal places. So practically ∃ speaking there is not much to choose between the two estimators p∃ 1 and p2 when considering unbiasedness. As Y ~Bin( n , p) we know that Var(Y) = np(1- p) np(1 − p) 1 = and Var Y ( ) n2 n2 np(1 − p) 1 Y + 1 Var ( p∃2 )= Var Var (Y ) = since Var(Y +1) = Var(Y). = 2 n + 2 (n + 2) (n + 2)2 Y n Now Var ( p∃1 )= Var = Comparing these two variances, Var ( p∃2 ) < Var ( p∃1 )so that p∃2 is less variable an estimator than p∃ 1. (c) In order to show consistency we need to calculate M . S . E ( p∃1 ) and M . S . E ( p∃2 ) and hopefully show that M . S . E ( p∃1 ) → 0 as n → ∞ , and M . S . E ( p∃2 ) → 0 as n → ∞ . M . S . E ( p∃1 ) = Var ( p∃1 ) + ( b( p∃1 )) = Var ( p∃1 ) since b( p∃1 )= 0, 2 = np(1 − p) n 2 = p(1− p) As n → ∞ , M . S . E ( p∃1 ) → 0 . M . S . E ( p∃2 ) = Var ( p∃2 ) + ( b( p∃2 )) = 2 np(1 − p) from parts (a) and (b), n np(1 − p) (n + 2)2 1 − 2 p + = 2 n + 4n + 4 n + 2 ( = 1 − 2 p + from parts(a) and (b). n+ 2 2 ) 2 p(1 − p) 1 − 2 p + 4 n + 2 n + 4 + n and as n → ∞ , M . S . E ( p∃2 ) → 0 ∃ Hence both p∃ 1 and p2 are consistent estimators of p. 2