7.4 Order Statistics (Optional)

advertisement
Proof of results given in Equations (7.1.11) and (7.1.12) of the Text:
To prove (7.1.11) and (7.1.12) consider the sample sum T  X 1 
N
 X n for all  
n 
possible samples. We wish to find the mean of all these T’s, which is what is meant by E (T ) .
We have,
E (T ) 
where T1 ,
,T

N
n
T1  ...  T N
n 
N
 
n 
,
(1)
N
would be the   possible sample sums if they were arranged in some
n 
order. Now each of the X i ' s, that is, X1 ,
write (1) in terms of X1 ,
 N  1
, X N , occurs in 
 T’s. Hence we may
 n 1
, X N as follows:
 N  1

 ( X1 
n 1 

E (T ) 
N
 
n
 N  1  N   n 
But 
       and X1 
 n 1  n   N 
 XN )
(2)
 X N  N  ; we have
n
E (T )    N   n ,
N
(3)
thus verifying (7.1.11).
Now consider (7.1.12). We can write
2
1 
Var (T ) 
T1  n  
 N  
 
n
2

 
  T  n   .
 
 
N
(4)
n
Using (1) and (3), we find, using Var (T) = E (T 2 )  ( E (T ))2 , that
Var (T ) 
1  2
T1 
 N  
 
n

 T 2   n2  2 .
 
N
n
(5)
Now for any sample X 1 ,
( X 12 
 X n2  2 X 1 X 2 
, X n , the T for this sample is ( X 1 
 2 X n  1 X n ) . Thus, any X 2 , say X i2 , occurs in
any product of two X’s say X i X j occurs in
Var (T ) in terms of X 12 ,
Var (T ) 
1
N
 
n
 X n ) and T 2 is
, X N2 , X 1 X 2 ,
 
N 2
n2
 T
N 1
n 1
2
’s, and
of the T 2 ’s. Hence, we can express
, X N  1 X N as follows,
 N  1  2
 2 2
 N  2
2

 ( X 1  ...  X N )  2 
 ( X 1 X 2  ...  X N 1 X N )   n 
n2 
 n  1 

(6)
which can be written as
Var (T ) 
  N  1   N  2  
1 
2
 

 ( X1 
n

1
n

2
N
 

 
 
n
 N  2
 
 ( X1 
n2
This equation results by adding and subtracting

 X N )2   n2  2 .

 (X
N 2
n2
 X N2 )
2
1

(7)
 X N2 ) inside the bracket in
(7) and noting that
 N  2
2
 ( X1 X 2 
n2
 N  2 2
 X N 1 X N )  
 ( X1 
n2
 X N2 )
 N  2

 ( X1 
n2
But since
 N  1  N  n

  
 n 1   n  N
and X 1 
and
 N  2   N  n (n  1)

  
 n  2   n  N ( N  1)
 X N  N  , we find that (7) reduces to
Var (T ) 
n ( N  n)
[( X 12 
N ( N  1)
 X N2 )  N  2 ].
Now using
2 
we obtain
1
( X 12  X 22 
N
 X N2  N  2 )
 X N )2 .
 N n 2
Var (T )  
 n
 N 1 
which proves (7.1.12). Note that if N   then Var(T) = n 2 , and consequently
Var( X )   2 / n.
7.2.1 The Central Limit Theorem
Theorem 7.2.1 (Central Limit Theorem) If X 1 ,
, X n are independent and identically
distributed with mean  and variance  2 (both finite), then the limiting form of the
distribution of Z n 
X 
as n   is that of the standard normal, that is, normal with
/ n
mean 0 and variance 1.
Proof: We prove this theorem for the case when the moment generating function
of X i ' s exists. We write Z n as
Zn 
Xi 
 X n  n
 n
X 
X 
 i
  n
 n
 n
n

i 1
Ui
 n
,
where U i  X i   .
Because Xi’s are independent and identically distributed with mean  and variance
 2 , U i , i  1, 2,
, n are independent and identically distributed with mean 0 and variance
 2 . Furthermore, from Chapter 6, we have that the moment generating function of a linear
combination of independent random variables, U i ’s, is the product of their moment
generating functions, so that the moment generating function of Z n is given by
 t 
M Zn (t )   M Ui 

 n 
i 1
n
Again, from (4.2.12), we know that for any random variable, say X, that
(1)
M x (t )  1  1  2
t2
t3
 3 
2!
3!
(.2)
Thus, from (1) and (2), it follows that
n

t2
t3
M Zn (t )   1 
 3

2!n
3! 3n3/ 2
i 1 



(3)
where  r  E (U ir ) and r  E (( X   ) r ).
Now (3) may be rewritten in the form
 t2  n
M Z n (t )   1 
2n
i 1 
n
n
  t2  n 
  1 
 ,
2n 
 
(4)

where  n   n (t )   (2 j t j ) / j ! j n j /21 so that  n approaches zero as n   . Thus,
j 3
letting n   , we have
1
t2
lim M Z n (t )  e 2 as n  ,
which is the m.g.f. of the standard normal distribution. Thus, Theorem 6.3.1 and
Theorem 7.2.1states that as n   , the limiting distribution of Z n is the standard normal.
That is, for n large enough M Z n (t ) is approximately that of the m.g.f. of the standard normal
so that, approximately, the distribution of Z n for n large enough, is that of the standard
normal. This in turn says that for large n,
X
i
is approximately normally distributed with
n
mean n and variance n 2 and X   X i / n is approximately normal with mean  and
i 1
n
variance  2 / n . In fact, algebraically,
Zn 
X
i 1
i
 n
n

X 
.
/ n
Theorem 7.3.6 If X and Y are independent random variables having the normal distribution
N (0,1) and the Chi-Square distribution with n degrees of freedom respectively, then the
random variable
X
Y /n
T=
(1)
has the following probability density function
((n  1) / 2)  t 2 
f(t) =
1  
n
n (n / 2) 

n 1
2
  t  .
(2)
Proof : To prove this we note that, since X and Y are independent, (X, Y) has the joint
probability density function
g ( x, y ) =
1  x2 /2 ( y / 2)n /21 e y /2
e
.
2(n / 2)
2
(3)
Now consider the transformation
T=
X
, U  Y,
Y /n
which has as the inverse transformation,
X
T U
,
n
Y  U.
We then find that the joint probability density function of (T, U) is
f (t , u)  g ( x, y) J
(4)
where the absolute value of the Jacobian, J , is easily seen to be U / n , so that
f (t , u ) =
 1
1
t2 
(u / 2)[( n1)/2]1 exp   u(1  )  .
n 
2 n 2(n / 2)
 2
Now integrating (5) with respect to u for a fixed t, we obtain (2).
(5)
Theorem 7.3.10 The probability density function of the Snedecor F-distribution with
 1 and 2 degrees of freedom is
1 /2
[( 1  2 ) / 2]   1 
h( f ) 
 
( 1 / 2)( 2 / 2)   2 
f
1 /2 1
 1 f 
1 

2 

 (1  2 )/2
= 0,
, f 0
(1)
otherwise
Proof: To prove Theorem 7.3.10, we begin with the p.d.f. of ( X 1 , X 2 ) , where X1 and X2 are
two independent random variables having Chi-Square distributions with  1 and  2 degrees of
freedom respectively. The joint p.d.f is given by
g ( x1 , x2 ) 
( x1 / 2)(1 /2)1 ( x2 / 2)( 2 /2) 1  ( x1  x2 )/2
,
e
4( 1 / 2)( 2 / 2)
for x1  0, x2  0
Let us use the transformation
F
X1 /1
,U  X 2
X 2 / 2
which has the inverse transformation given by
X1 
1
UF , X 2  U
2
Now the joint p.d.f. of (F, U) is
h(f, u) = g ( x1 , x2 ) J
where the Jacobian J is easily seen to be equal to ( 1 /  2 )u . Thus, the joint p.d.f. of (F, U) is
given by
h( f , u ) 
( 1 /  2 )1 /2 (u / 2)(1  2 )/21 f (1 /2) 1

1
exp[ u (1  1 f )] , f  0, u  0 .
2( 1 / 2)( 2 / 2)
2
2
(2)
Integrating (7.3.19) with respect to u from 0 to  for a fixed f gives (1). Also, at this point,
we may remark that t2
F1, as the reader may easily verify.
7.4 Order Statistics (Optional)
In this section we shall consider probability distributions of statistics that are obtained if one
orders the n elements of a sample from least to greatest, and if sampling is done on a
continuous random variable X whose p.d.f. is f(x). Suppose we let X 1 ,..., X n be a random
sample from a population having continuous p.d.f. f(x). We note that since X is a continuous
random variable, the probability of X assuming a specific value is 0. In fact, by a
straightforward conditional probability argument, we can show that for any two of
( X 1 ,..., X n ), the probability of their having the same value is zero.
Consider then the observations ( X 1 ,..., X n ) from a population having a p.d.f. f (x) . Let
X (1) = smallest of ( X 1 ,..., X n ),
X (2) = second smallest of ( X 1 ,..., X n ),

X ( k ) = k-th smallest of ( X 1 ,..., X n ),

X ( n ) = largest ( X 1 ,..., X n ).
Note that X (1) < X (2) < …< X ( k ) …< X ( n ) . The quantities X (1) , X (2) ,..., X ( n ) are random
variables and are called the order statistics of the sample. X (1) is called the smallest element
in the sample, X ( k ) the kth-order statistic; X ( m 1) is the sample median when the sample size
is odd so that n = (2m+1), and X ( n ) the largest; R  X ( n )  X (1) is called the sample range.
7.4.1 Distribution of the Largest Element in a Sample
As we have just stated, X ( n ) is the largest element in the sample X 1 ,..., X n . If the sample is
drawn from a population having p.d.f. f (x ) , let F(x) be the c.d.f. of the population defined
by
x
F ( x)   f (u)du  P( X  x)

(7.4.1)
Then the c.d.f. of X ( n ) is given by
P( X ( n )  x)  P( X 1 ,..., X n are all  x )
 ( F ( x))n ,
(7.4.2)
because the X i ' s are independent and P( X i  x)  F ( x) for i  1, 2,..., n . If we denote the
c.d.f. of the largest value by G ( x ) , we have
G( x)  ( F ( x))n
(7.4.3)
The above result says that if we take a random sample of n elements from a population
whose p.d.f. is f (x ) [or whose c.d.f. is F(x)], then the c.d.f. G ( x ) of the largest element in
the sample, denoted by X, is given by (7.4.3).
If we denote the p.d.f. of the largest element by g X ( n ) ( x) , we have
g X ( n ) ( x)  (d / dx)G( x)  n( F ( x))n1 f ( x) .
(7.4.4)
Example 7.4.1 (Distribution of Last Bulb to Fail) Suppose the mortality of a certain type of
mass-produced light bulb is such that a bulb of this type, taken at random from production,
burns out in time T. Further, suppose that T is distributed as exponential with parameter  ,
so that the p.d.f. of T is given by
f (t )  e t
t>0,
=0,
t 0
(7.4.5)
where  is some positive constant. If n bulbs of this type are taken at random, let their lives
be T1 ,..., Tn . If the order statistics are T(1) ,..., T( n ) , then T( n ) is the life of the last bulb to burn
out. We wish to determine the p.d.f. of T( n ) .
Solution: To solve this problem we may think of a population of bulbs whose p.d.f. of length
of life is given by (7.4.5), we first determine that the c.d.f. of T is given by
t
t

0
F (t )   f (t )dt   et dt  1 et .
(7.4.6)
Applying (7.4.4), we therefore have as the p.d.f. of T( n ) ,
gTn (t )  n (1  et )n1 et ,
= 0,
t >0
t 0
(7.4.7)
In other words, the probability that the last bulb to burn out expires during the time interval
(t , t  dt ) is given by g(t)dt, where
g (t )dt  n (1  et )n1 et dt .
7.4.2 Distribution of the Smallest Element in a Sample
(7.4.8)
We now wish to find the expression for the c.d.f, of the smallest element X (1) in the sample
X 1 ,..., X n . That is, we want to determine P( X (1)  x) as a function of x.
Denoting this function by G(x), we have
G ( x)  P ( X (1)  x)
 1  P( X (1)  x) .
(7.4.9)
But
P( X (1)  x)  P( X 1 ,..., X n are all  x )
 [1  F ( x)]n ,
(7.4.10)
because the X i ' s are independent and P( X i  x)  1  F ( x); i  1, 2,..., n. Therefore, the c.d.f.
G(x) of the smallest element in the sample is given by
G( x)  1  [1  F ( x)]n .
(7.4.11)
The p.d.f., say g(x), of the smallest element in the sample is therefore obtained by taking the
derivative of the right-hand side of (7.4.11) with respect to x. We thus find
g ( x)  n[1  F ( x)]n1 f ( x)
That is, the p.d.f. of X (1) , is given by
g X (1) ( x)  n[1  F ( x)]n1 f ( x) .
(7.4.12)
Example 7.4.2 (Probability Distribution of the Weakest Link of a Chain) Suppose links of a
certain type used for making chains are such that the population of individual links has
breaking strengths X with p.d.f.
f ( x) 
(m  1)( m  2) m
x (c  x ) ,
c m2
= 0,
0  x  c,
(7.4.13)
otherwise
where c and m are certain positive constants. If a chain is made up of n links of this type
taken at random from the population of links, what is the probability distribution of the
breaking strength of the chain?
Since the breaking strength of a chain is equal to the breaking strength of its weakest link, the
problem reduces to finding the p.d.f. of the smallest element X (1) in a sample of size n from
the p.d.f . f (x ) given in (7.4.13).
First we find the c.d.f. F(x) of breaking strengths of individual links by performing the
following integration:
x
F ( x) 

(m  1)(m  2) m
0 u (c  u )du ,
c m2
x
f (u )du 

(7.4.14)
that is,
 x
F ( x)  (m  2)  
c
m 1
 x
 (m  1)  
c
m2
.
(7.4.15)
With the use of (7.4.12) and (7.4.13) we obtain the p.d.f. of the breaking strength X of an
n-link chain made from a random sample of n of these links;
m 1
m2
(m  1)(m  2) x m 
x
x 
g ( x)  n
 1  (m  2)    (m  1)   
c m 2
c
 c  

n 1
(c  x),
(7.4.16)
for 0  x  c, and g(x)=0, otherwise.
7.4.3 Distribution of the Median of a Sample and of the kth Order Statistic
Suppose we have a sample of 2m+1 elements X1 ,..., X 2 m1 from a population having p.d.f
f (x ) [and c.d.f. F(x)]. If we form the order statistics X (1) ,..., X (2 m 1) of the sample, then
X ( m 1) is called the sample median. We want to determine the probability distribution
function for the median. Let us divide the x-axis into the following three disjoint intervals:
I1  (, x],
I 2  ( x, x  dx],
(7.4.17)
I3  ( x  dx, ) .
Then the probabilities p1 , p2 , p3 that an element X drawn from the population with p.d.f
f (x ) will lie in the intervals I1 , I 2 , I 3 are given, respectively, by
 p1  F ( x)

 p2  F ( x  dx)  F ( x)
 p  1  F ( x  dx)
 3
(7.4.18)
respectively.
If we take a sample of size 2m+1 from the population with p.d.f f (x ) , the median of the
sample will lie in ( x, x  dx) if, and only if, m sample elements fall in I1  (, x] , one
sample element falls in I 2  ( x, x  dx] and m sample elements fall in I 3  ( x  dx, ) . The
probability that all of this occurs is obtained by applying the multinomial probability
distribution discussion in Section 4.7. This gives
(2m  1)!
( p1 ) m ( p2 )1 ( p3 ) m .
2
(m!)
(7.4.19)
But substituting the values of p1 , p2 , p3 from (7.4.18) into (7.4.19), we obtain
(2m  1)! m
F ( x)[ F ( x  dx)  F ( x)][1  F ( x  dx)]m .
(m!)2
(7.4.20)
Now we may write
F ( x  dx)  F ( x)  f ( x)dx .
(7.4.21)
Substituting this expression into (7.4.20), we find that (ignoring terms of order (dx)2 and
higher)
P( x  X ( m1)  x  dx) 
(2m  1)! m
F ( x)[1  F ( x)]m f ( x)dx
2
(m!)
(7.4.22)
The p.d.f. g ( x ) of the median is the coefficient of dx on the right-hand side of (7.4.22), and
the probability that the sample median X ( m 1) falls in interval ( x, x  dx ) is given by
g X ( m1) ( x)dx 
(2m  1)! m
F ( x)[1  F ( x)]m f ( x)dx. ,
2
(m !)
(7.4.23)
We note that the sample space of the median X ( m 1) is the same as the sample space of X,
where X has the (population) c.d.f. F(x).
Example 7.4.3 (Probability Distribution of Median) Suppose 2m+1 points are taken “at
random” on the interval (0,1). What is the probability that the median of the 2m+1 points
falls in ( x, x  dx) ?
In this example the p.d.f. of a point X taken at random on (0,1) is defined as
f ( x)  1 ,
=0,
0  x 1
for all other values of x.
Then
F ( x)  0 ,
x0
= x,
0  x 1
= 1,
x  1.
Therefore, the p.d.f. g X ( m1) ( x) of the median in a sample of 2m+1 points is given by
g X ( m1) ( x) 
(2m  1)! m
x [1  x]m , if 0  x  1,
2
(m !)
and zero otherwise. Hence, the probability that the median of the 2m+1 points falls in
( x, x  dx) , is given by
g X ( m1) ( x)dx 
(2m  1)! m
x [1  x]m dx.
2
(m!)
_____________________________________________________________
More generally, if we have a sample of n elements, say X 1 ,..., X n , from a population having
p.d.f. f (x ) and if X ( k ) is the kth-order statistic of the sample (the kth smallest of X 1 ,..., X n ),
then we can show, as in the case of the median, that
P( x  X ( k )  x  dx) 
n!
F k 1 ( x)[1  F ( x)]n k f ( x)dx.
(k  1)!(n  k )!
(7.4.24)
Therefore, the p.d.f. of the kth-order statistic of the sample is given by
g X ( k ) ( x) 
n!
F k 1 ( x)[1  F ( x)]n k f ( x).
(k  1)!(n  k )!
(7.4.25)
Note that the functional form of the p.d.f. on the right of (7.4.25) reduces to that on the right
of (7.4.12) if k =1, and to that on the right of (7.4.4) if k = n, as one would expect, since in
these two cases the kth-order statistic X ( k ) becomes the smallest element X (1) and the largest
element X ( n ) , respectively.
Example 7.4.4 (Distribution of the kth Order Statistic) If n points X 1 ,..., X n are taken “at
random” on the interval (0, 1) what is the p.d.f. of the kth order statistic X ( k ) ?
Using (7.4.25), the p.d.f. of X ( k ) is given by:
g X ( k ) ( x) 
n!
x k 1[1  x]n k , if 0  x  1,
(k  1)!(n  k )!
since
x
F ( x)   1dx  x, if 0  x  1
0
and zero otherwise.
7.4.4 Other Uses of Order Statistics
 The Range as an Estimate of  in Normal Samples
Suppose a random variable X has the normal distribution with unknown standard deviation
 . If a sample of n independent observations is taken on X, then R  X ( n )  X (1) may be used
as an estimate of  . This estimate is not good for large n, but for small n( n  10) is deemed
to be adequate. The estimate ̂ is made using the formula
ˆ  c(n) R ,
(7.5.1)
where c (n) is tabulated in Table 7.5.1.
TABLE 7.5.1
n c(n)
n
c(n)
3 .591 7
.370
4 .486 8
.351
5 .430 9
.337
6 .395 10 .325
Practice Problems for Section 7.4
1. A continuous random variable, say X, has the uniform distribution function on (0,1) so that the
p.d.f. of X is given by
0,

f ( x)  1,
0,

x0
0  x 1
x 1
If X (1) , X (2) ,..., X ( n ) are the order statistics of n independent observations all having this
distribution function, give the expression for the density g(x) for
(a) The largest of these n observations.
(b) The smallest of these n observations.
(c) The rth smallest of these n observations.
2. If ten points are picked independently and at random on the interval(0,1):
(a) What is the probability that the point nearest 1 (i.e., the largest of the ten numbers selected)
will lie between .9 and 1.0?
(b) The probability is ½ that the point nearest 0 will exceed what number?
3. Assume that the cumulative distribution function of breaking strengths (in pounds) of links
used in making a certain type of chain is given by
F ( x)  1  e   x ,
=0,
x  0,
x0,
where  is a positive constant. What is the probability that a 100-link chain made from these
links would have a breaking strength exceeding y pounds?
4. Suppose F ( x) is the fraction of objects in a very large lot having weights less than or equal to
x pounds. If ten objects are drawn at random from the lot:
(a) What is the probability that the heaviest of these ten objects will have a weight less than
or equal to u pounds?
(b) What is the probability that the lightest of the objects will have a weight less than or
equal to v pounds?
5. The time, in minutes, taken by a manager of a company to drive from one plant to another is
uniformly distributed over an interval [15, 30]. Let X1 , X 2 ,
randomly selected days and let X ( n )  Max( X 1 , X 2 ,
, X n denote her driving times on n
, X n ). Determine
(a) The probability density function of X ( n ) .
(b) The mean of X ( n ) .
6. The lifetime, in years, X1 , X 2 ,
, X n of n randomly selected power steering pumps
manufactured by a subsidiary of a car company is exponentially distributed with
mean 1/ . Find the probability density function of X (1)  Min( X 1 , X 2 ,
, X n ) and find its mean
and variance.
7. In Problem 5, assume that n = 21.
(a) Find the probability density function of the median time taken by the manager to drive from
one plant to another.
(b) Find the expected value of X (21) .
8. Consider a system of n identical components operating independently. Suppose the lifetime,
in months, is exponentially distributed with mean 1/ . These components are installed in series,
so that the system fails as soon as the first component fails. Find the probability density function
of the life of the system and then find its mean and variance.
Download