Statistics 305 Parameters Used to Describe Characteristics of Population Elements

advertisement
S05
Statistics 305
Parameters Used to Describe Characteristics of
Population Elements
Every population consists of a set of numbers. To describe the characteristics of a population,
we use scalar numerical summary measures. These are called parameters.
The mean of population elements is the parameter named the population mean. A measure of
the amount of variability in population elements is a parameter named population variance.
A number Q ( p) such that the proportion p (for any 0 < p < 1 ) of population elements is less
than or equal to Q ( p) and the proportion 1 – p is greater than or equal to Q ( p) is a parameter
called the p quantile of the population. Thus, for example, Q (1 / 2) is the number such that half
of the population elements are less than or equal to Q (1 / 2) , and half are greater than or equal to
Q (1 / 2) . One-half is a rather special proportion and the name Median is given to this parameter.
Other rather special proportions, tenths and quarters, are given the names first, second, third, …,
etc. Deciles (for quantiles Q (0.1), Q (0.2), Q ( 0.3), ..., Q (0.9) ) and first, second and third
Quartiles (for Q (0.25), Q (0.5) and Q ( 0.75) ). Of course the second quartile is also named
Median.
Quantiles in the theoretical continuous type populations are found, for any given p, by solving
the integral equation
F ( Q( p) ) = p
(1)
for Q ( p) , where F ( x) is the cumulative distribution function for the population. For example,
the p = 0.5 quantile in the Exponential E (1) population is computed by solving the equation
Q( 0 . 5 )
∫0
e − y dy = 0.5
or
1 − e −Q (0 .5) = 0.5
Thus Q(0.5) = − ln (0.5) is the Median of the E(1) population.
Quantiles in discrete type populations are a nuisance to deal with. To do so we normally modify
the definition of Q(p) to say “the proportion 1 − p is greater than Q(p)”, rather than “greater than
or equal to Q(p)”. Consider for example the Binomial B(3, 0.1) population considered in a
previous handout. The cumulative distribution function values at unique population elements are
the following:
S05
x
0
1
2
3
F(x)
0.729
0.729 + 0.243 = 0.972
0.729 + 0.243 + 0.027 = 0.999
1.0
By the modified definition, the p = 0.729 quantile, Q(0.729), is zero. The p = 0.972 quantile is
Q(0.972) = 1. In other words the proportion 0.972 of population elements is less than or equal to
1 (i.e., are either 0 or 1). But what about the p = 0.96 quantile or the p = 0.97 quantile? A
correct but not very pleasing answer, which satisfies the definition of quantile, will be found by
interpolating between x = 0 and x = 1. So we might find, for example (maybe), that Q(0.96) =
0.94. Obviously, x = 0.94 is not a population element, also note that x = 0.94 satisfies the
definition as a p = 0.729 quantile as do other x values (hence the nuisance of dealing with
discrete population quantiles). Our solution for discrete populations will be to ignore p values
for which Q(p) in the solution to Equation (1) is not a population element. The solution to
equation (1) will be taken as the p quantile
When we consider computing quantiles in finite datasets (not theoretical populations) such as
random samples from continuous populations, we will interpolate when necessary and thus find a
quantile value which is not one of the dataset numbers. This will be described later in the
handout and on page 80 in your textbook.
In the following families of theoretical populations, the parameters Mean, Variance and p
quantiles are given as derived from mathematical theory. For example, you can see that the
mean of the Binomial B(4, 1/3) population is µ = 4(1/3) and the variance is σ 2 = 4(1/3) (2/3).
(It is customary to use the notation µ and σ 2 for these parameters.). For the p value
 4
0
p 0 =   (1 / 3) ( 2 / 3) 4
0
 
= ( 2 / 3) 4
the p0 = (2/3)4 quantile is taken as Q(16/81) = 0. Also for example, the Standard Normal
population, N(0, 1), has mean µ = 0 and variance σ 2 = 1. Quantiles (z) in N(0, 1) are given in
the margins of Table B.3 on pages 788-789 for corresponding (rounded) p values in the body of
the table. Thus (approximately) Q(0.0003) = −3.40, Q(0.00002) = −3.49, Q(0.5398) = 0.10 and
Q(0.5753) = 0.19 for the N(0., 1) population.
1.
Binomial B(n, r) .
The unique elements in these populations are nonnegative integers: zero thru n, and 0 < r < 1.
Denote these as x 0 = 0, x 1 = 1, …, x n = n. The mean and variance parameters of B(n, r) can
be shown to have value
Mean: µ = nr
2
S05
Variance : σ 2 = nr(1 − r)
Quantiles: Only quantiles for p-values p0 , p1 , …, pn , defined as follows, are considered
in this population. The pi-th quantile ( 0 < pi < 1) is x i ≡ Q( pi ) where pi and x i satisfy
(F(x) is the CDF)
F ( xi ) = pi
2.
Geometric G(r) .
The unique elements in these populations are all the positive integers. Denote these as x 1 = 1,
x2 = 2, … . The mean and variance of these populations can be shown to have value
Mean: µ = 1/r
Variance : σ 2 = (1 − r)/r2
Quantiles: Only quantiles for p-values p1 , p2 , …, defined as follows, are considered in
this population. The pi-th quantile ( 0 < pi < 1) is x i ≡ Q( pi ) where pi and x i satisfy
(F(x) is the CDF)
F ( xi ) = pi
3.
Normal N(µ, σ 2 ) .
The quantile for every p-value ( 0 < p < 1) is the number Q ( p) ≡ x p which satisfies the
equation ( Φ (x ) is the CDF)
Φ ( x p ) = p or
xp
∫
φ ( y ) dy = p ,
( φ ( x) is the density function)
−∞
The mean and variance of the N ( µ , σ 2 ) population are, respectively, µ and σ 2 . Quantiles
for certain p- values in N(0, 1) are given in Table B.3.
4.
Exponential E(α) .
Mean: µ = α
Variance : σ 2 = α 2
Quantiles: ( 0 < p < 1) The p-th quantile for given p-value is the real number x p
satisfying
(F ( x)
F(x p ) = p
3
is the CDF )
S05
Quantiles in Finite Datasets as Defined by
Textbook and JMP
The general definition of the p quantile of a random sample (or any dataset), for a given real
number 0 < p < 1, is a number Q(p) such that the fraction p of the elements are less than or equal
to Q(p) and the proportion 1 − p of the eleme nts are greater than or equal to Q(p). For finite
datasets this general definition cannot be satisfied for many given p values. To exhibit the
problem with the general definition for finite datasets, consider the set containing the numbers
{1, 5, 9}. The quantile Q(1/3) is a number such that p = 1/3 of the numbers are less than or equal
to Q(1/3) and 1 − p = 2/3 of the elements are greater than Q(1/3). What is the value of Q(1/3)?
Well Q(1/3) = 2 satisfies the definition. Q(1/3) = 3 also satisfies the definition as does 2.5, 3.5,
3.7, …, etc. There is not a unique answer, and for other p values such as p = 1/4 the choice is not
obvious. Thus, alternative definitions which produce approximate p quantiles are used. Your
textbook gives one on page 78, and JMP uses a slightly different one. This permits us to
compute approximations to population p quantiles in a random sample from a continuous
population.
For any ordered dataset y1 ≤ y 2 ≤ K ≤ y n , the p quantile (0 < p < 1) is the number Q(p) which
satisfies:
Textbook definition (see pages 78-81).
1. If i = np + 0.5 is an integer i ≤ n then Q(p) = yi .
0 .5
n − 0 .5
<p<
, then let j = [i] (largest integer <
n
n
i). The p quantile is computed by first computing
2. If i = np + 0.5 is not an integer and if
d =
p − ( ( j − 0.5) / n)
,
(( j + 0.5) / n) − (( j − 0.5) / n )
then
Q( p) = (1 − d ) Q (( j − 0.5) / n ) + dQ(( j + 0.5) / n ) .
If p ≤ 0.5/n then Q( p) = y1 , and if p ≥ (n − 0.5)/n then Q(p) = yn .
JMP definition
1. If i = np + p is an integer (i ≤ n), then Q(p) = yi .
2. If i = np + p is not an integer, the n let j = [i] (integer part of i) and f = i − j (fractional part
of i). Then if 0 < j < n, compute the p quantile as
4
S05
Q ( p) = (1 − f ) y j + f y j +1 .
If j = 0 or j = n use y1 for y0 or yn for yn+1 , so we have Q(p) = y1 if j = 0 and Q(p) = yn if j
= n.
For example, consider the dataset given in Example 5 on page 79. For p = 0.93, your textbook
definition gives
i = 10( 0.93) + 0.5
= 9 .8
Since this is not an integer, compute j = [i] = 9. Then
d =
=
0.93 − ((9 − 0.5) / 10)
((9 + 0.5) /10 ) = (( 9 − 0.5) /10 )
0.93 − 0.85
0.95 − 0.85
= 0.8
Q (0.93) = (1 − 0.8) Q ( 0.85) + 0.8 Q ( 0.95)
= ( 0.2) (9614) + (0.8) (10688)
= 10,473.2
The JMP definition gives i = (10) (0.93) + 0.93 = 10.23. Thus, j = 10 and f = 0.23. Since j = n,
Q(0.93) = 10,688. Both numbers are p = 0.93 quantiles of the dataset.
5
Download