4.5 The Mean and Median

advertisement
STAT 421 Lecture Notes
4.5
111
The Mean and Median
Definition 4.5.1. Let X be a random variable. Every m ∈ R such that
Pr(X ≤ m) ≥ 1/2 and Pr(X ≥ m) ≥ 1/2
is said to be a median. A median divides the distribution into two portions; the lower is
L = {x|x ≤ m} and the upper is U = {x|x ≥ m}, and they possess the property that
.5 ≤ Pr(X ∈ L)
and .5 ≤ Pr(X ∈ U ).
Suppose that X is continuous and the c.d.f. F is strictly increasing. Then, m = F −1 (.5).
Suppose that X ∼ Binom(4, .4). The table shows the p.f. and c.d.f.:
x
0
1
2
3
4
f (x) .1296 .3456 .3456 .1536 .0256
Pr(X ≤ x) .1296 .4752 .8208 .9744
1
Pr(X ≥ x)
1
.8704 .5248 .1792 .0256
The median of this distribution is 2. Since Pr(X > 2) = Pr(X ≥ 3) < .5 and that
P r(X < 2) = Pr(X ≤ 1) < .5, no other choice of m satisfies the necessary conditions and
hence, the median is unique.
Suppose that X ∼ Binom(3, .5). The table below shows probabilities.
0
1
2
3
x
f (x) .125 .375 .375 .125
Pr(X ≤ x) .125 .500 .875 1.
Pr(X ≥ x)
1
.875 .500 .125
Any x such that 1 ≤ x ≤ 2 is a median.
Example Suppose that X has a continuous distribution with the following c.d.f.



x ≤ 0,

0,
F (x) = 1/2, 0 ≤ x < 1



1,
1 < x.
For any m ∈ (0, 1), Pr(X ≤ m) = 1/2 and Pr(X ≥ m) = 1 − Pr(X < m) = 1/2, so the
median is any m ∈ (0, 1).
Theorem 4.5.1. Suppose that X is a random variable with support on some interval I ⊂ R,
and Y = r(X) where r is a one-to-one function on I. If m is a median of the distribution of
STAT 421 Lecture Notes
112
X, then r(m) is a median of the distribution of Y .
The proof argues that because r is one-to-one on I, it must be strictly increasing or strictly
decreasing. Suppose, for simplicity, that r is strictly increasing. Then, if m is a median of
X,
Pr[Y ≤ r(m)] = Pr(r−1 (Y ) ≤ m)
= Pr(X ≤ m) ≥ .5, and
Pr[Y ≥ r(m)] = Pr(X ≥ m) ≥ .5.
Thus, r(m) is a median of the distribution of Y . The argument must be revised slightly for
the case that r is strictly decreasing.
Prediction often is a goal of statistics. For example, if X is the price of a stock at the
end of trading day on the NASDAQ exchange, many people want to predict this price with
minimial error at some point earlier in the day.
A best predictor (given certain assumptions) can readily be determined. First consider
the mean squared error measure.
Definition 4.5.2. The mean squared error (m.s.e.) of the prediction d is defined to be
E[(X − d)2 ].
The next theorem determines what the best predictor is given that the measure of error is
√
mean square error (or root mean square error E[(X − d)2 ]).
Theorem 4.5.2. Let X be a random variable with finite variance E[(X − µ)2 ] = σ 2 and
mean µ = E(X). For every d ∈ R,
E[(X − µ)2 ] ≤ E[(X − d)2 ].
Furthermore, E[(X − µ)2 ] = E[(X − d)2 ] if and only if d = µ.
To prove the theorem, expand E[(X − d)2 ] as follows:
E[(X − d)2 ] = E[(X − µ + µ − d)2 ]
= E[(X − µ)2 ] + 2(µ − d)E(X − µ) + (µ − d)2
= σ 2 + (µ − d)2 .
Clearly, d = µ minimizes E[(X − d)2 ] and the minimum value is σ 2 . Prediction error cannot
be reduced beyond the intrinsic variability of the random variable.
STAT 421 Lecture Notes
113
Example Suppose that an event occurs with probability .05. Then the best prediction of
the number of times that the event will happen in 100 independent replications of the experiment is E(X) = np = 5 (where X is a random variable counting the number of events that
occur among 100 replications and assuming that the criterion of goodness is mean squared
error).
An alternative objective function to mean squared error (i.e., E[(X − d)2 ]), is mean absolute error, E[|X − d|].
Theorem 4.5.3. Let X denote a random variable with finite mean µ and let m denote
the median of the distribution of X. For every number d,
E(|X − m|) ≤ E(|X − d|).
Furthermore, E(|X − m|) = E(|X − d|) if and only if d = m.
A partial proof proceeds as follows. Consider the case of continuous X with p.d.f. f and
suppose that m < d. Then
∫ ∞
E(|X − d|) − E(|X − m|) =
(|x − d| − |x − m|)f (x) dx
−∞
m
∫
=
−∞
∫
∫
d
(d − m)f (x) dx +
∞
+
(d + m − 2x)f (x) dx
m
(m − d)f (x) dx,
d
Since, when x < m < d,
|x − d| − |x − m| = d − x − (m − x) = d − m,
when m < x < d,
|x − d| − |x − m| = d − x − (x − m) = d + m − 2x,
and when m < d < x,
|x − d| − |x − m| = x − d − (x − m) = m − d.
Notice that for m < x < d, d + m − 2x < m − d, so
∫ m
∫ d
E(|X − d|) − E(|X − m|) ≤
(d − m)f (x) dx +
(m − d)f (x) dx
−∞
m
∫ ∞
+
(m − d)f (x) dx
d
∫ m
∫ ∞
=
(d − m)f (x) dx +
(m − d)f (x) dx.
−∞
m
STAT 421 Lecture Notes
Then
∫
E(|X − d|) − E(|X − m|) ≥
∫
m
−∞
114
(d − m)f (x) dx +
∞
(m − d)f (x) dx
m
= (d − m)[Pr(X ≤ m) − Pr(X > m)].
Because m is a median,
Pr(X ≤ m) ≥ 1/2 ≥ Pr(X > m) ⇒ Pr(X ≤ m) − Pr(X > m) ≥ 0
⇒ E(|X − d|) ≥ E(|X − m|).
Equality is achieved if and only if d = m.
Example Recall that the median of the distribution of X ∼ Binom(4, .4) is 2, and so m = 2
minimizes mean absolute error. In comparison, µ = 1.6 minimizes mean squared error.
Download