STAT 421 Lecture Notes 4.5 111 The Mean and Median Definition 4.5.1. Let X be a random variable. Every m ∈ R such that Pr(X ≤ m) ≥ 1/2 and Pr(X ≥ m) ≥ 1/2 is said to be a median. A median divides the distribution into two portions; the lower is L = {x|x ≤ m} and the upper is U = {x|x ≥ m}, and they possess the property that .5 ≤ Pr(X ∈ L) and .5 ≤ Pr(X ∈ U ). Suppose that X is continuous and the c.d.f. F is strictly increasing. Then, m = F −1 (.5). Suppose that X ∼ Binom(4, .4). The table shows the p.f. and c.d.f.: x 0 1 2 3 4 f (x) .1296 .3456 .3456 .1536 .0256 Pr(X ≤ x) .1296 .4752 .8208 .9744 1 Pr(X ≥ x) 1 .8704 .5248 .1792 .0256 The median of this distribution is 2. Since Pr(X > 2) = Pr(X ≥ 3) < .5 and that P r(X < 2) = Pr(X ≤ 1) < .5, no other choice of m satisfies the necessary conditions and hence, the median is unique. Suppose that X ∼ Binom(3, .5). The table below shows probabilities. 0 1 2 3 x f (x) .125 .375 .375 .125 Pr(X ≤ x) .125 .500 .875 1. Pr(X ≥ x) 1 .875 .500 .125 Any x such that 1 ≤ x ≤ 2 is a median. Example Suppose that X has a continuous distribution with the following c.d.f. x ≤ 0, 0, F (x) = 1/2, 0 ≤ x < 1 1, 1 < x. For any m ∈ (0, 1), Pr(X ≤ m) = 1/2 and Pr(X ≥ m) = 1 − Pr(X < m) = 1/2, so the median is any m ∈ (0, 1). Theorem 4.5.1. Suppose that X is a random variable with support on some interval I ⊂ R, and Y = r(X) where r is a one-to-one function on I. If m is a median of the distribution of STAT 421 Lecture Notes 112 X, then r(m) is a median of the distribution of Y . The proof argues that because r is one-to-one on I, it must be strictly increasing or strictly decreasing. Suppose, for simplicity, that r is strictly increasing. Then, if m is a median of X, Pr[Y ≤ r(m)] = Pr(r−1 (Y ) ≤ m) = Pr(X ≤ m) ≥ .5, and Pr[Y ≥ r(m)] = Pr(X ≥ m) ≥ .5. Thus, r(m) is a median of the distribution of Y . The argument must be revised slightly for the case that r is strictly decreasing. Prediction often is a goal of statistics. For example, if X is the price of a stock at the end of trading day on the NASDAQ exchange, many people want to predict this price with minimial error at some point earlier in the day. A best predictor (given certain assumptions) can readily be determined. First consider the mean squared error measure. Definition 4.5.2. The mean squared error (m.s.e.) of the prediction d is defined to be E[(X − d)2 ]. The next theorem determines what the best predictor is given that the measure of error is √ mean square error (or root mean square error E[(X − d)2 ]). Theorem 4.5.2. Let X be a random variable with finite variance E[(X − µ)2 ] = σ 2 and mean µ = E(X). For every d ∈ R, E[(X − µ)2 ] ≤ E[(X − d)2 ]. Furthermore, E[(X − µ)2 ] = E[(X − d)2 ] if and only if d = µ. To prove the theorem, expand E[(X − d)2 ] as follows: E[(X − d)2 ] = E[(X − µ + µ − d)2 ] = E[(X − µ)2 ] + 2(µ − d)E(X − µ) + (µ − d)2 = σ 2 + (µ − d)2 . Clearly, d = µ minimizes E[(X − d)2 ] and the minimum value is σ 2 . Prediction error cannot be reduced beyond the intrinsic variability of the random variable. STAT 421 Lecture Notes 113 Example Suppose that an event occurs with probability .05. Then the best prediction of the number of times that the event will happen in 100 independent replications of the experiment is E(X) = np = 5 (where X is a random variable counting the number of events that occur among 100 replications and assuming that the criterion of goodness is mean squared error). An alternative objective function to mean squared error (i.e., E[(X − d)2 ]), is mean absolute error, E[|X − d|]. Theorem 4.5.3. Let X denote a random variable with finite mean µ and let m denote the median of the distribution of X. For every number d, E(|X − m|) ≤ E(|X − d|). Furthermore, E(|X − m|) = E(|X − d|) if and only if d = m. A partial proof proceeds as follows. Consider the case of continuous X with p.d.f. f and suppose that m < d. Then ∫ ∞ E(|X − d|) − E(|X − m|) = (|x − d| − |x − m|)f (x) dx −∞ m ∫ = −∞ ∫ ∫ d (d − m)f (x) dx + ∞ + (d + m − 2x)f (x) dx m (m − d)f (x) dx, d Since, when x < m < d, |x − d| − |x − m| = d − x − (m − x) = d − m, when m < x < d, |x − d| − |x − m| = d − x − (x − m) = d + m − 2x, and when m < d < x, |x − d| − |x − m| = x − d − (x − m) = m − d. Notice that for m < x < d, d + m − 2x < m − d, so ∫ m ∫ d E(|X − d|) − E(|X − m|) ≤ (d − m)f (x) dx + (m − d)f (x) dx −∞ m ∫ ∞ + (m − d)f (x) dx d ∫ m ∫ ∞ = (d − m)f (x) dx + (m − d)f (x) dx. −∞ m STAT 421 Lecture Notes Then ∫ E(|X − d|) − E(|X − m|) ≥ ∫ m −∞ 114 (d − m)f (x) dx + ∞ (m − d)f (x) dx m = (d − m)[Pr(X ≤ m) − Pr(X > m)]. Because m is a median, Pr(X ≤ m) ≥ 1/2 ≥ Pr(X > m) ⇒ Pr(X ≤ m) − Pr(X > m) ≥ 0 ⇒ E(|X − d|) ≥ E(|X − m|). Equality is achieved if and only if d = m. Example Recall that the median of the distribution of X ∼ Binom(4, .4) is 2, and so m = 2 minimizes mean absolute error. In comparison, µ = 1.6 minimizes mean squared error.