Detecting Outliers

advertisement
Detecting Outliers
• An outlier is a measurement that appears to be much
different from neighboring observations.
• In the univariate case with adequate sample sizes, and
assuming that normality holds, an outlier can be
detected by:
1. Standardizing the n measurements so that they
are approximately N (0.1).
2. Flagging observations with standardized values below
or above 3.5 or thereabouts.
• In p dimensions, detecting outliers is not so easy. A sample
unit which may not appear to be an outlier in each of the
marginal distributions, can still be an outlier relative to the
multivariate distribution.
235
Detecting outliers (cont’d)
236
Steps for detecting outliers
1. Investigate all univariate marginal distributions, visually by
√
constructing the standardized values zij = (xij − x̄)/ σjj for
the i-th sample unit and j-th variable.
2. If p is moderate, construct all bivariate scatter plots. There
are p(p − 1)/2 of them.
3. For each sample unit, calculate the squared distance
0 −1 (x − x̄), where x is the p × 1 vector of
d2
i
i
i = (xi − x̄) S
measurements on the i-th sample unit.
2 are
4. To decide if d2
is
’extreme’,
recall
that
the
d
i
i
2
approximately χp . For example, if n = 100, we would
expect to observe about 5 squared distances larger than
the 0.95 percentile of the χ2
p distribution.
237
Example: Stiffness of lumber boards
• Recall that four measurements were taken on each of 30
boards.
• Data are shown in Table 4.4 along with the four columns of
standardized values and the 30 squared distances.
• Note that boards 9 and 16 have unusually large d2, of 12.26
and 16.85, respectively. The 0.975 and 0.995 percentiles of
a χ2
4 distribution are 11.14 and 14.86.
• In a sample of size 30, we would expect less than 0.75 obs.
with d2 > 11.14 and less than 0.015 obs. with d2 > 14.86.
• Unit 16, with d2 = 16.85 is not flagged as an outlier when
we only consider the univariate standardized measurements.
238
Example: Stiffness of boards (cont’d)
239
Transformations to near normality
• If observations show gross departures from normality, it might
be necessary to transform some of the variables to near
normality.
• The following are some suggestions which stabilize variances,
but some people use them to transform to near-normality.
Original scale
Right skewed data
x are counts
Transformed scale
log(x)
√
x
√
x are proportions p̂
Arc sine transformation: arcsin( p)
x are correlations r
Fisher’s z(r) = 1/2 log[(1 + r)/(1 − r)]
240
The Box-Cox transformation
• Proposed by Box and Cox in a 1964 JRSS(B) article.
• The Box-Cox transformation is a member of the family of
power transformations:
xλ − 1
,
λ
λ 6= 0,
= log(x),
λ = 0.
x(λ) =
• To estimate λ we do the following:
1. Assume that xλ ∼ N (µ, σ 2) for some unknown λ.
2. Do a change of variables to obtain the likelihood with
respect to x.
3. Maximize the resulting log likelihood for λ.
241
Box-Cox Transformation (cont’d)
• If xλ ∼ N (µ, σ 2), then with respect to the untransformed
observations the resulting log-likelihood is
n
n
X
X
n
1
(λ)
L(µ, σ 2, λ) = − log(2πσ 2)− 2
(xi −µ)2+(λ−1)
log(xi),
2
2σ i=1
i=1
where the last term is the logarithm of the Jacobian of the
transformation |dxλ
i /dxi |.
• Substituting MLEs for µ and σ 2 we get, as a function of λ
alone:
X
n 1 X (λ)
2
¯
(λ)
l(λ) = − [
(xi − x ) ] + (λ − 1)
log(xi).
2 n i
i
• The best power transformation λ̂ is the one that maximizes
the expression above.
242
Computing the Box-Cox transformation in
practice
• One way to find the MLE (or an approximation to the MLE)
of λ is to simply plug into the log likelihood in the previous
transparencies a sequence of discrete values of λ ∈ (−3, 3) or
in some other range.
• For each λ, we compute the log-likelihood and then we pick
the λ for which the log-likelihood is maximized.
• SAS will also compute the MLE of λ and so will R.
243
Radiation in microwave ovens example
• Recall example 4.10. Radiation measurements on 42 ovens
were obtained.
• The log-likelihood was evaluated for 26 values of λ ∈ (−1, 1.5)
in steps of 0.10.
244
Microwave example (cont’d)
245
The Multivariate Normal Distribution
Microwave example (cont’d)
246
Another way to find a transformation
• If data are normally distributed, ordered observations plotted
against the quantiles under the assumption of normality will
fall on a straight line.
• Consider a sequence of values of λ: λ1, λ2, ..., λk and for λj
fit the regression
λ
j
= β0 + β1q(i),j + ei,
x(i)
λ
j
are the transformed and ordered sample values,
where x(i)
q(i),j are the quantiles corresponding to each transformed,
ordered observation under the assumption of normality, and
β0, β1 are the usual regression coefficients.
247
Alternative estimation method (cont’d)
• For each of the k regressions (one for each λj ), compute the
MSE.
• The best fitting model will have lowest MSE.
• Therefore, the λ which minimizes the MSE of the regression
of sample quantiles on normal quantiles will be ’best’.
• It can be shown that this is also the λ that maximizes the
log-likelihood function shown earlier.
248
Transforming multivariate observations
• By transforming each of the p measurements individually, we
approximate normality of the marginals but not necessarily
of the joint p-dimensional distribution.
• It is possible to proceed as in the Box-Cox approach and try
to maximize the log-likelihood jointly for all p lambdas, but
in most cases this is not worth it. See discussion on page
199 of text book.
249
Bivariate transformation in microwave example
• The bivariate log-likelihood was maximized for λ1, λ2. Contours are shown below. Optimum near (0.16, 0.16)
250
Download