Lecture 7 - WordPress.com

Lecture 7
Sections 2.3 – 2.4
•More Detailed Summary Quantities
− Quartiles and IQR
− Boxplots
− Quantile Plots
More Detailed Summary Quantities
The median divides a data set into two equal parts. A finer partition
can be obtained by dividing a data set into more than two parts.
The (100p)th percentile separates the smallest 100p% of the data or
distribution from the remaining values.
Quartiles and the Interquartile Range
Certain percentiles are particularly important. Quartiles (first
quartile, median, third quartile) separates a data set or distribution
into four equal parts:
25%th percentile=first quartile or lower quartile, denoted by Q1.
50%th percentile=median,
75%th percentile=third quartile or upper quartile, denoted by Q3.
Sample quartiles
Separate the n ordered sample observations into a lower half and an
upper half. If n is odd, include the median in each half. Then,
Q1=median of the lower half of the data
Q3=median of the upper half of the data
Note that there are several different sensible ways to define the
sample quartiles. R uses different ways of finding sample quartiles.
Example. n = 15
20 25 25 27 28 31 33 34 36 37 44 50 59 85 86
Find Q1 ,Median and Q3.
Example. n=14
20 25 25 27 28 31 33 34 36 37 44 50 59 85
Find Q1, Median and Q3.
Population Quartiles
 f ( x)dx  0.25
 f ( x)dx  0.25
 f ( x)dx  p
IQR and Outlier Detection
Interquartile range (IQR)
IQR = Q3 - Q1
•Resistant to the effect of outliers.
•Useful for the estimation of the variability when the distribution is
Determining outliers
Suspected (mild) outlier – any observation is a suspected outlier if it
is farther than 1.5 IQR from the closest quartile (i.e., falls beyond Q11.5IQR and Q3-1.5IQR).
Highly suspected (extreme) outlier – any observation is an
extreme outlier if it is farther than 3IQR form the nearest quartile
(i.e., falls beyond Q1-3IQR and Q3-3IQR).
A boxplot is a visual display of data based on the following five-number
Min, Q1, Median, Q3, Max
Note: Boxplots always run from bottom-toup or from left-to-right. A central box spans
Q1 and Q3 and a line in the box marks the
median. Outliers are marked with “o”. In a
box plot the upper whisker extends to the
largest data value within the upper limit, Q3
+ 1.5IQR, and the lower whisker extends to
the smallest value within the lower limit, Q1
Boxplot Examples
Ultrasound was used to gather the accompanying corrosion data on the
thickness of the floor plate of an aboveground tank used to store crude
oil (“Statistical Analysis of UT Corrosion Data from Floor Plates of a
Crude Oil Aboveground Storage Tank”, Material Eval., 1994: 846-849).
Each observation is the largest pit depth in the plate, expressed in milliin.
40 52 55 60 70 75 85 85 90 90 92 94 94 95 98 100 115 125 125
Find the five-number summary and plot the boxplot.
The effects of partial discharges on the degradation of insulation cavity
material have important implications for the lifetimes of high-voltage
components. Consider the following sample of n=25 pulse widths from
slow discharges in a cylindrical cavity made of polyethylene:
5.3 8.2 13.8 74.1 85.3 88.0 90.2 91.5 92.4 92.9 93.6 94.3 94.8 94.9
95.5 95.8 95.9 96.6 96.7 98.1 99.0 101.4 103.7 106.0 113.5
Find the five-number summary and plot the boxplot.
Comparative Boxplots
Comparative boxplot (or side-by-side boxplot) provides a very
effective way of revealing similarities and differences between two or
more data sets consisting of observations on the same variable.
Example. The article “Compression of Single-Wall Corrugated
Shipping Containers Using Fixed and Floating Test Platens” (J. of
Testing and Evaluation, 1992: 318-320) describes an experiment in
which several different types of boxes were compared with respect to
compression strength. Consider the following observations on four
different types of boxes:
Type of Box
Compression Strength (lb)
655.5 788.3 734.3 721.4 679.1 699.4
789.2 772.5 786.9 686.1 732.1 774.8
737.1 639.0 696.3 671.7 717.2 727.1
535.1 628.7 542.4 559.0 586.9 520.0
Quantile Plots
Quantile Plots
An investigator frequently wishes to know whether data was selected
from a particular type of population distribution (e.g., normal distribution).
For one thing, many inferential procedures are based on the assumption
that the underlying distribution is of a specified type. The use of such
procedures is inappropriate if the actual distribution differs greatly from
the assumed type. Additionally, understanding the underlying distribution
can sometimes give insight into the physical mechanisms involved in
generating the data. An effective way to check distributional assumption
is to construct a quantile plot (or probability plot).
Idea: Plot the sample quantiles vs. the theoretical quantiles (population
quantiles). If the data come from the correct distribution, the points in the
plot will fall close to a straight line. If the actual distribution is quite
different from the one used to construct a plot, the points should depart
substantially from a linear pattern.
Normal Quantile Plot
A Normal Quantile Plot is a plot of the (z quantile, sample quantile) pairs.
Example. The accompanying sample consisting of n=20
observations on dielectric breakdown voltage of a piece of epoxy
resin appeared in the article “Maximum Likelihood Estimation in the
3-Parameter Weibull Distribution” (IEEE Trans on Dielectrics and
Elec. Insul., 1996: 43-55).
24.46 25.61 26.25 26.42 26.66 27.15 27.31 27.54 27.74 27.94
27.98 28.04 28.28 28.49 28.50 28.87 29.11 29.13 29.50 30.88
Is the population distribution of dielectric breakdown voltage
Review of Concepts