V045desc

advertisement
Lesson Objectives
 Learn what percentiles are and
how to calculate quartiles.
 Learn to find the five number summary.
 Learn how to construct and use
Boxplots.
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 1
Sample
If x = the
th
100p
th
100p
percentile:
percentile, then
at least 100p% of data is  x,
at least 100(1-p)% of data is  x.
Example: You are told you scored 47;
then you hear “47” is at the 82nd percentile.
82% of the sample have scores  47,
AND 18% have scores  47.
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 2
Finding
th
100p
1. Order the data.
2. Calculate np.
3a. If np is NOT
an integer,
round up;
find the obs. in
this position.
 Department of ISM, University of Alabama, 1995-2003
percentile:
n = 25, p = 1/3
np = 8.333,
9th position
will be the
33.333 %tile.
M08-Numerical Summaries 2 3
Finding
th
100p
1. Order the data.
2. Calculate np.
3b. If np IS an
integer, say k,
then avg the
kth and (k+1)th
ordered values.
 Department of ISM, University of Alabama, 1995-2003
percentile:
n = 25, p = .40
np = _____ ,
average of
______ & ____
positions will be
the 40th %tile.
M08-Numerical Summaries 2 4
Five Number Summary
1.
2.
3.
4.
5.
Maximum
3rd Quartile, Q3 = 75th p’tile
Median
1st Quartile, Q1 = 25th p’tile
Minimum
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 5
Quartiles:
1st Quartile (25th percentile) :
at least 25% of the data values
lie at
or below it.
3rd Quartile (75th percentile) :
at least 75% of the data values
lie at
or below it.
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 6
Method 1: Percentile method
Q1 located at position (n+1)*1/4
Q2 located at position (n+1)*2/4
Q3 located at position (n+1)*3/4
n
Q1
Q2
Q3
5
8
11
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 7
Example 6
Step 1: Order the data:
12, 14, 16, 18, 19, 21, 22, 25, 27
Max
=
Q3
=
Median =
Q1
=
Min
=
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 8
Method 2: Median method
Q1 =
median of observations
below the median’s position.
Q3 = median of observations
above the median’s position.
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 9
Example 6
Ordered data:
12, 14, 16, 18, 19, 21, 22, 25, 27
Max
=
Q3
=
Median =
Q1
=
Min
=
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 10
4. Interquartile Range (IQR)
IQR = Q 3 - Q 1

IQR is the range of the
middle 50% of the data.

Observations more than
1.5 IQR’s beyond quartiles
are considered outliers.
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 11
Which summary statistics
should I use?
Shape?
Location?
Variation?
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 12
Boxplot
A graphically display of
the five number summary
(also called a box-and-whiskers plot)
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 13
Example 6
Ordered data:
12, 14, 16, 18, 19, 21, 22, 25, 27
Q3 = 23.5
Q1 = 15.0
Max
=
Q3
=
Median =
Q1
=
Min
=
 Department of ISM, University of Alabama, 1995-2003
27.0
23.5
19.0
15.0
12.0
IQR = 8.5
M08-Numerical Summaries 2 14
Example 6A
Ordered data:
What if . . . .
Example 6B
Ordered data:
What if . . . .
19, 19, 19,
12, 14, 16, 18, 19, 21, 22, 25, 27
X
12, 14, 16, 18, 19, 21, 22, 25, 27
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 15
28
26
24
22
20
18
Max
=
Q3
=
Median =
Q1
=
Min
=
27.0
23.5
19.0 IQR = 8.5
15.0
12.0
Note:
Middle 50% of data
are within the
range of the box
16
14
12
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 16
Use side-by-side boxplots
to display two variables when
one is quantitative, and
one is categorical.
Useful tool for comparing
distributions.
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 17
Part Suppliers;
who is best?
15.040
15.020
15.000
14.980
14.960
A
B
 Department of ISM, University of Alabama, 1995-2003
C
M08-Numerical Summaries 2 18
Modified Boxplot

More accurate picture of data.

Useful in detecting outliers:
Observations more than
1.5 IQR’s beyond quartiles
are considered outliers.

Available in Minitab (boxplot);
not in Excel.
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 19
Example 7
13, 24, 26, 26, 27, 28, 36, 46
25.0
Maximum =
3rd Quartile =
Median
=
1st Quartile =
Minimum
=
26.5
46.0
32.0
26.5
25.0
13.0
32.0
IQR = 7.0
1.5 IQR = 1.5 • 7.0 = 10.5
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 20
48
Data: 13, 24, 26, 26, 27, 28, 36, 46
*
44
40
36
Q 3 + 1.5 • IQR = 42.5
1.5•IQR
32
Q3 = 32.0
28
Q1 = 25.0
24
20
1.5•IQR
16
12
Note:
Whiskers go to the
most extreme value
within the limits,
not to the limits.
*
Q1 - 1.5 • IQR = 14.5
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 21
48
44
Data: 13, 24, 26, 26, 27, 28, 36, 46
*
Finished
Box Plot
40
36
32
28
24
20
16
12
*
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 22
Formula Sheet Example
Box Plot:
Min
Q1 M
1.5 IQR
1.5 IQR
Modified
Box Plot:
Q1 -1.5 IQR
Note: For this
problem, no data
are below the
lower “outlier limit”.
Max
Q3

Lines extend to the
smallest & largest
obs. inside of limits.

Q3 +1.5 IQR
Plot each obs. that
is beyond the
“outlier limits”
on each end.
Match each of the following descriptions to
one of the following histograms.
1. Scores on an EASY Math exam.
2. Heights of a group of students.
3. Number of medals won by medal winning
countries in the 1996 Winter Olympics.
4. SAT scores for some college students.
5. Last digit in SSN for 100 people.
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 24
Match descriptions to a Histograms.
1. Scores on an EASY Math exam.
2. Heights of a group of students.
3. Number of medals won by medal winning
countries in the 1996 Winter Olympics.
4. SAT scores for some college students.
5. Last digit in SSN for 100 people.
B
D
 Department of ISM, University of Alabama, 1995-2003
A
C
E
M08-Numerical Summaries 2 25
Match each of the following
Boxplots (1,2,3,4,5) to one
of the Histograms (A-E) above.
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 26
BoxPlots for Schaeffer Examples
1
2
3
4
5
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 27
Descriptive Statistics
Variable
A
B
C
D
E
N
100
100
100
100
100
Mean
50.6
49.9
49.9
54.1
50.4
 Department of ISM, University of Alabama, 1995-2003
Median
51.0
50.1
50.6
32.9
49.8
Range
20.0
42.6
12.9
415.4
32.9
M08-Numerical Summaries 2 28
Descriptive Statistics
N
100
100
100
100
100
Mean
50.6
49.9
49.9
54.1
50.4
Median
51.0
50.1
50.6
32.9
49.8
Range
20.0
42.6
12.9
415.4
32.9
30
1
20
Frequency
Variable
A
B
C
D
E
10
0
0
2
10
200
300
400
D
30
4
5
20
Frequency
0
30
40
50
60
10
70
B
15
10
100
0
3
40
45
50
C
15
Frequency
5
0
5
10
5
40
42
44
46
48
50
52
54
56
58
60
A
 Department of ISM, University of Alabama, 1995-2003
M08-Numerical Summaries 2 29
0
40
50
60
70
E
D
C
B
A
0
100
 Department of ISM, University of Alabama, 1995-2003
200
A
300
M08-Numerical Summaries 2 30
Download