Chapter 2c--Stem and leaf diagrams, crosstabulations and scatter

advertisement
y
•Exploratory data
analysis
•Cross tabulations and
scatter diagrams
x
Exploratory data
analysis consists
of simple
arithmetic and
easy-to-draw
graphs that can be
used to summarize
data quickly
The Stem and Leaf Display
•
A stem-and-leaf display shows both the rank order
and shape of the distribution of the data.
• It is similar to a histogram on its side, but it has the
advantage of showing the actual data values.
•The first digits of each data item are arranged to the
left of a vertical line.
•To the right of the vertical line we record the last
digit for each item in rank order.
Example: Hudson Auto Repair
The manager of Hudson Auto
would like to have a better
understanding of the cost
of parts used in the engine
tune-ups performed in the
shop. She examines 50
customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.
Stretched Stem and Leaf
•If we believe the original stem-and-leaf
display has condensed the data too much,
we can stretch the display by using two
stems for each leading digit(s).
•Whenever a stem value is stated twice,
the first value corresponds to leaf values of
0 - 4, and the second value corresponds to
leaf values of 5 - 9.
Sample parts cost for 50 tune-ups
91
71
104
85
62
78
69
74
97
82
93
72
62
88
98
57
89
68
68
101
75
66
97
83
79
52
75
105
68
105
99
79
77
71
79
80
75
65
69
69
97
72
80
67
62
62
76
109
74
73
A Stem and Leaf Display for the Auto Parts Cost data
Stem
5
27
6
222256788899
7
1122344555678999
8
0023589
9
1377789
10 1 4 5 5 9
Leaf
Stretched Stem and Leaf for Hudson Auto parts data
5
5
6
6
7
7
8
8
9
9
10
10
2
7
2222
5
1
5
0
5
1
7
1
5
6
1
5
0
8
3
7
4
5
7
2
5
2
9
88899
2344
678999
3
789
9
Leaf Units
A single digit is used to
define each leaf. In the
preceding example, the
leaf unit was 1. But it does
not have to be 1. The leaf
unit can be 0.1, 10, or 100.
Example: Leaf unit = .1
Suppose we have the following data:
8.6 11.7 9.4 10.2 11.0 8.8
The leaf unit is .1. Thus:
8
68
9
10
4
2
11
07
Example: Leaf Unit = 10
If we have data with values such as
1806 1717
1974
1791
1682
1910
1838
a stem-and-leaf display of these data will be
Leaf Unit = 10
16 8
17 1 9
18 0 3
19 1 7
The 82 in 1682
is rounded down
to 80 and is
represented as an 8.
Crosstabulations and Scatter Diagrams
So far we have considered
only ONE variable (parts
cost, audit time). But often
we are interested in tabular
and graphical data that
uncover the relationship
between TWO variables.
Crosstabulations
A tabular method for summarizing the data for two
variables simultaneously
Crosstabulations can be used when
• one variable is qualitative and the other is
quantitative,
• both variables are qualitative, or
• both variables are quantitative.
Example: Finger Lakes Homes

Crosstabulation
The number of Finger Lakes homes sold for each
style and price for the past two years is shown below.
qualitative
variable
quantitative
variable
Home Style
Log Split A-Frame
Price
Range
Colonial
< $99,000
> $99,000
18
12
6
14
19
16
12
3
55
30
20
35
15
100
Total
Total
45
Crosstabulation: Row or Column Percentages
• Converting the entries in the table
into row percentages or column
percentages can provide additional
insight about the relationship
between the two variables.
Crosstabulation: Row Percentages
Price
Range
Colonial
< $99,000
> $99,000
32.73
26.67
Home Style
Log Split A-Frame
10.91
31.11
34.55
35.56
21.82
6.67
Total
100
100
Note: row totals are actually 100.01 due to rounding.
(Colonial and > $99K)/(All >$99K) x 100 = (12/45) x 100
Crosstabulation: Column Percentages
Price
Range
Colonial
< $99,000
> $99,000
60.00
40.00
30.00
70.00
54.29
45.71
80.00
20.00
Total
100
100
100
100
Home Style
Log Split A-Frame
(Colonial and > $99K)/(All Colonial) x 100 = (12/30) x 100
Using Excel’s PivotTable Report
to Construct a Crosstabulation
Step 1: Click on the Insert tab on the ribbon
Step 2: In the Tables group, click the icon above
PivotTable
Step 3 When the Create Pivot Table dialog box appears:
Choose Select a table or range
Enter A1:C301 in the Table/Range box
Select New Worksheet
Click OK
Chapter 2 file Restaurant.xlsx
Using the Pivot table Field List
• Step 1: In the PivotTable Field List, go to
Choose Fields to add to report:
– Drag the Quality Rating Field to the Row Labels area.
– Drag the ($)Meal Price field to the Column Labels area.
– Drag the Restaurant field to the Values area.
• Step 2: Click Sum of Restaurant in the Values area
– Select Value Field Settings.
• Step 3: When the Value Field Settings dialog box appears:
– Under Summarize value field by, choose Count
– Click OK
Finalizing the PivtotTable Report
• Step 1: Right-click in cell B4 (or any other cell containing meal
prices)
– Select Group
• Step 2: When the Grouping dialog box appears:
– Enter 10 in the Starting at box
– Enter 49 in the Ending at box
– Enter 10 in the By box
• Step 3: Right-click on Excellent in cell A5
– Choose Move
– Select Move “Excellent” to End
• Step 4: Close the PivotTable Field List box
Crosstabulation for the LA
Restaurant Example
Meal Price
($)
Quality Rating
10-19
20-29
Good
42
40
2
Very Good
34
64
46
6
150
Excellent
2
14
28
22
66
Grand Total
78
118
76
28
300
Chapter 2 file Restaurant.xlsx
30-39 40-49
Grand
Total
84
Crosstabulation: Simpson’s Paradox
 Data in two or more crosstabulations are often
aggregated to produce a summary crosstabulation.
 We must be careful in drawing conclusions about the
relationship between the two variables in the
aggregated crosstabulation.
 Simpson’ Paradox: In some cases the conclusions
based upon an aggregated crosstabulation can be
completely reversed if we look at the unaggregated
data.
Judge
Verdict
Kendall
Luckett
Total
Upheld
Reversed
129 (86%)
21 (14%)
110 (88%)
15 (12%)
239
36
Total (%)
150 (100%)
125 (100%)
275
You might think Luckett is the better
Judge. However, a larger share of
Kendall’s cases were in municipal
court—where the likelihood of being
overturned on appeal is higher.
Scatter Diagram and Trendline
 A scatter diagram is a graphical presentation of the
relationship between two quantitative variables.
 One variable is shown on the horizontal axis and the
other variable is shown on the vertical axis.
 The general pattern of the plotted points suggests the
overall relationship between the variables.
 A trendline is an approximation of the relationship.
A Positive Relationship
Y
0
X
A Negative Relationship
Y
0
X
No Apparent Relationship
Y
0
X
Example: Panthers Football Team
• Scatter Diagram
The Panthers football team is interested
in investigating the relationship, if any,
between interceptions made and points scored.
x = Number of
Interceptions
1
3
2
1
3
y = Number of
Points Scored
14
24
18
17
30
Scatter Diagram
Number of Points Scored
y
35
30
25
20
15
10
5
0
0
1
x
2
3
Number of Interceptions
4
Example: Panthers Football Team

Insights Gained from the Preceding Scatter Diagram
•
The scatter diagram indicates a positive relationship
between the number of interceptions and the
number of points scored.
•
Higher points scored are associated with a higher
number of interceptions.
• The relationship is not perfect; all plotted points in
the scatter diagram are not on a straight line.
Using Excel’s Chart Wizard to Construct
a Scatter Diagram and Trendline

Formula Worksheet (showing data entered)
1
2
3
4
5
6
7
A
Number of
Interceptions
1
3
2
1
3
B
Number of
Points Scored
14
24
18
17
30
C
Using Excel’s Chart Wizard
to Construct a Scatter Diagram
Step 1 Select cells A1:B6
Step 2 Click the Chart Wizard button on standard toolbar
Step 3 When the Chart Wizard - Step 1 of 4 - Chart Type
dialog box appears:
Choose XY (Scatter) in the Chart Type list
Choose Scatter from the Chart subtype display
Click Next >
. . . continue
Using Excel’s Chart Wizard
to Construct a Scatter Diagram
Step 4 When the Chart Wizard - Step 2 of 4 - Chart
Source Data dialog box appears:
Click Next >
. . . continue
Using Excel’s Chart Wizard
to Construct a Scatter Diagram
Step 5 When the Chart Wizard - Step 3 of 4 – Chart
Options dialog box appears:
Select the Titles tab and then
Type Scatter Diagram for the Panthers
in the Chart title: box
Type Number of Interceptions in the
Value (X) axis: box
Type Number of Points Scored in the
Value (Y) axis: box
. . . continue
Using Excel’s Chart Wizard
to Construct a Scatter Diagram
Step 5 (continued)
Select the Legend tab and then
Remove the check in the Show Legend box
Click Next >
Step 6 When the Chart Wizard – Step 4 of 4 - Chart
Location dialog box appears:
Specify a location for the new chart
Click Finish
Using Excel’s Chart Wizard
to Construct a Scatter Diagram
A
C
Scatter Diagram for the Panthers
35
30
Num ber of
Points Scored.
8
9
10
11
12
13
14
15
16
17
18
19
20
B
25
20
15
10
5
0
0
1
2
3
Num ber of Interceptions
4
Using Excel’s Chart Wizard to Construct
a Scatter Diagram and Trendline

Adding a Trendline
Step 1 Position the mouse pointer over any data
point in the scatter diagram and right click
Step 2 Choose the Add Trendline option
Step 3 When the Add Trendline dialog box appears:
Select the Type tab and then
Choose Linear from the Trend/
Regression type display
Click OK
Using Excel’s Chart Wizard to Construct
a Scatter Diagram and Trendline
A
C
Scatter Diagram for the Panthers
35
30
Num ber of
Points Scored.
8
9
10
11
12
13
14
15
16
17
18
19
20
B
25
20
15
10
5
0
0
1
2
3
Num ber of Interceptions
4
Scatter Diagram for the Stereo and Sound
Equipment Store Example
Sales Voleum
Scatter Diagram for Stereo and Sound Equipment
Store
70
60
50
40
30
20
10
0
0
1
2
3
Commercials
4
5
6
Scatter Diagram for the Stereo and Sound
Equipment Store Example—with a Trendline
Sales Voleum
Scatter Diagram for Stereo and Sound Equipment
Store
70
60
50
40
30
20
10
0
0
1
2
3
Commercials
4
5
6
Download