Doing Statistics for Business Data, Inferences, and Decision Making

advertisement
Doing Statistics for Business
Data, Inference, and Decision Making
Marilyn K. Pelosi
Theresa M. Sandifer
Chapter 5
Analyzing
Bivariate Data
1
Doing Statistics for Business
Chapter 5 Objectives
 Bivariate Qualitative Data:
Contingency Tables
Clustered & Stacked Bar Charts
 Identifying Quantitative Relationships:
Scatter Plots
The Least-squares Line
2
Doing Statistics for Business
A Contingency Table is a table whose
rows represent the possible values of
one variable and whose columns
represent the possible values for a
second variable. The entries in the table
are the number of times that each pair
of values occurs.
3
Doing Statistics for Business
TRY IT NOW!
Quality Problems
Creating a Contingency Table
A company that manufactures cardboard boxes is trying to
understand some of their quality problems. They have analyzed some
data and determined that their major defects are Printing, Color and
Skewness (how square the box is). Further attempts to pinpoint the
problems have resulted in many opinions and finger pointing about
responsibility. They decide to collect some additional data on defect
type and the shift during which production occurred.
4
Doing Statistics for Business
TRY IT NOW!
Quality Problems
Creating a Contingency Table (con’t)
The data (sorted by defect type) are as follows:
Color
Color
Color
Color
Color
Color
Color
Color
Color
Color
1
2
1
3
3
2
1
1
1
2
Color
Color
Color
Color
Color
Printing
Printing
Printing
Printing
Printing
2
1
1
3
1
3
2
1
2
1
Printing
Printing
Printing
Printing
Printing
Printing
Printing
Printing
Skewness
Skewness
3
1
2
1
2
1
2
1
2
2
5
Doing Statistics for Business
TRY IT NOW!
Quality Problems
Creating a Contingency Table (con’t)
Create a relative frequency contingency table for the data.
What percentage of the defects were color?
Does there appear to be any credence tot he claim that the majority of the
defects occur on the third shift?
6
Doing Statistics for Business
Non-Transfer Students
Taking Introductory
Statistics Course
Transfer Students Taking
Introductory Statistics
Course
40%
Percent of
Percent of
Students
20%
Students
0%
40%
20%
0%
F
S
J
Year
Sr.
F
S
J
Year
Sr.
Figure 5.1 Two bar charts representing
a contingency table
7
Doing Statistics for Business
In a Clustered Bar Chart, the bars for
one variable are grouped according to
the values of the other qualitative
variables.
8
Doing Statistics for Business
Sample Clustered Bar Chart
0.16
0.14
0.12
0.1
Relative Frequency
Variable 1 Category 1
Variable 1 Category 2
Variable 1 Category 3
0.08
0.06
0.04
0.02
0
Category 1
Category 2
Category 3
Category 4
Variable 2
Figure 5.2 Example of a Clustered Bar Chart
9
Doing Statistics for Business
TRY IT NOW!
Quality Problems
Creating a Clustered Bar Chart
The management of the company that manufactures cardboard boxes
decide to have a meeting with the crews about the quality problems. The
managers decide that a graphical display of the data is the best way to
convey the information they have obtained and that they want to display
the crew data for each of the quality defects.
10
Doing Statistics for Business
TRY IT NOW!
Quality Problems
Creating a Clustered Bar Chart (con’t)
Defect
Color
Printing
Skewness
Total
1
27%
20%
0%
47%
Shift
2
13%
17%
7%
37%
3
Total
10%
50%
7%
43%
0%
7%
17%
100%
Create a clustered bar chart that display the percent defects for each shift
using the quality problems as the categories for the x axis.
11
Doing Statistics for Business
TRY IT NOW!
Quality Problems
Creating a Clustered Bar Chart (con’t)
Which shift contributes the largest percentage of color defects? The least
percentage?
What do you notice about the skewness defects?
12
Doing Statistics for Business
In a Stacked Bar Chart the data for the
selected variable are represented as a
percentage of the total for each category
of the second variable. Each value of
the selected variable is represented in a
different way and the bars are “stacked”
to total 100%.
13
Doing Statistics for Business
Percent
Stacked Bar Chart for Biviate Data
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Variable 1 Category 3
Variable 1 Category 2
Variable 1 Category 1
Category 1
Category 2
Category 3
Category 4
Variable 2
Figure 5.3 Stacked Bar Chart
14
Doing Statistics for Business
TRY IT NOW!
Quality Problems
Creating a Stacked Bar Chart
The company that manufactures cardboard boxes decides to create
a graphical display of the data to show to the employees. What the
company wants to do is to display type of defect as a percentage of
total defects for each shift. It is hoped that these data will help each shift
concentrate on its own priorities. The contingency table for the defect is
shown on the following slide.
15
Doing Statistics for Business
TRY IT NOW!
Quality Problems
Creating a Stacked Bar Chart (con’t)
Defect
Color
Printing
Skewness
Total
1
8
6
0
14
Shift
2
4
5
2
11
3
3
2
0
5
Total
15
13
2
30
Modify the contingency table to display defect types as a percentage of
total defects for each shift.
16
Doing Statistics for Business
TRY IT NOW!
Quality Problems
Creating a Stacked Bar Chart (con’t)
Create a stacked bar chart for the data.
Which defect type should each crew concentrate on? Why?
17
Doing Statistics for Business
In a Scatter Plot an axis is used to
represent each of the variables and the
data are plotted as points on the graph.
Typically, the independent variable is
plotted on the x axis and the dependent
variable is plotted on the y axis.
18
Doing Statistics for Business
Scatter Plot
170
160
150
140
Y Variable
130
120
110
100
0
2
4
6
8
10
12
14
16
X Variable
Figure 5.4 Scatter Plot
19
Doing Statistics for Business
TRY IT NOW!
Airline Fares
Creating a Scatter Plot
A travel agency was interested in knowing how airline fares were
related to the length of the flight in miles. The agency hypothesized
that the longer the flight, the more the airfare. The following data were
collected:
Miles
Airfare ($)
2375
430
1400
272
1250
252
2325
422
985
207
2025
20
Doing Statistics for Business
TRY IT NOW!
Airline Fares
Creating a Scatter Plot (con’t)
Which variable is the dependent variable and which is the independent
variable?
Make a scatter plot of the data.
From your plot, do you think that the travel agency’s hypothesis was
correct? Why or why not?
21
Doing Statistics for Business
No Relationship
14
Y
11
8
5
0
2
4
6
8
10
12
14
16
X
Figure 5.6 Scatter Plot Indicating no Relationship
Between Two Variables
22
Doing Statistics for Business
TRY IT NOW!
Starting Salaries & Math Courses
Finding the Least-Squares Line
The Career Planning office of a large university is interested in knowing
whether there is a relationship between the starting salary of graduates
and the number of mathematics courses the graduates had taken as
students. The Career Planning office goes through the records for the
last year and finds the following data.
23
Doing Statistics for Business
TRY IT NOW!
Starting Salaries & Math Courses
Finding the Least-Squares Line (con’t)
Number of Math
Courses
1
1
2
3
4
6
Starting Salary
($)
26,284
25,470
26,777
27,269
28,553
30,054
Which variable is the independent variable? The dependent variable?
24
Doing Statistics for Business
TRY IT NOW!
Starting Salaries & Math Courses
Finding the Least-Squares Line (con’t)
Use a grid to create a scatter plot of the data.
Do you think that there is a linear relationship between the number of
math courses taken and the starting salary? If so, describe the
relationship.
25
Doing Statistics for Business
Discovery Exercise 5.1
Discovering Relationships
1. The data below represent data taken on the number
of weeks that a student was enrolled in a speed reading program
and the speed gain in words per minute that the student has experienced:
WEEKS
SPEED GAIN
2
49
3
86
4
109
6
164
8
193
26
Doing Statistics for Business
Discovery Exercise 5.1
Discovering Relationships (con’t)
2. Plot the data on graph paper. Be sure to label your axes
and to construct the graph so that it uses most of the paper.
3. Draw a straight line through the points that you think best represent the
relationship between x and y. What criteria did you use for drawing the
line you selected?
27
Doing Statistics for Business
The Least Squares technique finds the
equation of the line that minimizes the
sum of the squared errors between the
actual data points and the line.
28
Doing Statistics for Business
Deviation between
a data point and
the line
{
Figure 5.7 Deviations Between the Data
Points and the Line
29
Doing Statistics for Business
The distance between the predicted value
of Y, y called the deviation or error. The
deviation, e, is equal to y - y.
30
Doing Statistics for Business
TRY IT NOW!
Starting Salaries and Math Courses
Finding the Least-Squares Line
The Career Planning office at the university wants to use
least squares to find the equation that relates the number of math
courses taken and starting salary. The data are:
Number of math courses
Starting Salary ($)
1
26,284
1
25,470
2
26,777
3
27,269
4
28,553
6
30,054
31
Doing Statistics for Business
TRY IT NOW!
Starting Salaries and Math Courses
Finding the Least-Squares Line (con’t)
(a) Find the equation for the least-squares line for the data.
(b) Explain what the least-squares line tells the Career Planning Office
about math courses and starting salary.
32
Doing Statistics for Business
TRY IT NOW!
Starting Salaries and Math Courses
Finding the Least-Squares Line
Once the least-squares equation is obtained the Career Planning
Office wants to see whether the equation does a good job of predicting
starting salary for a given number of math courses.
Plot the data and the least-squares line on the same graph.
33
Doing Statistics for Business
TRY IT NOW!
Starting Salaries and Math Courses
Finding the Least-Squares Line
Use the equation you just found to predict the starting salary
of a person who takes 2 mathematics courses. Do the same thing
for the starting salary of a person who takes 5 mathematics courses.
Do you think the predictions from the least-squares line are useful? Why
or why not?
Is what you just did interpolation or extrapolation?
34
Doing Statistics for Business
TRY IT NOW!
Starting Salaries and Math Courses
Finding the Least-Squares Line (con’t)
Now use the equation of the least-squares line to predict the
starting salary of a person who takes no math courses. Do the
same thing for a person who takes 10 math courses.
Do you think these predictions are valid? Why or why not?
35
Doing Statistics for Business
Creating Contingency Tables in Excel
1. From the Data menu, select PivotTable and Pivot Chart
Report .
2. Since you have already highlighted the range that contains
your data, you can just proceed by clicking Next>.
3. Indicate where you want the Pivot table to appear and
click the button labeled Layout.
36
Doing Statistics for Business
Creating Contingency Tables in Excel
4. Drag the field box for Rank to the area marked Row; and
the field box for School to the area marked Column. Now
drag the field box for either Rank or School into the area
marked Data.
5. Click on Next and tell Excel where you want
the table to appear; either in the current worksheet
or in a new worksheet.
6. Finally, click on Finish.
37
Doing Statistics for Business
Figure 5.10 Contingency Table of Rank and School
38
Doing Statistics for Business
The basics of creating a chart in Excel,
using the Chart Wizard.
1. Highlight the data (Frequency table) that you want to
graph.
2. Invoke the Chart Wizard by clicking on the icon on the
toolbar.
3. Follow the directions and hints from the Chart Wizard.
4. Edit the graph to include any other features or changes
you want.
39
Doing Statistics for Business
Creating Clustered Bar Charts in in Excel
1. Highlight just a portion of the pivot table by copying it to
another location in the worksheet using the Paste Special
command. To do this click on the lower right hand corner
of the table and highlight it. Select Copy from the Edit
menu. Then, click on the location for the copy and select
Paste Special. In the Paste section, click on the radio
button for Values and in the Operation section, select
None. Click OK.
40
Doing Statistics for Business
Creating Clustered Bar Charts in in Excel
(con’t)
2. Highlight the copy of the table, being careful not to
highlight the Grand Total row or column. Then start the
Chart Wizard and select Column as the type of chart and
clustered chart as the sub-type.
3. Click Next twice to get to the Options step; add titles and
make any desired formatting changes.
4. Click Next to indicate where you want the chart to be
located and then click Finish.
41
Doing Statistics for Business
Figure 5.14
Clustered Bar Chart
42
Doing Statistics for Business
Creating Stacked Bar Charts in in Excel
1. Highlight the rows of the pivot table, being careful not to
include the Grand Total row and column and start the
Chart Wizard.
2. Select Column as the chart type and 100% Stacked Column
as the sub-type.
3. Click Next twice to get to the Options step; put in titles and
any other formatting changes that you want.
4. Click Next, enter the location for the chart, and click
Finish to display the chart.
43
Doing Statistics for Business
Figure 5.17
Stacked Bar Chart
44
Doing Statistics for Business
Analyzing Quantitative Bivariate Data
in Excel
1. Highlight the range of the data and start the Chart Wizard.
Select XY (Scatter) as the chart type and Scatter no lines)
for the subtype.
2. Select Next> twice to get to the Chart Options dialog box;
put in titles and any other formatting changes that you
want.
3. Select Next> to tell Excel where to locate the chart and
click Finish to display the chart.
45
Doing Statistics for Business
Figure 5.20 Finished Scatter Plot
46
Doing Statistics for Business
Adding the Least-Squares Line to Your
Scatter Plot
1. Click on any one of the points in the scatter plot to
highlight them all.
2. From the Chart menu, select Add Trendline. The Add
Trendline dialog box will open.
3. In the Trend/Regression section, highlight Linear.
4. Click on the Options tab and click the box labeled Display
equation on chart.
5. Click OK.
47
Doing Statistics for Business
Figure 5.24
Scatter Plot with Least-Squares Line
48
Doing Statistics for Business
Chapter 5 Summary
In this chapter you have learned:
 Some methods for looking at how two variables
are related.
 There are graphical methods, such as stacked bar
charts, clustered bar charts, and scatter plots.
 There are quantitative methods, such as least-
squares analysis.
49
Download