Presenting Data in Tables & Charts

advertisement
Presenting Data in Tables &
Charts
Organizing Numerical Data
Data with 20 or more
observations should be
organized
The Ordered Array: arranges raw
data in order from the smallest
observation to the largest
observation.
Raw Data Arranged in an Ordered
Array
5.
Auto
Cost
($)
18000
18000
21000
1000
24000
15000
30000
22000
12000
30000
17000
2300
5.
Auto
Cost
($)
1000
2000
2300
5000
6000
9000
10000
12000
12000
14000
15000
16000
17000
18000
27000
5000
18000
18000
46000
20000
26500
21000
The Ordered Array makes it easy to
identify:
• extreme values
• typical values
• range where the majority of
values are concentrated
Stem and Leaf Display:
shows where raw data
clusters over a range
of observations.
EXAMPLE:
the following data represent the
weekly salary checks earned by
a sample of eight secretaries:
$555 $490 $648 $832
$710 $590 $576 $623
First, put the values in ascending
order and then use the 100s column
as the stems, use the 10s column as
the leaves, and either ignore the
units column or round the units
column and then use the 10s column
as the leaves.
$555 $490 $648 $832 $710
$590 $576 $623
4
5
6
7
8
|
|
|
|
|
9
579
24
1
3
To further illustrate, how we can
organize data to present, analyze
and interpret findings,
we will study data from a
previous QBA questionnaire:
1) USD students’ auto costs
• 2) USD students’ maximum
auto speeds
Raw Data from student
questionnaire (partial)
1.
Age
19
18
24
20
19
20
19
19
19
21
2.
Gender
0=M, 1=F
1
1
0
1
1
0
1
1
0
0
3.
Live
4.
Campus Study /
0=Off
Week
1=On
(hrs)
0
15
1
33
1
12
1
12.5
0
12.5
0
10
1
5
1
20
1
15
1
10
5.
9.
Auto
6.
7.
8.
No. units
Cost Alch bev Sodas / Hrs. Paid this sem
($)
/ wk (#)
wk (#) / wk (hrs)
(#)
18000
4
5
13.5
14
18000
0
6.5
14
16
21000
10
3
4
23
1000
1
12.5
12
16.5
24000
2
0
20
17
15000
50
1
0
13
21
0
18
16
0
2
0
17
30000
2
13
0
16
22000
20
0
5
16
10.
TV /
video
game /
wk (hrs)
2
4
4
10
5
3
14
2
20
15
RAW
5.
Auto
Cost
($)
18000
18000
21000
1000
24000
15000
30000
ARRAY
5.
Auto
Cost
($)
1000
2000
2300
5000
6000
9000
10000
12000
12000
Stem & Leaf Auto Costs
Stem unit: 10000
0
1
2
3
4
5
6
122569
0224567888
0124577
00
56
06
Stem & Leaf MPH
Stem unit: 10
7
8
9
10
11
12
13
14
15
16
17
18
0
0
005
0000000558
0
0
0
0
0
0
00
05
5
0027
0
And just for fun, let’s look at GPA
GPA
GPA
Stem unit: = 1
2 4556889
3 00001122333334445567889
4 0
How Else Can We Organize
our Data?
Numerical Data
• Frequency Distribution
• Relative Frequency Distribution
• Percentage Frequency Distribution
• Cumulative Frequency Distribution
Frequency Distribution
Freq. Dist. MPH (22 fast car (mph))
interval
70
80
90
100
110
120
130
140
150
160
170
180
Frequency Percentage
1
1
2
8
3
3
2
2
4
4
0
1
3.2%
3.2%
6.5%
25.8%
9.7%
9.7%
6.5%
6.5%
12.9%
12.9%
0.0%
3.2%
Freq Distribution MPH (22. fast car (mph)
interval
Frequency Percentage
60
80
100
120
140
160
180
0
2
10
6
4
8
1
0.0%
6.5%
32.3%
19.4%
12.9%
25.8%
3.2%
Histogram MPH fast car (mph) midpoints
-70
90
Frequency Distribution
Frequency Distribution for Numerical
Data (5.
Auto Cost
($))
0
Frequency
Percentage
10000
7
24.14%
20000
10
34.48%
30000
8
27.59%
40000
0
0.00%
50000
2
6.90%
60000
1
3.45%
70000
1
3.45%
Selecting the Number of Classes
• There is no “correct” number of classes
(K) to use in a frequency distribution.
• However, the frequency distribution should
have at least 5 classes, but no more than
20
Caution!
• If you have too “FEW” classes (K), a large
portion of your data, lies in one class.
• However, if there are a number of empty
classes, or too many classes with a
frequency of 1 or 2, this may indicate too
“MANY” classes (K).
Approximate Number of Classes
in Frequency Distribution
# Observations
Less than 50
50 – 200
200 – 500
500 – 1,000
1,000 – 5,000
5,000 – 50,000
More than 50,000
# Classes
5–7
7–9
9 - 10
10– 11
11- 13
13 – 17
17 - 20
What do you gain by organizing
your data in a Frequency
Distribution?
Hint!
From pages of raw data
Answer
• Reduce large
numbers of data
points to a workable
number of classes
and frequencies.
• Study the frequency
distribution and learn
a great deal about the
shape of the data set.
Raw Data from student
questionnaire (partial)
1.
Age
19
18
24
20
19
20
19
19
19
21
2.
Gender
0=M, 1=F
1
1
0
1
1
0
1
1
0
0
3.
Live
4.
Campus Study /
0=Off
Week
1=On
(hrs)
0
15
1
33
1
12
1
12.5
0
12.5
0
10
1
5
1
20
1
15
1
10
5.
9.
Auto
6.
7.
8.
No. units
Cost Alch bev Sodas / Hrs. Paid this sem
($)
/ wk (#)
wk (#) / wk (hrs)
(#)
18000
4
5
13.5
14
18000
0
6.5
14
16
21000
10
3
4
23
1000
1
12.5
12
16.5
24000
2
0
20
17
15000
50
1
0
13
21
0
18
16
0
2
0
17
30000
2
13
0
16
22000
20
0
5
16
10.
TV /
video
game /
wk (hrs)
2
4
4
10
5
3
14
2
20
15
Frequency Distribution
Freq. Dist. MPH (22 fast car (mph))
interval
70
80
90
100
110
120
130
140
150
160
170
180
Frequency Percentage
1
1
2
8
3
3
2
2
4
4
0
1
3.2%
3.2%
6.5%
25.8%
9.7%
9.7%
6.5%
6.5%
12.9%
12.9%
0.0%
3.2%
Freq Distribution MPH (22. fast car (mph)
interval
Frequency Percentage
60
80
100
120
140
160
180
0
2
10
6
4
8
1
0.0%
6.5%
32.3%
19.4%
12.9%
25.8%
3.2%
Histogram MPH fast car (mph) midpoints
-70
90
Frequency Distribution
Frequency Distribution for Numerical
Data (5.
Auto Cost
($))
0
Frequency
Percentage
10000
7
24.14%
20000
10
34.48%
30000
8
27.59%
40000
0
0.00%
50000
2
6.90%
60000
1
3.45%
70000
1
3.45%
Constructing a Frequency
Distribution
•
•
•
•
•
Gather the sample data
Arrange data in an Ordered Array
Select the number of classes to be used
Determine class width: range/ # of classes
Determine the class limits for each class so that
the distribution is easy to interpret
• Count the number of data values in each class
(the raw frequencies)
• Determine the Relative Frequencies
Relative Frequency =
Raw frequency count in each class
-------------------------------------Total number of observations (n)
Relative Frequency is essential
for comparing the relationship
between two data sets.
To Convert Relative Frequency
to Percent Frequency:
Multiply Relative Frequency X 100
Example
15. A doctor's office staff has studied the waiting times for patients who arrive at
the office
with a request for emergency service. The following data were collected
over a one-month period (the waiting times are in minutes).
2 5 10 12 4 4 5 17 11 8 9 8 12 21 6 8 7 13 18 3
Use classes of 0 - 4, 5- 9, and so on.
a. Show the frequency distribution.
b. Show the relative frequency distribution.
c. Show the cumulative frequency distribution.
d. Show the relative cumulative frequency
distribution.
How Else Can We Organize
our Data?
Graphic Techniques to
Describe Numerical Data
1) Histogram (continuous data)
2) Polygon
3) Ogive
4) Scattergram
Histogram
• Uni-modal
• Bi-modal
• Skewed:
i) right or positively skewed
ii) left or negatively skewed
Histogram Auto Costs
Histogram of Auto Cost
10
8
6
4
2
Midpoints ($)
65000
55000
45000
35000
25000
15000
0
--
Frequency
12
Histogram MPH
Frequency
Histogram
15
10
5
0
--
70
90
110
Midpoints
130
150
170
Negative or Left Skewed
Positive or Right Skewed
Quiz
Would incomes
of employees
in large firms
tend to be
positively or
negatively
skewed? Why?
Quiz
Do exam
grades tend to
be positively or
negatively
skewed? Why?
A Scatter Diagram
Graphs bivariate data to
examine whether a relationship
exists between two numerical
variables.
Is there a relationship between the
price of their auto and the
maximum MPH a USD student has
driven?
Scatter Diagram Speed vs. Cost
200
(MPH)
150
100
c
50
0
0
20000
40000
($)
60000
80000
Is there a relationship between the
number of alcoholic beverages
consumed per week and the
number of hours studied per
week?
Scattergram: Weekly Alchohol Amount vs Hours Studied
50
No. of Drinks
40
30
20
10
0
0
5
10
15
20
Hours
25
30
35
$Wk Entertainment vs #Alcoholic Bev/wk
$Wk Entertain
200
150
100
50
0
0
10
20
30
Alcohol Bev/wk
40
50
60
GPA vs mph
4
GPA
3
2
1
0
50
70
90
110
130
mph
150
170
190
Alcoholic Beverages/wk
MPH vs Alcoholic Beverages/wk
60
50
40
30
20
10
0
50
70
90
110
130
MPH
150
170
190
mph
0
70
0
80
4
90
21
95
8
100
4
100
15
100
1
100
7
100
0
100
2
100
6
105
12
105
3
108
50
120
0
120
2
120
0
130
10
130
2
135
MPH vs Alcoholic Beverages/wk
Alcoholic Beverages/wk
Alch bev / wk
(#)
60
50
40
30
20
10
0
50
70
90
110
130
MPH
150
170
190
Scatter Diagram: Entertainment Vs. Cash On Hand
Typical $ On-Hand
120
100
80
60
40
20
0
0
50
100
150
Entertainment $ per Week
200
Hours Paid Vs. Cash on Hand ($)
Cash on Hand $
120
100
80
60
40
20
0
0
10
20
30
Hours per week work
40
50
Tables & Charts for Categorical
Data
1) Summary Table: similar to Frequency
Distribution.
2) Contingency Table for Crosstabulation of
Bivariate Categorical Data.
3) Bar Chart: graphical representation of
frequency of occurrence.
4) Pie Chart: graphical emphasis of proportion
5) Pareto Diagram
6) Side-by-Side Bar Charts: for bivariate
categorical data.
Summary Table
Presentation of Categorical Data
Coke Classic
Diet Coke
Pepsi-Cola
Diet Coke
Coke Classic
Coke Classic
Dr. Pepper
Diet Coke
Pepsi-Cola
Pepsi-Cola
Coke Classic
Dr. Pepper
Sprite
Coke Classic
Diet Coke
Coke Classic
Coke Classic
Sprite
Coke Classic
Diet Coke
Coke Classic
Diet Coke
Coke Classic
Sprite
Pepsi-Cola
Coke Classic
Coke Classic
Coke Classic
Pepsi-Cola
Coke Classic
Sprite
Dr. Pepper
Pepsi-Cola
Diet Coke
Pepsi-Cola
Coke Classic
Coke Classic
Coke Classic
Pepsi-Cola
Dr. Pepper
Coke Classic
Diet Coke
Pepsi-Cola
Pepsi-Cola
Pepsi-Cola
Pepsi-Cola
Coke Classic
Dr. Pepper
Pepsi-Cola
Sprite
Show:
a. Freq distribution
b. Relative Freq
c. Percent Freq
d. Bar graph
e. Pareto diagram
f. Pie chart.
Soft drink Freq. Relative Freq
Coke Classic 19
Diet Coke
8
Dr. Pepper
5
Pepsi-Cola
13
Sprite
5
.38 or 38%
.16 or 16%
.10 or 10%
.26 or 26%
.10 or 10%
Total
1.00 or 100%
50
Contingency Table for
Crosstabulation of Bivariate
Categorical Data
CROSSTABULATION OF QUALITY RATING AND MEAL PRICE FOR 300
LOS ANGELES RESTAURANTS
Meal Price
Quality Rating
$10 – 19
$20 – 29
$30 – 39
$40 – 49
Total
Good
42
40
2
0
84
Very Good
34
64
46
6
150
Excellent
2
14
28
22
66
78
118
76
28
300
Total
ROW PERCENTAGES FOR EACH QUALITY RATING CATEGORY
Meal Price
Quality Rating
$10 – 19
$20 – 29
$30 – 39
$40 – 49
Good
50.0
47.6
2.4
0.0
Very Good
22.7
42.7
30.6
4.0
Excellent
3.0
21.2
42.4
33.4
Total
100
100
100
Gender vs. Number of Alcoholic
Drinks per Week
Crosstabs Gender vs Alcohol/wk
Count of Gender 0=M, 1=F Alcohol/wk
Gender 0=M, 1=F
0
1
Grand Total
0
2
2
4
1
9
7
16
2
4
1
5
4
0
1
1
5 Grand Total
1
16
0
11
1
27
Contingency Table for Gender vs.
Auto Costs
Crosstabs of Gender vs Auto Cost
Count of Gender 0=M, 1=F Auto Cost
Gender 0=M, 1=F
0
1
Grand Total
1
5
2
7
2
4
6
10
3
4
2
6
5
2
0
2
6
1
0
1
7 Grand Total
0
16
1
11
1
27
Contingency Table of Gender vs.
MPH
Crosstabs Gender vs mph
Count of Gender 0=M, 1=F mph
Gender 0=M, 1=F
70-90
0
0
1
2
Grand Total
2
91-110
3
6
9
111-130
3
1
4
131-150
3
1
4
5
7
1
8
Grand Total
16
11
27
Contingency Table of Live on/off
Campus by Gender
On (1) Off (0) Campus
Gender
Male 0
Female 1
0
1
Totals
0
9
5
14
1
9
10
19
Totals
18
15
33
Gender
0=M, 1=F
0
1
0
0
0
0
1
1
1
1
0
0
0
1
0
1
0
0
0
0
0
1
1
0
1
0
1
Auto
Cost
Alch bev
($)
/ wk (#)
15000
50
18000
0
21000
10
9000
1
26500
6
2000
4
18000
6
20000
15
27000
36
24000
2
12000
15
16000
6
30000
2
1000
1
5000
12
18000
4
46000
7
6000
12
25000
0
60000
6
17000
20
66000
2
12000
0
45000
0
2300
3
10000
10
14000
2
mph
120
70
145
160
150
100
105
100
150
135
152
160
150
100
105
90
100
180
120
157
140
120
100
130
108
130
100
Auto CostAlcohol/wk
2
5
2
0
3
1
1
1
3
1
1
1
2
1
2
2
3
4
3
1
2
2
2
1
3
1
1
1
1
2
2
1
5
1
1
2
3
0
6
1
2
2
7
1
2
0
5
0
1
1
1
1
2
1
mph
3
1
4
5
5
2
2
2
5
4
5
5
5
2
2
1
2
5
3
5
4
3
2
3
2
4
2
The following data are for 30
observations on two qualitative
variables, X and Y. The categories
for X are A, B, and C; the
categories for Y are 1 and 2.
Categories for X are A,B, and C. The
categories for Y are 1 and 2.
Obs.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
x
A
B
B
C
B
C
B
C
A
B
A
B
C
C
C
B
C
B
C
B
C
B
C
A
B
C
C
A
B
B
y
1
1
1
2
1
2
1
2
1
1
1
1
2
2
2
2
1
1
1
1
2
1
2
1
1
2
2
1
1
2
a. Develop a crosstabulation for the data with x in the rows and
y in the columns.
b. Compute the row percentages.
c. Compute the column percentages.
d. What is the relationship, if any, between x and y?
Side-by-side Bar Charts
M al e
Drop Page Fields Here
C r os s t a b Ge nde r v s A ut o $
Count of Gender 0=M , 1=F
Femal e
A ut o Cost
1
7
6
Gender 0=M , 1=F
5
3
2
M al e
0
1
0
1
2
3
4
5
6
7
Pareto Diagram
Separates the “vital few”
from the “trivial many”.
Download