3A-1
Chapter
3A
Describing Data Visually (Part 1)
Visual Description
Dot Plots
Frequency Distributions and Histograms
Line Charts
Bar Charts
McGraw-Hill/Irwin
© 2008 The McGraw-Hill Companies, Inc. All rights reserved.
3A-3
Visual Description
• Methods of organizing, exploring and summarizing
data include:
- Visual (charts and graphs)
provides insight into characteristics of a data set
without using mathematics.
- Numerical (statistics or tables)
provides insight into characteristics of a data set
using mathematics.
3A-4
Visual Description
• Begin with univariate data (a set of n observations
on one variable) and consider the following:
Characteristic
Interpretation
Measurement
What are the units of measurement?
Are the data integer or continuous?
Any missing observations? Any concerns with
accuracy or sampling methods?
Central Tendency
Where are the data values concentrated? What
seem to be typical or middle data values?
3A-5
Visual Description
Characteristic
Interpretation
Dispersion
How much variation is there in the data?
How spread out are the data values?
Are there unusual values?
Shape
Are the data values distributed symmetrically?
Skewed? Sharply peaked? Flat? Bimodal?
3A-6
Visual Description
 Example: Price/Earnings Ratios
• P/E ratios are
current stock
price divided
by earnings
per share in
the last 12
months. For
example:
3A-7
Visual Description
 Measurement
• Look at the data and visualize how it was collected
and measured.
 Sorting
• Sort the data and then summarize in a graphical
display. Here are the sorted P/E ratios:
8
10
10
10
13
13
14
14
15
15
16
16
17
18
19
19
20
20
21
22
23
26
26
27
29
29
34
48
55
68
• A histogram graphically displays sorted data.
3A-8
Visual Description
 Sorting
• Sorting allows you to observe central tendency,
dispersion and shape as well as minimum, maximum
and range.
• What else do
you observe?
3A-9
Dot Plots
• A dot plot is the simplest graphical display of n
individual values of numerical data.
- Easy to understand
- Not good for large samples (e.g., > 5,000).
 Steps in Making a Dot Plot
1. Make a scale that covers the data range
2. Mark the axes and label them
3. Plot each data value as a dot above the scale at
its approximate location
If more than one data value lies at about the same
axis location, the dots are piled up vertically.
3A-10
Creating a dot plot in MegaStat
3A-11
Dot Plots
• Range of data shows dispersion.
• Clustering shows central tendency.
• Dot plots do not tell much of shape of distribution.
• Can add annotations (text boxes) to call attention
to specific features.
3A-12
Dot Plots
 Small Sample: Home Prices
• Consider the
following median
home prices for
nine U.S. Cities.
Metropolitan Area
Median Home Price
(000)
Akron OH
119.6
Bergen-Passaic NJ
363.0
Bradenton FL
170.4
Colorado Springs CO
181.7
Hartford CT
198.5
Milwaukee WI
186.2
Raleigh-Durham NC
173.8
San Francisco CA
560.2
Topeka KS
100.7
3A-13
Dot Plots
 Small Sample: Home Prices
• A dot plot is useful to realtors as they discuss
patterns in home selling prices within their
community.
3A-14
Dot Plots
 Comparing Groups
• A stacked dot plot compares two or more groups
using a common X-axis scale.
Frequency Distributions
and Histograms
3A-15
 Bins and Bin Limits
• A frequency distribution is a table formed by
classifying n data values into k classes (bins).
• Bin limits define the values to be included in each
bin. Widths must all be the same.
• Frequencies are the number of observations within
each bin.
• Express as relative frequencies (frequency divided
by the total) or percentages (relative frequency
times 100).
Frequency Distributions
and Histograms
 Constructing a Frequency Distribution
1. Sort data in ascending order (e.g., P/E ratios)
8
10
10
10
13
13
14
14
15
15
16
16
17
18
19
19
20
20
21
22
23
26
26
27
29
29
34
48
55
68
2. Choose the number of bins (k)
- k should be much smaller than n.
- Too many bins results in sparsely populated
bins, too few and dissimilar data values are
lumped together.
3A-16
3A-17
Frequency Distributions
and Histograms
 Constructing a Frequency Distribution
- Herbert Sturges proposes the following rule:
Sample Size Number of Bins
(n)
(k)
Sample Size Number of Bins
(n)
(k)
16
5
256
9
32
6
512
10
64
7
1024
11
128
8
Frequency Distributions
and Histograms
 Constructing a Frequency Distribution
3. Set the bin limits:
Bin width 
X max  X min
k
For example, for k = 7 bins, the approximate bin
width is:
Bin width 
68  8 60

 8.57
7
7
To obtain “nice” limits, we round the width to 10
and start the first bin at 0 to get bin limits:
0, 10, 20, 30, 40, 50, 60, 70
3A-18
Frequency Distributions
and Histograms
3A-19
 Constructing a Frequency Distribution
4. Put the data values in the appropriate bin
In general, the lower limit is included in the bin
while the upper limit is excluded.
5. Create the table, you can include
Frequencies – counts for each bin
Relative frequencies – absolute frequency divided
by total number of data values.
Cumulative frequencies – accumulated relative
frequency values as bin limits increase.
Frequency Distributions
and Histograms
What are the bin limits for the P/E ratio data?
Cumulative
Relative
Frequency
Bin Range
Frequency
Relative
Frequency
0<P/E Ratio<10
1
0.0333
0.0333
10<P/E Ratio<20
15
0.5000
0.5333
20<P/E Ratio<30
10
0.3333
0.8666
30<P/E Ratio<40
1
0.0333
0.8999
40<P/E Ratio<50
1
0.0333
0.9332
50<P/E Ratio<60
1
0.0333
0.9665
60<P/E Ratio<70
1
0.0333
0.9998
3A-20
Frequency Distributions
and Histograms
 Histograms
• A histogram is a graphical representation of a
frequency distribution.
Y-axis shows frequency within each bin.
• A histogram is a bar chart.
X-axis ticks shows end points of each bin.
3A-21
Frequency Distributions
and Histograms
 Histograms
• Consider 3 histograms for the P/E ratio data with
different bin widths. What do they tell you?
3A-22
3A-23
Obtaining a histogram in Excel
Frequency Distributions
and Histograms
3A-24
 Excel Histograms
• Specify a range of cells containing the bin limits or
accept Excel’s default.
Frequency Distributions
and Histograms
 Excel Histograms
• Once created, you can modify the resulting
histogram to make it more attractive.
3A-25
3A-26
In MegaStat, you can
specify the interval width
and lower limit of the first
interval or accept the default
Frequency Distributions
and Histograms
 MegaStat Histograms
• MegaStat
shows percents
on the Y-axis
instead of
frequencies.
3A-27
Frequency Distributions
and Histograms
 MegaStat Histograms
• MegaStat also provides a frequency distribution
including cumulative frequencies.
3A-28
3A-29
In MINITAB, choose Graphs > Histograms
and accept all defaults.
Frequency Distributions
and Histograms
 MINITAB Histograms
• Right-click
the X-axis to
adjust the
bins, axis tick
marks, etc.
3A-30
Frequency Distributions
and Histograms
3A-31
 Modal Class
• A histogram bar that is higher than those on either
side.
• Monomodal – a single modal class.
• Bimodal – two modal classes.
• Multimodal – more than two modal classes.
• Modal classes may be artifacts of the way bin
limits are chosen.
Frequency Distributions
and Histograms
3A-32
 Shape
• A histogram suggests the shape of the population.
• It is influenced by number of bins and bin limits.
• Skewness – indicated by the direction of the longer
tail of the histogram.
Left-skewed – (negatively skewed) a longer left
tail.
Right-skewed – (positively skewed) a longer right
tail.
Symmetric – both tail areas approximately the
same.
3A-33
3A-34
Line Charts
 Simple Line Charts
• Used to display a
time series or spot
trends, or to
compare time
periods.
• Can display
several variables
at once.
3A-35
Line Charts
 Simple Line Charts
• Two-scale line chart – used to compare variables
that differ in magnitude or are measured in
different units.
3A-36
Line Charts
 Grid Lines
• A line graph usually has no vertical grid lines.
Horizontal lines can be added to make it easier to
establish the y value. Which is easier to read?
3A-37
Line Charts
 Log Scales
• Arithmetic scale – distances on the Y-axis are
proportional to the magnitude of the variable being
displayed.
• Logarithmic scale – (ratio scale) equal distances
represent equal ratios.
• Use a log scale for the vertical axis when data vary
over a wide range, say, by more than an order of
magnitude.
• This will reveal more detail for small data values.
3A-38
Line Charts
 Log Scales
• Log scale is only suited for positive data values.
• Reveals whether the quantity is growing at an
increasing percent (concave upward),
constant percent (straight line), or
declining percent (concave downward)
Arithmetic scale
Log scale
3A-39
Line Charts
 Example: U.S. Trade
Arithmetic scale
Log scale
• What does the log scale graph tell you about
growth rate for both series?
3A-40
Line Charts
 When to Use Log Scales
• Useful for
- time series data that might be expected to
grow at a compound annual percentage rate
(e.g., GDP, national debt, future income)
- financial charts that cover long periods of
time-data that grow rapidly (e.g., revenues)
3A-41
Line Charts
 Tips for Effective Line Charts
1. Line charts are used for time series data (never
for cross-sectional data).
2. Y-axis shows numerical variable while X-axis
shows time units with time increasing left to right.
3. Use a zero origin on the Y-axis unless more detail
is needed.
3A-42
Line Charts
 Tips for Effective Line Charts
4. Omit numerical labels on a line chart to avoid
clutter. Use gridlines if needed.
5. Use data markers (squares, triangles, circles) if
they don’t clutter the graph.
6. Don’t make lines too thick.
3A-43
Bar Charts
 Plain Bar Charts
• Most common way to display attribute data.
- Bars represent categories or attributes.
- Lengths of bars represent frequencies.
Vertical Bar Chart
Horizontal Bar Chart
3A-44
Bar Charts
 3-D and Novelty Bar Charts
3-D Bar Chart
Pyramid Chart
3A-45
Bar Charts
 Pareto Charts
•
Special type of bar chart used in quality management to
display the frequency of defects or errors of different types.
•
Categories are
displayed in
descending order
of frequency.
•
Focus on
significant few
(i.e., few
categories that
account for most defects or errors).
3A-46
Bar Charts
 Stacked Bar Chart
• Bar height is the
sum of several
subtotals.
Areas may be
compared by
color to show
patterns in the
subgroups and
total.
3A-47
Bar Charts
 Bar Charts for Time Series Data
• Bar charts can be used for time series data
although it may be harder to compare trends.
Line Chart
Bar Chart
3A-48
Bar Charts
 Tips for Effective Bar Charts
1. Show the numerical variable of interest with
vertical bars on the Y-axis, category labels on the
X-axis.
2. For time series quantities, display the category
labels on the horizontal X-axis with time
increasing from left to right.
3. The height or length of each bar should be
proportional to the quantity displayed.
4. Put numerical values at the top of each bar,
except if too cluttered.