In part two of this lecture I’ll talk about X,... I’ll present alternative and better constructions of Chart Deception: Part 2

advertisement
Chart Deception: Part 2
Slide 1
In part two of this lecture I’ll talk about X, Y or scatter charts and then provide many examples of
poorly constructed graphics. In many cases, I’ll present alternative and better constructions of
the same graphics.
Slide 2
Some of this advice is critical to designing non-deceptive scatter charts. It’s important to label
the data points.
Slide 3
This graph is essentially meaningless for communicating real information because there are no
data points and no axes labels.
Slide 4
In contrast to the previous graph, this graph includes both labeled points and axes.
Slide 5
You’ll notice, as you start looking at the graphics more critically, that graphical displays like
those in USA Today exaggerate differences or changes or trends. USA Today does this by not
starting it’s axes at the origin (the 0,0 point). The next four slides illustrate this problem.
Slide 6
Here the Y axis starts at 3 and ends in 9, so there appears to be a huge shift in average orders
by month. People will remember the shape of this graph, and a quick glance suggests there
were no orders in October, November, and December, but a substantial orders in July, August,
and September.
Slide 7
In contrast to the previous slide, this slide suggests a much more stable pattern of average
orders per salesperson over the six month period. The first slide suggests tremendous
variability, which is incorrect. The true message is that orders were a bit higher in July through
September and a bit lower in October through December. A graph that suggests otherwise is
deceptive. Also notice that the Y axis starts at the zero point and extends beyond the top-most
value, which occurred in August.
Slide 8
Here’s another example of the failure to use the 0,0 origin as the starting point. A quick look at
this graph suggests that the average orders per salesperson are highly variable and that
September was a disastrous month.
Page | 1
Slide 9
In fact, just the opposite is true. The average orders have been extraordinarily stable across this
six month period. The first graph leaves viewers with a distorted impression of the average
orders over this six month period.
Slide 10 (No Audio)
Slide 11
In this graph, distortion is caused from treating unequal time intervals as equal. The graph on
the right, B, shows a huge area associated with the period between 1975 and 1980. As a result,
it seems as if there was a long take off period before the increase in dollars. Alternatively, the
graph on the left, A, presents an undistorted X axis. In A, dollars seem relatively stable over the
first half of the graph, but from 1985-2000 there was a meaningful increase in dollars. You
should avoid graphs like B, in which the period from 1975-1980 and the period from 1995-2000
are distorted.
Slide 12
Even if you depict equal intervals identically in your graph, you can still do things to change the
visual impression. The top left graph shows the original scaled arrangement; you can see what
happens as you start expanding and contracting the X and Y axes. In some cases, you can
make changes seem much smaller; in other cases, you can make the changes seem much
greater. You shouldn’t distort the message by arbitrarily expanding or contracting the horizontal
and vertical scales of your graph. X, Y plots are meant to indicate trends and variability. You can
influence that message by arbitrarily expanding or contracting the X and Y axes.
Slide 13 (No Audio)
Slide 14
As I mentioned earlier, avoid broken axes. Here, the Y axis jumps from 0 to 9, so it seems as if
the designer’s properly started at the Y axis at the zero (0) point, but in fact has distorted the
message. Thus, it appears that this line graph depicts a huge increase from 1930 to 1970. The
numbers in the graph indicate a roughly 50% increase. Do you believe anyone viewing this
graph casually will see only a 50% increase when the graph suggests—due to the broken Y
axis—that there’s a 400 to 500% increase? Using discontinuous axes to depict data will only
confuse and distort the information.
Slide 15
Here are three more examples of why you shouldn’t distort a chart by using broken axes and
why you should start from the 0 point on the X and Y axis.
Slide 16 (No Audio)
Slide 17
There are certain conventions when people view graphs, and one of those conventions is that
an upward sloping line means the quantity is increasing over time, especially if time is on the X
axis and the amount in question is on the Y axis. This cumulative rainfall graph seems to
indicate that rainfall increased markedly from June to December. In fact, the individual amounts
for each month listed at the bottom of the slide indicate no increase. Avoid using cumulative
Page | 2
charts unless you’ve a good reason for doing so because cumulative charts tend to be
inconsistent with people’s chart expectations.
Slide 18
Although this is a bar chart instead of an X, Y chart, using cumulative bar charts presents a
similar problem. Avoid using cumulative charts.
Slide 19
Occasionally, people seem compelled to put multiple graphs into the same single graph.
Somehow, this arrangement is meant to depict vital information about how the two graphs are
related. This format can only confuse people, as you’ll see in the next three slides.
Slide 20
Someone looking at this graph is supposed to conclude that sales are dropping because
inventory is dropping; however, those things could be independent. Graphing those things
together suggests a relationship when none may exist. If it’s necessary to show sales and
inventories, you should use two graphs, not one.
Slide 21
Here’s another example of graphing two things together that suggests these things are related.
It’s doubtful that consumption triples when the outdoor temperature increases from 80 to 95
degrees, yet that’s what this slide suggests. Although these things may be barely or strongly
related, but this slide indicates they’re strongly related, which may or may not be the case.
Slide 22
This graph is meant to indicate newspaper readership over time. The Daily News and The Post
are two New York City newspapers. A quick look at this graph suggests that readership is
convergent, which isn’t true, at least not as dramatically as suggested. Part of the problem is
this graph is discontinuous; it suddenly drops from 800,000 readers per day to 1.5 million
readers per day. In other words, there are two graphs—the Daily New graph on the top and the
Post graph on the bottom—that someone stuck together. Although the readership of the Post is
increasing and the readership of the Daily News is decreasing, they are not converging as
rapidly as suggested by this graph.
Slide 23
As I mentioned in an earlier lecture, many marketing relationships are non-linear, and one way
to linearize a relationship is to transform the data. Converting data into its log can be useful for
scientific audiences used to semi-log charts. However, such transformations will deceive nonscientists who don’t understand it. In this case, the use of semi-log charts minimizes the
appearance of a trend.
Slide 24
Here’s a logarithmic transformation. Equal intervals denote increases by a power of 10. One is
10 to the zero power, 10 is 10 to the first power, 100 is 10 to the second power, and 1000 is 10
to the third power. Seemingly, there’s not much of a trend from 1995-1997, and the extrapolated
area suggests a mild increase over time. In fact, if this graph is correct, then the increase from
Page | 3
1995 to 1999 is from 10 to 1000, or a 100-fold increase. A casual look at the graph suggests
only a tripling.
Slide 25
This is another one of those examples of people’s cultural norms regarding the reading of
graphs. If the X axis isn’t the time axis then connecting the dots can mistakenly suggest a trend
because that’s what people expect when they see X, Y charts, a bunch of dots, and a line that
connects those dots. Don’t connect the dots unless the X axis is the time axis.
Slide 26 (No Audio)
Slide 27
Finally, regarding X, Y charts, I want to ensure you’re clear about interpolation and
extrapolation. With interpolation you’ve got two points on the graph and you’re trying to guess
the midpoint between those two points. With extrapolation, you’re looking at all the points up to
the end of the graph and then trying to guess subsequent points beyond the current points. Both
interpolation and extrapolation are subjective assessments. Interpolation may seem safer
because you’re guesstimating a midpoint. If you have a large series of points, you’ll feel
comfortable that there won’t be some dramatic change right at the midpoint or somewhere
between two existing points. With extrapolation, it’s impossible to know whether trends will
continue or not; there could be a dramatic increase or decrease relative to the current trend not
suggested by theory, sophisticated forecasting methods, and the like. Just remember that
extrapolations are highly subjective and interpolations are somewhat subjective.
Slide 28 (No Audio)
Slide 29
Here’s what I mean by a radar chart. Although these are popular in Japan, they are problematic.
In this example, the axes are not identical; although acceleration goes from 0 to 15 but handling
goes from 0 to 8, these two axes are the same length. Fuel economy goes from 0 to 10, riding
and styling goes from 5 to 15. Even if you put multiple plots on the same graphic, you’ve got all
the problems that I mentioned earlier about the use of profile analysis and semantic differentials:
you can’t know what’s important and you can distort people’s perceptions by changing the
relative size of the different and unrelated axes. I urge you to avoid radar charts. Fortunately, no
current spreadsheet, graphics, or statistical packages uses this approach for plotting data.
Slide 30
People seem to focus on point estimates—modes, medians, and means—which are the single
best summary numbers. For metric—interval- or ratio-scaled—data, that point is an estimate
based on the sample you drew. If you drew subsequent samples, you may find different point
estimates. It’s important to give the viewers of your graphs a sense for the range of likely point
estimates for repeated samples. That’s what you depict when you show point estimates and
confidence intervals around those estimates. The next three slides show confidence intervals as
well as the point estimates.
Slide 31 to Slide 33 (No Audio)
Page | 4
Slide 34
You always want to use footnotes to indicate the source of the information depicted in any table
that you create. Otherwise, you can be accused of plagiarism, and that’s a serious charge. In
this Internet era, many people believe that borrowing liberally from other sources is okay, as the
true creativity is in the remixing of existing sources. That’s not a good mindset. If it’s obvious
that you have borrowed something but haven’t indicated the source, then that’s plagiarism. Try
to avoid plagiarism by using footnotes.
You should use proportional fonts and you shouldn’t mix fonts. I use arial fonts for everything. If
you import a jpg file, then it’s difficult to manipulate fonts because you’re starting from a picture.
Don’t use all uppercase lettering; instead, mix upper and lower case letters. Using uppercase
only is equivalent to shouting. If you need to emphasize, then use underline or italics. Finally,
consider how you depict numbers. If the numbers represent data points, then use numbers;
otherwise, spell out the numbers one through nine. After that, you can use digits for 10 through
infinity.
Slide 35 (No Audio)
Slide 36
The next four slides depict bad graphics but suggest no fix. The text indicates what’s
inappropriate about the graph and what makes it bad suggests improvements.
Slide 37 to Slide 38 (No Audio)
Slide 39
This graph is lousy for two reasons. First, the Y axis is logarithmic. Second, the graph depicts
two different things, which suggests a relationship when none may exist.
Slide 40
This is another poor graph. Notice the effort to compare males to females over the same period,
1968-1976. Between the two graphs are dotted lines that imply a precipitous drop from males to
females. Although there’s a drop, it’s not as much as implied by this graph. By comparing, in
that dotted line, 1976 data for males to 1968 data for females, the graph creates a distorted
impression.
Slide 41 (No Audio)
Slide 42
Here’s an example of stacked 3-D bar charts to reveal a trend from 1971 to 2000 across five
different sources of electricity.
Slide 43
The real message from the previous chart is that there are profound increases predicted for
petroleum and nuclear energy, but only modest increases for other sources. That point is
obscured in the previous slide but made obvious in a scatter plot instead of a stacked 3-D bar
chart.
Slide 44 (No Audio)
Page | 5
Slide 45
In the previous slide, the Y axis did not begin at the origin. Here’s what happens when you take
that same data and plot it with the Y axis for expenditures per pupil starting at the origin. In this
case, it’s clear that expenditures have been relatively stable. The previous slide implies that
upward trends in expenditures have caused upward trends in SAT scores. The data, if depicted
properly, suggests just the opposite.
Slide 46 (No Audio)
Slide 47
The problem with the first graph in this two graph series is that 1978 data was only partial data.
As a result, it suggests that there was a downward trend from 1976 to 1978. In fact, a careful
look at the 1978 data indicates an upward trend in commission payments.
Slide 48 (No Audio)
Slide 49
The previous stacked bar chart version of this data obscured that different countries were
pulling or not pulling their weight in this regard. By dividing the one stacked bar chart into four
separate graphs, it appears that the U.S. has added production, Japanese production is flat,
West Germany reserves have fluctuated, and the stock for all other OECD countries has
declined. The first oil stocks graph obscures whether or not U.S. stocks have increased, the
stocks of the two other countries have remained stable, and the stocks of the remaining OECD
countries have declined.
Slide 50
Here’s an excellent example for why you should avoid the chart junk that appears in
publications like USA Today. Notice that the graph in the background contains barrels of beer
and this is supposed to artistically indicate changes in beer sales from 1970 to 1978. In 1970,
the number of barrels was 120 million. This graph doesn’t start from the zero point; rather, it
starts from 100 million. In 1978, total sales were 160 million barrels. The difference between 120
million and 160 million is 40 million, or a 33 1/3% increase. Because the graph starts from a
non-zero point on the Y axis and amounts sold are associated with barrel sizes, the 1978 barrel
appears huge relative to the 1970 barrel. A quick glance suggests that U.S. beer consumption
increased tremendously in the eight year period. The graph in the foreground indicates the
millions of barrels sold by Schlitz, and the brewer’s market share seems in rapid decline.
Slide 51
However, as this next graph shows, that’s not the case. In fact, U.S. beer sales grew steadily
throughout the 1970s. This graph indicates what I mentioned earlier: sales are up 1/3rd and that
Schlitz’s original market share of 15% first grew to 25%, and then declined to 20%. The
previous graph suggests that the market exploded and Schlitz sales not only declined but its
share of the market dropped far more than 5% from its peak.
Page | 6
Slide 52
This graphic shows someone’s artistic approach to illustrating the declining purchase power of
the dollar. Starting with 1958 as the base year, when Eisenhower was President, the dollar was
worth a dollar. By Jimmy Carter’s time in 1978, the value of a dollar dropped to $0.44.
Slide 53
As this graph shows, the purchasing power of the dollar has declined since the Eisenhower
Administration through at least half of the Carter Administration. The dollar on this graph is
worth less than half in 1978 as it was worth in 1958. That’s the correct interpretation. The
problem with the previous graph and all the chart junk is that dollars were depicted in areas
shown as a picture of a dollar. If you were to take a ruler and measure the length and width of
the dollars from 1958 and 1978, the ratio is roughly 1 to 0.44. However, the area taken by the
1978 dollar is nowhere near half of area taken by the 1958 dollar. Once again, as was the case
with multiple pie charts, the relative areas are supposed to indicate relative quantities.
Slide 54 (No Audio)
Slide 55
As the headings for these two revised graphs show, relative to the previous slide, the message
was that during the 1970s there was an increasing positive balance of trade with China, but it
worsened with the trade deficit with Taiwan. Because of the mixed metaphor in the last draft,
that was difficult to discern.
Slide 56
As a final example, here’s a table and relates male versus female life expectancies in different
countries. One might think this a reasonable way to present this information because the
countries are listed alphabetically. What you’re supposed to discern from this graph is females
outlive males.
Slide 57
As this alternative table shows, a cross-country comparison is easier when life expectancies are
organized from highest to lowest rather than listed alphabetically by country name. Notice the
even groupings. The first grouping of six countries shows the life expectancies for females
exceed or equal 70. The next grouping shows two countries and the life expectancies are
between 60 and 69. For the last grouping, the life expectancy is between 40 and 49.
Slide 58
Something like this might be more powerful. Women are shown on the left-hand side and men
on the right hand side. These aren’t horizontal bar charts; they’re just indications. The only
important thing is the age number in the middle of this display, in which the oldest is at the top
and the youngest is at the bottom. This graph shows a rank order and grouping of countries by
life expectancy at birth by sex. This is a powerful way to summarize the data in the previous
table. The bottom line is to think about the point you’re trying to make and be certain that your
graphical displays or tables make that point in a way that’s not deceptive.
Page | 7
Download