2 Graphing 2.1 Introduction Most people respond better to pictures than words and numbers, so even a table of frequencies is still not the best way to present data. Graphs show trends and differences in data very quickly, and different types of graphs are available to show the data in various ways. GENERAL PRINCIPLES FOR CONSTRUCTING GRAPHS You should apply the following general rules when constructing a graph: • axes labelled clearly • informative title • axis scale is the same all the way along – if 0-10 covers 1 cm, then so should 90-100 etc • the axis scales must be shown • where multiple series are plotted on the same graph, make it clear which data belongs to which line/column by the use of legends • don’t over-complicate the graph by the use of too many colours/gridlines/3-D (see below) • the measured variable should be plotted on the vertical axis FORMATTING ISSUES – Excel charting in general Excel’s graphing function has so many options, it becomes a mixed blessing. Some of the appearance options tend to swamp the graph with unnecessary “bells and whistles. Do not accept the default settings without thinking “can I make this look better?” 2D vs 3D columns etc There is no doubt that “3-dimensional” columns look better than plain rectangles. However, they can make it difficult to judge where the top of the “column” is, relative to the vertical axis. Gridlines (see below) can help in this regard. However, 3D pie charts are too be avoided like the plague. This will be explained further later in this chapter. Gridlines As noted above, they help with 3D columns, but are less needed with 2D columns, and are totally distracting with line-based graphs. Legend A legend is crucial where there are more than one data series, but totally useless when there is only one (e.g. calibration graph). When the legend is necessary, it is made useless if the labels are simply the default Series 1, Series 2 etc. Make sure you fix this using the Series/Name (how to do this it will depend on which version of Excel you have). Background Older versions of Excel used to default to a graph background of mid-grey. On the printed page, this becomes very distracting, and should be changed to no background (or white if you are copying it to a Powerpoint). Fortunately, this doesn’t happen in newer ones. Colours & patterns If you have to change colours of lines, column, points etc, make sure they don’t end up similar brightness, because black & white printing will mask the difference. Avoid patterns – they are too hard on the eyes. Sci. Info Skills TYPES OF GRAPHS Excel allows you access to numerous graph (chart) types, but be warned that not every type is useable for every type of data set. Table 2.1 provides some guidance as to what each can be used for (in the context of this subject, but the best advice is to ask yourself the following question when you have created a graph: Does this graph actually tell me what I want it to? If the answer is anything but a resounding yes, it is back to the drawing board! TABLE 2.1 Excel chart types Excel chart type Used for Column, bar, cylinder, cone, pyramid Plotting frequencies for category data; the only difference between them is appearance – a matter of personal preference Line Only useful for plotting frequencies for tallied numerical data where a XY scatter doesn’t work (eg group values on horizontal axis); must not be used for category data Pie, doughnut To show relative proportions of different categories; doughnut allows multiple different sets to be compared XY scatter numerical data where two variables have some relationship to each other, eg concentration vs time, temperature vs CO2, absorbance vs concentration Area A line graph where the space underneath is filled with a solid colour; can be used for multiple related sets (eg classes of wastes) Radar Allows more than 2 variables to be plotted for different data sets; not widely used Bubble Allows a 3rd variable to be plotted on a XY chart using the diameter of the bubble Stock Designed specifically for stock market graphs Line-Column A mixed format graph as the name indicates COLUMN TYPE GRAPHS This includes bar, cylinder, cone and pyramid charts offered by Excel. Column type graphs plot frequency (or relative frequency) on the vertical (with the exception of bar graphs, where it is the horizontal) axis, and the category value on the horizontal axis (see Figure 2.1). Sci. Info Skills Population Density 35 No. of countries 30 25 20 15 10 5 0 50 100 150 200 250 300 350 400 Popn/sq. km FIGURE 2.1 (a) Column chart Population Density 400 350 Popn/sq. km 300 250 200 150 100 50 0 5 10 15 20 25 30 35 No. countries FIGURE 2.1(b) Bar chart (same data used) CLASS EXERCISE 2.1 Does either of the above graphs illustrate the data better? Where you have multiple data sets with the same categories (such as Table 2.3), they work better when plotted together rather than separately, just as the combined table does. There are two ways of approaching this, depending on the data: • multiple column/bars per category (Figure 2.2a) • stacked columns (2.2b) - only appropriate where the data makes sense when it is added together Sci. Info Skills (a) 70 60 50 Urban Undeveloped 40 30 20 10 0 Native Introduced Not identified 100 (b) 90 80 70 60 Not identified Introduced 50 Native 40 30 20 10 0 Urban Undeveloped FIGURE 2.2 Multiple set column graphs (a) side-by-side and (b) stacked CLASS EXERCISE 2.2 Which graph in Figure 2.2 is better at portraying the data? Sci. Info Skills Figure 2.3 shows a different way of multiple-column plotting the same data as above. It is not better or worse, simply different. You need to consider what you are trying to illustrate. 70 60 50 40 Native Introduced 30 Not identified 20 10 0 Urban Undeveloped FIGURE 2.3 Alternative to 2.2(a) 120 100 Undeveloped 80 Urban 60 40 20 0 Native FIGURE 2.4 Inappropriate alternative to 2.2(b) CLASS EXERCISE 2.3 What is wrong with Figure 2.4? Introduced Not identified Sci. Info Skills FORMATTING ISSUES – Column graphs Since there are at least 5 different styles (column, bar, cone etc), think about this first Then consider whether a 3D graph will be more distracting than useful. LINE GRAPHS It is easy to mistake line and scatter charts in Excel, and imagine they are the same thing. In reality, line charts are basically for the same type of data as column type graphs, except they must not be used for category data, only tallied numerical data, as shown in Figure 2.5 (which uses the same data as the column & bar graphs above). Population Density 35 30 25 20 No. countries 15 10 5 0 <50 50-100 100-150 150-200 200-250 250-300 300-350 >400 Popn/sq. km FIGURE 2.5 Line graph showing frequencies of tallied groups (using the same data as in Figure 2.1) CLASS EXERCISE 2.4 How does the line graph compare to the column/bar graphs for this data? CLASS EXERCISE 2.5 Why is a line graph wrong for plotting category data? Sci. Info Skills FORMATTING ISSUES – Line graphs The only point of using a line graph is to put a line in, so don’t leave it as dots. Since it has to be (no choice in this) a “join-the-dots” line, it isn’t necessary to plot both the line and the data points. Make sure the line is thicker than default. But probably the best advice is to think first: should I use a category or scatter graph instead? SCATTER GRAPHS When you want to draw a graph with a line in it, this should your first port of call. To make a sensible line-based graph, both axes need a number associated with them. In other words, two measurements should have been made about that particular item, eg absorbance and concentration, pH and time. If there is only one measurement, it is not possible. The scatter graph shows the correlation between the two variables, ie whether one changes in a consistent way when the other changes. This doesn’t have to be a perfect straight line, like in calibration graphs. Excel offers the option of plotting the graph with or without the line, and with or without the points. You might think the line should always be shown, but that is not the case. Sometimes, simply the points are sufficient. If there is an obvious connection between the points, eg the measurements have been made at different times or at different distances, but on the same basic population, then a line is a reasonable way of showing that they are connected (see Figure 2.6). If the two measures have been made on totally different but related items, then a line is not appropriate, and it is best to simply show the points (see Figure 2.7). If you do include the line, the decision needs to be made about whether to show the points. If the line is a join-the-dots, it is not so important to show the points (Figure 2.6), but if it is a best-fit line, it is essential, otherwise it might give the impression that the points are perfectly in line (Figure 2.8). 340 Concentration (ppm) 335 330 325 320 315 310 305 300 1958 1960 1962 1964 1966 1968 1970 1972 FIGURE 2.6 Scatter graph showing line showing time connection of measurements 1974 Sci. Info Skills 2.3 2.1 Nitrate (mg/L) 1.9 1.7 1.5 1.3 1.1 0.9 0.7 0.5 6 6.5 7 7.5 8 8.5 9 pH FIGURE 2.7 Scatter graph with points only 1 Abs. 0.75 0.5 0.25 0 0 5 10 15 20 Conc. (mg/L) FIGURE 2.8 Scatter graph showing dots and best-fit line FORMATTING ISSUES – Scatter graphs Make sure the line is thick enough to be clearly visible. If you have more than one line on the graph, make sure that they differ in style (solid, dotted, dashed), and are not light colours, to ensure it is readable when printed. When using join-the-dots, avoid the auto-smoothed line that Excel defaults to. Sci. Info Skills PIE AND DOUGHNUT GRAPHS Pie charts can also be used for category data, where the categories are represented by segments of a circle. The size of the segments is proportional to relative frequency (or proportion) of each category (see Figure 2.9). A pie chart, therefore, displays information in a similar way to each of the stacked columns in Figure 2.2(b). Only data collected from the same population should be grouped into a pie chart, for example weights of the different types of recycled materials collected. If related measurements come from different populations – for example masses of paper in recycling from different suburbs – then a pie chart is incorrect. Not identified Introduced Native FIGURE 2.9 Typical pie chart Likewise, it would be inappropriate to leave some categories from a measurement out entirely, as that would artificially increase the importance of the others. Doughnut graphs allow two related data sets to be plotted together, but can be confusing to read, and are not recommended. Stick to stacked column graphs. FORMATTING ISSUES – Pie charts Firstly, do not use 3D pie charts under any circumstances (explained later). If you intend to print it out in monochrome, be careful to avoid adjoining segments of the same colour intensity (two dark segments next to each will merge). Placing the labels next to the segments, rather than in a box-type legend will help make identification easier. It is not necessary to include values and especially percentages in the label. If you want to emphasise a particular category, it is common practice to make that segment slightly displaced from the main pie (like a serve of pie that has been lifted out slightly) (as shown). Sci. Info Skills Assignment 2 Go to the website and download the two files - data and questions. 2.2 Bad graphs The basic intention of a graph is to pictorially display data in a sensible and meaningful way. However, it is true to say that there are many graphs in the public domain that are constructed so that they fail to show the true meaning of the data. There are two basic reasons why this happens: • poor design • intentional deception POORLY DESIGNED Apart from the various aspects described in the previous section, poor graph design can come about through: • plotting the wrong data • carelessness • over-complication • duplication of information EXAMPLE 2.1 Fortune magazine is one of the biggest magazines for those people in big business. This proves that a lot of money doesn’t automatically make for a good graph. Before your teacher gives you the answers, think about the following: (i) what information is the graph trying to provide? (ii) what problem(s) do you see with the graph? (iii) how would you fix the problem(s)? Sci. Info Skills EXERCISE 2.6 For the following graphs, all of which have been published: (i) what information is the graph(s) trying to provide? (ii) what problem(s) do you see with the following graphs? (iii) how would you fix the problem(s)? (a) (b) Note that the original was in monochrome. This one can’t be printed because of the colours. Assume you could actually read the text. Source: http://lilt.ilstu.edu/gmklass/pos138/datadisplay/chart_clutter_examples.htm Sci. Info Skills EXERCISE 2.6 (CONT’D) (c) (d) Sci. Info Skills EXERCISE 2.6 (CONT’D) (e) Sci. Info Skills EXERCISE 2.6 (CONT’D) (f) If you haven’t printed this out in colour, the lighter un-dotted line is distinguished by colour and relates to the left hand vertical axis. Sci. Info Skills EXERCISE 2.6 (CONT’D) (g) Sci. Info Skills EXERCISE 2.6 (CONT’D) (h) Another that may not print very well, relying on colour darkness variations. Sci. Info Skills DELIBERATELY DECEIVING An English prime minister of the late nineteenth century said that there were three kinds of lies: lies, damned lies and statistics. Graphs are powerful story-tellers, particularly for people in a hurry. It is quite possible for the truth to be hidden in the detail of a graph, and for the picture to distort this truth. There are various ways that this can be done: • hiding elements of the graph • distorting elements of the graph • overemphasising one element of the graph • making unfair comparisons If we were doing magic tricks on stage, this would be called “sleight of hand”, or in politics, just everyday business. Remember the key aspects of a graph is the picture, and that is what the graph liar relies on. Hiding elements If a key piece of information necessary for the viewer to understand what the data is saying is taken away, then all that is left is to look at the picture and get the story from it. The most commonly missing graph element that is removed is the title, but this is less serious than you might imagine. Why? The axes labels or the text surrounding the graph might tell you enough information anyway. Far more serious is the removal of the scale from one or both axes. EXAMPLE 2.2 The graph below shows the trend in carbon dioxide levels above Hawaii from 1958 to 1974. It will be referred to a number of times. 400 ppm CO2 370 340 310 280 1950 1960 1970 1980 1990 2000 2010 Sci. Info Skills EXAMPLE 2.2 (CONT’D) (a) Is it a problem that the graph has no title? Yes, while we can tell what is being graphed, we don’t know where the measurement have been taken from (global average?). How about the same data, plotted without a vertical scale? ppm CO2 (b) 1950 1960 1970 1980 1990 2000 2010 Yes, the picture is the same, but without anything to compare to, the graph is really meaningless. Different people looking at this would interpret it quite differently. Some would be concerned, others not. A graph should tell the same story to everyone. THE LESSON TO BE LEARNED If there is no title and nothing else tells you what it is about, ignore the graph entirely! If there is no axis scale, again ignore it! Distorting elements Distorting one of the elements of the graph leads to a false picture because something has become exaggerated. The most commonly exaggerated element is the axis scale, even when it is shown. Remember people look at the picture. Also is this category are the fancy graphical images where a picture of the actual item being measured is used in the graph. Sci. Info Skills EXAMPLE 2.3 Let’s fiddle with the vertical scale, but show it this time, just not very distinctly. (a) Now the same data plotted by someone who wants to reduce the apparent rise in CO2 levels. 600 ppm CO2 4 50 300 150 0 1950 1960 1970 1980 1990 2000 2010 Just looking at the picture (with no scale to refer to) makes it seem that CO2 levels aren’t rising very fast at all. (b) Now the same data plotted by someone who wants to make the rise in CO2 levels look very serious. 390 ppm CO2 3 70 3 50 330 3 10 1950 Now we’re worried! 1960 1970 1980 1990 2000 2010 Sci. Info Skills With line-based graphs, there are no right or wrong ranges for axes scales as long as the scale is shown. You don’t have to start at 0, but the two examples above are extreme cases and clearly intended to deceive. The first one, with some white space above and below, seems a reasonable compromise. There mightn’t be a rule for axis scales for line/scatter graphs, but there definitely is one for column type graphs: you must start the vertical axis at 0 since the size of the column is proportional to the value. Twice the value means the column must be twice as long. Not helping here is Excel’s default to the wrong scale. EXERCISE 2.7 The graph below shows the energy consumption in the USA at 5 yearly intervals from 1980. This is properly constructed. 100 trillion Btu 75 50 25 0 1980 1985 1990 1995 2000 2005 Here is the same data, plotted by someone with an interest in distorting the data. What are they trying to imply to the viewer of the graph? How have they done it? 100 trillion Btu 95 90 85 80 75 1980 1985 1990 1995 2000 2005 Sci. Info Skills The author of these notes has actually heard a professional speaker actually say that some measure had doubled from the previous year on the basis of a column chart, but a close examination of the vertical axis scale showed that it started at 40, and therefore caused the exaggeration because the previous year had a value of 42 and the current year 44! There is one very bad distortion of the axis scale that was in the list of no-no’s at the start of this section: inconsistent scaling. The scale should be consistent all the way along. For example, if 1 cm = 1 year at the left hand end, it should also be the same at the right hand end. Just because it is general graph no-no doesn’t stop people using it to deceive. EXAMPLE 2.4 Let’s go back to the “good” vertical scale (and make sure people can actually see it), but monkey around with the horizontal. What does this graph suggest? 400 ppm CO2 370 340 310 280 1958 1968 1978 1994 It looks like there has been a rapid rise in CO2 from the late 80s, but what really has happened is that the scale on the right hand end is compressed. And yes, this has been done with Excel! Anyone like to suggest how? One very popular trend of recent – not available to you on Excel – is the use of relevant pictures instead of bars or columns. Instead of a plain old coloured rectangle showing the increase in oil production, you use a picture of an oil drilling rig and make it larger as the value of oil production increases. But there is a problem. Sci. Info Skills EXAMPLE 2.5 In principle, this is an visually interesting way of presenting the growth in oil production over time. But there is something wrong with it – what? 5.8 billion 2.9 billion 1.4 billion 1975 1960 1990 This may not look as good but it is not misleading. It is better than stretching the graphic in one direction only. 1960 1975 1990 THE LESSON TO BE LEARNED Always look at the axis scales before making a judgement! Look carefully at fancy graphic column charts. Sci. Info Skills Over-emphasising one component If you want to “sell” one particular item in a graph, make it stand out more than it should. 3D pie charts are notorious for this – even if you want to be objective and even-handed you can’t – which is why you should never use them. 3D pie charts always over-emphasise the importance of the front sector because you mentally perceive the front of the pie to be part of the sector and therefore make it seem larger. EXAMPLE 2.6 Here gas looks more important than coal. Coal Oil Renewables Nuclear Gas Here it is the other way round. Nuclear Gas Oil Renewables Coal They are actually of equal value. Sci. Info Skills Line graphs where one line is much darker (more prominent) than others emphasises it, which can be used for good reasons, but also could be misused. EXERCISE 2.8 What do you think is going on here? 9 8 7 mg/L effluent 6 Zn 5 4 3 2 Pb 1 0 2001 2002 2003 2004 2005 2006 2007 THE LESSON TO BE LEARNED Look past the overly obvious and don’t miss the detail. Ignore 3D pie charts – try to find the data it came from. Unfair comparisons Any time you see two separate graphs compared side by side, be wary. Differences in numerical value will be masked because the graph will have scaled (in terms of physical size) the value differences out. This is mainly a problem with column and pie charts. When two items being compared are in the same graph, then they will be scaled according to their relative amounts. However, when they are in separate graphs, then unless the scaling is exactly the same, differences will disappear. Sci. Info Skills EXERCISE 2.9 What are the following graphs trying to tell you about energy output in the US? 7 25 6 20 Renewables 5 Coal 15 4 3 10 2 5 1 0 0 1960 1970 1980 1990 2000 1960 1970 1980 1990 2000 How should they have been plotted? EXERCISE 2.10 Now for some real graphs. For each: 1. describe what the PICTURE ONLY indicates 2. explain where the lying is going on 3. describe how the graph should have been done to be accurate Source: http://junkcharts.typepad.com/junk_charts/ (a) Here is Steve Jobs, the founder of Apple Corporation, marketing a new product (not possible to print this one). Sci. Info Skills EXERCISE 2.10 (CONT’D) (b) Some more computer-related deception. (c) Surely someone’s having a joke here? Sci. Info Skills EXERCISE 2.10 (CONT’D) (d) Crime figures from the US. The three columns per city are for the 3 different years. (e) A report on steroid use in American baseballers. Bear in mind that most teams have 3-4 times as many pitchers as any other position player. Sci. Info Skills EXERCISE 2.10 (CONT’D) (f) The effect of inflation – top graphic represents 1958, then each 5 years. (g) ODA is an acronym for Overseas Development Assistance. Sci. Info Skills Interpreting graphs There is a skill in creating a “good” graph which tells the story that you want it to, objectively and clearly. Likewise, gaining information from other people’s graph is equally important. EXERCISE 2.11 The graphs for these exercises are on separate pages following. (a) Monthly rainfall data - Newcastle 1. What type of graph is this? 2. What information is the graph providing? 3. Can you see any problems with the graph? 4. Which month has the highest average rainfall? 5. What is the rainfall for that month? 6. Which month has the lowest average rainfall? 7. What is the rainfall for that month? 8. Which month has the highest average number of raindays? 9. What is the number of days for that month? 10. Which month has the lowest average number of raindays? 11. What is the number of days for that month? Sci. Info Skills EXERCISE 2.11 (CONT’D) (b) Trend in number of hot days. 1. Give an example of a region showing an increase of 5 hot days/10 years. 2. Give an example of a region showing an increase of 2 hot days/10 years. 3. Give an example of a region showing a decrease in hot days. 4. Give FIVE important pieces of general information to be gained from this graph. (c) 1. Average annual thunder days Where in Australia has the greatest amount of thunder? 2. Where in Australia has the least amount of thunder? 3. Is there a general trend in thunder behaviour from north to south? What is it? 4. Give TWO examples of regions that are exceptions to the basic trend? (d) 1. Air temperature vs wind speed What type of graph is this? 2. Which axis belongs to windspeed? Air temperature? 3. What is the (i) lowest and (ii) highest windspeed recorded? 4. What is the (i) lowest and (ii) highest temperature recorded? 5. What conclusions can you draw from the graph? Sci. Info Skills EXERCISE 2.11 - GRAPHS (a) Sci. Info Skills EXERCISE 2.11 - GRAPHS (b) Sci. Info Skills EXERCISE 2.11 - GRAPHS (c) Sci. Info Skills EXERCISE 2.11 - GRAPHS (d) Air Temperature (deg C) vs Windspeed (knots) Data measured every 10 minutes over the last 5 days – Casey station, Antarctic Sci. Info Skills EXERCISE 2.12 Below is a “map” showing ozone concentrations in the atmosphere above an Antarctic monitoring station in 2006. The colours are important. Use it to answer the questions on the next page. Sci. Info Skills EXERCISE 2.12 (CONT’D) (a) Given the ozone layer is a zone of the atmosphere with relatively high concentrations of ozone, what is its altitude range? (b) Given the ozone hole is a loss in concentration in the layer, in which months does it occur? (c) You are required to draw a graph showing the change in ozone concentration in the ozone layer across the year. How could you do it? (d) You are required to draw a graph showing the change in ozone concentration in the atmosphere at two times during the year to demonstrate the difference between “normal” and the ozone hole period. How could you do it? (e) You are required to answer the question “Has the ozone hole decreased?”. How could you use information from this 2006 map to help answer this question? (f) What other information would you need? Sci. Info Skills EXERCISE 2.13 You are provided with graphs and tables summarising a year’s worth of pollution data for Sydney measured in three sub-regions: Central East, North West and South West. You are required to use these summaries to answer the following questions. 1. How does air pollution in Sydney vary throughout the year? 2. How does air pollution in Sydney vary geographically? Look at each and find evidence relating to the two questions above (note that you may not find evidence for both questions from an individual summary). Summary 1 Does this summary provide evidence to help answer with Q1? If so, what is it? Does this summary provide evidence to help answer with Q2? If so, what is it? Sci. Info Skills Summary 2 Does this summary provide evidence to help answer with Q1? If so, what is it? Does this summary provide evidence to help answer with Q2? If so, what is it? Sci. Info Skills Summary 3 Does this summary provide evidence to help answer with Q1? If so, what is it? Does this summary provide evidence to help answer with Q2? If so, what is it? Sci. Info Skills Summary 4 Does this summary provide evidence to help answer with Q1? If so, what is it? Does this summary provide evidence to help answer with Q2? If so, what is it? Sci. Info Skills Summary 5 - Low/Medium/High Readings Monthly By Region Low, medium and high are gradings (categories) given based on the actual numerical pollution values. Central East J F M A M J J A S O N D L 26 22 25 25 28 26 26 29 27 22 21 24 M 4 5 5 5 3 4 5 2 3 6 7 5 H 0 1 1 0 0 0 0 0 0 1 2 1 J F M A M J J A S O N D L 22 19 22 29 30 23 29 26 28 21 16 18 M 9 8 9 1 1 7 2 4 2 8 12 11 H 0 1 0 0 0 0 0 1 0 0 2 1 J F M A M J J A S O N D L 18 17 23 27 30 29 30 29 27 17 15 14 M 9 8 8 3 1 1 0 1 3 12 10 14 H 3 3 0 0 0 0 1 1 0 0 5 2 North West South West Does this summary provide evidence to help answer with Q1? If so, what is it? Does this summary provide evidence to help answer with Q2? If so, what is it? Sci. Info Skills Summary 6 - Daily Comparison of Readings Between Regions The values in these graphs are the number of days where a particular sub-region has the highest/higher value that day. ALL THREE Pairs of regions Does this summary provide evidence to help answer with Q1? If so, what is it? Does this summary provide evidence to help answer with Q2? If so, what is it? Now use this evidence to draw some conclusions. Variation across the year Variation across Sydney What does Summary 6 tell you? Assignment 3 Download the files (pdf) containing the graphs and questions from the Assignment webpage.