1 Line Graphs 1.1 Introduction Data collected from experiments can be presented in different ways such as tables, charts or graphs. Data in the form of tables is useful because a table can precisely represent the original data. Extracting data trends by inspection from tables of data is however difficult and for this purpose charts and graphs are, in practice, much more useful. The human eye is very good at spotting trends or irregularities in a visual representation. However the problem with a graphical representation is that it is an approximate representation. Nevertheless, depicting data in graphical form is a widely used tool for describing data. There are many different ways to graphically represent data, examples include bar charts, pie charts and of particular interest to this book, line graphs. Line graphs are used to depict the relationship of one variable against another and are by far the most common type 1 2 CHAPTER 1. LINE GRAPHS of visual representation used in reporting scientific data. When constructing line graphs, data are plotted with respect to two axes, called the X-axis and the Y-axis. The X-axis is usually used to depict the independent variable, that is the variable under the control of the experimenter. Examples of independent variables include the voltage applied to a circuit, the amount of substance added to a reaction, or soil types in a plant growth experiment. A special kind of independent variable is time, we cannot easily control time except to select the time start and time end in an experiment. Time however is a common independent variable, for example we might measure the distance an object travels in time. What ever independent variable we choose, the Y-axis will be used to depict the dependent variable, that is the variable under observation that depends on the independent variable. For example, we might measure the current through an electrical circuit as we control the input voltage or we might measure the amount of gas emitted from a chemical reaction as a function of the starting amount of reactant. It is possible to have multiple dependent variables plotted with respect to one independent variable or multiple dependent variables plotted with respect to separate independent variables. Table 1.1 shows a typical set of observations from an experiment where the current through part of an electrical circuit is measured as a function of the input voltage. The first column on the left (voltage) represents the independent variable and the second column on the right (current) represents the dependent column. Voltage (Volts) Current (mA) 0.5 1.0 1.5 2.0 2.5 20 38 67 80 102 Table 1.1: Table showing a single independent in the left column (Volts) and dependent variable in the right column (mA) 1.1. INTRODUCTION 3 Additional columns representing new dependent variables can be included in addition to further independent columns. For example two voltages may be applied to a circuit and multiple currents measure, see Table 1.2. Voltage A Current A Voltage B Current B 0.5 1.0 1.5 2.0 2.5 20 38 67 80 102 4.5 5.5 6.5 7.5 8.5 120 129 141 142 151 Table 1.2: An Example of multiple independent and dependent variables. All voltages are expressed in units of volts and all currents in units of mA. Most common are data that comprises one independent variable with one or more dependent variables. By convention, values increase left to the right on the X-axis and bottom to top for the Y-axis. Graphs will often have a zero origin but this is not always necessary thought its absence can sometimes give a misleading impression of the data (See later section). To the left and below the origin the axis gradations are negative (Figure 1.2). The graduation scale on the X and Y-axis are often linear but this need not always be the case. In a later chapter a common alternative scale, the logarithmic scale will be discussed. For now we will assume all scales are linear. 1.1.1 Graph Paper Although many graphs are today generated by computer, it is still instructive, particularly for the new student, to draw graphs manually using traditional graph paper. Graph paper comes in a great variety of types and can be purchased as either loose leaf or bound into note- 4 CHAPTER 1. LINE GRAPHS 200 Current A Current B Current (mA) 150 100 50 0 0 1.5 3 4.5 Voltage (volts) 6 Figure 1.1: Data plotted from Table 1.2 illustrating multiple independent and dependent variables. books. When purchasing graph paper it is important to distinguish between quadrille paper (sometimes called quad paper) and graph paper. Quad paper is simply a course grid in gray or light blue that extends to the very edge of the paper. Although often called graph paper, quad paper is not suitable for plotting data and should be avoided. Proper graph paper is sometimes sold as engineering paper and is often printed in light green. The grid is printed precisely, with bold lines to indicate major divisions. The grid does not extend to the edge of the paper, leaving a clear margin surrounding the edge. Sometimes the grid is drawn on both sides of the page. When the grid is only drawn on one side of the page it is often possible to draw the graph on the clear side because the grid lines will show faintly through the paper. Sizes vary but a suitable grid size for making graphs is the common 1.2. PARTS OF A GRAPH 5 Y 5 4 3 2 1 -5 -4 -3 -2 -1 0 1 2 3 4 5 X -1 -2 -3 -4 -5 Figure 1.2: Graph quadrants ten inch and seven inch grid with each inch square divided into ten divisions. If possible avoid the course five by five grid within the inch square. The same applies to metric graph paper which can be found as 27 cm by 19 cm grid squares. Each major line delineates a 1 cm by 1cm square with a ten by ten 1 mm grid within each 1 cm square. When drawing the graph manually it is sometimes more convenient to orientate the graph paper horizontally. Another important class of graph paper is logarithmic paper. Logarithmic axes will be discussed in more detail in a later chapter but logarithmic graph paper comes in two types, semi-logarithmic and full logarithmic paper. Semi-logarithmic paper is usually logarithmic along the vertical axis while the horizontal axis is linear. Full logarithmic paper has logarithmic axes on both the horizontal and vertical axes. 1.2 Parts of a Graph The parts of a graph shown in Figure 1.4 are probably well known to the reader. Here we make some stylistic comments on the different 6 CHAPTER 1. LINE GRAPHS Figure 1.3: Typical metric engineering graph paper. Resolution is one mm with major divisions at 1 cm intervals. The major division is often drawn using a bolder line. parts. 1.2.1 Main Title The main title should clearly state the purpose of the graph. It is insufficient and redundant to state in the title that the graph represents some dependent variable plotted against some independent variable. For example a poor title would be ’Velocity vs. Time’. The title should express some meaning relevant to the data, for example an improved title might be: “The velocity of a cannon ball fired from a Mark-II cannon”. While descriptive, a title should also be succinct. Additional information on the graph can be either placed in the main text of the hosting document or in the graph caption. If possible, excessive detail should not be placed in the main text in order to avoid any unnecessary distraction. 1.2. PARTS OF A GRAPH 1.2.2 7 Axes Titles The X and Y axes labels should use complete words and include the units. Unless it is not possible, avoid axes titles such as “y”, “t”, instead use, “Displacement (cm), d”, or “time (secs), t”. The Y axis title can be either orientated horizontally or vertically depending on the size of the text and available space but the vertical orientation is generally preferred. The axes will also include labels that indicate the scale and should invariable be oriented horizontally next to the major axis divisions. The X and Y axes are also called the abscissa and the ordinate respectively. Graph Title 25 6 Y Axis 5 Y1 Data 20 2 1 15 7 4 10 5 8 0 0 5 10 X Axis 3 15 20 Figure 1.3: This is where the caption should go. 9 Figure 1.4: Parts of a graph: 1. Main Title; 2. Y-axis Title; 3. X-axis Title; 4. Lines Between Points; 5. Point Markers; 6. Legend; 7; Tick Marks; 8. Origin; 9. Caption to Figure. Another useful part that isn’t shown in the figure is a grid. 8 1.2.3 CHAPTER 1. LINE GRAPHS Scale and Tick Marks The scales on the axes should always be chosen so that they can be easily read. Good choices for the major divisions include multiples of 1, 2 and 5 since these values are easy to read and will fall on the easily read divisions (See Figure 1.4). The scale will usually be assigned a unit and it is recommended that the appropriate standard prefix such as kilo or micro be used instead of an exponent. For example it is best to avoid a scale such as x10 4 because of ambiguity in the notation. Thus if the X axis indicates length in meters, then the units 10 4 m should be replaced if possible with m so that 5 10 4 m becomes 500 m. Tick marks should be placed next to both the major and subdivisions. A grid that overlays the graph can also be very useful when taking readings off fitted lines, computing slopes from curves (see later). Grids can involve lines along the major and minor divisions. If a graph will be used to display general tends, as is often the case in economic data for example, then a grid can be omitted or grid lines on the major divisions used. In all cases, grid lines should be of a lighter color compared to the axis, border and lines between the data markers. 1.2.4 The Origin An origin on a graph is generally good practice. One of the significant issues in starting an axis at a value other than zero is that is can easily give a false impression of trends in the data. For example, Figure 1.5 shows two representations of the same data that describes the evolution of Hydrogen from a reaction of Hydrochloric acid with Zinc metal. The graph on the left starts the Y-axis origin at 40 and suggests at first glance that the evolution of Hydrogen is quite rapid. However, the graph on the right shows the same data but with the Y-axis now starting at the origin. From this perspective, the rate of evolution of hydrogen is much more modest. It is very easy to give a false impression of a trend by adjusting the starting point on the axes and 1.2. PARTS OF A GRAPH 9 200 200 190 150 Hydrogen gas (ml) Hydrogen gas (ml) readers have to take great care in taking note of the origin before any immediate conclusions are drawn. 180 170 160 0 2 4 Time (secs) 6 8 100 50 0 0 2 4 Time (secs) 6 8 Figure 1.5: Misleading effects of using non-zero origins. The graph on the left a) has the Y-axis origin starting at 40. The steepness of the line implies a rapid production of Hydrogen. However in graph b), the Y-axis starts at zero and the rate of production now look much more modest. Although an absent zero origin can be a disadvantage, if the presence of an origin means that the data points are grouped into one small corner of the graph then the graphing space is being being used effectively (See Figure 1.7). In such a situation it is possible to use graphical cues to help the reader quickly understand the scale. One such device is shown in Figure 1.6. Other devices include a simple break in the axis separated by two break lines. 1.2.5 Markers Data markers indicate the location of the individual data points. In the example in Figure 1.4, open circles are used. Such symbols are adequate if only general trends need to be discerned from the graph. However, if more detailed information such as slopes or distance readings need to be measured, then empty markers with a central point to CHAPTER 1. LINE GRAPHS 200 200 190 190 Hydrogen gas (ml) Hydrogen gas (ml) 10 180 170 0 0 2 4 Time (secs) 6 8 180 170 0 0 2 4 Time (secs) 6 Figure 1.6: To visual different visual cues that can be used to help the reader identify mark the actual data is extremely useful. With computer generated graphs this distinction is no longer so important. However, hardcopy printouts of graph would still benefit from this. 1.2.6 Lines There is some difference of opinion on how lines should be drawn between markers in a line graph. One opinion is that joining markers with straight lines should never be used. However, there are situations when this approach is useful and when it is misleading. The key consideration is whether there is evidence to suggest a definite mathematical trend. For example, if a set of data is reasonably described by a straight line then a straight line fit should be used. The same applies to other possible trends such as exponential. If no reasonable trend is known for the data, joining markers with straight lines is acceptable. If there is a large number of data points then a scatter plot can be a better choice and connecting points with straight lines. Compare for example Fig 1.8 and Fig 1.9. In Fig 1.8, 200 data points are plotted as a scatter plot and the trend in the data can be seen clearly. By comparison, Fig 1.9 shows the same plot with the data points joined by 8 1.2. PARTS OF A GRAPH 11 8 Y Axis 6 4 2 0 0 2.5 5 X Axis 7.5 10 Figure 1.7: Ineffective use of the graphing space. straight lines. The lines joining the points add nothing to the graph and if anything may sometimes obscure the data trend. Lines should always be drawn first so that the data points markers appear drawn above the line. One of the worst things that novice students will invariably do is to fit nth order polynomials to the data in order that a line goes through every point. In fact there is invariably no justification that the data should follow such a trend. The rule is: if a theory exists for the data that suggests a particular fit, then use it, otherwise join the markers with straight lines. There is a temptation to use bar charts in these situations but if the variables are continuous then a line graph should be used where possible. Finally, the thickness of any lines joining markers or curve fits should be at least twice as thick as the thickness of the axis lines. 1.2.7 Legend If the graph shows more than one data set then a legend must be shown. Usually a legend will display a segment of the line and marker, together with a very brief description. There is some debate on whether the legend should be placed inside or outside the graphing area. The 12 CHAPTER 1. LINE GRAPHS A Large Number of Data Points is Best Served by a Scatter Plot 5000 Y Axis 3750 2500 1250 0 0 50 100 150 200 X Axis Figure 1.8: Scatter plot of 200 data points. decision should depend on the particular graph. If there is ample room within the graphing area then the legend can be safely located in the graphing area. The danger is that if the graphing area is already crowded then adding a legend to the graphing area will only confuse the reader further and in these instances the legend can be placed outside the graph area. Figures 1.1 and 1.16 illustrate legends placed inside the graphing area. 1.2.8 Error Bars No measurement in the real world can ever be exact, that is any measurement will include some level of uncertainty. Such uncertainly can be due to many factors which will be discussed in more detail in chapter 2. It is convention that any uncertainties in the data be expressed in the form of error bars on the data markers. Error bars can express 1.3. SLOPES AND STRAIGHT LINE FITTING 13 A Large Number of Data Points Connected by Straight Lines 5000 Y Axis 3750 2500 1250 0 0 50 100 150 200 X Axis Figure 1.9: A graph of 200 data points where each points is connected by a straight line. uncertainty either in the X or Y axis directions. Often an experimentalist will assume that the uncertainty in the independent variable is so small as be considered negligible. It should always be made absolutely clear in the figure caption what the error bars represent because there are various ways to measure uncertainty. Errors bar could represent the range, the standard deviation or the standard error. Each measure of uncertainty is different and it is important which measurement is employed. 1.3 Slopes and Straight Line Fitting The Universe of full of things that change. A gas from a chemical reaction is generated at a certain rate of volume per unit time, or a 14 CHAPTER 1. LINE GRAPHS 10 Over enthusiatic fit of the data 8 Y Axis 6 4 2 0 0 2 4 X Axis 6 8 Figure 1.10: Overenthusiastic fitting of data to a 7t h order polynomial. Unless there is evidence, such fitting should be avoided at all costs. car moves at a given rate of distance per unit time. As scientists we are always interested in how fast things change. To measure a rate of change we record the property of interest (distance, volume etc) over a given time period. For example, if a car moves 10 miles in 20 minutes then we say that the rate of change of distance of the car is 10/20 miles per minute or 30 miles per hour. Plotting data on a graph servers as a ideal place to measure the rate of change of a some property. For example consider the position of a car on a road over time. Table 1.3 shows the position of the car over a 50 minute period. The graph shown in Fig 1.12 plots the data from Table 1.3. The graph shows a fairly consistent trend as the car travels the distance over time. The data shows some variation that might be attributed to road signals, heavy traffic and other unpredictable events. On an ideal trip with all signal lights set to green and not a single other 1.3. SLOPES AND STRAIGHT LINE FITTING 15 10 Y Axis 5 0 -5 -10 -2 0 2 X Axis 4 6 8 Figure 1.11: Two sets of data plotted with error bars. Graphs that show errors bars must always include a statement on what the error bars represent, for example, the range, standard deviation or standard error. car on the road we might expect our car to drive at a relatively constant speed. In this situation we would expect all the data points to lie on a straight line starting at zero. Even though the actual data is variable we can get a good idea of this idea speed by drawing the “best” straight line through the points. There are various ways to do this, two common methods are plotting the line between two extreme slopes or running a computer program that computes the best line based on minimizing the distances between the data points and the line. The computer method will be described in a later chapter. Here we will briefly discuss a manual method for estimating the best line by plotting two lines that correspond to the steepest and shallowest lines on the graph. Figure 1.13 shows our attempt at drawing the steepest and shallowest lines through the data. This method is somewhat subjective but is a good first approach to estimating a best line. From the two 16 CHAPTER 1. LINE GRAPHS Time (mins) Distance (km) 0 10 20 30 40 50 0 3.4 11.2 14.2 21.3 23.9 Table 1.3: Distance traveled by a car as a function of time. slopes we draw a mid line between the slopes, this mid line is deemed the best fit. From the best straight line we can now compute the rate of change of distance over time, that is the velocity. Figure 1.14 shows the same graph but with the steepest and shallowest lines removed. The slope of any line is given by the distance traversed vertically divided by the distance traversed horizontally. That is: Slope D x y2 D y x2 y1 x1 The points can be directly read off the graph and the slope computed. For example, from the graph we can record the x and y values the correspond to the dotted lines on the graph. That is: x1 D 20:25I x2 D 36I y1 D 10 and y2 D 18. Inserting these in to the slope formula yields: slope D 18 D 10 D 0:51 km min 36 20:25 1 1.3. SLOPES AND STRAIGHT LINE FITTING 17 Distance traveled by Car Distance (km) 30 20 10 0 0 15 30 45 60 Time (mins) Figure 1.12: Data plotted from Table 1.3. Distance traveled by Car Distance (km) 30 20 y 2 = 18 ∆y y 1 = 10 10 ∆x x 1 = 20.25 0 0 15 x 2 = 36 30 45 60 Time (mins) Figure 1.14: Data plotted from Table 1.3 with straight line through points. The slope of the line is given by y=x. 18 CHAPTER 1. LINE GRAPHS -1 30 Best Slope from mid line: 0.51 km min Steepest Slope Distance (km) 22.5 Shallowest Slope 15 7.5 Mid Line Best Slope 0 15 30 Time (mins) 45 60 Figure 1.13: Data plotted from Table 1.3 illustrating estimated steepest and shallowest slopes 1.4 Poor Layout The following figures give examples of poor layout. Figure 1.17 shows some typical errors made in drawing a graph. The first layout issue is the title (1) which has three issues. The first is that title font is too small. Secondly, the title itself simply states what can already be sees from the axes titles, the title should be more descriptive. Finally, even if the text of the title were appropriate, the title is in fact wrong. It describe a graph of t vs. s rather than s vs. t. The axes titles (2) are also to small and are largely non-descriptive. In addition, no units are given in the axes titles. The divisions on the x axis (3) are completely inappropriate. To begin with there are too many major divisions and secondly the divisions should be on even values rather than fractional as they are in the graph. Finally the marker used to indicate the data points are much too small. 1.4. POOR LAYOUT 30 t vs. s 19 1 s 22.5 2 15 4 7.5 0 0 6.25 12.5 18.75 25 31.25 37.5 43.75 50 t 2 3 Figure 1.15: Poorly laid out graph showing (1) poor choice of main title and axes titles (2); inappropriate x axis divisions (3) and the data markers are much too small (4). See text for details. Figure 1.16 shows the same graph in Figure 1.17 but with significant improvements. The title is now bigger and more descriptive. The axes titles are also much more descriptive and also give the units. The x axis major divisions are not much more reasonable and readable. The marker for the data points has been enlarged but also a single point has been placed in the center of each marker to indicate the actual location of the coordinate. Finally to make is easier to read of information on the graph, a grid has been added with major and minor divisions. Figure 1.18 shows the same graph in Figure ?? but with significant improvements. In particular the y axis title is now present, the x range has been reduced to make better use of the graphing space and the line and marker styles have been changed to allow the two data sets to be distinguished. In addition, a legend has been added to make the discrimination easier. The one issue that hasn’t been addressed relates to the line styles, 20 CHAPTER 1. LINE GRAPHS Distance (km) 30 Distance traveled by object in time 20 10 0 0 20 40 60 Time (mins) Figure 1.16: The same graph shown in Fig 1.17 but with significant improvements to titles axis divisions, data markers and a grid overlay to assist in reading data of the graph. in particular the use of color. Traditionally, color has tended to be avoided because there is no guarantee that the graph will be displayed or printed in color thereby making it difficult to distinguish different data sets. If it is known that the graph will be displayed by either a color devices, such as a computer monitor or in a full color book, then color is appropriate although some individuals are color blind therefore the use of red/green combinations can be problematic. If it is likely that the graph will be viewed in black and white then the marker symbols can be used to distinguish the data types, or line thickness or even whether the lines are dotted or solid. In either case, care should be taken when using color, it is so easy to assume everyone has the capacity to see color when the opposite may be true. 1.4. POOR LAYOUT 21 Distance traveled by object in time 40 4 30 1 2 20 10 0 0 40 80 3 120 Time (mins) Figure 1.17: Poorly laid out graph showing (1) two data sets using the same line style and marker styles, therefore data sets cannot be distinguished; missing y axis title (2); inappropriate use of space, the x axis range extends far to much to the right; (4) missing legend, however a legend wouldn’t help much here because of the inappropriate use of line and marker styles. See text for details. Distance traveled by object in time 40 Distance (km) 30 20 10 Distance Car A Distance Car B 0 0 20 40 60 Time (mins) Figure 1.18: The same graph shown in Fig 1.17 but with significant improvements to title axis divisions, data markers and a grid overlay