1 Line Graphs 1.1 Introduction

advertisement
1
Line Graphs
1.1 Introduction
Data collected from experiments can be presented in different ways
such as tables, charts or graphs. Data in the form of tables is useful
because a table can precisely represent the original data. Extracting
data trends by inspection from tables of data is however difficult and
for this purpose charts and graphs are, in practice, much more useful. The human eye is very good at spotting trends or irregularities in
a visual representation. However the problem with a graphical representation is that it is an approximate representation. Nevertheless,
depicting data in graphical form is a widely used tool for describing
data.
There are many different ways to graphically represent data, examples include bar charts, pie charts and of particular interest to this
book, line graphs. Line graphs are used to depict the relationship of
one variable against another and are by far the most common type
1
2
CHAPTER 1. LINE GRAPHS
of visual representation used in reporting scientific data. When constructing line graphs, data are plotted with respect to two axes, called
the X-axis and the Y-axis. The X-axis is usually used to depict the
independent variable, that is the variable under the control of the
experimenter. Examples of independent variables include the voltage
applied to a circuit, the amount of substance added to a reaction, or
soil types in a plant growth experiment. A special kind of independent
variable is time, we cannot easily control time except to select the time
start and time end in an experiment. Time however is a common independent variable, for example we might measure the distance an
object travels in time. What ever independent variable we choose, the
Y-axis will be used to depict the dependent variable, that is the variable under observation that depends on the independent variable. For
example, we might measure the current through an electrical circuit as
we control the input voltage or we might measure the amount of gas
emitted from a chemical reaction as a function of the starting amount
of reactant. It is possible to have multiple dependent variables plotted with respect to one independent variable or multiple dependent
variables plotted with respect to separate independent variables.
Table 1.1 shows a typical set of observations from an experiment
where the current through part of an electrical circuit is measured as
a function of the input voltage. The first column on the left (voltage) represents the independent variable and the second column on
the right (current) represents the dependent column.
Voltage (Volts)
Current (mA)
0.5
1.0
1.5
2.0
2.5
20
38
67
80
102
Table 1.1: Table showing a single independent in the left column
(Volts) and dependent variable in the right column (mA)
1.1. INTRODUCTION
3
Additional columns representing new dependent variables can be included in addition to further independent columns. For example two
voltages may be applied to a circuit and multiple currents measure,
see Table 1.2.
Voltage A
Current A
Voltage B
Current B
0.5
1.0
1.5
2.0
2.5
20
38
67
80
102
4.5
5.5
6.5
7.5
8.5
120
129
141
142
151
Table 1.2: An Example of multiple independent and dependent variables. All voltages are expressed in units of volts and all currents in
units of mA.
Most common are data that comprises one independent variable with
one or more dependent variables.
By convention, values increase left to the right on the X-axis and bottom to top for the Y-axis. Graphs will often have a zero origin but
this is not always necessary thought its absence can sometimes give a
misleading impression of the data (See later section). To the left and
below the origin the axis gradations are negative (Figure 1.2).
The graduation scale on the X and Y-axis are often linear but this need
not always be the case. In a later chapter a common alternative scale,
the logarithmic scale will be discussed. For now we will assume all
scales are linear.
1.1.1
Graph Paper
Although many graphs are today generated by computer, it is still instructive, particularly for the new student, to draw graphs manually
using traditional graph paper. Graph paper comes in a great variety
of types and can be purchased as either loose leaf or bound into note-
4
CHAPTER 1. LINE GRAPHS
200
Current A
Current B
Current (mA)
150
100
50
0
0
1.5
3
4.5
Voltage (volts)
6
Figure 1.1: Data plotted from Table 1.2 illustrating multiple independent and dependent variables.
books.
When purchasing graph paper it is important to distinguish between
quadrille paper (sometimes called quad paper) and graph paper. Quad
paper is simply a course grid in gray or light blue that extends to the
very edge of the paper. Although often called graph paper, quad paper
is not suitable for plotting data and should be avoided. Proper graph
paper is sometimes sold as engineering paper and is often printed in
light green. The grid is printed precisely, with bold lines to indicate
major divisions. The grid does not extend to the edge of the paper,
leaving a clear margin surrounding the edge. Sometimes the grid is
drawn on both sides of the page. When the grid is only drawn on one
side of the page it is often possible to draw the graph on the clear side
because the grid lines will show faintly through the paper.
Sizes vary but a suitable grid size for making graphs is the common
1.2. PARTS OF A GRAPH
5
Y
5
4
3
2
1
-5 -4 -3 -2
-1
0
1
2
3
4
5
X
-1
-2
-3
-4
-5
Figure 1.2: Graph quadrants
ten inch and seven inch grid with each inch square divided into ten
divisions. If possible avoid the course five by five grid within the inch
square. The same applies to metric graph paper which can be found
as 27 cm by 19 cm grid squares. Each major line delineates a 1 cm by
1cm square with a ten by ten 1 mm grid within each 1 cm square.
When drawing the graph manually it is sometimes more convenient
to orientate the graph paper horizontally.
Another important class of graph paper is logarithmic paper. Logarithmic axes will be discussed in more detail in a later chapter but
logarithmic graph paper comes in two types, semi-logarithmic and
full logarithmic paper. Semi-logarithmic paper is usually logarithmic
along the vertical axis while the horizontal axis is linear. Full logarithmic paper has logarithmic axes on both the horizontal and vertical
axes.
1.2 Parts of a Graph
The parts of a graph shown in Figure 1.4 are probably well known to
the reader. Here we make some stylistic comments on the different
6
CHAPTER 1. LINE GRAPHS
Figure 1.3: Typical metric engineering graph paper. Resolution is one
mm with major divisions at 1 cm intervals. The major division is often
drawn using a bolder line.
parts.
1.2.1
Main Title
The main title should clearly state the purpose of the graph. It is insufficient and redundant to state in the title that the graph represents some
dependent variable plotted against some independent variable. For
example a poor title would be ’Velocity vs. Time’. The title should
express some meaning relevant to the data, for example an improved
title might be: “The velocity of a cannon ball fired from a Mark-II
cannon”. While descriptive, a title should also be succinct. Additional information on the graph can be either placed in the main text
of the hosting document or in the graph caption. If possible, excessive detail should not be placed in the main text in order to avoid any
unnecessary distraction.
1.2. PARTS OF A GRAPH
1.2.2
7
Axes Titles
The X and Y axes labels should use complete words and include the
units. Unless it is not possible, avoid axes titles such as “y”, “t”,
instead use, “Displacement (cm), d”, or “time (secs), t”. The Y axis
title can be either orientated horizontally or vertically depending on
the size of the text and available space but the vertical orientation is
generally preferred. The axes will also include labels that indicate the
scale and should invariable be oriented horizontally next to the major
axis divisions. The X and Y axes are also called the abscissa and the
ordinate respectively.
Graph Title
25
6
Y Axis
5
Y1 Data
20
2
1
15
7
4
10
5
8
0
0
5
10
X Axis
3
15
20
Figure 1.3: This is where the caption should go. 9
Figure 1.4: Parts of a graph: 1. Main Title; 2. Y-axis Title; 3. X-axis
Title; 4. Lines Between Points; 5. Point Markers; 6. Legend; 7; Tick
Marks; 8. Origin; 9. Caption to Figure. Another useful part that isn’t
shown in the figure is a grid.
8
1.2.3
CHAPTER 1. LINE GRAPHS
Scale and Tick Marks
The scales on the axes should always be chosen so that they can be
easily read. Good choices for the major divisions include multiples of
1, 2 and 5 since these values are easy to read and will fall on the easily
read divisions (See Figure 1.4). The scale will usually be assigned a
unit and it is recommended that the appropriate standard prefix such
as kilo or micro be used instead of an exponent. For example it is best
to avoid a scale such as x10 4 because of ambiguity in the notation.
Thus if the X axis indicates length in meters, then the units 10 4 m
should be replaced if possible with m so that 5 10 4 m becomes
500 m.
Tick marks should be placed next to both the major and subdivisions.
A grid that overlays the graph can also be very useful when taking
readings off fitted lines, computing slopes from curves (see later).
Grids can involve lines along the major and minor divisions. If a
graph will be used to display general tends, as is often the case in
economic data for example, then a grid can be omitted or grid lines
on the major divisions used. In all cases, grid lines should be of a
lighter color compared to the axis, border and lines between the data
markers.
1.2.4
The Origin
An origin on a graph is generally good practice. One of the significant issues in starting an axis at a value other than zero is that is can
easily give a false impression of trends in the data. For example, Figure 1.5 shows two representations of the same data that describes the
evolution of Hydrogen from a reaction of Hydrochloric acid with Zinc
metal. The graph on the left starts the Y-axis origin at 40 and suggests
at first glance that the evolution of Hydrogen is quite rapid. However, the graph on the right shows the same data but with the Y-axis
now starting at the origin. From this perspective, the rate of evolution of hydrogen is much more modest. It is very easy to give a false
impression of a trend by adjusting the starting point on the axes and
1.2. PARTS OF A GRAPH
9
200
200
190
150
Hydrogen gas (ml)
Hydrogen gas (ml)
readers have to take great care in taking note of the origin before any
immediate conclusions are drawn.
180
170
160
0
2
4
Time (secs)
6
8
100
50
0
0
2
4
Time (secs)
6
8
Figure 1.5: Misleading effects of using non-zero origins. The graph
on the left a) has the Y-axis origin starting at 40. The steepness of the
line implies a rapid production of Hydrogen. However in graph b), the
Y-axis starts at zero and the rate of production now look much more
modest.
Although an absent zero origin can be a disadvantage, if the presence
of an origin means that the data points are grouped into one small
corner of the graph then the graphing space is being being used effectively (See Figure 1.7).
In such a situation it is possible to use graphical cues to help the reader
quickly understand the scale. One such device is shown in Figure 1.6.
Other devices include a simple break in the axis separated by two
break lines.
1.2.5
Markers
Data markers indicate the location of the individual data points. In
the example in Figure 1.4, open circles are used. Such symbols are
adequate if only general trends need to be discerned from the graph.
However, if more detailed information such as slopes or distance readings need to be measured, then empty markers with a central point to
CHAPTER 1. LINE GRAPHS
200
200
190
190
Hydrogen gas (ml)
Hydrogen gas (ml)
10
180
170
0
0
2
4
Time (secs)
6
8
180
170
0
0
2
4
Time (secs)
6
Figure 1.6: To visual different visual cues that can be used to help the
reader identify
mark the actual data is extremely useful. With computer generated
graphs this distinction is no longer so important. However, hardcopy
printouts of graph would still benefit from this.
1.2.6
Lines
There is some difference of opinion on how lines should be drawn
between markers in a line graph. One opinion is that joining markers
with straight lines should never be used. However, there are situations when this approach is useful and when it is misleading. The key
consideration is whether there is evidence to suggest a definite mathematical trend. For example, if a set of data is reasonably described by
a straight line then a straight line fit should be used. The same applies
to other possible trends such as exponential. If no reasonable trend is
known for the data, joining markers with straight lines is acceptable.
If there is a large number of data points then a scatter plot can be a
better choice and connecting points with straight lines. Compare for
example Fig 1.8 and Fig 1.9. In Fig 1.8, 200 data points are plotted as
a scatter plot and the trend in the data can be seen clearly. By comparison, Fig 1.9 shows the same plot with the data points joined by
8
1.2. PARTS OF A GRAPH
11
8
Y Axis
6
4
2
0
0
2.5
5
X Axis
7.5
10
Figure 1.7: Ineffective use of the graphing space.
straight lines. The lines joining the points add nothing to the graph
and if anything may sometimes obscure the data trend. Lines should
always be drawn first so that the data points markers appear drawn
above the line.
One of the worst things that novice students will invariably do is to
fit nth order polynomials to the data in order that a line goes through
every point. In fact there is invariably no justification that the data
should follow such a trend. The rule is: if a theory exists for the data
that suggests a particular fit, then use it, otherwise join the markers
with straight lines. There is a temptation to use bar charts in these
situations but if the variables are continuous then a line graph should
be used where possible.
Finally, the thickness of any lines joining markers or curve fits should
be at least twice as thick as the thickness of the axis lines.
1.2.7
Legend
If the graph shows more than one data set then a legend must be
shown. Usually a legend will display a segment of the line and marker,
together with a very brief description. There is some debate on whether
the legend should be placed inside or outside the graphing area. The
12
CHAPTER 1. LINE GRAPHS
A Large Number of Data Points is
Best Served by a Scatter Plot
5000
Y Axis
3750
2500
1250
0
0
50
100
150
200
X Axis
Figure 1.8: Scatter plot of 200 data points.
decision should depend on the particular graph. If there is ample
room within the graphing area then the legend can be safely located
in the graphing area. The danger is that if the graphing area is already
crowded then adding a legend to the graphing area will only confuse
the reader further and in these instances the legend can be placed outside the graph area. Figures 1.1 and 1.16 illustrate legends placed
inside the graphing area.
1.2.8
Error Bars
No measurement in the real world can ever be exact, that is any measurement will include some level of uncertainty. Such uncertainly can
be due to many factors which will be discussed in more detail in chapter 2. It is convention that any uncertainties in the data be expressed
in the form of error bars on the data markers. Error bars can express
1.3. SLOPES AND STRAIGHT LINE FITTING
13
A Large Number of Data Points
Connected by Straight Lines
5000
Y Axis
3750
2500
1250
0
0
50
100
150
200
X Axis
Figure 1.9: A graph of 200 data points where each points is connected
by a straight line.
uncertainty either in the X or Y axis directions. Often an experimentalist will assume that the uncertainty in the independent variable is
so small as be considered negligible. It should always be made absolutely clear in the figure caption what the error bars represent because
there are various ways to measure uncertainty. Errors bar could represent the range, the standard deviation or the standard error. Each
measure of uncertainty is different and it is important which measurement is employed.
1.3 Slopes and Straight Line Fitting
The Universe of full of things that change. A gas from a chemical
reaction is generated at a certain rate of volume per unit time, or a
14
CHAPTER 1. LINE GRAPHS
10
Over enthusiatic fit of the data
8
Y Axis
6
4
2
0
0
2
4
X Axis
6
8
Figure 1.10: Overenthusiastic fitting of data to a 7t h order polynomial.
Unless there is evidence, such fitting should be avoided at all costs.
car moves at a given rate of distance per unit time. As scientists we
are always interested in how fast things change. To measure a rate
of change we record the property of interest (distance, volume etc)
over a given time period. For example, if a car moves 10 miles in 20
minutes then we say that the rate of change of distance of the car is
10/20 miles per minute or 30 miles per hour.
Plotting data on a graph servers as a ideal place to measure the rate of
change of a some property. For example consider the position of a car
on a road over time. Table 1.3 shows the position of the car over a 50
minute period.
The graph shown in Fig 1.12 plots the data from Table 1.3. The graph
shows a fairly consistent trend as the car travels the distance over time.
The data shows some variation that might be attributed to road signals,
heavy traffic and other unpredictable events.
On an ideal trip with all signal lights set to green and not a single other
1.3. SLOPES AND STRAIGHT LINE FITTING
15
10
Y Axis
5
0
-5
-10
-2
0
2
X Axis
4
6
8
Figure 1.11: Two sets of data plotted with error bars. Graphs that
show errors bars must always include a statement on what the error
bars represent, for example, the range, standard deviation or standard
error.
car on the road we might expect our car to drive at a relatively constant speed. In this situation we would expect all the data points to lie
on a straight line starting at zero. Even though the actual data is variable we can get a good idea of this idea speed by drawing the “best”
straight line through the points. There are various ways to do this, two
common methods are plotting the line between two extreme slopes
or running a computer program that computes the best line based on
minimizing the distances between the data points and the line. The
computer method will be described in a later chapter. Here we will
briefly discuss a manual method for estimating the best line by plotting two lines that correspond to the steepest and shallowest lines on
the graph. Figure 1.13 shows our attempt at drawing the steepest and
shallowest lines through the data. This method is somewhat subjective but is a good first approach to estimating a best line. From the two
16
CHAPTER 1. LINE GRAPHS
Time (mins)
Distance (km)
0
10
20
30
40
50
0
3.4
11.2
14.2
21.3
23.9
Table 1.3: Distance traveled by a car as a function of time.
slopes we draw a mid line between the slopes, this mid line is deemed
the best fit.
From the best straight line we can now compute the rate of change of
distance over time, that is the velocity. Figure 1.14 shows the same
graph but with the steepest and shallowest lines removed. The slope
of any line is given by the distance traversed vertically divided by the
distance traversed horizontally. That is:
Slope D
x
y2
D
y
x2
y1
x1
The points can be directly read off the graph and the slope computed.
For example, from the graph we can record the x and y values the
correspond to the dotted lines on the graph. That is: x1 D 20:25I x2 D
36I y1 D 10 and y2 D 18. Inserting these in to the slope formula
yields:
slope D
18 D 10
D 0:51 km min
36 20:25
1
1.3. SLOPES AND STRAIGHT LINE FITTING
17
Distance traveled by Car
Distance (km)
30
20
10
0
0
15
30
45
60
Time (mins)
Figure 1.12: Data plotted from Table 1.3.
Distance traveled by Car
Distance (km)
30
20
y 2 = 18
∆y
y 1 = 10
10
∆x
x 1 = 20.25
0
0
15
x 2 = 36
30
45
60
Time (mins)
Figure 1.14: Data plotted from Table 1.3 with straight line through
points. The slope of the line is given by y=x.
18
CHAPTER 1. LINE GRAPHS
-1
30
Best Slope from mid line: 0.51 km min
Steepest Slope
Distance (km)
22.5
Shallowest Slope
15
7.5
Mid Line Best Slope
0
15
30
Time (mins)
45
60
Figure 1.13: Data plotted from Table 1.3 illustrating estimated steepest and shallowest slopes
1.4 Poor Layout
The following figures give examples of poor layout. Figure 1.17
shows some typical errors made in drawing a graph. The first layout issue is the title (1) which has three issues. The first is that title
font is too small. Secondly, the title itself simply states what can already be sees from the axes titles, the title should be more descriptive.
Finally, even if the text of the title were appropriate, the title is in fact
wrong. It describe a graph of t vs. s rather than s vs. t.
The axes titles (2) are also to small and are largely non-descriptive.
In addition, no units are given in the axes titles. The divisions on
the x axis (3) are completely inappropriate. To begin with there are
too many major divisions and secondly the divisions should be on
even values rather than fractional as they are in the graph. Finally the
marker used to indicate the data points are much too small.
1.4. POOR LAYOUT
30
t vs. s
19
1
s
22.5
2
15
4
7.5
0
0 6.25 12.5 18.75 25 31.25 37.5 43.75 50
t 2
3
Figure 1.15: Poorly laid out graph showing (1) poor choice of main
title and axes titles (2); inappropriate x axis divisions (3) and the data
markers are much too small (4). See text for details.
Figure 1.16 shows the same graph in Figure 1.17 but with significant
improvements. The title is now bigger and more descriptive. The axes
titles are also much more descriptive and also give the units. The x
axis major divisions are not much more reasonable and readable. The
marker for the data points has been enlarged but also a single point has
been placed in the center of each marker to indicate the actual location
of the coordinate. Finally to make is easier to read of information on
the graph, a grid has been added with major and minor divisions.
Figure 1.18 shows the same graph in Figure ?? but with significant
improvements. In particular the y axis title is now present, the x
range has been reduced to make better use of the graphing space and
the line and marker styles have been changed to allow the two data
sets to be distinguished. In addition, a legend has been added to make
the discrimination easier.
The one issue that hasn’t been addressed relates to the line styles,
20
CHAPTER 1. LINE GRAPHS
Distance (km)
30
Distance traveled by object in time
20
10
0
0
20
40
60
Time (mins)
Figure 1.16: The same graph shown in Fig 1.17 but with significant
improvements to titles axis divisions, data markers and a grid overlay
to assist in reading data of the graph.
in particular the use of color. Traditionally, color has tended to be
avoided because there is no guarantee that the graph will be displayed
or printed in color thereby making it difficult to distinguish different
data sets. If it is known that the graph will be displayed by either
a color devices, such as a computer monitor or in a full color book,
then color is appropriate although some individuals are color blind
therefore the use of red/green combinations can be problematic. If it is
likely that the graph will be viewed in black and white then the marker
symbols can be used to distinguish the data types, or line thickness or
even whether the lines are dotted or solid. In either case, care should
be taken when using color, it is so easy to assume everyone has the
capacity to see color when the opposite may be true.
1.4. POOR LAYOUT
21
Distance traveled by object in time
40
4
30
1
2
20
10
0
0
40
80
3
120
Time (mins)
Figure 1.17: Poorly laid out graph showing (1) two data sets using the
same line style and marker styles, therefore data sets cannot be distinguished; missing y axis title (2); inappropriate use of space, the x axis
range extends far to much to the right; (4) missing legend, however a
legend wouldn’t help much here because of the inappropriate use of
line and marker styles. See text for details.
Distance traveled by object in time
40
Distance (km)
30
20
10
Distance Car A
Distance Car B
0
0
20
40
60
Time (mins)
Figure 1.18: The same graph shown in Fig 1.17 but with significant
improvements to title axis divisions, data markers and a grid overlay
Download