1

advertisement
1
The “appendix” is really these grey boxes throughout this document; skip to them if you are already
comfortable with this subject. Otherwise, you might want to read around them too; for example, you might
want to go through the examples on rounding rules or propagating errors if your math is a bit rusty 
Precision and Accuracy……………………………………………………4
Random Errors………………………………………………………………..6
Averages…………………………………………………………………………8
1D and 2D Averages………………………………………………………..8-9
Linear correlation coefficient, R2…………………………………….10
1D and 2D Standard Errors………………………………………………14-15
Reliability ………………………….…………………………………………..16
Formulas for Propagating Errors………………………………………23
Propagated Standard Errors of a 1D and 2D averages……...27
Theoretical Convergences…………………………………………………28
Fractional Errors……………………………………………………………….36
Rounding Rules…………………………………………………………………40
Systematic Errors………………………………………………………………44
Comparing Precision with Accuracy………………………………….47
If you know the Way broadly, you will see it in everything.
-Miyamoto Musashi, Go Rin No Sho
“
Appendix” on Measurement and Error Analysis
Introduction
A political scientist walks into a bar. Today is voting day and a news station hired the scientist to make
an early determination as to just who will become the next President of the United States! After polling
a patron, the first measurement yields the following result: “Ah yes, after carefully analyzing the
candidates, I have decided to vote for Mickey Mouse.” After that, the political scientist signals the
camera crew, is put on live TV, and announces to the public, “I have measured the voting and
determined that Mickey Mouse will be the next president of the United States. I would like to
congratulate the Mouse Campaign for providing us all with a shocking upset victory. Thank you.”
Feels wrong doesn’t it? Somewhere in your intuition you understand that such a report is totally
unreliable. Even if you can’t quite put your finger on just why yet, you know the following needs to
happen to increase reliability. More than one person needs to be polled and the best estimate to the
actual winner will be an average. And this average’s reliability increases as more and more people are
polled.
A tourist walks into a bar and asks the bartender, “How long does it take to get to the Tsukiji Fish
Market?” The bartender replies, “Well if you go by car, there a lot of things that could make the trip
2
faster or slower, such as traffic and red lights, so I would say it will take you about an hour, give or take a
half an hour.” The bartender pauses for a moment, then continues, “If you’re trying to better plan your
day, you might want to take the subway. It might take a bit longer, but the trains tend to run on time
and not linger at stations; so that should take you about 70 minutes, give or take 10 minutes.”
Feels fine doesn’t it? Somewhere in your intuition you understand these two things, even if you might
not yet describe them quite like this. The second determination is more precise than the first. And the
first answer’s lack of precision is related to the increased level of randomness with things like traffic and
red lights.
Boozer Biff starts chastising the bartender! “Hey! How can you say ‘give or take a half an hour’?
There’s just soooooo much that can go wrong, you know? Like, you know, like, what if this tourist
happens to get a police escort? Or what if there’s a horrible accident causing massive detours? ‘Give or
take a half an hour’ seems way too conservative now doesn’t it!”
Biff pulls down another swig and continues, “Or, hey, like, wait a second, what if a, like, big-time
massive, huge helicopter picks up the car and then drops it off at the Fish Market? Or what if a circus
elephant decides to sit on this tourist’s car for hours?”
Biff nearly drops his mug when saying his next revelations! “Oh, whoa, wait a second. What if, like, you
know, like, this tourist’s car is, like, that one from that movie and it can travel back in time?! Or, hey,
you know we are not alone right? I’ve heard of alien’s grabbing whole cars from time to time!”
Proudly, Biff concludes, “What you really should say is ‘give or take infinity’ duuuuuuude!”
Feels silly doesn’t it? Somewhere in your intuition you understand that making a cutoff in judging what
to include in a “give or take” estimation does not necessarily make the bartender’s estimation
unreliable. The bartender tacitly understood that there are just some random events that are too rare
to include and still have a useful answer.
The Wandering Wise One approaches the tourist and confidently states, “Get on this bus right outside.
Its final stop is the Tsukiji Fish Market and it will take you about 68.2 minutes give or take 1.7 minutes to
get there.”
Feels weird doesn’t it? Somewhere in your intuition you understand just how unusual giving such a
precise answer would be, especially for a bus’s final stop! The Wandering Wise One must travel a lot!
At the bus’s final stop, the tourist indeed notes that the trip took 67.4 minutes! However, alas, the
tourist then discovers the final stop is the airport. The Tsukiji Fish Market is nowhere in sight!
Feels cruel doesn’t it! Somewhere in your intuition you know there are other things that can go wrong
that are not part of the randomness in making the journey itself. The advice was very precise, true, but
it was not accurate. Apparently, The Wondering Wise One enjoys messing with tourists!
3
Try to never let go of your intuition. For most students, unfortunately, this is easier said than done.
Soon you will be presented with all sorts of mathematics—such as summations, square roots,
inequalities, and absolute values—and sometimes students allow themselves to forget just what they
represent. What we are attempting here is challenging, true, but it is worthwhile. We are attempting to
turn intuition into science.
Measurements
Making measurements and properly interpreting their relationships are essential skills in most academic
fields. From physics to psychology, from carpentry to dentistry, from economics to education, all rely on
quantifying properties, combining various quantities statistically, and determining precision and
accuracy. Measurements fuel experimental science, and measurements provide the means for
understanding, exploring, and modifying theoretical science.
A measurement quantifies something—such as length, mass, and time—by comparing it to preset units
that are widely accepted by the scientific community—such as meters, kilograms, and seconds. We live
in a privileged time where we readily have a large variety of measurement devices with various accepted
units carefully marked and calibrated for us—such as rulers, scales, and watches.
Every culture has independently developed, and continues to develop, new methods and new units in
attempting to quantify various parameters. The examples are numerous and varied: attending a class at
a certain time that runs a certain period of time, buying shoes of a certain size with currency mutually
agreed to represent a certain value, buying milk and soda at certain volumes, turning on lights at a
certain wattage, using an outlet’s electricity set at a certain frequency and stepped down to a certain
voltage by nearby transformers set at a certain inductance, judging someone’s influence by a certain
level of fiefdom’s favor, and assessing someone’s popularity by how many Facebook friends or Twitter
followers they have! Measurements and units are everywhere!
Measurements can be nicely discrete, such as the flip of a coin or the roll of a die. Generally though,
measurements fall into a continuum of values. No matter how precise your ruler is, whether the
smallest divisions are meters, centimeters, millimeters, and so on, there will theoretically always be an
infinite number of possible measurements between any two divisions on a ruler. There will always be
some point where uncertainly begins. Often times a measurement’s uncertainly begins when the
precision of the measuring device used reaches its limits. This guess at the measurement’s final digit is
either made by you in analog measurements, like judging where in-between two marked divisions you
believe the measurement lies, or this is made by a machine in digital measurements.
Measurements themselves can be measured! They are normally judged as having two tendencies:
precision and accuracy. Understanding, quantifying, and comparing these two tendencies are the main
goals of error analysis.
4
Precision is how deeply, how closely, and how reliably we determine a quantity. Precision is how
far or how close subsequent determinations tend to be from each other; it is affected by the
randomness of four broad categories: the measurement device, the measurer, the measured, and
the measurement’s environment. Precision is quantified with the standard error.
Accuracy is determined by seeing just how close our experimental determination of some
quantity compares to what is considered that quantity’s accepted, true, value. Since an accepted,
true, value might be unknown, testing accuracy will not always be possible. When it is possible,
analyzing this tendency can reveal systematic, non-random, effects that might have thrown off
our determination that are not properly accounted for in the standard error.
Our initial focus will be on precision.
Consider analog (not digital) measurements, such as using a ruler to measure the length of some widget.
What do we know for certain? Well, we know the length is after the 2 cm division, so L=2 cm so far.
Then we can see it falls after the 4th marked division, L=2.4 cm so far. Now is where the uncertainty
begins, we must guess as best we can as to where in-between the divisions the length lies. L=2.46 cm?
L=2.47 cm? L=2.48 cm? With what we can see, reporting more precision by carrying any more digits
beyond this guess is misleading. In other words, without some way to improve our precision, like using
a Vernier caliper and/or a magnifying glass, there is no way we could legitimately claim something like
L=2.472 cm.
So when realizing this, if the best our eyes can estimate is L=2.47cm, could we tell the difference
between 2.4699 or 2.4701 cm? 2.4698 or 2.4702 cm? 2.4697 or 2.4703 cm? Such randomness is
beyond our ability to detect. How about 2.469 or 2.471 cm? 2.468 or 2.472 cm? 2.467 or 2.473 cm?
Again, even this level of randomness is undetectable unless we find some way to improve our precision.
How about 2.46 or 2.48 cm? 2.45 or 2.49 cm? 2.44 or 2.50 cm? Ah, now we are in the range where we
would hope to be able to detect such randomness. We can see how the limitations of a measurement
device can affect precision; yet for such a measurement like this, precision is also personal. For
example, can we reliably tell the difference between 2.47±0.02cm? Some can and some cannot. The
5
key to reliability is in repeatability. The ideal way to quantify your precision will be based on how your
measurements tend to deviate from each other.
There has also been an unspoken assumption about this measurement. The reality of such a
measurement is that there are two measurements, for we must properly zero it first. Consider this
situation.
This is very undesirable for a variety of reasons. For the moment, consider what the previous
conversation would be like if we did not catch that this was not properly zero’ed. Suppose we read this
as L=2.52 cm, then our discussion of randomness would be the same but shifted by this kind of nonrandom error. That last range of potential randomness would look like this: “How about 2.51 or 2.53
cm? 2.50 or 2.54 cm? 2.49 or 2.55 cm?... For example, can we reliably tell the difference between
2.52±0.02cm?” In just our discussion of trying to quantify our potential randomness, we would not
catch this error. If we somehow knew the accepted, true, value for the length of this widget, perhaps
from the box the widget came in, then we might catch this shift in comparing what we measured to
what is accepted—a test of accuracy.
Let us now shine some formality on what we have thus far discussed loosely.
Errors
Broadly speaking, errors fall into two categories: random and systematic. We will thoroughly discuss
random errors first, along with how to treat them with statistics. This will provide a valuable context
when discussing systematic errors.
6
Random Errors
If a measurement is only affected by random errors, it will just as likely be too big or too
small around the theoretical accepted, true, value of what is being measured.
Mathematically, for true value ,
with the random error having an equal chance of being positive or negative. How far the
measurement result deviates—the size of a particular instance of random error—is also
random. The standard error will be an attempt to quantify the typical, average, size of
these deviations and will be how we represent precision.
Random errors influence all real measurements due to the measurement device, the
measurer, the measured, and the measurement’s environment. The impact of random
errors can be reduced with the averaging process.
Returning to our discussion in trying to include some range of reliability when finding the length of some
widget, what might make a claim of L=2.47±0.02cm more or less realistic? Let us consider altering a
variety of conditions:
Consider the measurement device. Suppose the device used to measure the widget’s length is now
some fancy laser/mirror system used by top notch scientists. On the other hand, consider the
measurement being performed with a partially used crayon generosity provided by a wandering six-year
old child.
Consider the measurer. Perhaps he or she spends all day carefully setting up the measurement and
performs it with conscientiousness, mindfulness, and extraordinary technique. On the other hand,
consider the measurer wandering out of a bar and then attempting to quickly measure the widget’s
length right before passing out on the sidewalk.
Consider the measured. Perhaps the widget is made of the sturdy metal carefully manufactured to be
consistently straight throughout. On the other hand, consider the widget to be made of a gelatinous
substance that awkwardly wiggles and curves when handled.
Consider the measurement’s environment. Perhaps the widget is measured in a peaceful, temperature
and humidity controlled laboratory and is rigidly clamped to an air-table in order to minimize
inadvertent jostling and vibrations during the measurement. On the other hand, consider the
measurement occurring on a roller coaster during a series of jarring loop-the-loops.
7
Depending on the influences of random error, a claim of L=2.47±0.02 cm might be reasonable. On the
other hand, it might be unreasonable.
Random error can never be fully removed from any real measurement. No measurement device has
infinite precision. No measurer has perfect use of the five senses. No measured physical quantity can
ever escape the random nature of matter. No measurement’s environment can ever be truly removed
from the random nature of the universe.
Yet this very randomness is vital for scientific development and exploration, for its very nature allows
the use of statistical techniques to understand, quantify, and most importantly, to reduce its influence.
The importance of this can never be overstated.
Statistics
For most of the upcoming statistical operations, you only need to know how to get a computer program
to perform the calculations for you, and the Excel commands will be presented and highlighted for your
convenience. But you must always understand what the results represent.
Averages are everywhere in our culture. Your grade for this course will be an average! In the election
example, we intuitively know the Mickey Mouse result is an outlying datum that can be properly
identified as such when more data is gathered. And even when you get some of the more expected
measurements, like for Obama or Romney, you might get data clusters which disproportionally favor
one candidate as you take more measurements. The hope is that this will balance out as you tend to get
data clusters disproportionally favoring the other candidate. The averaging process allows these
clusters to cancel some of each other out in the hopes that such cancelations produces a better
estimate to the true result of who will win the election.
We will deal with what are called one-dimensional averages, where multiple determinations of the same
quantity are considered. We will also deal with two-dimensional, linear averages, where two quantities’
relationship will be determined by considering how the change in one affects the other. Both these
averages stem from the same theoretical justifications, and they both share the same properties.
So suppose you are trying to determine the true value of some physical quantity, like the acceleration
due to gravity, the speed of sound in air, the resistance of a circuit element, or the magnetic
permeability of free space. If we only consider random error, due to the nature of random error, any
determination you make will higher or lower than what you are trying to determine. Suppose you had
one that is higher and one that is lower, add them together, and divide by two. Some of the “too
bigness” of one will cancel out some of the “too smallness” of the other one and your result will be a
little closer to what you are trying to determine.
However, assuming any pair of measurements has one being too big and the other being too small is
just as likely as both being too big or both being too small. Like if you flip a coin twice, you are just as
likely to get two heads or two tails as getting one head and one tail. However, if you keep flipping more
8
and more, as you keep taking more and more measurements, it will start becoming more and more
likely that you are getting closer to a 50-50 split. Making more and more determinations will improve an
average’s approximation to whatever it is trying to approximate.
The other issue of randomness in a general continuous measurement is how far each determination
tends to deviate from what you are actually trying to determine. This is intimately linked to precision.
This too will have a different theoretical size, which we will try to approximate later on with another
average called the standard error. For the moment, we need to understand that as more or less random
errors impact our measurements (the more or less precision), the more or less measurements are
needed to earn the same reliability in its approximation. In other words, suppose there is a certain level
of trust in an average, the greater the typical deviations can swing, the greater potential for “too
bigness” and “too smallness” implies the greater need for more representations to achieve the same
level of canceling in order to earn the same level of trust.
An average reduces the influence of random error. If systematic errors are
reduced to negligible levels, an average becomes your best estimate to the
theoretical true value you seek. And as long as systematic errors continue to be
negligible, this estimate approximates this theoretical true value better, more
reliably, as the number of measurements increases.
Formalizing our discussion, here is a one-dimensional average:
If there is set of independent
determinations of ,
Then their average is
̅
∑
Excel Command: =average(values)
9
Here are the two linear, two-dimensional averages:
If there is set of
(
)(
)
independent determinations of (
(
) pairs,
),
that are suspected as having a linear relationship such that
with slope and intercept ,
then the average values for this slope and this intercept are
∑
∑
∑
∑
∑
∑
∑
∑
[∑ ]
Excel Commands: =slope(y-values, x-values)
=intercept(y-values, x-values)
The averages are determined in a process often called linear regression.
Alas, it just happened didn’t it? You were probably fine with the 1D average, but now, “What the hell
are those?!? Those are averages?!? (Is he crazy?)”
Yes (and yes!). Remember, in practice you normally calculate none of those averages “the long way,”
and you get a computer program, like Excel, to do them for you. However, you must know and
understand that the results are averages and behave like averages.
Ah, but understanding these 2D averages is trickier than the 1D case. Consider this statement in that
definition: “that are suspected as having a linear relationship”?
What if and are actually not related? Meaning that as you change , is not affected at all by this
change. Ah, but you still might be determining different values for and just not realizing these
differences are due to only the random errors you would normally face if all you did was try to measure
over and over again without ever considering at all! In other words, since these are averages, the 2D
case must reduce to the 1D case. The true value for the slope must be zero, and the true value for the
intercept must be the 1D average of . Realistically, however, if you just enter these pairs in a computer
program, it will do what its told and spit back the calculated averages for you. The problem is that just
10
having those averages might not be enough to judge the actual relationship. For example, is the slope
just really small, or are these two variables actually independent of each other?
Furthermore, and what more frequently happens in science, the variables might indeed be related to
each other, but not linearly. For example, what if is really dependent on the logarithm of or the
inverse square of ? You will still get a set of ( ) data that you can perform a linear regression on!
And worse, you might get a “slope” and “intercept” that appear legitimate to you!
Graphing the data can certainly help here, but we need a better judge of relationships beyond our ability
to make and read graphs. We need some more help in interpreting these 2D averages from linear
regression!
Our help comes in the form of R2.
The linear correlation coefficient R2 is a number between zero and one and helps
us interpret the relationship between two quantities.
R2 tending toward 1 implies linear dependence.
R2 tending toward 0 implies independence.
R2 tending toward some number in-between 0 and 1 implies dependence, but not
necessarily linear.
How reliably we know where R2 is heading depends on the influence of random
error; therefore, this reliability improves as the number of data points increases.
Another consequence of this is that R2 also implies precision; even if you have a
linear relationship, R2 can still drift away from 1 depending on, and implying, the
precision of your data.
Excel Commands: =RSQ(y-values, x-values)
(∑(
∑(
̅ )(
̅ ) ∑(
̅))
̅)
Yup, I deliberately didn’t include the calculation of R2 in the grey box and even made the font very small. Believe it or not, I am trying to deter
the less careful students from bothering to read it. I still need this calculation somewhere for, you know, “the sake of completeness,” but
there’s something dangerous in it that I’m fearful of displaying more openly. R2 happens to use the mathematical process that also produces
1D averages. This should be considered just a mathematical convenience here though, for in general, it does not make sense to perform 1D
statistics on 2D data that is supposed to be different. Still though, so what’s the big deal here? Listen! Do you know how many classmates and
students I’ve seen mindlessly performing 1D statistics on 2D columns of data? You’d try to hide it too!
(If you are curious, if we considered R and not R2, a positive R implies a positive slope and a negative R
implies a negative slope…..but you will see that for yourself when you find the slope!)
11
Let us illuminate this with an example. Suppose you carefully draw a line with a set value of r1.
And then you use that to sweep-out and measure a distance C1.
Then you increase r1 to r2, use r2 to sweep out C2, and measure C2.
And then you increase again and again to find N, (r,C) pairs.
12
Now we know the true values that govern this relationship, C is indeed linearly related to r, C=2r.
Therefore a linear regression should produce a slope that is a best estimate to 2meaning that if you
wanted a best estimate to , just divide the slope by 2. (Seeing yet how incredibly powerful this can
be?) The average value for the intercept should be close to zero, and R2 should be close to 1 because
they are indeed linearly dependent on each other.
Ah, but what if instead we found the area of the region swept out instead, giving us N (r, A) pairs? And
then we performed a linear regression on it? We would get nonsensical results (it is a quadratic
relationship) and R2 will not tend to be one. (You will explore this further in 181’s first synthesis.)
Now if you linearize your data, perhaps use a set of (A/r,r) data instead, then you will be able to use
linear regression, and once again, have useful results from the produced averages, like getting  from
the slope.
(I *really* hope the awesome power that such flexibility grants is starting to become apparent! Think
about it, you can have two sets of data that are clearly not linearly related, and then you can perform
some algebraic tricks and *still* use linear regression to extract useful information! Don’t worry, I will
say it for you, “that’s amazing!”)
Standard Errors
Now we come to another set of averages needed to answer some of our still unanswered questions.
Such as, if I produce an average, just how reliable is it? It hopefully cannot be just as reliable as the
individual measurements used to produce it, otherwise what is the point in producing an average? Well,
even before that, just how might we quantify the precision of what went into the average in the first
place? In other words, looking back at that ruler, we certainly want to believe that anyone using such a
device can also reasonably report measurements to the second decimal place, but we can also imagine
conditions where such precision is actually unreasonable. And the word “reliable” keeps getting thrown
around and that too has not been firmly established yet! We will address all these questions in the
following sections.
In quantifying experimental precision, remembering that each measurement tends to be randomly
distributed around its theoretical true value reveals both the problem and the solution for us. If we look
at just one measurement’s deviation from what we think is the true value, random error will make such
a determination equally likely to be too high or too low around this true value, but not necessarily by
the same amount. The size of these deviations themselves will also have randomness to them;
therefore, considering just one deviation is totally unreliable (let alone knowing what to deviate from if
we do not know the true value is to begin with). Ah, but because these deviations will tend to be too big
or too small randomly we can again use the awesome power of the averaging process on multiple
measurements, which allows some of this “too bigness” in deviations of some to cancel some of this
“too smallness” in deviations in others, helping us expose the more standard size of these deviations.
How to get this average, though, is tricky.
13
Consider a spread of 1D data,
, all trying to measure the true value of and subject to only
random error. Suppose we actually know this true value, call it . (It is traditional to use Greek letters
for true values and Latin letters for the estimates of them.) So the first deviation from this true value is
(
), the second one is (
), and so on. So we could try to just average those.
∑
(
)
Alas, this is problematic. If each measurement is truly an attempt to measure its true value, then each
(
) is like an attempt to measure zero! And we already know an average like this will ultimately
become the best estimate of trying to find zero. We seek the average size of deviations, and if the
average size was zero, that would be saying, on average, there is no randomness!
Therefore we need to make sure this average does not converge to zero as we make more and more
measurements. Making every term positive would do this. We could take the absolute value of each
term, and that would work in a meaningful way to quantify precision (called the absolute deviation).
However, this is seldom used over the vastly more popular way to make each term positive, squaring!
∑
(
)
Indeed, this is much closer to what we seek. So the average of the squared terms will produce a
squared result, so we take the square root in order to get back to the same units. And let us now give it
a proper label, , as representing the average deviations of from its true value, ,
√∑
(
)
Why have we not let go of the question mark and put a nice box around it? The issue now lies in just
what we are deviating from. In practice, we generally do not know the true value of what we are trying
to determine, we use averages to produce best estimates for them. In the 1D case, this is ̅ .
√∑
(
̅)
Still no pretty box? And now the question mark moved to ? What’s the problem now?!?
14
Alas, this is a very peculiar issue as it is not one most people need to consider in the averages we
typically encounter. Consider this situation for a moment. This is not a direct analogy, and the rigorous
proof is beyond our scope, but this will hopefully be enough to see that there is a problem we need to
address now.
Suppose you are trying to average 5 independent measurements, say 5 rolls of two six-sided dice, which
has a well-defined mean of 7. Say you roll 4, 11, 8, 6, 10. So you would just add up the numbers and
divide by five, no problems so far. But what if I then add a sixth number that is a copy of the final
number, giving you 4, 11, 8, 6, 10, 10. Errrrr is finding the average the same still? Divide by six this
time? Well what if you just copied the five original numbers and called them five new measurements,
divide by ten now? Hopefully these two situations are making you question just what to divide by, all
the numbers being independent of each other is important.
The change from the true value to a best estimate in creates a problem for this average that the
previous averages did not face, an issue of independence. You can see this superficially if you try to use
the above with just one measurement, which would make =0. Well that certainly cannot be true!
Why make anymore measurements if you already have infinite precision?!
More specifically, the best estimate came from our data, it depended on the data; this costs us what is
called a degree of freedom. Dividing by just N creates a bias that artificially reduce the size of . We
correct this by reducing N to N-1. Now we are ready for a pretty box!
If there is set of
independent determinations of ,
A single determination of ’s standard error in one-dimensional statistics is
√∑
(
̅)
This is often referred to as the standard deviation.
This is also a best estimate to its own theoretical true value in quantifying
precision, .
Excel Command: =stdev(values)
Note that in Excel, this is the same as the =stdev.s(values) command. There is also a =stdev.p(values)
command that you will not be using in this course. (I will explain why later.)
15
For the linear, two-dimensional situation, we generally consider the independent variable's imprecision
to be negligible compared to the dependent variable’s imprecision. This is analogous to carefully
zero’ing one end of a measurement in a 1D measurement and considering that imprecision negligible
compared to the imprecision in determining whatever interval the other side ends up in, like when using
a ruler. Because you are setting the independent variable, the assumption is that you can carefully set
those to clear edges of lines and that imprecision will be negligible compared to the imprecision of
measuring the dependent variables in whatever intervals they end up in. Thus we determine the
precision in the dependent variable only.
If there is set of
(
)(
independent determinations of (
)
(
) pairs,
),
that are linear related with slope, , and intercept, , produced by
linear regression, the best estimate for the standard error in the
dependent variable, ,is
√∑
(
(
))
Excel Command: =steYX(y-values, x-values)
(It is dangerous to call this a “standard deviation,” so please don’t even though it is basically the same thing. Yet more I will explain later on!)
Hopefully this N-2 does not surprise you now. Since the best estimate of where we predict each will
be is determined with two averages now, and , and both are determined from the data, not
theoretical true values, we lose two degrees of freedom this time. In form, this is essentially the same
thing as the 1D standard error that came before.
√∑
(
))
(
(
)
Now we are ready to discuss just what we mean by “reliable.” Up until now this has been a vague
notion based mainly on personal judgment, like the bartender’s advice from earlier. In our scientific
context, we desire a specific, commonly agreed upon notion of just what “reliable” means. This range is
conveniently provided by one of the mathematical properties of the standard error. If we consider true
values, and , the probability of any measurement falling in the range
is 68%. So any future
measurements have a 68% chance of being anything in that range, say
or
, all fall in
16
that range. When we have true values, the range determined by
range.
is what is considered the reliable
If you are curious, beyond this, it is 95.4% likely that measurements fall in the range of
, 99.7%
likely that a measurements in the range of
, and a steep 99.99% chance that all measurements
will be within
. Or, another way to say that last statement, there is a 0.01% chance a future
measurement will be outside
.
What about the difference in wording between saying “68%” and “roughly 7/10”? For this, we need to
remember three things:
Firstly, when we calculate an experimental standard error, this is indeed our “best estimate” to its
theoretical true value, but it is still just an estimate. Not only that, but standard errors converge much
slower to their theoretical true values than the means they represent (which will influence our formal
rounding rules later on). So in practice, with the number of measurements we can reasonably make in
the blocks of lab time allowed to us, we will always have plenty of uncertainly in our standard error
estimates. Furthermore, many of our standard errors, including the ones representing averages, will
come from the upcoming technique called the propagation of errors. This technique includes yet
another layer of estimation in how it is derived; in other words, a propagated error is an estimation of
an estimation!
Secondly, sure, we can predict how likely a subsequent measurement is to fall within a certain range.
However, if we actually make that subsequent measurement, it technically updates its own standard
error, which means that initial prediction is also updated!
Thirdly, realistically a standard error is a fluid quantity. Remember how this all started, we were trying
to provide some range of reliability with a reported value, this being how we are quantifying precision.
So consider the four broad categories that influence precision: the measurement device, the measurer,
the measured, and the measurement’s environment. These all can be fluid elements that might change
during the course of an actual experiment. Plus or minus a cup of coffee can change some scientists’
experimental precision! Weather, mood, alertness, medication, on and on, these all can alter the “true
value” a standard error calculation is trying to estimate.
An experimentally determined value is considered reliable within plus or minus one
standard error of that value. This predicts a roughly 7/10th chance that a subsequent
determination of the same quantity will fall within this range. Equivalently, this predicts
a roughly 7/10th chance that the current determination falls within this range around
the best estimate to the theoretical true value of what is being determined.
This is how you express the experimental precision of the value, the quantification of
the impact of random errors.
17
Reading Errors
When you do not have multiple measurements to calculate standard errors, you make a best guess at
this range of reliability in a single measurement. This guess is called a reading error.
Suppose you are trying to measure an unknown mass. You put it on a scale and read 137.4 kg. Then
you take it off, make sure the scale is properly zero’ed, put it back on the scale, read the mass, and you
again read 137.4 kg. Then suppose you do this again and again with the same diligence in maintaining
high precision, and well, you just keep getting the same number over and over again! Well the average
value is clearly 137.4 kg, no problems there, but when you try to calculate its standard error, you get
zero! Well surely this does not mean that you have infinite precision in weighing this mass, you are not
justified in recording it any one of those measurements as 137.4000000… kg!
Now consider a situation where only one measurement is possible. Perhaps the experiment itself is
dangerous, such as making some measurement while skydiving or in outer space. Perhaps the
experiment is expensive, such as needing to use very rare chemicals or having limited funding.
In such cases you must make a guess as to the standard error of that measurement. This guess is called
a reading error.
How “educated” this guess is depends on your judgment when making an analog measurement or how
much you know about the device being used when making a digital measurement.
Let us revisit this analog measurement.
So remember what we are trying to guess at here. What is our range of reliability? If we were to make
a second measurement, it would have a roughly 7/10th chance to fall in what range around what we just
measured? Or equivalently, what range around the theoretical accepted, true, value does our
measurement have a roughly 7/10th chance to be around? Also, this is a guess as to where the
uncertainty in the actually measurement begins. In other words, if you feel you can legitimately
measure a length to 2 decimal places, this implies that the uncertainty is in the second decimal place.
Therefore, your standard error should also reveal where this uncertainty starts.
18
When making such a guess, you must remember what goes into determining the precision of such an
analog measurement: the measuring device, the measurer, the measured, and the measurement’s
environment.
Let’s address the personal element first. So I’m looking at that picture. I have my glasses on. And I
believe I can legitimately read this as =2.47 cm. I certainly cannot see a third decimal place there. So
I’m looking for =0.0_ cm, and I just need to guess at what should go into that blank spot.
?
Well I believe should be around 0.03 or 0.04 cm. However, when guessing at a reading error,
overestimating is usually safer than underestimating. So, for me and my eyes, I guess =0.04 cm. If I
am reporting this length, I would report it as
Now if I could get a better look at this, my precision can get better. For example, if I used a magnifying
glass,
Well now I surely know my measurement more reliably! This increase in reliability should then be
reflected in my reading error guess. Perhaps I can now claim =0.02 cm or even 0.01 cm!
If I could magnify this even more, or perhaps I have a great eye for mentally entering in further divisions,
could I enter the realm of =0.009 cm? Claim to know to three decimal places? Again, I personally
cannot obtain such precision even with a magnifying glass, others can though. Precision in science,
especially when using an analog device, is a skill and a talent just like precision in any other situation,
like sports, taste-testing, and shooting.
19
Moving on to another aspect of precision, consider this measuring device.
The uncertainly increases. The best I can do is
. Perhaps the better measurers can go
after that second decimal place, but their reading error guess should be relatively high, like 0.09 or 0.08
cm.
The measured should factor into what your reading error guess should be. Is what you are trying to
measure stationary? For example, perhaps you are trying to record a starting and stopping time of
something moving quite quickly. Is what you are trying to measure in an awkward place? For example,
perhaps you are trying to measure the length of a pendulum string attached to a ceiling far above your
head.
And finally, the measurement’s environment should be considered when trying to guess at your
precision. Even if you believe you can normally reliably measure something to two decimal places,
this does not mean you can achieve the same precision if you are measuring in a helicopter. Perhaps
other experimenters are unknowingly shaking your shared table and oscillating what you are trying to
measure? Perhaps a strobe light is flickering in the background?
Digital reading errors differ in the personal judgment area as we have less insight into just how close the
measurement really is. Suppose you have a digital meter reading that length to one decimal place.
Because the digital meter rounded this for you, you cannot see just how close it is to any divisions. We
know—from before—this was close to 2.47 cm and seeing that allowed us a visual feel for just how likely
the adjacent determinations were. Sure this could have been 2.46 or 2.48 cm. But you know it is less
likely this length is 2.45 or 2.50 cm. But because you cannot see this nature, you must assume that all
the numbers that round to 2.5 are equally likely. (You should revisit this section after the first
synthesis, we essentially need to assume a uniform distribution because we cannot see the Gaussian
nature, in other words, digital reading errors will generally be overestimations when considering only
the personal element of precision.)
20
Here are the three methods in trying to determine a digital reading error guess.
Method One: Look it up in the manual.
Method Two: Try to determine how the device rounds.
If it rounds as we normally do, the reading error is one half of the smallest displayed unit. For example,
if all values such that 2.45 ≥ L > 2.54 cm rounds to 2.5 cm, then consider =0.05 cm. And then to keep
consistent with reporting our values to where the uncertainty starts, you may report this as 2.50 ± 0.05
cm. But once again, remember how broad a dispersion this is; we are saying this is reliably in the range
of 2.45 to 2.55 cm. Looking back at the analog measurement, such a range is probably a bigger than you
would have assumed to be the reliable range if you could get a closer look at the (Gaussian) nature of
just where the measurement lies.
Or if it always rounds up or down, the reading error is the smallest displayed unit. For example, if all
values such that 2.4 > L ≥ 2.5 cm rounds to 2.5 cm, then consider =0.1 cm. So this would be reported
as 2.5 ± 0.1 cm. This type of rounding might be awkward for you. For example, if the device always
rounds up, this means something like 2.41 gets kicked all the way up to 2.5 and not 2.4, but that is
indeed how some digital devices round.
One way you can try to determine how it rounds, sans manual of course, is to change the scale.
For example, here is a measurement of a resistance of a circuit element.
Then I change the scale,
21
OK, so far I know it is not rounding down. If I move down in scale once more, I will know how this
rounds. If I see 0.998 k, then I know it rounds as we normally do. If I see 0.999 k, then I know it is
rounding up.
There we have it! With only one measurement, I would report my original resistance as
R=0.998280±0.000005 k.
Method Three: Err on the side of overestimation and use the smallest unit displayed.
So if you do not have access to the device’s manual (or the manual fails to mention it), and you cannot
determine how it rounds, just assume it rounds up or down. So that digital length would be reported as
2.5 ± 0.1 cm and that digital resistance would be reported as R=0.99828±0.00001 k.
Of course, remember the broad categories that comprise precision: the measurement device, the
measurer, the measured, and the measurement’s environment. We have discussed the former two with
the tacit assumption that the latter two were not a factor, which is usually the case if you can redo a
measurement and keep getting the same result over and over again. But suppose what you were trying
to make an analog measurement on something that is wiggling, you would consider that situation when
trying to determine the reading error (only when one measurement is possible as it would be unlikely to
keep getting the same result over and over again in this case). The same goes for digital measurements,
if some of the final numbers are fluctuating, those are not certain. This happens all the time in
electronics as stray wiggling signals can interfere with your precision. For example you might be trying
to measure a sensitive voltage and start picking up interference from the 60Hz electrical signal coming
from the wall outlets. Or when trying to measure photo-intensity, you might pick up noise from other
light sources around the room.
Remember, one more time, in every way, making more than one measurement is ideal when possible.
You only make reading error guesses when you must. And as method three implies, it is generally safer
to overestimate reading errors. Yet this can be dangerous too. We are about to discuss propagation of
errors, a method of estimating a standard error of a calculated quantity from the standard errors that
went into the calculation. But this method tacitly assumes the standard errors are quite small compared
to what they are the standard errors of, and this is where overestimations can cause problems.
Indeed, our next topic will be finally addressing some of our long standing unanswered issues! Sure, it is
nice that we are able to quantify the precision of our single measurements. But we want to report
22
averages! Our best estimates! We want to report something like ̅ , and not just a single ; thus, while
is nice, it is not ̅ ! And the same thing for 2D statistics, we assume the independent variable’s
precision is negligible,
, compared to the dependent variable’s precision we have already
discussed, . However, we do a linear regression to find the averages for what determines their
relationship, slope and intercept. So while we could just get a slope and intercept from any two points,
this is not taking advantage of the random error reducing effects the averaging process provides us! So
if we get an and from linear regression, what is
and ? Furthermore, what if you want to use
those best estimates to calculate something else? Like if
, then what is ? Or say we have a
1D average, ̅ , subtracted from a quantity, , which we only have a reading error guess for its standard
̅ ; then what is ?
error,
All of those questions will be answered in the next section! Since these are all formulas, we need a
means to quantify the standard error of a value calculated with other quantities that each have their
own standard errors.
Propagation of Errors
In general, you will want combine your various measurements and averages mathematically—adding,
multiplying, raising to a power, and so on—and you need a way to properly combine the standard errors
of these quantities to find the standard error of the final quantity.
Alas, the way to properly show this involves multivariable calculus. However, here is a general feel for
how the process goes.
Did you ever see a first order approximation to the sine function?
This works nicely for small in radians. Consider =0.01 radians, then
=0.0100000 to seven
decimal places! A great approximation! How about =1 radian? Then
=0.84, not nearly as good.
How about =10 radians? Then
=-0.54, a truly horrible approximation.
How the upcoming error propagation formulas are found is with a first order approximation that
assumes your standard errors are small compared to the typical values they represent the errors of.
Then this first order expansion is entered into the standard error formula itself. The following results
come out of that process.
23
If
If
Then
Then
( )
(
)
(
)
(
)
( )
√(
)
(
)
(
(
)
(
)
(
)
√(
)
(
)
(
)
)
You can propagate any errors you will find in this course with these two formulas.
We can find two immediate corollaries as our first examples.
If
for a constant
(we know it with certainty, i.e.
), then
( )
(
)
( )
( )
(
)
( )
Now consider
( )
(
)
( )
(
)
| |
Note the interesting distinction in adding (or subtracting) a constant over multiplying (or dividing) by a
constant. One has no effect on the standard error and the other scales it. Also notice the step right
24
before was substituted in, the was still there but contained in . In other words, when just using the
formula, the last example would normally come out to
We will get to propagating through the averaging formulas in a moment, but we can still use the
previously discussed cases for two more quick examples.
If
The numerator 1 is a constant, its standard error is zero, and the variable
has an exponent of -1.
So applying the second error propagation formula,
( )
(
)
( )
A nice property of propagating errors is the squaring part that automatically lets you rewrite a negative
exponent as being positive. (Analogous to the formula for adding and subtracting being the same, the
formula for multiplying and dividing is essentially the same too.)
( )
(
)
( )
For another example, consider
̅
Then
(
)
(
)
√( )
( ̅)
( ̅)
Here is an example finding a propagated error you will actually use in lab,
(
̅
) ̅
25
(
( )
̅
)
̅
(
̅
̅
̅
)
( )
̅
Remember, you can have a term with the constants if you like, but since you know them with certainty,
there standard error will go to zero. (In other words, that number 2 in that formula is a legitimate 2.
We are not entertaining any uncertainty in it, neither 2.000001 nor 1.999999, that there is a bloody 2!
This is the same as saying its quantified uncertainty, , is zero. Same with and any other values we
are treating as constants.)
̅
√(
̅
̅
)
( )
̅
For a more general example, suppose
(
)
√
( )
(
( )
)
(
)
(
)
(
)
√( )
√( )
( )
(
(
√
)
√( )
)
)
( )
(
(
(
)
)
( )
(
( )
)
( )
(
( )
)
( )
(
( )
)
( )
(
( )
)
26
A less abstract way to do this is to first simplify by adding in different variables.
(
)
√
Let
Find
and
and
,
first,
√( )
(
√( )
) and
(
)
Now
( )
√
Find
( )
(
)
(
)
√(
)
(
)
You can then substitute the ’s and ’s back for your final result. However, it can be ideal to leave it like
this, especially when using MS Excel to propagate errors. You just make cells for and , then
and
, and then you can more easily click on them when finding , rather than trying to enter in longer,
and more error prone, expression for found first.
Let us, at long last, find
̅.
The formula for a the 1D mean is
̅
̅
∑
(
)
Keeping it squared, using the addition formulas, as well as the multiplying by a constant corollary found
above,
(
̅)
((
)
(
)
(
) )
27
We already established that our best estimate for the precision error in each measurement is just
(
̅)
((
(
)
(
̅)
)
( (
(
(
) )
) )
̅)
The estimated quantification of the reliability of a calculated
one-dimensional average—the standard error of a 1D mean is
̅
√
Sometimes this is called the standard deviation of the mean.
There is no pre-set Excel command to find this. However, if all your
data is in column A, then you can use this:
=stdev(A:A)/sqrt(count(A:A))
The same can be done with the 2D averages, propagating
slope and intercept from linear regression.
through the formulas determining the
The estimated quantifications of the reliability of the two-dimensional averages from linear
regression—the standard error of slope, , and the standard error of intercept, , are
∑
√
√
∑
[∑
]
Don’t worry, you will be given a very easy way to find these on Excel. Your job is understanding
what they represent.
However, if you prefer to find them on your own, for your copying and pasting pleasures, if your
independent variables are in column A and your dependent variables are in column B, then the two
commands are,
is =STEYX(B:B,A:A)*SQRT(COUNT(A:A)/(COUNT(A:A)*SUMSQ(A:A)-SUM(A:A)^2))
is =STEYX(B:B,A:A)*SQRT(SUMSQ(A:A)/(COUNT(A:A)*SUMSQ(A:A)-SUM(A:A)^2))
.
28
Now we have our general set of standard errors along with a means to propagate them into any
formulas we will encounter in this course. So let us discuss their expected behavior as the number of
measurements gets larger and larger.
Theoretical Convergences
If we have only random errors, then one-dimensional statistics predicts the following as the
number of measurements, , gets larger and larger, theoretically approaching infinity.
If ̅ and
are the best estimates of their theoretical true values,
̅⇒
and
̅
and
, then
⇒
⇒
If we have only random errors, then two-dimensional linear statistics predicts the following as
the number of measurements, , gets larger and larger, theoretically approaching infinity.
If , , and are the best estimates of their theoretical true values of slope,
and standard error of the dependent variable, , then
⇒
and
⇒
⇒
and
⇒
, intercept, ,
⇒
Most students are OK with the averages of the values themselves approaching their theoretical true
values as the number of measurements gets higher and higher. Yet some students tend to have a hard
time connecting this with the equivalent statement of its uncertainly getting lower and lower. As a
matter of fact, the previous table is redundant! ̅ ⇒
automatically implies ̅ ⇒
, and vice
versa. Furthermore, you must appreciate how powerful such a statement is, for it is concisely displaying
the awesome power of the averaging process; it reduces the impact of random errors. This is also
concisely affirming, in mathematical terms, the uncomfortable vibe you hopefully got when reading that
a political scientist claims he knows for certain that Mickey Mouse will be the next president after only
one measurement. Knowing such a thing with any certainty can only be done with an average, and the
certainty of this average improves as the number of measurements increases.
Another common issue is believing the 1D (or its 2D analog
due to seeing that N-1 (or N-2) in the denominator,
) also approaches zero. This is often
29
∑
(
̅)
and thinking “Ah! In the limit as N goes to infinity, this must go to zero!”
But converging to zero does not make sense on many levels. Mathematically, yes indeed, the
denominator is of order N, but the numerator is a sum of N positive values; therefore, the numerator is
also of order N. (Well they both are on the order √ since we’re talking about the standard errors here,
but you hopefully get the point.)
You can see this visually after learning histograms in the first synthesis. As N increases, you can see that
the histograms are not becoming thinner and taller; they are converging to definite size and shape. And
since represents the width of such a histogram, this means that is converging to a set non-zero
value.
And thinking about this experimentally, represents the precision of your measurements of , a
quantity affected by countless factors due to the measuring device, the measurer, the measured, and
the measurement environment. Now you can certainly improve upon , and indeed, the more
measurements that are made can provide valuable experience allowing a now wiser experimenter to
improve upon his or her technique. And people can improve their precision in other ways too, such as
using a magnifying glass, playing calming music, being well rested, firmly clamping certain things down,
and so on. But claiming that converges to zero is a very different thing. As you make more
measurements, as you keep updating , you are starting to know your experimental precision more
precisely, but that does not mean you are becoming infinitely precise! Think about that literally. So as
you make more and more measurements with the same ruler, it will magically start developing smaller
and smaller nicely calibrated divisions? And then your eyeballs must start developing super magnifying
vision? And then as you use the ruler even more, the laboratory and earth will start to vibrate less?
(Believe me, I wish this were true! It would be like developing super powers due to being an extreme
nerd! Darth Vector would be a legitimate Sith Lord! Muhahahaha!
Alas, all I’ve gotten after years of physics is myopia.)
Quantifying random error does not reduce the impact of random error on your data. Averaging reduces
the impact of random error. That is why their standard errors do indeed theoretically converge to zero
as you make more and more measurements. This is fundamentally why averaging is a monumental
process for science and humanity. Its importance can never be overstated. Averages are everywhere.
As mentioned before, even your lab grade will be an average! Your grade will not be determined from
just one single measurement like one lab report grade or one pre-lab score. Several measurements are
made in various ways to try to reduce the potential randomness of your performance. Then an average
30
of these measurements becomes the best estimate in quantifying the true value of your knowledge of
physics, reporting ability, experimental technique, and ability to work in a laboratory environment.
Now there is also another reason why even the standard error of averages will never actually be able to
reach the theoretical limit of zero. Notice there needed to be this qualification first: “If we have only
random errors” in the above boxes. As we will discuss later, systematic errors will gladly pour some cold
water on any attempts to actually try to reach those theoretical limits. This is why the only true time we
will be able to see this behavior is if we can simulate an environment with zero systematic errors like in
the first synthesis.
Alright, more examples of propagated errors!
If ̅
And
̅
What is
?
.
̅
√
√(
̅
)
(
)
In general, we only keep one significant figure when rounding a standard error and this tells us where
the uncertainty in the quantity begins. In this case, is telling us that uncertainly begins in the first
decimal place so that is where we should round .
Therefore,
This is an interesting result, if we just ignored the imprecision in ̅ , we would have gotten the same
reported result,
. This is as if ̅
and might have been ignored form the start—considered
negligible compared to the precision error in b.
As you might have noticed, this issue of negligibility comes up quite often in error analysis. You’ve been
doing it for years and didn’t know it! Consider when using a ruler, one end is matched up to an edge,
usually the zero line’s edge, and then the second end is measured in whatever interval it happens to fall
in.
31
So when you read the less precise end, say as 2.47 cm, you are ignoring the subtraction that is actually
happening, 2.47-0 cm, and you also never bothered to propagate the error in that subtraction! In other
words, say you guess the standard error in making one measurement is , then you would report the
widget’s length with standard error of just .
However, look at this situation again:
Can you see now in terms of quantifying precision just how undesirable this is? This is actually two
interval measurements now. Therefore, if you guess the standard error in one measurement is , this
must be propagated through the subtraction now, √( )
widget’s length with standard error √
( )
√
. Thus you would report the
.
This same reasoning applies to the independent versus dependent variables. It is assumed that you are
carefully setting up the independent variable measurements, conscientiously lining up to edges of lines,
and then you measure the dependent variables in whatever intervals they happen to end up in. This is
why was considered negligible compared to .
Ah, but we need to be careful now! The story of negligibility can get more complex! We must also
consider how the error compares to what it is the error of!
Let the same ̅
multiplied,
̅
and
from before, but this time consider their being
. Then
(
)
̅
( )
̅
( )
32
√( ̅ )
̅
(
(
)(
( )
)√(
)
)
)√(
(
)
(
)
Therefore we report the result as
Did you notice what just happened? The very same
negligible in multiplication with ! Yikes!
̅
that was negligible in addition with is now NOT
Let’s back up and examine when this major difference occurred.
(
)
)√(
(
)
There we can see already how the term representing ̅ is already larger than the term representing .
Backing up a little more,
(
)(
)√(
√( ̅ )
̅
)
(
)
( )
Now we can see the problem a bit closer in the ratios of the deviations over what they deviate from.
These are called fractional errors. Your precision can be hurt if what you are measuring is small
compared to the standard error of what you are measuring.
Now consider improving the fractional error ̅ by increasing ̅ to 22.30 g and keeping
worsening the fractional error in by decreasing to 3.22 and keeping the same.
√( ̅ )
̅
( )
̅
the same, and
33
(
)√(
)(
(
)
)
)√(
(
)
(
)
Ah, we can already see the imprecision now dominates over the ̅ imprecision, which was not done by
changing either or ̅ , but by changing what they compared to.
Now let us double check this by simply assuming the imprecision in ̅ was negligible compared to the
imprecision in from the beginning. Having this assumption is treating ̅ = 0, negligible and ignorable.
√( ̅ )
̅
(
)(
(
( )
)√(
)
)√( )
(
(
)
)
The same experimental result! Now the imprecision in ̅ is negligible to the imprecision in
addition and multiplication.
under both
Generally, it would not be safe to assume negligibility from the beginning in the previous example since
it was “cutting it pretty close.” So keep that in mind when setting measurements that are going to be
assumed negligible, such as properly zero’ing measurements and carefully setting independent
variables.
(The rest of this discussion needs to wait until after systematic errors are discussed. Look for *****)
Let us further explore fractional error with a visual example. Consider these three cases in using the
same measuring device to find three distances.
First the length ,
34
Then the width
,
Finally the thickness ,
35
While the error in using the ruler,
, theoretically stays the same, the fractional errors gets worse as
what is being measured becomes more and more comparable to its error,
(Here’s a tip; take another look at when I first introduced the error propagation formulas, I put in a step
just before the square rooting. This is because that is how I actually remember them, the
addition/subtraction one is just the errors added in quadrature, and the multiplication/division one is
the fractional errors added in quadrature with some potential exponents thrown in. I hope that helps!
Now it’s pretty box time!)
36
Fractional error is a ratio comparing some sort of deviation to what is being deviated from. In
terms of precision, this is a standard error compared to the quantity it is the error of,
In terms of accuracy, this is the deviation from what is found experimentally to what is
considered the accepted, true, value over the accepted, true, value.
Sometimes it is desirable to convert this to a percent fractional error,
This course’s general test of accuracy is a PFE called a percent difference.
|
|
Rounding Rules
We are now ready to discuss the formal rounding rules for this course. You always use unrounded
numbers in calculations; however, when formally reporting a result, you want to present it as reliably as
you know it along with some indication of how accurate your value is. (You will not always be able to
test accuracy.)
Alright, so let’s say you perform one-dimensional
statistics on an ensemble of data. For example,
suppose you are trying to find the length of a simple
pendulum attached to the ceiling with a 2-meter stick
(something you will do in lab by the way). You do this
ten times and find these results in the table.
L
N
(cm)
176.25
`L
10
175.62
(cm)
175.77 175.90300
176.18
SL
175.88
(cm)
175.68 0.24060341
175.92
S` L
175.94
(cm)
175.60 0.07608548
176.19
37
Suppose that all we wanted to do was formally report our best estimate for this length, ̅ . In the
modern day of calculators and computers, we will get back far more numbers than are actually
significant. So how can we determine where to properly round what we wish to formally report?
Ah, you might be thinking about the rules of significant figures. And those are fine as long as you are
dealing with single quantities where we have all tacitly agreed that the final digit is where uncertainty
begins. However, this is also assuming the overall impact of random error is not being reduced
throughout the entire problem. We want to use averages, not just single determinations.
Averaging reduced the impact of random errors.
Therefore just following the rules of significant figures is not adequate for us. For example, there is no
reason to suspect that merely adding and subtracting two or more numbers together will produce a sum
or difference that has had its precision improved over any of the single values that went to determine it,
so the rules of significant figures are fine. The same reasoning applies to multiplying and dividing two or
more numbers together, why would we think such a result has lessened the impact of random error?
Therefore again, the rules of significant figures are fine.
But again, averaging reduced the impact of random errors. We want this to happen. We want as much
experimental reliability as possible in our experimental results. So yes, when a 1D average is taken on
an ensemble of single determinations, such as with all the measurements of length above, then the
impact of random error is reduced on the 1D average over just any single measurement of length. If you
recall how we quantify this reliability in values, with the standard error, what we just stated in words is
the exact same thing as noting mathematically that
̅ .
Make sure you understand that this same reasoning applies to our linear 2D statistics too. In other
) and (
), then using the rules of
words, if you have just one pair of determinations, some (
significant figures is fine if you then determine the one slope and one intercept from those just two
points. However, in the exact same way that having more than one determination of a single value and
applying 1D Stats to them is ideal, having more than one pair of points and applying 2D Stats is ideal.
Normally we do not find what would be the standard error of just one determination of slope and
intercept from just one pair of points, but if we did, mathematically they must be less than standard
errors reflecting the 2D average values of slope and intercept.
OK, let’s get into how we justify our rounding rules. In our formally reported result, we only want one
uncertain digit, the last one.
Let us explore this with the more tangible ruler analogy first.
38
OK, assuming this is properly zero’ed (meaning that part contributes negligible imprecision) then here is
how we normally try to measure a length.
We can tell that our first value is 2 cm. So far so good!
Next we can see that our widget seems indeed past the fourth line in-between 2 and 3 cm, giving us 2.4
cm so far.
Now we normally consider the next digit where uncertainty begins because that is where our
uncertainty begins in our reading! We are out of nice markings and must guess as best we can just
where in-between those two line our widget’s length resides. 2.43 cm? 2.44 cm? 2.45 cm?
Well whatever the final guess is, based on the measurement device alone, the uncertainty begins in the
second decimal place. Ask yourself this question, what if someone took the widget and ruler, and then
formally reported this single measurement back to you: 2.4472843 cm. Would you believe that person
was able to see microscopically? If someone formally reported that length to you, would you just
assume that person must have super powers and accept that all those digits but the very last one are
certain?
Ah, no, you should not. Whether you realize it or not, you know that person did not properly round!
And also whether you realize this or not, you are assuming the standard error of a single measurement
takes on this form 0.0_ cm with the first significant figure representing where uncertainty begins.
The standard error represents our experimental uncertainty; we let it guide us when deciding where
uncertainty begins in the values we wish to formally report.
So getting back to this example,
First, we normally only bother keeping one significant figure when reporting a
standard error. Therefore, we start by rounding ̅
.
Why only one significant figure for a standard error? Well here is the easy
answer. We are generally not interested in quantifying precision outside of
what it reveals about the values it represents the precision of. Meaning we
care more about what ̅ can tell us about ̅ than ̅ itself.
However, yes, there is a hard answer. As sample size increases, standard
errors of averages converge much slower to their theoretical true values than
the average values they represent. In the physical experiments we will be
performing in these labs, we will never come close to the number of
measurements needed to legitimately reveal more than one significant figure
in a standard error. (You will get to explore this in the first physics 181
synthesis.)
L
N
(cm)
176.25
10
`L
175.62
(cm)
175.77 175.90300
176.18
SL
175.88
(cm)
175.68 0.24060341
175.92
S` L
175.94
(cm)
175.60 0.07608548
176.19
39
Alright, so ̅
tells us that our uncertainty of what it represent, ̅ , begins in the second
decimal place. Therefore we formally round there. Giving us,
̅
Before moving, note something interesting about this particular real life example. Suppose you only
made a single measurement of this length of a pendulum with a 2-meter stick, which technically you can
measure to the second decimal place in centimeters. You probably would have guessed that the
uncertainly of a single measurement, which represents, would fall in the second decimal place, just
like in the ruler example above. If you only made a single measurement and needed to make a “reading
error” guess at , you would probably say it was 0.05 cm or something like that.
Yet that actually would have been a gross underestimation. In this real life example, the statistical best
estimate of where uncertainty begins in a single measurement of calculated out to be
,
implying uncertainty beginning in the first decimal place, not the second.
You can actually see this from just examining the data of this particular example. Look at each
measurement from left to right. The 1’s are all the same, the 7’s are all the same, the 6’s are all the
same, but then the numbers jump around quite a bit, implying that is where uncertainty begins.
In other words, this example is to remind you that random errors come from more than just the
measurement device itself. There is individual skill in measuring, just like in sports and shooting, so how
good is the measurer in seeing this uncertainty? Especially in this example of measuring something
hanging above you from the ceiling, how much is what is being measured contributing to the random
error, is it stable or awkward? How is the measurement environment contributing to the potential
random errors, are their distractions or strobe lights going off in the laboratory?
Sometimes you must try to guess at the standard error of a single measurement, but it is always
preferable and safer to take more measurements in use statistics.
So when developing a set of rounding rules, again, we want only one uncertain number in your final
reported value and the standard error tells you when this uncertainty begins, and therefore where to
round.
Generally, you keep only one significant figure for your standard error. But an exception is allowed
when that significant figure rounds to a 1. Consider just how close an error 0.013 is to 0.009, so we
allow one extra significant digit in rounding the quantity this represents the precision of, along with
keeping two significant figures for the standard error. This is not to say that the true value for an error
cannot legitimately have a leading digit of 1 and that is truly where the uncertainty begins; for example,
the standard error of a vernier caliper is often considered to be a legitimate 0.01cm when the hundredth
place is truly as far as the measurer can go (without a magnifying glass).
40
We are usually less interested in a percent difference beyond what it reveals to us about our
experimental value’s accuracy. So just keep two significant figures for percent differences is good
enough for us to be able to gauge how accurate our values are.
Rounding Rules when formally presenting a result
Keep one significant figure in your result’s standard error and that tells you where to
round the result.
An exception can be made if the first digit in the standard error rounds to a one, then
you can keep a second significant figure in the standard error and round the result to
where that second digit is.
If your standard error happens to be larger than the value it represents, then that value
is totally unreliable. Just keep one significant figure for both the value and its standard
error and put a question mark by your value.
If you can do a test of accuracy with a percent difference, just keep 2 significant figures.
Remember, this is just how to present your experimental finding. You always use
unrounded numbers in calculations, which is easy when using Excel because you can
just select the cell with the unrounded number in it.
The exception—when the leading digit of the standard error rounds to one, we allow an extra digit in
the reported value—requires some more discussion just to be clear.
For contrast, let’s consider a unit-less example without this exception. Suppose your standard error
comes out to 0.09, which justifies your reporting a final value to the second decimal place. Yet what if
your calculated standard error just happened to be 0.10?
“Blasted! I am so close to getting that extra significant figure! Come on, It’s like 10 cents versus 9 cents!
Can’t I just have that extra place, please? It’s not like 80 cents versus 9 cents, or even 20 cents versus 9
cents!”
Remember, as a scientist you want high precision; you want high reliability; you want to be able to
report your values to as many significant figures as possible.
Indeed, we can justify allowing one more significant figure when being that close. Standard errors are
averages of a finite sample set too; standard errors themselves have uncertainties even though we
normally don’t calculate them. So we allow this exception of when a leading digit rounds to a one, it’s
close enough to giving you that extra significant figure.
41
Now this is not to say that standard errors do not theoretically converge to some accepted, true, value
themselves. So maybe the proper standard error does actually have a leading digit of one, and that
actually is where the uncertainly begins. Perhaps the best example of when this might happen is using a
Vernier caliper. Some measurers can get three decimal places in centimeters while others can
legitimately get only two. So for the former, their standard error could legitimately be 0.009 cm and the
latter could legitimately be 0.01 cm. The point is that we almost never take enough measurements to
be sure enough, so we allow the exception in normal practice.
As a matter of fact, something you will ideally observe in the first synthesis (and hopefully this will be a
nice bonus to those actually reading this!), standard errors converge much slower than the averages
they represent the standard error of. This is also why we only normally keep one significant figure in a
standard error, we really don’t know that digit all that well anyway, but the point is to help you develop
solid experimental habits. In other words, you would typically need far more measurements than are
reasonably possible in these labs to justify normally reporting more than one significant figure on a
standard error.
In other words, our uncertainty in our uncertainty, discussed a bit more below, is why we normally keep
just one significant figure in a standard errors and why we allow that exception when keeping two.
Examples!
Suppose when determining the acceleration of gravity, you find
You would formally report this as
Suppose when determining the height of the John Hancock skyscraper, and after repeated
measurements, you find
̅
̅
You would formally report this as
(
)
OR
42
Suppose you carefully draw a large series of lines of length and then draw a circle around that line and
) points.
measure its circumference, . You do this for many different sizes giving you a set of (
After performing 2D stats on this ensemble, you find
You would formally report this as
After performing Millikan’s Oil Drop experiment, you find the mass of an electron to be
You would formally report this as
(
)
OR
Ideally you will not spend your life as a scientist rediscovering things that have already been discovered
where checks of accuracy will not be possible. Suppose you determine how many moles of an unknown
gas you have in a container as
You would formally report this as
Hopefully the situation of your standard error being much larger than the value is represents will be
quite rare. Consider trying to find the density of some metal with a foot and stone.
You would formally report this as
43
( )
Before we move on, we need to address two related issues which might not seem related! Firstly, if
something like ̅ needs to have its range of reliability, ̅ , reported with it, should not then the average
of have its own error,
, and why are we not finding it? Secondly, what about what The Wandering
Wise One said before, “it will take you about 68.2 minutes, give or take 1.7 minutes to get there.” So if
that uncertainly legitimately started in the one’s place, or at least it would have rounded to 2, why
wasn’t the answer rounded to 68?
Firstly, you will not be using this in this course, but there is indeed a
,
√
And the reason you don’t need to know it here is because we almost always do not ask you to report .
We ask for things like ̅ , so is needed to calculate ̅ , and then ’s use for us essentially ends. This
is because of our context. We are generally interested in the physical quantities being measured or
determined, like the acceleration of gravity or the resistance of a circuit element, and we tend to find
best estimates for them with averages, from either 1D statistics or linear 2D statistics. So after finding a
best estimate, we stop being as interested in the precision of just one individual determination.
However, the “give or take” The Wandering Wise One gave was not ̅ , it was . In that context, the
tourist was in fact interested in making an individual trip. So for time , if The Wandering Wise One was
trying to be thorough (and perhaps delightfully annoying to the tourist), the more formal answer would
look like
̅
̅
̅
But the really disturbing piece of missing information was not about precision, it was that The
Wandering Wise One’s directions were not accurate!
Accuracy? Now that is a word we haven’t heard in a long time. A long time.
Indeed, and this is a very important issue here. Notice that in all of our previous statistical formulas for
trying to quantify deviations, nowhere do the accepted, true, values ever come into play. It is always
our best estimates, our averages, which represent them. Well, fair enough. Ideally a real scientist is not
going to be spending his or her entire career rediscovering things that have already been discovered!
However, tests of accuracy will always be important. They are important at the undergraduate level in
44
that it is nice to be able to compare what we discover to what is accepted. Later on, using what is
accepted can help scientists develop new experimental techniques and explore new substances in
various ways by using what is already well-known as a guide. Furthermore, at all levels, tests of accuracy
are vital in helping us explore those errors that deviated us away from what we are trying to determine,
but are not properly handled by our statistical treatments.
Systematic Errors
A systematic error is anything that would not be properly reflected in quantifying precision, the
influence of random errors. The influence of systematic errors may or may not affect any of the
preceding statistics, but if it does, it would not affect the statistical calculations in the way they are
designed to handle; if the statistics treats a systematic error properly, then it is actually a random
error. Yet this does not mean a systematic error will affect a statistical calculation at all or even
enough to be noticeable beyond the normal variations random errors produce. Yet, this does not
mean that all mathematics will not be useful in accounting for systematic errors. For example, by
analyzing anomalous results in statistical calculations, like a test of accuracy, this can help expose
systematic errors.
So overall, systematic errors can be a varied as nature itself. Unlike random errors, there are no set
ways of dealing with them. They may or may not be detectable and treatable mathematically. They
can never be fully removed from any real measurement. By analyzing data; by mindfully designing,
constructing, and performing experiments; by identifying and judging relevance; the sagacious
scientist can try to account for systematic errors to the ideal point where their influence is negligible
compared to the influence of random errors.
In other words, you try to make it so you can ignore systematic errors according to level of
precision you hope to obtain.
If you cannot ignore them, meaning their impact is affecting your results (it is non-negligible) and you
cannot account for them by either adjusting your data, your experiment, or your analysis, then you,
at the very least, must make sure to properly acknowledge their effects in your experimental reports.
However, do not clutter your reports with trying to list systematic errors that are clearly negligible at
your precision. Judgment and forthrightness are essential qualities in experimental reporting.
Let us develop a frame of reference and then introduce various systematic errors to get a feel for them.
We will start with a faithful set of (x,y) pairs.
45
Now suppose there has been some sort of calibration error in Ex. This might be a poor zero’ing at the
beginning, or some value was supposed to be subtracted from each Ex and the experimenter forgot to
do this.
Notice, already, how insidious these systematic errors can be! If this last graph was all you had to look
at, would you be tipped off that there was something wrong? Now see how both look together.
46
Now compare these statistically derived quantities. Since this is an error that has affected Ex
systematically, in the same way by the same amount, the slope is the same in both cases. So if we were
able to compare this to what we expected to get, or more often, if we compared a value calculated with
slope to what we predicted we would find, we would not be able to spot this systematic error. Notice
also that R2 has not changed, and why should it? R2 helps us determine the nature of the dependency of
the two variables, and R2 implies precision. For this particular systematic error, the shifted data is just as
linear as the starting data and the precision is the same. Ah, but if we compared the intercept, or
something calculated from the intercept, to what we expected, then we would be better able to expose
this systematic error!
Thus one of the ways we try to identify systematic errors is to compare what we experimentally get to
what we believe we should have gotten—tests of accuracy.
47
Comparing Precision with Accuracy.
Compare the following two deviations. On one side, use the statistically derived best estimate of
the range of reliability from the best estimate to the accepted, true, value of . On the other side,
use the actual deviation of your best estimate to what is considered the accepted, true, value of .
|
Case one:
|
|
|
This means that whatever deviation you see in that test of accuracy is within the plus or minus
swing of reliability random errors seem to have already forced upon you. In other words, just
based on the determination of precision, whatever deviation is happening on the right hand side is
already inside what is considered reliable. This implies that systematic errors have been reduced
to negligible influences based on your level of precision.
Case two:
|
|
This implies there are errors having a clear influence on your data that are not being properly
reflected in the standard error determinations. In other words, hunt for non-negligible systematic
errors. This is when you want to look inside the absolute value to see if
or
, and try to better determine the nature of the inaccuracy.
Remember, these are still tests based on estimations. As you might have noticed in the rounding
rules concerning standard errors, there will always be significant uncertainty in them based on the
number of actual measurements we can make in the laboratory time allotted to us. Furthermore,
you should often question the accepted value itself. Often a physical quantity is determined
theoretically using approximations along the way, like assuming an ideal gas, while other times
many accepted values are dependent on environmental conditions, like temperature and
pressure, and your experimental conditions might not actually match those. In other words, you
might actually be more accurate than the accepted value!
Remember though, this is assuming we can do such tests of accuracy at all. Sometimes we have no
accepted, true, values to compare to. And this is partially why scientists sometimes use a particular
measurement technique on something well known first as a way to weed out any potential systematic
errors. For example, if a scientist is trying to use some technique to determine some property of a new
substance, an optical crystal perhaps, then they will first use their technique on something familiar,
water perhaps, in order to make sure their technique is producing accurate results.
48
Now consider a systematic error that might affect all the data but not in the same way. Air resistance is
a frequent offender here as it gets worse as something moves faster and faster (imagine sticking your
hand out of the window of a car at various speeds.) Stray vector fields can cause such errors, such as
the earth’s magnetic field interacting with an electric current that is increasing or decreasing at each
step. Perhaps you are performing some comparatively high precision length or voltage measurements
on a piece of wire, and the temperature of the room is creeping in one direction or the other. Perhaps
as more and more people enter the lab and turn on more and more electronic equipment, the
temperature is steadily rising. Or perhaps there’s an adjacent experiment where people are using liquid
nitrogen and the temperature is getting progressively colder? In either case, such changes in thermal
expansion and in resistivity might indeed be producing a systematic error to the data.
Notice how both the equation and R2 are affected by this type of systematic error. And R2 is now
implying higher precision after this systematic error was added!
A systematic error might be affecting the data neither by the same amount nor in the same direction.
Consider a poorly skilled experimenter not consciously trying to keep parallel when reading volumes
from a cylinder that is having more and more liquid poured into it. Depending on what level the
experimenter’s eyes are, this parallax error can start off shifting data in one direction, steadily get
better, and then start shifting the data in the other direction!
49
As you might have noticed, spotting and properly accounting for systematic errors can get quite difficult,
especially when tests of accuracy are not possible. The ways to account for them can vary too. For
example, if you spot a forgotten subtraction that shifted all your data, then you simply fix this. If you
suspect air resistance, a stray vector field, or heat gain is affecting your results, you try to modify the
experiment to try to account for these (as we actually do in the inclined plane experiment in 181, the
current balance experiment in 182, and when using dual cups of liquid nitrogen in 182). Or you might be
able to update your analysis by including mathematical terms to properly factor in these effects, which
usually involve differential equations however.
In general, you should always properly label any systematic errors you believe have affected your data,
meaning it is non-negligible compared to your precision. Exposures of systematic errors have led to
some historic discoveries! The famous Rutherford’s gold foil experiment, the development of Einstein’s
theory of relativity, and the discovery of cosmic rays are three instances where mysterious and
inexplicable results led to profound insight into physical theory itself!
Alas, not all systematic errors get to enjoy a life of mystery and eminence. It could be just a mistake. It
happens. You should always inspect your spread of data, and if a datum appears unreasonably too far
from the rest of the data, it could be just a mistake. Thus it should be labeled as such and not included
in subsequent analyses. (Meaning if you are using Excel, cut-and-paste it out of the columns you are
doing statistical analyses on, and label it as a suspect datum). Sometimes it can be fixed due to there
being no ambiguity in how the mistake happened. An example of this (and a frustratingly frequent
example for this dyslexic author) is simply discovering that a number has been recorded backwards.
That might show up like this:
50
Or a point might have been skipped during the experiment causing the subsequent measurements to be
recorded in the wrong places.
Although they will not be discussed here, there are more formal means when trying to decide if a datum
is just a trivial goof or not, based on how many standard errors away the datum is. For such trivial
systematic errors, use your judgment when deciding whether or not a datum is an unambiguous goof
that can just be changed. There are other schools of thought that say nothing should be considered a
goof. Always err on the side of caution in such decisions.
The way to judge relevance and impact of systematic errors is closely linked to your level of precision.
Did you notice how I switched to Vee vs. You in that last example? I needed to increase that running
example’s precision in order to demonstrate that last systematic error. This is the same error applied to
the first set.
51
As we saw a few examples ago, this systematic error of skipping a point and entering the subsequent
data in the wrong spots actually improves R2 from 0.9793 to 0.9942! Welcome to experimental science!
This imprecision is bad enough to actually obscure and benefit from a systematic error!
*****
As a matter of fact, the existence and impact of systematic errors are intimately linked with precision;
they almost act as nature’s “checks-and-balances” in experimental science. Let us discuss this in a
context you will see over and over again, performing a two-dimensional, linear experiment.
In the second 181 experiment where you will find the density of water, you carefully set a value of mass
that is independent of the amount of water you will add afterward. Then the volume you add depends
on where you had set the mass. It is important to fully take advantage of this extra control in precision
in setting such independent variables. Remember that the two-dimensional statistical analysis that is
normally performed by pre-programed machines like your calculator or Excel automatically treats the
precision error in the independent variables as being zero—negligible and safely ignorable—compared
to the precision error in measuring the dependent variable. So do not make liars out of them!
As we discussed earlier, this is analogous to using a ruler to measure the length of some object.
Spending time to carefully align one end to an edge of one of the nicely calibrated, clear lines, usually
the zero line, followed by measuring whatever interval in-between lines the other end happens to fall in
allows us to consider the precision error in the first measurement to be zero—negligible and safely
ignorable—compared to the second measurement’s precision error. Yet if you decided to randomly
align the object just anywhere on the ruler, resulting in the same type of interval measurement for both
ends, then you need to consider a propagated subtraction error.
As a matter of fact, this inherently higher precision in using an edge over an interval is why independent
variables are often written as whole numbers, as would be the 0 in the subtraction with the ruler.
52
In other words, if this subtraction was written out, it would probably be written out as L=2.47-0 cm.
Now this can be confusing for some as we would tend to write 0 and not 0.00. But there is a huge
difference in these two measurements. The reason you know the 2.47cm measurement to the second
decimal place is because that is where the uncertainty begins, in an interval. However, we set the 0
measurement (ideally) by carefully aligning one end to the edge of the clear, nicely calibrated line.
Notice this is different than trying to align to the center of a line, which is trying to guess the middle of
yet another, albeit smaller, interval. So we are using an edge of a line! We can claim that to be 0.000
cm! Perhaps even 0.0000 cm!
Well, if we are really careful, what about 0.00000 cm?
Nooooope!!! *CRASH!!!*
If you try to increase precision more and more, you will start crashing into systematic errors that were
once considered negligible and safely ignorable. In this example, if you really want to claim such high
precision as four or five decimal places with a normal ruler, then the temperature of the room will start
becoming relevant. Nearly all scientific equipment is calibrated at 20 degrees Celsius, and it is very
unlikely that the laboratory you are sharing with other students will be just that temperature. And
suppose it is, then the non-random issues inherent in thermal expansion will give your increasing
precision another wall to crash into.
This checks-and-balances of random errors versus systematic errors comes up everywhere. Here is a
more specific example based on one of the experiments you will perform in 181. In the Kinematics of
Free Fall experiment you will be determining the acceleration of gravity, . There is a systematic error
of air resistance that will be negligible for some and not for others depending on how precisely
measurements are made. Some experimenters will (unfortunately) be so imprecise in finding that
their range of reliability, , will swing so far as to make whatever impact air resistance had on their
|
| for them, which is why tests of accuracy
results negligible. And it is even possible that
can sometimes fool experimenters into thinking they did a “good job” when, in fact, their lack of
precision was so overwhelming that it obscured insight into a deeper understanding the physical
situation. This is not to say that accuracy is not important, but it can be important in other ways than
|
|.
just seeing “good” percentages. The more precise experimenters will first find their
Then this poor accuracy will cause them to hunt for systematic errors. Then (ideally of course) they will
start looking inside the absolute values and realize that
, and they will eventually realize that
53
their level of precision was high enough to expose that the systematic error of air resistance had a nonnegligible impact on their data, and unless it is accounted for in some way, they will never truly be able
to get closer to the theoretical true value for no matter how many similar measurements are made.
Such improved precision grants them deeper insight into the physical world, which is acting as a
systematic error.
Taking this experiment beyond our humble laboratory, the more you improve your precision, those
systematic errors that were at one point negligible compared to your precision will start waving at you.
So if you do manage to account for air resistance and then further increase your precision, the
systematic error of performing such an experiment on an accelerating frame of reference will start to
wave at you. And if you account for that and further increase your precision, the ocean’s tides will start
waving at you. And eventually, if you try to improve your precision enough, your quantum mechanics
textbook will start waving at you. Science is not an exact science.
Comments on Language
Systematic errors versus non-random errors
This category called systematic errors is difficult to properly label; “non-random errors” would be a
better term. However, in the various textbooks on error analysis, this category is usually called
“systematic errors,” or perhaps broken up into “systematic errors” and “mistakes.” Since this document
is intended to be a launching point to undergraduate error analysis, eventually the developing scientist
will need to reference a proper textbook treatment of this material. So I had to decide which term
would be the least confusing overall, do I continue to use the term “systematic” to describe errors that
really are not that systematic but make using proper texts potentially less confusing, or do I use the
better term “non-random” when describing this group but potentially increasing the confusion when
trying to reference proper texts? Well you already know the way I went. I did try to italicize the word
systematic as sort of a less distracting way of saying “systematic” errors, but I didn’t italicize it in the
example where the error was legitimately systematic. The bottom line is you should consider the term
“systematic” a loose description while the term “random” in random errors is a firm description.
What also makes this category very slippery is when we start getting nitpicky about just what does and
does not belong in each category. Like the issue of a mistake, for “we all make mistakes” right? Like
writing a number backwards? Should not that be a random error then? And as a matter of fact, if I
keep making more and more measurements, as I keep updating my state of knowledge about both the
value I’m trying to determine and its standard error, eventually a mistake that had a large impact on the
data will start to fade away and become irrelevant. So should not these kinds of mistakes fall into the
random error category?
Ah, but what if a mistake is made right at the beginning of an experiment, like some knob is bumped
affecting all subsequent data, or the coefficient of the factored in air resistance term in your analysis has
accidentally been entered as zero? Then this is a mistake that would not be fixed by taking more and
54
more measurements like the previous mistake was! So these types of mistakes clearly fall in the
systematic error category!
The truth is that any categorization of what we are trying to categorize is error prone itself. Remember
just what we are trying to group into categories here. Error itself! That is quite abstract when you get
right down to it, like trying to categorize love!
Indeed the point is systematic errors need to be handled on a case-by-case basis. If an error seems to
be appropriately random, then it should be left alone to allow its proper influence in the standard error.
stdev.s() versus stdev.p()
When discussing the Excel code for entering in the 1D standard error,
√∑
(
̅)
we had to distinguish between stdev.s() and stdev.p(). The former is for a sample of a population and
the latter is for an entire population. The only difference is the population one divides by N instead of
N-1, which is only really a problem for us when N is small.
This is a tricky topic that depends on just what these deviations are from. If the deviations are from a
mean value determined by the data, we have lost a “degree of freedom,” as reflected in N becoming
one less, N-1. However, if you are measuring deviations from a true or accepted value, usually produced
independently of any measurements you have made, then you divide by N. For example, we know the
mean of rolling two six-sided dice is 7. If you roll two dice once as a “measurement” of this true value
and get 9, and we are interesting in its deviation from the true value of 7, then we use the standard
error with 7 in place of the mean and divide by N and not N-1. Even before the calculation, we know
this must come out to 2. However, if we want to use the standard error we normally use, our average of
this one value would be 9, and we would need to divide by N-1. Even before the calculation, we know
this must be undefined, confirming nicely that we actually have no knowledge of any deviation with only
one measurement. Not knowing the true value of what we are trying to measure is the general case in
the natural sciences, and even when we have an accepted value, as discussed above, it is still important
to keep this out of our standard error calculations in the hopes of being able to catch systematic errors
later on by comparing precision to accuracy.
Yet that is still not quite what stdev.p() is, for it uses both the mean and divides by N. This can only be
justified if the mean happens to also be the true value.
In other sciences, particularly the social sciences, you can feasibly obtain an entire population in some
sort of statistical calculation. For example, if you want to see how many people in your Bingo club make
under a certain amount of money a year, this is a perfectly obtainable sample size. So when you have
polled everyone, your mean is also the true value, and you may use stdev.p(). Yet if you are trying to
use your data to discuss all Bingo players in the world, what you have is just a sample of a larger
population; therefore, your mean is just your best estimate of the true value. This best estimate is
55
calculated from your data; thus this statistical double-dipping requires you to divide by N-1 and use
stdev.s().
Anyway, as mentioned before, as is the case in most natural science measurements, we will always just
have a sample of an infinitely large population; stdev.s() is the only option for us. If your calculator gives
you both and you are not sure which is which, the bigger number is what we want (meaning it is being
divided by the smaller N-1 and not N).
Standard Error versus Standard Deviation
As I mentioned before, do you know how many times I have had to correct students from doing a 1D
statistics on 2D data? Enough to make me scared of making it worse! In other words, you can perform
a 1D mean and standard deviation calculation on a column of independent variables or dependent
variables that are supposed to be changing. But the 1D standard deviation just does not mean the same
thing in the 2D case. As a matter of fact, it really is a totally different mathematical exercise as it has lost
its “standard-ness” of indicating that roughly 7/10th of your data falls plus or minus one standard
deviation from the mean, etc. And it has also lost its “deviation-ness” as the 1D average in this case is
not a best estimate of anything necessarily.
Suppose I have some independent variables, 1, 2,3, 4, 5. Well the mean of that is 3, but so what? They
are supposed to be changing! What if I then did a sixth point at 100? The mean is massively shifted to
that extreme, but again, so what? The averages that matter here are the slope and intercept, and as
long as the dependent variables are acting as we hope, adding that sixth point should only improve our
meaningful averages.
But now the other textbooks come into play in how I need to decide what to label terms. Basically, what
is called the “standard error” in the 2D case is now what really has the “deviations” that have the
properties that make us consider it “standard.” So that is really what should be called the “standard
deviation” in the 2D case, but alas, it is never is. And to make this worse, the 1D mean and standard
deviation does have some use for us in how R2 is calculated, that is why I reduced its font size earlier! It
is just a mathematical convenience now; it is not representing deviations in the “standard” way even
though, unfortunately, it is still called the “standard deviation” in the 2D case.
Mean notation.
Sigh, soooooooo why does an 1D average get special notation, like fancy bars over them, ̅ , and the 2D
averages, slope and intercept, do not, just and ? And suppose you use a couple values that have
fancy bars over them to calculate other values, but somehow the bars do not transfer over? For
example, in a 181 experiment, you will calculate an initial velocity using two averages and some
constants:
̅√
̅
56
Sooooooo if
is essentially a value determined by averages, why is this not labeled as an average too?
Suppose you had 10 measurements of each and , then used them to calculate 10 individual ’s, and
then averaged them, would not then this get the fancy bar over it, ̅ ? Ah, but if I first find ̅ and then ̅
and use them to calculate the exact same , this does not get a fancy bar over it?
And then why is the word “regression” used in the 2D case and not used in the 1D case when it is
theoretically just a “regression”?
Alas, there are no good answers to some these confusions in notation. The notations in this document
will not match other documents on error analysis, and those will not match others still. In a way, that
statistics is so incredibly important is almost to its own detriment when it comes to people trying to
learn it. Because so many different fields use it, they tend to develop their own notations and labels.
So what is my advice for you?
Think like an artist.
Consider something that is even more ubiquitous on our planet, like the color black. Every culture and
every language has their own name for this color. The Japanese word for it is kuro. The Icelandic word
for it is svart. The Lakota word for it is sapa. But does this mean you cannot be a painter in those
lands? Of course not! When in front of a canvas, it does not matter what labels are on your paints; you
know the right paint to reach for.
Download