1 The “appendix” is really these grey boxes throughout this document; skip to them if you are already comfortable with this subject. Otherwise, you might want to read around them too; for example, you might want to go through the examples on rounding rules or propagating errors if your math is a bit rusty Precision and Accuracy……………………………………………………4 Random Errors………………………………………………………………..6 Averages…………………………………………………………………………8 1D and 2D Averages………………………………………………………..8-9 Linear correlation coefficient, R2…………………………………….10 1D and 2D Standard Errors………………………………………………14-15 Reliability ………………………….…………………………………………..16 Formulas for Propagating Errors………………………………………23 Propagated Standard Errors of a 1D and 2D averages……...27 Theoretical Convergences…………………………………………………28 Fractional Errors……………………………………………………………….36 Rounding Rules…………………………………………………………………40 Systematic Errors………………………………………………………………44 Comparing Precision with Accuracy………………………………….47 If you know the Way broadly, you will see it in everything. -Miyamoto Musashi, Go Rin No Sho “ Appendix” on Measurement and Error Analysis Introduction A political scientist walks into a bar. Today is voting day and a news station hired the scientist to make an early determination as to just who will become the next President of the United States! After polling a patron, the first measurement yields the following result: “Ah yes, after carefully analyzing the candidates, I have decided to vote for Mickey Mouse.” After that, the political scientist signals the camera crew, is put on live TV, and announces to the public, “I have measured the voting and determined that Mickey Mouse will be the next president of the United States. I would like to congratulate the Mouse Campaign for providing us all with a shocking upset victory. Thank you.” Feels wrong doesn’t it? Somewhere in your intuition you understand that such a report is totally unreliable. Even if you can’t quite put your finger on just why yet, you know the following needs to happen to increase reliability. More than one person needs to be polled and the best estimate to the actual winner will be an average. And this average’s reliability increases as more and more people are polled. A tourist walks into a bar and asks the bartender, “How long does it take to get to the Tsukiji Fish Market?” The bartender replies, “Well if you go by car, there a lot of things that could make the trip 2 faster or slower, such as traffic and red lights, so I would say it will take you about an hour, give or take a half an hour.” The bartender pauses for a moment, then continues, “If you’re trying to better plan your day, you might want to take the subway. It might take a bit longer, but the trains tend to run on time and not linger at stations; so that should take you about 70 minutes, give or take 10 minutes.” Feels fine doesn’t it? Somewhere in your intuition you understand these two things, even if you might not yet describe them quite like this. The second determination is more precise than the first. And the first answer’s lack of precision is related to the increased level of randomness with things like traffic and red lights. Boozer Biff starts chastising the bartender! “Hey! How can you say ‘give or take a half an hour’? There’s just soooooo much that can go wrong, you know? Like, you know, like, what if this tourist happens to get a police escort? Or what if there’s a horrible accident causing massive detours? ‘Give or take a half an hour’ seems way too conservative now doesn’t it!” Biff pulls down another swig and continues, “Or, hey, like, wait a second, what if a, like, big-time massive, huge helicopter picks up the car and then drops it off at the Fish Market? Or what if a circus elephant decides to sit on this tourist’s car for hours?” Biff nearly drops his mug when saying his next revelations! “Oh, whoa, wait a second. What if, like, you know, like, this tourist’s car is, like, that one from that movie and it can travel back in time?! Or, hey, you know we are not alone right? I’ve heard of alien’s grabbing whole cars from time to time!” Proudly, Biff concludes, “What you really should say is ‘give or take infinity’ duuuuuuude!” Feels silly doesn’t it? Somewhere in your intuition you understand that making a cutoff in judging what to include in a “give or take” estimation does not necessarily make the bartender’s estimation unreliable. The bartender tacitly understood that there are just some random events that are too rare to include and still have a useful answer. The Wandering Wise One approaches the tourist and confidently states, “Get on this bus right outside. Its final stop is the Tsukiji Fish Market and it will take you about 68.2 minutes give or take 1.7 minutes to get there.” Feels weird doesn’t it? Somewhere in your intuition you understand just how unusual giving such a precise answer would be, especially for a bus’s final stop! The Wandering Wise One must travel a lot! At the bus’s final stop, the tourist indeed notes that the trip took 67.4 minutes! However, alas, the tourist then discovers the final stop is the airport. The Tsukiji Fish Market is nowhere in sight! Feels cruel doesn’t it! Somewhere in your intuition you know there are other things that can go wrong that are not part of the randomness in making the journey itself. The advice was very precise, true, but it was not accurate. Apparently, The Wondering Wise One enjoys messing with tourists! 3 Try to never let go of your intuition. For most students, unfortunately, this is easier said than done. Soon you will be presented with all sorts of mathematics—such as summations, square roots, inequalities, and absolute values—and sometimes students allow themselves to forget just what they represent. What we are attempting here is challenging, true, but it is worthwhile. We are attempting to turn intuition into science. Measurements Making measurements and properly interpreting their relationships are essential skills in most academic fields. From physics to psychology, from carpentry to dentistry, from economics to education, all rely on quantifying properties, combining various quantities statistically, and determining precision and accuracy. Measurements fuel experimental science, and measurements provide the means for understanding, exploring, and modifying theoretical science. A measurement quantifies something—such as length, mass, and time—by comparing it to preset units that are widely accepted by the scientific community—such as meters, kilograms, and seconds. We live in a privileged time where we readily have a large variety of measurement devices with various accepted units carefully marked and calibrated for us—such as rulers, scales, and watches. Every culture has independently developed, and continues to develop, new methods and new units in attempting to quantify various parameters. The examples are numerous and varied: attending a class at a certain time that runs a certain period of time, buying shoes of a certain size with currency mutually agreed to represent a certain value, buying milk and soda at certain volumes, turning on lights at a certain wattage, using an outlet’s electricity set at a certain frequency and stepped down to a certain voltage by nearby transformers set at a certain inductance, judging someone’s influence by a certain level of fiefdom’s favor, and assessing someone’s popularity by how many Facebook friends or Twitter followers they have! Measurements and units are everywhere! Measurements can be nicely discrete, such as the flip of a coin or the roll of a die. Generally though, measurements fall into a continuum of values. No matter how precise your ruler is, whether the smallest divisions are meters, centimeters, millimeters, and so on, there will theoretically always be an infinite number of possible measurements between any two divisions on a ruler. There will always be some point where uncertainly begins. Often times a measurement’s uncertainly begins when the precision of the measuring device used reaches its limits. This guess at the measurement’s final digit is either made by you in analog measurements, like judging where in-between two marked divisions you believe the measurement lies, or this is made by a machine in digital measurements. Measurements themselves can be measured! They are normally judged as having two tendencies: precision and accuracy. Understanding, quantifying, and comparing these two tendencies are the main goals of error analysis. 4 Precision is how deeply, how closely, and how reliably we determine a quantity. Precision is how far or how close subsequent determinations tend to be from each other; it is affected by the randomness of four broad categories: the measurement device, the measurer, the measured, and the measurement’s environment. Precision is quantified with the standard error. Accuracy is determined by seeing just how close our experimental determination of some quantity compares to what is considered that quantity’s accepted, true, value. Since an accepted, true, value might be unknown, testing accuracy will not always be possible. When it is possible, analyzing this tendency can reveal systematic, non-random, effects that might have thrown off our determination that are not properly accounted for in the standard error. Our initial focus will be on precision. Consider analog (not digital) measurements, such as using a ruler to measure the length of some widget. What do we know for certain? Well, we know the length is after the 2 cm division, so L=2 cm so far. Then we can see it falls after the 4th marked division, L=2.4 cm so far. Now is where the uncertainty begins, we must guess as best we can as to where in-between the divisions the length lies. L=2.46 cm? L=2.47 cm? L=2.48 cm? With what we can see, reporting more precision by carrying any more digits beyond this guess is misleading. In other words, without some way to improve our precision, like using a Vernier caliper and/or a magnifying glass, there is no way we could legitimately claim something like L=2.472 cm. So when realizing this, if the best our eyes can estimate is L=2.47cm, could we tell the difference between 2.4699 or 2.4701 cm? 2.4698 or 2.4702 cm? 2.4697 or 2.4703 cm? Such randomness is beyond our ability to detect. How about 2.469 or 2.471 cm? 2.468 or 2.472 cm? 2.467 or 2.473 cm? Again, even this level of randomness is undetectable unless we find some way to improve our precision. How about 2.46 or 2.48 cm? 2.45 or 2.49 cm? 2.44 or 2.50 cm? Ah, now we are in the range where we would hope to be able to detect such randomness. We can see how the limitations of a measurement device can affect precision; yet for such a measurement like this, precision is also personal. For example, can we reliably tell the difference between 2.47±0.02cm? Some can and some cannot. The 5 key to reliability is in repeatability. The ideal way to quantify your precision will be based on how your measurements tend to deviate from each other. There has also been an unspoken assumption about this measurement. The reality of such a measurement is that there are two measurements, for we must properly zero it first. Consider this situation. This is very undesirable for a variety of reasons. For the moment, consider what the previous conversation would be like if we did not catch that this was not properly zero’ed. Suppose we read this as L=2.52 cm, then our discussion of randomness would be the same but shifted by this kind of nonrandom error. That last range of potential randomness would look like this: “How about 2.51 or 2.53 cm? 2.50 or 2.54 cm? 2.49 or 2.55 cm?... For example, can we reliably tell the difference between 2.52±0.02cm?” In just our discussion of trying to quantify our potential randomness, we would not catch this error. If we somehow knew the accepted, true, value for the length of this widget, perhaps from the box the widget came in, then we might catch this shift in comparing what we measured to what is accepted—a test of accuracy. Let us now shine some formality on what we have thus far discussed loosely. Errors Broadly speaking, errors fall into two categories: random and systematic. We will thoroughly discuss random errors first, along with how to treat them with statistics. This will provide a valuable context when discussing systematic errors. 6 Random Errors If a measurement is only affected by random errors, it will just as likely be too big or too small around the theoretical accepted, true, value of what is being measured. Mathematically, for true value , with the random error having an equal chance of being positive or negative. How far the measurement result deviates—the size of a particular instance of random error—is also random. The standard error will be an attempt to quantify the typical, average, size of these deviations and will be how we represent precision. Random errors influence all real measurements due to the measurement device, the measurer, the measured, and the measurement’s environment. The impact of random errors can be reduced with the averaging process. Returning to our discussion in trying to include some range of reliability when finding the length of some widget, what might make a claim of L=2.47±0.02cm more or less realistic? Let us consider altering a variety of conditions: Consider the measurement device. Suppose the device used to measure the widget’s length is now some fancy laser/mirror system used by top notch scientists. On the other hand, consider the measurement being performed with a partially used crayon generosity provided by a wandering six-year old child. Consider the measurer. Perhaps he or she spends all day carefully setting up the measurement and performs it with conscientiousness, mindfulness, and extraordinary technique. On the other hand, consider the measurer wandering out of a bar and then attempting to quickly measure the widget’s length right before passing out on the sidewalk. Consider the measured. Perhaps the widget is made of the sturdy metal carefully manufactured to be consistently straight throughout. On the other hand, consider the widget to be made of a gelatinous substance that awkwardly wiggles and curves when handled. Consider the measurement’s environment. Perhaps the widget is measured in a peaceful, temperature and humidity controlled laboratory and is rigidly clamped to an air-table in order to minimize inadvertent jostling and vibrations during the measurement. On the other hand, consider the measurement occurring on a roller coaster during a series of jarring loop-the-loops. 7 Depending on the influences of random error, a claim of L=2.47±0.02 cm might be reasonable. On the other hand, it might be unreasonable. Random error can never be fully removed from any real measurement. No measurement device has infinite precision. No measurer has perfect use of the five senses. No measured physical quantity can ever escape the random nature of matter. No measurement’s environment can ever be truly removed from the random nature of the universe. Yet this very randomness is vital for scientific development and exploration, for its very nature allows the use of statistical techniques to understand, quantify, and most importantly, to reduce its influence. The importance of this can never be overstated. Statistics For most of the upcoming statistical operations, you only need to know how to get a computer program to perform the calculations for you, and the Excel commands will be presented and highlighted for your convenience. But you must always understand what the results represent. Averages are everywhere in our culture. Your grade for this course will be an average! In the election example, we intuitively know the Mickey Mouse result is an outlying datum that can be properly identified as such when more data is gathered. And even when you get some of the more expected measurements, like for Obama or Romney, you might get data clusters which disproportionally favor one candidate as you take more measurements. The hope is that this will balance out as you tend to get data clusters disproportionally favoring the other candidate. The averaging process allows these clusters to cancel some of each other out in the hopes that such cancelations produces a better estimate to the true result of who will win the election. We will deal with what are called one-dimensional averages, where multiple determinations of the same quantity are considered. We will also deal with two-dimensional, linear averages, where two quantities’ relationship will be determined by considering how the change in one affects the other. Both these averages stem from the same theoretical justifications, and they both share the same properties. So suppose you are trying to determine the true value of some physical quantity, like the acceleration due to gravity, the speed of sound in air, the resistance of a circuit element, or the magnetic permeability of free space. If we only consider random error, due to the nature of random error, any determination you make will higher or lower than what you are trying to determine. Suppose you had one that is higher and one that is lower, add them together, and divide by two. Some of the “too bigness” of one will cancel out some of the “too smallness” of the other one and your result will be a little closer to what you are trying to determine. However, assuming any pair of measurements has one being too big and the other being too small is just as likely as both being too big or both being too small. Like if you flip a coin twice, you are just as likely to get two heads or two tails as getting one head and one tail. However, if you keep flipping more 8 and more, as you keep taking more and more measurements, it will start becoming more and more likely that you are getting closer to a 50-50 split. Making more and more determinations will improve an average’s approximation to whatever it is trying to approximate. The other issue of randomness in a general continuous measurement is how far each determination tends to deviate from what you are actually trying to determine. This is intimately linked to precision. This too will have a different theoretical size, which we will try to approximate later on with another average called the standard error. For the moment, we need to understand that as more or less random errors impact our measurements (the more or less precision), the more or less measurements are needed to earn the same reliability in its approximation. In other words, suppose there is a certain level of trust in an average, the greater the typical deviations can swing, the greater potential for “too bigness” and “too smallness” implies the greater need for more representations to achieve the same level of canceling in order to earn the same level of trust. An average reduces the influence of random error. If systematic errors are reduced to negligible levels, an average becomes your best estimate to the theoretical true value you seek. And as long as systematic errors continue to be negligible, this estimate approximates this theoretical true value better, more reliably, as the number of measurements increases. Formalizing our discussion, here is a one-dimensional average: If there is set of independent determinations of , Then their average is ̅ ∑ Excel Command: =average(values) 9 Here are the two linear, two-dimensional averages: If there is set of ( )( ) independent determinations of ( ( ) pairs, ), that are suspected as having a linear relationship such that with slope and intercept , then the average values for this slope and this intercept are ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ [∑ ] Excel Commands: =slope(y-values, x-values) =intercept(y-values, x-values) The averages are determined in a process often called linear regression. Alas, it just happened didn’t it? You were probably fine with the 1D average, but now, “What the hell are those?!? Those are averages?!? (Is he crazy?)” Yes (and yes!). Remember, in practice you normally calculate none of those averages “the long way,” and you get a computer program, like Excel, to do them for you. However, you must know and understand that the results are averages and behave like averages. Ah, but understanding these 2D averages is trickier than the 1D case. Consider this statement in that definition: “that are suspected as having a linear relationship”? What if and are actually not related? Meaning that as you change , is not affected at all by this change. Ah, but you still might be determining different values for and just not realizing these differences are due to only the random errors you would normally face if all you did was try to measure over and over again without ever considering at all! In other words, since these are averages, the 2D case must reduce to the 1D case. The true value for the slope must be zero, and the true value for the intercept must be the 1D average of . Realistically, however, if you just enter these pairs in a computer program, it will do what its told and spit back the calculated averages for you. The problem is that just 10 having those averages might not be enough to judge the actual relationship. For example, is the slope just really small, or are these two variables actually independent of each other? Furthermore, and what more frequently happens in science, the variables might indeed be related to each other, but not linearly. For example, what if is really dependent on the logarithm of or the inverse square of ? You will still get a set of ( ) data that you can perform a linear regression on! And worse, you might get a “slope” and “intercept” that appear legitimate to you! Graphing the data can certainly help here, but we need a better judge of relationships beyond our ability to make and read graphs. We need some more help in interpreting these 2D averages from linear regression! Our help comes in the form of R2. The linear correlation coefficient R2 is a number between zero and one and helps us interpret the relationship between two quantities. R2 tending toward 1 implies linear dependence. R2 tending toward 0 implies independence. R2 tending toward some number in-between 0 and 1 implies dependence, but not necessarily linear. How reliably we know where R2 is heading depends on the influence of random error; therefore, this reliability improves as the number of data points increases. Another consequence of this is that R2 also implies precision; even if you have a linear relationship, R2 can still drift away from 1 depending on, and implying, the precision of your data. Excel Commands: =RSQ(y-values, x-values) (∑( ∑( ̅ )( ̅ ) ∑( ̅)) ̅) Yup, I deliberately didn’t include the calculation of R2 in the grey box and even made the font very small. Believe it or not, I am trying to deter the less careful students from bothering to read it. I still need this calculation somewhere for, you know, “the sake of completeness,” but there’s something dangerous in it that I’m fearful of displaying more openly. R2 happens to use the mathematical process that also produces 1D averages. This should be considered just a mathematical convenience here though, for in general, it does not make sense to perform 1D statistics on 2D data that is supposed to be different. Still though, so what’s the big deal here? Listen! Do you know how many classmates and students I’ve seen mindlessly performing 1D statistics on 2D columns of data? You’d try to hide it too! (If you are curious, if we considered R and not R2, a positive R implies a positive slope and a negative R implies a negative slope…..but you will see that for yourself when you find the slope!) 11 Let us illuminate this with an example. Suppose you carefully draw a line with a set value of r1. And then you use that to sweep-out and measure a distance C1. Then you increase r1 to r2, use r2 to sweep out C2, and measure C2. And then you increase again and again to find N, (r,C) pairs. 12 Now we know the true values that govern this relationship, C is indeed linearly related to r, C=2r. Therefore a linear regression should produce a slope that is a best estimate to 2meaning that if you wanted a best estimate to , just divide the slope by 2. (Seeing yet how incredibly powerful this can be?) The average value for the intercept should be close to zero, and R2 should be close to 1 because they are indeed linearly dependent on each other. Ah, but what if instead we found the area of the region swept out instead, giving us N (r, A) pairs? And then we performed a linear regression on it? We would get nonsensical results (it is a quadratic relationship) and R2 will not tend to be one. (You will explore this further in 181’s first synthesis.) Now if you linearize your data, perhaps use a set of (A/r,r) data instead, then you will be able to use linear regression, and once again, have useful results from the produced averages, like getting from the slope. (I *really* hope the awesome power that such flexibility grants is starting to become apparent! Think about it, you can have two sets of data that are clearly not linearly related, and then you can perform some algebraic tricks and *still* use linear regression to extract useful information! Don’t worry, I will say it for you, “that’s amazing!”) Standard Errors Now we come to another set of averages needed to answer some of our still unanswered questions. Such as, if I produce an average, just how reliable is it? It hopefully cannot be just as reliable as the individual measurements used to produce it, otherwise what is the point in producing an average? Well, even before that, just how might we quantify the precision of what went into the average in the first place? In other words, looking back at that ruler, we certainly want to believe that anyone using such a device can also reasonably report measurements to the second decimal place, but we can also imagine conditions where such precision is actually unreasonable. And the word “reliable” keeps getting thrown around and that too has not been firmly established yet! We will address all these questions in the following sections. In quantifying experimental precision, remembering that each measurement tends to be randomly distributed around its theoretical true value reveals both the problem and the solution for us. If we look at just one measurement’s deviation from what we think is the true value, random error will make such a determination equally likely to be too high or too low around this true value, but not necessarily by the same amount. The size of these deviations themselves will also have randomness to them; therefore, considering just one deviation is totally unreliable (let alone knowing what to deviate from if we do not know the true value is to begin with). Ah, but because these deviations will tend to be too big or too small randomly we can again use the awesome power of the averaging process on multiple measurements, which allows some of this “too bigness” in deviations of some to cancel some of this “too smallness” in deviations in others, helping us expose the more standard size of these deviations. How to get this average, though, is tricky. 13 Consider a spread of 1D data, , all trying to measure the true value of and subject to only random error. Suppose we actually know this true value, call it . (It is traditional to use Greek letters for true values and Latin letters for the estimates of them.) So the first deviation from this true value is ( ), the second one is ( ), and so on. So we could try to just average those. ∑ ( ) Alas, this is problematic. If each measurement is truly an attempt to measure its true value, then each ( ) is like an attempt to measure zero! And we already know an average like this will ultimately become the best estimate of trying to find zero. We seek the average size of deviations, and if the average size was zero, that would be saying, on average, there is no randomness! Therefore we need to make sure this average does not converge to zero as we make more and more measurements. Making every term positive would do this. We could take the absolute value of each term, and that would work in a meaningful way to quantify precision (called the absolute deviation). However, this is seldom used over the vastly more popular way to make each term positive, squaring! ∑ ( ) Indeed, this is much closer to what we seek. So the average of the squared terms will produce a squared result, so we take the square root in order to get back to the same units. And let us now give it a proper label, , as representing the average deviations of from its true value, , √∑ ( ) Why have we not let go of the question mark and put a nice box around it? The issue now lies in just what we are deviating from. In practice, we generally do not know the true value of what we are trying to determine, we use averages to produce best estimates for them. In the 1D case, this is ̅ . √∑ ( ̅) Still no pretty box? And now the question mark moved to ? What’s the problem now?!? 14 Alas, this is a very peculiar issue as it is not one most people need to consider in the averages we typically encounter. Consider this situation for a moment. This is not a direct analogy, and the rigorous proof is beyond our scope, but this will hopefully be enough to see that there is a problem we need to address now. Suppose you are trying to average 5 independent measurements, say 5 rolls of two six-sided dice, which has a well-defined mean of 7. Say you roll 4, 11, 8, 6, 10. So you would just add up the numbers and divide by five, no problems so far. But what if I then add a sixth number that is a copy of the final number, giving you 4, 11, 8, 6, 10, 10. Errrrr is finding the average the same still? Divide by six this time? Well what if you just copied the five original numbers and called them five new measurements, divide by ten now? Hopefully these two situations are making you question just what to divide by, all the numbers being independent of each other is important. The change from the true value to a best estimate in creates a problem for this average that the previous averages did not face, an issue of independence. You can see this superficially if you try to use the above with just one measurement, which would make =0. Well that certainly cannot be true! Why make anymore measurements if you already have infinite precision?! More specifically, the best estimate came from our data, it depended on the data; this costs us what is called a degree of freedom. Dividing by just N creates a bias that artificially reduce the size of . We correct this by reducing N to N-1. Now we are ready for a pretty box! If there is set of independent determinations of , A single determination of ’s standard error in one-dimensional statistics is √∑ ( ̅) This is often referred to as the standard deviation. This is also a best estimate to its own theoretical true value in quantifying precision, . Excel Command: =stdev(values) Note that in Excel, this is the same as the =stdev.s(values) command. There is also a =stdev.p(values) command that you will not be using in this course. (I will explain why later.) 15 For the linear, two-dimensional situation, we generally consider the independent variable's imprecision to be negligible compared to the dependent variable’s imprecision. This is analogous to carefully zero’ing one end of a measurement in a 1D measurement and considering that imprecision negligible compared to the imprecision in determining whatever interval the other side ends up in, like when using a ruler. Because you are setting the independent variable, the assumption is that you can carefully set those to clear edges of lines and that imprecision will be negligible compared to the imprecision of measuring the dependent variables in whatever intervals they end up in. Thus we determine the precision in the dependent variable only. If there is set of ( )( independent determinations of ( ) ( ) pairs, ), that are linear related with slope, , and intercept, , produced by linear regression, the best estimate for the standard error in the dependent variable, ,is √∑ ( ( )) Excel Command: =steYX(y-values, x-values) (It is dangerous to call this a “standard deviation,” so please don’t even though it is basically the same thing. Yet more I will explain later on!) Hopefully this N-2 does not surprise you now. Since the best estimate of where we predict each will be is determined with two averages now, and , and both are determined from the data, not theoretical true values, we lose two degrees of freedom this time. In form, this is essentially the same thing as the 1D standard error that came before. √∑ ( )) ( ( ) Now we are ready to discuss just what we mean by “reliable.” Up until now this has been a vague notion based mainly on personal judgment, like the bartender’s advice from earlier. In our scientific context, we desire a specific, commonly agreed upon notion of just what “reliable” means. This range is conveniently provided by one of the mathematical properties of the standard error. If we consider true values, and , the probability of any measurement falling in the range is 68%. So any future measurements have a 68% chance of being anything in that range, say or , all fall in 16 that range. When we have true values, the range determined by range. is what is considered the reliable If you are curious, beyond this, it is 95.4% likely that measurements fall in the range of , 99.7% likely that a measurements in the range of , and a steep 99.99% chance that all measurements will be within . Or, another way to say that last statement, there is a 0.01% chance a future measurement will be outside . What about the difference in wording between saying “68%” and “roughly 7/10”? For this, we need to remember three things: Firstly, when we calculate an experimental standard error, this is indeed our “best estimate” to its theoretical true value, but it is still just an estimate. Not only that, but standard errors converge much slower to their theoretical true values than the means they represent (which will influence our formal rounding rules later on). So in practice, with the number of measurements we can reasonably make in the blocks of lab time allowed to us, we will always have plenty of uncertainly in our standard error estimates. Furthermore, many of our standard errors, including the ones representing averages, will come from the upcoming technique called the propagation of errors. This technique includes yet another layer of estimation in how it is derived; in other words, a propagated error is an estimation of an estimation! Secondly, sure, we can predict how likely a subsequent measurement is to fall within a certain range. However, if we actually make that subsequent measurement, it technically updates its own standard error, which means that initial prediction is also updated! Thirdly, realistically a standard error is a fluid quantity. Remember how this all started, we were trying to provide some range of reliability with a reported value, this being how we are quantifying precision. So consider the four broad categories that influence precision: the measurement device, the measurer, the measured, and the measurement’s environment. These all can be fluid elements that might change during the course of an actual experiment. Plus or minus a cup of coffee can change some scientists’ experimental precision! Weather, mood, alertness, medication, on and on, these all can alter the “true value” a standard error calculation is trying to estimate. An experimentally determined value is considered reliable within plus or minus one standard error of that value. This predicts a roughly 7/10th chance that a subsequent determination of the same quantity will fall within this range. Equivalently, this predicts a roughly 7/10th chance that the current determination falls within this range around the best estimate to the theoretical true value of what is being determined. This is how you express the experimental precision of the value, the quantification of the impact of random errors. 17 Reading Errors When you do not have multiple measurements to calculate standard errors, you make a best guess at this range of reliability in a single measurement. This guess is called a reading error. Suppose you are trying to measure an unknown mass. You put it on a scale and read 137.4 kg. Then you take it off, make sure the scale is properly zero’ed, put it back on the scale, read the mass, and you again read 137.4 kg. Then suppose you do this again and again with the same diligence in maintaining high precision, and well, you just keep getting the same number over and over again! Well the average value is clearly 137.4 kg, no problems there, but when you try to calculate its standard error, you get zero! Well surely this does not mean that you have infinite precision in weighing this mass, you are not justified in recording it any one of those measurements as 137.4000000… kg! Now consider a situation where only one measurement is possible. Perhaps the experiment itself is dangerous, such as making some measurement while skydiving or in outer space. Perhaps the experiment is expensive, such as needing to use very rare chemicals or having limited funding. In such cases you must make a guess as to the standard error of that measurement. This guess is called a reading error. How “educated” this guess is depends on your judgment when making an analog measurement or how much you know about the device being used when making a digital measurement. Let us revisit this analog measurement. So remember what we are trying to guess at here. What is our range of reliability? If we were to make a second measurement, it would have a roughly 7/10th chance to fall in what range around what we just measured? Or equivalently, what range around the theoretical accepted, true, value does our measurement have a roughly 7/10th chance to be around? Also, this is a guess as to where the uncertainty in the actually measurement begins. In other words, if you feel you can legitimately measure a length to 2 decimal places, this implies that the uncertainty is in the second decimal place. Therefore, your standard error should also reveal where this uncertainty starts. 18 When making such a guess, you must remember what goes into determining the precision of such an analog measurement: the measuring device, the measurer, the measured, and the measurement’s environment. Let’s address the personal element first. So I’m looking at that picture. I have my glasses on. And I believe I can legitimately read this as =2.47 cm. I certainly cannot see a third decimal place there. So I’m looking for =0.0_ cm, and I just need to guess at what should go into that blank spot. ? Well I believe should be around 0.03 or 0.04 cm. However, when guessing at a reading error, overestimating is usually safer than underestimating. So, for me and my eyes, I guess =0.04 cm. If I am reporting this length, I would report it as Now if I could get a better look at this, my precision can get better. For example, if I used a magnifying glass, Well now I surely know my measurement more reliably! This increase in reliability should then be reflected in my reading error guess. Perhaps I can now claim =0.02 cm or even 0.01 cm! If I could magnify this even more, or perhaps I have a great eye for mentally entering in further divisions, could I enter the realm of =0.009 cm? Claim to know to three decimal places? Again, I personally cannot obtain such precision even with a magnifying glass, others can though. Precision in science, especially when using an analog device, is a skill and a talent just like precision in any other situation, like sports, taste-testing, and shooting. 19 Moving on to another aspect of precision, consider this measuring device. The uncertainly increases. The best I can do is . Perhaps the better measurers can go after that second decimal place, but their reading error guess should be relatively high, like 0.09 or 0.08 cm. The measured should factor into what your reading error guess should be. Is what you are trying to measure stationary? For example, perhaps you are trying to record a starting and stopping time of something moving quite quickly. Is what you are trying to measure in an awkward place? For example, perhaps you are trying to measure the length of a pendulum string attached to a ceiling far above your head. And finally, the measurement’s environment should be considered when trying to guess at your precision. Even if you believe you can normally reliably measure something to two decimal places, this does not mean you can achieve the same precision if you are measuring in a helicopter. Perhaps other experimenters are unknowingly shaking your shared table and oscillating what you are trying to measure? Perhaps a strobe light is flickering in the background? Digital reading errors differ in the personal judgment area as we have less insight into just how close the measurement really is. Suppose you have a digital meter reading that length to one decimal place. Because the digital meter rounded this for you, you cannot see just how close it is to any divisions. We know—from before—this was close to 2.47 cm and seeing that allowed us a visual feel for just how likely the adjacent determinations were. Sure this could have been 2.46 or 2.48 cm. But you know it is less likely this length is 2.45 or 2.50 cm. But because you cannot see this nature, you must assume that all the numbers that round to 2.5 are equally likely. (You should revisit this section after the first synthesis, we essentially need to assume a uniform distribution because we cannot see the Gaussian nature, in other words, digital reading errors will generally be overestimations when considering only the personal element of precision.) 20 Here are the three methods in trying to determine a digital reading error guess. Method One: Look it up in the manual. Method Two: Try to determine how the device rounds. If it rounds as we normally do, the reading error is one half of the smallest displayed unit. For example, if all values such that 2.45 ≥ L > 2.54 cm rounds to 2.5 cm, then consider =0.05 cm. And then to keep consistent with reporting our values to where the uncertainty starts, you may report this as 2.50 ± 0.05 cm. But once again, remember how broad a dispersion this is; we are saying this is reliably in the range of 2.45 to 2.55 cm. Looking back at the analog measurement, such a range is probably a bigger than you would have assumed to be the reliable range if you could get a closer look at the (Gaussian) nature of just where the measurement lies. Or if it always rounds up or down, the reading error is the smallest displayed unit. For example, if all values such that 2.4 > L ≥ 2.5 cm rounds to 2.5 cm, then consider =0.1 cm. So this would be reported as 2.5 ± 0.1 cm. This type of rounding might be awkward for you. For example, if the device always rounds up, this means something like 2.41 gets kicked all the way up to 2.5 and not 2.4, but that is indeed how some digital devices round. One way you can try to determine how it rounds, sans manual of course, is to change the scale. For example, here is a measurement of a resistance of a circuit element. Then I change the scale, 21 OK, so far I know it is not rounding down. If I move down in scale once more, I will know how this rounds. If I see 0.998 k, then I know it rounds as we normally do. If I see 0.999 k, then I know it is rounding up. There we have it! With only one measurement, I would report my original resistance as R=0.998280±0.000005 k. Method Three: Err on the side of overestimation and use the smallest unit displayed. So if you do not have access to the device’s manual (or the manual fails to mention it), and you cannot determine how it rounds, just assume it rounds up or down. So that digital length would be reported as 2.5 ± 0.1 cm and that digital resistance would be reported as R=0.99828±0.00001 k. Of course, remember the broad categories that comprise precision: the measurement device, the measurer, the measured, and the measurement’s environment. We have discussed the former two with the tacit assumption that the latter two were not a factor, which is usually the case if you can redo a measurement and keep getting the same result over and over again. But suppose what you were trying to make an analog measurement on something that is wiggling, you would consider that situation when trying to determine the reading error (only when one measurement is possible as it would be unlikely to keep getting the same result over and over again in this case). The same goes for digital measurements, if some of the final numbers are fluctuating, those are not certain. This happens all the time in electronics as stray wiggling signals can interfere with your precision. For example you might be trying to measure a sensitive voltage and start picking up interference from the 60Hz electrical signal coming from the wall outlets. Or when trying to measure photo-intensity, you might pick up noise from other light sources around the room. Remember, one more time, in every way, making more than one measurement is ideal when possible. You only make reading error guesses when you must. And as method three implies, it is generally safer to overestimate reading errors. Yet this can be dangerous too. We are about to discuss propagation of errors, a method of estimating a standard error of a calculated quantity from the standard errors that went into the calculation. But this method tacitly assumes the standard errors are quite small compared to what they are the standard errors of, and this is where overestimations can cause problems. Indeed, our next topic will be finally addressing some of our long standing unanswered issues! Sure, it is nice that we are able to quantify the precision of our single measurements. But we want to report 22 averages! Our best estimates! We want to report something like ̅ , and not just a single ; thus, while is nice, it is not ̅ ! And the same thing for 2D statistics, we assume the independent variable’s precision is negligible, , compared to the dependent variable’s precision we have already discussed, . However, we do a linear regression to find the averages for what determines their relationship, slope and intercept. So while we could just get a slope and intercept from any two points, this is not taking advantage of the random error reducing effects the averaging process provides us! So if we get an and from linear regression, what is and ? Furthermore, what if you want to use those best estimates to calculate something else? Like if , then what is ? Or say we have a 1D average, ̅ , subtracted from a quantity, , which we only have a reading error guess for its standard ̅ ; then what is ? error, All of those questions will be answered in the next section! Since these are all formulas, we need a means to quantify the standard error of a value calculated with other quantities that each have their own standard errors. Propagation of Errors In general, you will want combine your various measurements and averages mathematically—adding, multiplying, raising to a power, and so on—and you need a way to properly combine the standard errors of these quantities to find the standard error of the final quantity. Alas, the way to properly show this involves multivariable calculus. However, here is a general feel for how the process goes. Did you ever see a first order approximation to the sine function? This works nicely for small in radians. Consider =0.01 radians, then =0.0100000 to seven decimal places! A great approximation! How about =1 radian? Then =0.84, not nearly as good. How about =10 radians? Then =-0.54, a truly horrible approximation. How the upcoming error propagation formulas are found is with a first order approximation that assumes your standard errors are small compared to the typical values they represent the errors of. Then this first order expansion is entered into the standard error formula itself. The following results come out of that process. 23 If If Then Then ( ) ( ) ( ) ( ) ( ) √( ) ( ) ( ( ) ( ) ( ) √( ) ( ) ( ) ) You can propagate any errors you will find in this course with these two formulas. We can find two immediate corollaries as our first examples. If for a constant (we know it with certainty, i.e. ), then ( ) ( ) ( ) ( ) ( ) ( ) Now consider ( ) ( ) ( ) ( ) | | Note the interesting distinction in adding (or subtracting) a constant over multiplying (or dividing) by a constant. One has no effect on the standard error and the other scales it. Also notice the step right 24 before was substituted in, the was still there but contained in . In other words, when just using the formula, the last example would normally come out to We will get to propagating through the averaging formulas in a moment, but we can still use the previously discussed cases for two more quick examples. If The numerator 1 is a constant, its standard error is zero, and the variable has an exponent of -1. So applying the second error propagation formula, ( ) ( ) ( ) A nice property of propagating errors is the squaring part that automatically lets you rewrite a negative exponent as being positive. (Analogous to the formula for adding and subtracting being the same, the formula for multiplying and dividing is essentially the same too.) ( ) ( ) ( ) For another example, consider ̅ Then ( ) ( ) √( ) ( ̅) ( ̅) Here is an example finding a propagated error you will actually use in lab, ( ̅ ) ̅ 25 ( ( ) ̅ ) ̅ ( ̅ ̅ ̅ ) ( ) ̅ Remember, you can have a term with the constants if you like, but since you know them with certainty, there standard error will go to zero. (In other words, that number 2 in that formula is a legitimate 2. We are not entertaining any uncertainty in it, neither 2.000001 nor 1.999999, that there is a bloody 2! This is the same as saying its quantified uncertainty, , is zero. Same with and any other values we are treating as constants.) ̅ √( ̅ ̅ ) ( ) ̅ For a more general example, suppose ( ) √ ( ) ( ( ) ) ( ) ( ) ( ) √( ) √( ) ( ) ( ( √ ) √( ) ) ) ( ) ( ( ( ) ) ( ) ( ( ) ) ( ) ( ( ) ) ( ) ( ( ) ) ( ) ( ( ) ) 26 A less abstract way to do this is to first simplify by adding in different variables. ( ) √ Let Find and and , first, √( ) ( √( ) ) and ( ) Now ( ) √ Find ( ) ( ) ( ) √( ) ( ) You can then substitute the ’s and ’s back for your final result. However, it can be ideal to leave it like this, especially when using MS Excel to propagate errors. You just make cells for and , then and , and then you can more easily click on them when finding , rather than trying to enter in longer, and more error prone, expression for found first. Let us, at long last, find ̅. The formula for a the 1D mean is ̅ ̅ ∑ ( ) Keeping it squared, using the addition formulas, as well as the multiplying by a constant corollary found above, ( ̅) (( ) ( ) ( ) ) 27 We already established that our best estimate for the precision error in each measurement is just ( ̅) (( ( ) ( ̅) ) ( ( ( ( ) ) ) ) ̅) The estimated quantification of the reliability of a calculated one-dimensional average—the standard error of a 1D mean is ̅ √ Sometimes this is called the standard deviation of the mean. There is no pre-set Excel command to find this. However, if all your data is in column A, then you can use this: =stdev(A:A)/sqrt(count(A:A)) The same can be done with the 2D averages, propagating slope and intercept from linear regression. through the formulas determining the The estimated quantifications of the reliability of the two-dimensional averages from linear regression—the standard error of slope, , and the standard error of intercept, , are ∑ √ √ ∑ [∑ ] Don’t worry, you will be given a very easy way to find these on Excel. Your job is understanding what they represent. However, if you prefer to find them on your own, for your copying and pasting pleasures, if your independent variables are in column A and your dependent variables are in column B, then the two commands are, is =STEYX(B:B,A:A)*SQRT(COUNT(A:A)/(COUNT(A:A)*SUMSQ(A:A)-SUM(A:A)^2)) is =STEYX(B:B,A:A)*SQRT(SUMSQ(A:A)/(COUNT(A:A)*SUMSQ(A:A)-SUM(A:A)^2)) . 28 Now we have our general set of standard errors along with a means to propagate them into any formulas we will encounter in this course. So let us discuss their expected behavior as the number of measurements gets larger and larger. Theoretical Convergences If we have only random errors, then one-dimensional statistics predicts the following as the number of measurements, , gets larger and larger, theoretically approaching infinity. If ̅ and are the best estimates of their theoretical true values, ̅⇒ and ̅ and , then ⇒ ⇒ If we have only random errors, then two-dimensional linear statistics predicts the following as the number of measurements, , gets larger and larger, theoretically approaching infinity. If , , and are the best estimates of their theoretical true values of slope, and standard error of the dependent variable, , then ⇒ and ⇒ ⇒ and ⇒ , intercept, , ⇒ Most students are OK with the averages of the values themselves approaching their theoretical true values as the number of measurements gets higher and higher. Yet some students tend to have a hard time connecting this with the equivalent statement of its uncertainly getting lower and lower. As a matter of fact, the previous table is redundant! ̅ ⇒ automatically implies ̅ ⇒ , and vice versa. Furthermore, you must appreciate how powerful such a statement is, for it is concisely displaying the awesome power of the averaging process; it reduces the impact of random errors. This is also concisely affirming, in mathematical terms, the uncomfortable vibe you hopefully got when reading that a political scientist claims he knows for certain that Mickey Mouse will be the next president after only one measurement. Knowing such a thing with any certainty can only be done with an average, and the certainty of this average improves as the number of measurements increases. Another common issue is believing the 1D (or its 2D analog due to seeing that N-1 (or N-2) in the denominator, ) also approaches zero. This is often 29 ∑ ( ̅) and thinking “Ah! In the limit as N goes to infinity, this must go to zero!” But converging to zero does not make sense on many levels. Mathematically, yes indeed, the denominator is of order N, but the numerator is a sum of N positive values; therefore, the numerator is also of order N. (Well they both are on the order √ since we’re talking about the standard errors here, but you hopefully get the point.) You can see this visually after learning histograms in the first synthesis. As N increases, you can see that the histograms are not becoming thinner and taller; they are converging to definite size and shape. And since represents the width of such a histogram, this means that is converging to a set non-zero value. And thinking about this experimentally, represents the precision of your measurements of , a quantity affected by countless factors due to the measuring device, the measurer, the measured, and the measurement environment. Now you can certainly improve upon , and indeed, the more measurements that are made can provide valuable experience allowing a now wiser experimenter to improve upon his or her technique. And people can improve their precision in other ways too, such as using a magnifying glass, playing calming music, being well rested, firmly clamping certain things down, and so on. But claiming that converges to zero is a very different thing. As you make more measurements, as you keep updating , you are starting to know your experimental precision more precisely, but that does not mean you are becoming infinitely precise! Think about that literally. So as you make more and more measurements with the same ruler, it will magically start developing smaller and smaller nicely calibrated divisions? And then your eyeballs must start developing super magnifying vision? And then as you use the ruler even more, the laboratory and earth will start to vibrate less? (Believe me, I wish this were true! It would be like developing super powers due to being an extreme nerd! Darth Vector would be a legitimate Sith Lord! Muhahahaha! Alas, all I’ve gotten after years of physics is myopia.) Quantifying random error does not reduce the impact of random error on your data. Averaging reduces the impact of random error. That is why their standard errors do indeed theoretically converge to zero as you make more and more measurements. This is fundamentally why averaging is a monumental process for science and humanity. Its importance can never be overstated. Averages are everywhere. As mentioned before, even your lab grade will be an average! Your grade will not be determined from just one single measurement like one lab report grade or one pre-lab score. Several measurements are made in various ways to try to reduce the potential randomness of your performance. Then an average 30 of these measurements becomes the best estimate in quantifying the true value of your knowledge of physics, reporting ability, experimental technique, and ability to work in a laboratory environment. Now there is also another reason why even the standard error of averages will never actually be able to reach the theoretical limit of zero. Notice there needed to be this qualification first: “If we have only random errors” in the above boxes. As we will discuss later, systematic errors will gladly pour some cold water on any attempts to actually try to reach those theoretical limits. This is why the only true time we will be able to see this behavior is if we can simulate an environment with zero systematic errors like in the first synthesis. Alright, more examples of propagated errors! If ̅ And ̅ What is ? . ̅ √ √( ̅ ) ( ) In general, we only keep one significant figure when rounding a standard error and this tells us where the uncertainty in the quantity begins. In this case, is telling us that uncertainly begins in the first decimal place so that is where we should round . Therefore, This is an interesting result, if we just ignored the imprecision in ̅ , we would have gotten the same reported result, . This is as if ̅ and might have been ignored form the start—considered negligible compared to the precision error in b. As you might have noticed, this issue of negligibility comes up quite often in error analysis. You’ve been doing it for years and didn’t know it! Consider when using a ruler, one end is matched up to an edge, usually the zero line’s edge, and then the second end is measured in whatever interval it happens to fall in. 31 So when you read the less precise end, say as 2.47 cm, you are ignoring the subtraction that is actually happening, 2.47-0 cm, and you also never bothered to propagate the error in that subtraction! In other words, say you guess the standard error in making one measurement is , then you would report the widget’s length with standard error of just . However, look at this situation again: Can you see now in terms of quantifying precision just how undesirable this is? This is actually two interval measurements now. Therefore, if you guess the standard error in one measurement is , this must be propagated through the subtraction now, √( ) widget’s length with standard error √ ( ) √ . Thus you would report the . This same reasoning applies to the independent versus dependent variables. It is assumed that you are carefully setting up the independent variable measurements, conscientiously lining up to edges of lines, and then you measure the dependent variables in whatever intervals they happen to end up in. This is why was considered negligible compared to . Ah, but we need to be careful now! The story of negligibility can get more complex! We must also consider how the error compares to what it is the error of! Let the same ̅ multiplied, ̅ and from before, but this time consider their being . Then ( ) ̅ ( ) ̅ ( ) 32 √( ̅ ) ̅ ( ( )( ( ) )√( ) ) )√( ( ) ( ) Therefore we report the result as Did you notice what just happened? The very same negligible in multiplication with ! Yikes! ̅ that was negligible in addition with is now NOT Let’s back up and examine when this major difference occurred. ( ) )√( ( ) There we can see already how the term representing ̅ is already larger than the term representing . Backing up a little more, ( )( )√( √( ̅ ) ̅ ) ( ) ( ) Now we can see the problem a bit closer in the ratios of the deviations over what they deviate from. These are called fractional errors. Your precision can be hurt if what you are measuring is small compared to the standard error of what you are measuring. Now consider improving the fractional error ̅ by increasing ̅ to 22.30 g and keeping worsening the fractional error in by decreasing to 3.22 and keeping the same. √( ̅ ) ̅ ( ) ̅ the same, and 33 ( )√( )( ( ) ) )√( ( ) ( ) Ah, we can already see the imprecision now dominates over the ̅ imprecision, which was not done by changing either or ̅ , but by changing what they compared to. Now let us double check this by simply assuming the imprecision in ̅ was negligible compared to the imprecision in from the beginning. Having this assumption is treating ̅ = 0, negligible and ignorable. √( ̅ ) ̅ ( )( ( ( ) )√( ) )√( ) ( ( ) ) The same experimental result! Now the imprecision in ̅ is negligible to the imprecision in addition and multiplication. under both Generally, it would not be safe to assume negligibility from the beginning in the previous example since it was “cutting it pretty close.” So keep that in mind when setting measurements that are going to be assumed negligible, such as properly zero’ing measurements and carefully setting independent variables. (The rest of this discussion needs to wait until after systematic errors are discussed. Look for *****) Let us further explore fractional error with a visual example. Consider these three cases in using the same measuring device to find three distances. First the length , 34 Then the width , Finally the thickness , 35 While the error in using the ruler, , theoretically stays the same, the fractional errors gets worse as what is being measured becomes more and more comparable to its error, (Here’s a tip; take another look at when I first introduced the error propagation formulas, I put in a step just before the square rooting. This is because that is how I actually remember them, the addition/subtraction one is just the errors added in quadrature, and the multiplication/division one is the fractional errors added in quadrature with some potential exponents thrown in. I hope that helps! Now it’s pretty box time!) 36 Fractional error is a ratio comparing some sort of deviation to what is being deviated from. In terms of precision, this is a standard error compared to the quantity it is the error of, In terms of accuracy, this is the deviation from what is found experimentally to what is considered the accepted, true, value over the accepted, true, value. Sometimes it is desirable to convert this to a percent fractional error, This course’s general test of accuracy is a PFE called a percent difference. | | Rounding Rules We are now ready to discuss the formal rounding rules for this course. You always use unrounded numbers in calculations; however, when formally reporting a result, you want to present it as reliably as you know it along with some indication of how accurate your value is. (You will not always be able to test accuracy.) Alright, so let’s say you perform one-dimensional statistics on an ensemble of data. For example, suppose you are trying to find the length of a simple pendulum attached to the ceiling with a 2-meter stick (something you will do in lab by the way). You do this ten times and find these results in the table. L N (cm) 176.25 `L 10 175.62 (cm) 175.77 175.90300 176.18 SL 175.88 (cm) 175.68 0.24060341 175.92 S` L 175.94 (cm) 175.60 0.07608548 176.19 37 Suppose that all we wanted to do was formally report our best estimate for this length, ̅ . In the modern day of calculators and computers, we will get back far more numbers than are actually significant. So how can we determine where to properly round what we wish to formally report? Ah, you might be thinking about the rules of significant figures. And those are fine as long as you are dealing with single quantities where we have all tacitly agreed that the final digit is where uncertainty begins. However, this is also assuming the overall impact of random error is not being reduced throughout the entire problem. We want to use averages, not just single determinations. Averaging reduced the impact of random errors. Therefore just following the rules of significant figures is not adequate for us. For example, there is no reason to suspect that merely adding and subtracting two or more numbers together will produce a sum or difference that has had its precision improved over any of the single values that went to determine it, so the rules of significant figures are fine. The same reasoning applies to multiplying and dividing two or more numbers together, why would we think such a result has lessened the impact of random error? Therefore again, the rules of significant figures are fine. But again, averaging reduced the impact of random errors. We want this to happen. We want as much experimental reliability as possible in our experimental results. So yes, when a 1D average is taken on an ensemble of single determinations, such as with all the measurements of length above, then the impact of random error is reduced on the 1D average over just any single measurement of length. If you recall how we quantify this reliability in values, with the standard error, what we just stated in words is the exact same thing as noting mathematically that ̅ . Make sure you understand that this same reasoning applies to our linear 2D statistics too. In other ) and ( ), then using the rules of words, if you have just one pair of determinations, some ( significant figures is fine if you then determine the one slope and one intercept from those just two points. However, in the exact same way that having more than one determination of a single value and applying 1D Stats to them is ideal, having more than one pair of points and applying 2D Stats is ideal. Normally we do not find what would be the standard error of just one determination of slope and intercept from just one pair of points, but if we did, mathematically they must be less than standard errors reflecting the 2D average values of slope and intercept. OK, let’s get into how we justify our rounding rules. In our formally reported result, we only want one uncertain digit, the last one. Let us explore this with the more tangible ruler analogy first. 38 OK, assuming this is properly zero’ed (meaning that part contributes negligible imprecision) then here is how we normally try to measure a length. We can tell that our first value is 2 cm. So far so good! Next we can see that our widget seems indeed past the fourth line in-between 2 and 3 cm, giving us 2.4 cm so far. Now we normally consider the next digit where uncertainty begins because that is where our uncertainty begins in our reading! We are out of nice markings and must guess as best we can just where in-between those two line our widget’s length resides. 2.43 cm? 2.44 cm? 2.45 cm? Well whatever the final guess is, based on the measurement device alone, the uncertainty begins in the second decimal place. Ask yourself this question, what if someone took the widget and ruler, and then formally reported this single measurement back to you: 2.4472843 cm. Would you believe that person was able to see microscopically? If someone formally reported that length to you, would you just assume that person must have super powers and accept that all those digits but the very last one are certain? Ah, no, you should not. Whether you realize it or not, you know that person did not properly round! And also whether you realize this or not, you are assuming the standard error of a single measurement takes on this form 0.0_ cm with the first significant figure representing where uncertainty begins. The standard error represents our experimental uncertainty; we let it guide us when deciding where uncertainty begins in the values we wish to formally report. So getting back to this example, First, we normally only bother keeping one significant figure when reporting a standard error. Therefore, we start by rounding ̅ . Why only one significant figure for a standard error? Well here is the easy answer. We are generally not interested in quantifying precision outside of what it reveals about the values it represents the precision of. Meaning we care more about what ̅ can tell us about ̅ than ̅ itself. However, yes, there is a hard answer. As sample size increases, standard errors of averages converge much slower to their theoretical true values than the average values they represent. In the physical experiments we will be performing in these labs, we will never come close to the number of measurements needed to legitimately reveal more than one significant figure in a standard error. (You will get to explore this in the first physics 181 synthesis.) L N (cm) 176.25 10 `L 175.62 (cm) 175.77 175.90300 176.18 SL 175.88 (cm) 175.68 0.24060341 175.92 S` L 175.94 (cm) 175.60 0.07608548 176.19 39 Alright, so ̅ tells us that our uncertainty of what it represent, ̅ , begins in the second decimal place. Therefore we formally round there. Giving us, ̅ Before moving, note something interesting about this particular real life example. Suppose you only made a single measurement of this length of a pendulum with a 2-meter stick, which technically you can measure to the second decimal place in centimeters. You probably would have guessed that the uncertainly of a single measurement, which represents, would fall in the second decimal place, just like in the ruler example above. If you only made a single measurement and needed to make a “reading error” guess at , you would probably say it was 0.05 cm or something like that. Yet that actually would have been a gross underestimation. In this real life example, the statistical best estimate of where uncertainty begins in a single measurement of calculated out to be , implying uncertainty beginning in the first decimal place, not the second. You can actually see this from just examining the data of this particular example. Look at each measurement from left to right. The 1’s are all the same, the 7’s are all the same, the 6’s are all the same, but then the numbers jump around quite a bit, implying that is where uncertainty begins. In other words, this example is to remind you that random errors come from more than just the measurement device itself. There is individual skill in measuring, just like in sports and shooting, so how good is the measurer in seeing this uncertainty? Especially in this example of measuring something hanging above you from the ceiling, how much is what is being measured contributing to the random error, is it stable or awkward? How is the measurement environment contributing to the potential random errors, are their distractions or strobe lights going off in the laboratory? Sometimes you must try to guess at the standard error of a single measurement, but it is always preferable and safer to take more measurements in use statistics. So when developing a set of rounding rules, again, we want only one uncertain number in your final reported value and the standard error tells you when this uncertainty begins, and therefore where to round. Generally, you keep only one significant figure for your standard error. But an exception is allowed when that significant figure rounds to a 1. Consider just how close an error 0.013 is to 0.009, so we allow one extra significant digit in rounding the quantity this represents the precision of, along with keeping two significant figures for the standard error. This is not to say that the true value for an error cannot legitimately have a leading digit of 1 and that is truly where the uncertainty begins; for example, the standard error of a vernier caliper is often considered to be a legitimate 0.01cm when the hundredth place is truly as far as the measurer can go (without a magnifying glass). 40 We are usually less interested in a percent difference beyond what it reveals to us about our experimental value’s accuracy. So just keep two significant figures for percent differences is good enough for us to be able to gauge how accurate our values are. Rounding Rules when formally presenting a result Keep one significant figure in your result’s standard error and that tells you where to round the result. An exception can be made if the first digit in the standard error rounds to a one, then you can keep a second significant figure in the standard error and round the result to where that second digit is. If your standard error happens to be larger than the value it represents, then that value is totally unreliable. Just keep one significant figure for both the value and its standard error and put a question mark by your value. If you can do a test of accuracy with a percent difference, just keep 2 significant figures. Remember, this is just how to present your experimental finding. You always use unrounded numbers in calculations, which is easy when using Excel because you can just select the cell with the unrounded number in it. The exception—when the leading digit of the standard error rounds to one, we allow an extra digit in the reported value—requires some more discussion just to be clear. For contrast, let’s consider a unit-less example without this exception. Suppose your standard error comes out to 0.09, which justifies your reporting a final value to the second decimal place. Yet what if your calculated standard error just happened to be 0.10? “Blasted! I am so close to getting that extra significant figure! Come on, It’s like 10 cents versus 9 cents! Can’t I just have that extra place, please? It’s not like 80 cents versus 9 cents, or even 20 cents versus 9 cents!” Remember, as a scientist you want high precision; you want high reliability; you want to be able to report your values to as many significant figures as possible. Indeed, we can justify allowing one more significant figure when being that close. Standard errors are averages of a finite sample set too; standard errors themselves have uncertainties even though we normally don’t calculate them. So we allow this exception of when a leading digit rounds to a one, it’s close enough to giving you that extra significant figure. 41 Now this is not to say that standard errors do not theoretically converge to some accepted, true, value themselves. So maybe the proper standard error does actually have a leading digit of one, and that actually is where the uncertainly begins. Perhaps the best example of when this might happen is using a Vernier caliper. Some measurers can get three decimal places in centimeters while others can legitimately get only two. So for the former, their standard error could legitimately be 0.009 cm and the latter could legitimately be 0.01 cm. The point is that we almost never take enough measurements to be sure enough, so we allow the exception in normal practice. As a matter of fact, something you will ideally observe in the first synthesis (and hopefully this will be a nice bonus to those actually reading this!), standard errors converge much slower than the averages they represent the standard error of. This is also why we only normally keep one significant figure in a standard error, we really don’t know that digit all that well anyway, but the point is to help you develop solid experimental habits. In other words, you would typically need far more measurements than are reasonably possible in these labs to justify normally reporting more than one significant figure on a standard error. In other words, our uncertainty in our uncertainty, discussed a bit more below, is why we normally keep just one significant figure in a standard errors and why we allow that exception when keeping two. Examples! Suppose when determining the acceleration of gravity, you find You would formally report this as Suppose when determining the height of the John Hancock skyscraper, and after repeated measurements, you find ̅ ̅ You would formally report this as ( ) OR 42 Suppose you carefully draw a large series of lines of length and then draw a circle around that line and ) points. measure its circumference, . You do this for many different sizes giving you a set of ( After performing 2D stats on this ensemble, you find You would formally report this as After performing Millikan’s Oil Drop experiment, you find the mass of an electron to be You would formally report this as ( ) OR Ideally you will not spend your life as a scientist rediscovering things that have already been discovered where checks of accuracy will not be possible. Suppose you determine how many moles of an unknown gas you have in a container as You would formally report this as Hopefully the situation of your standard error being much larger than the value is represents will be quite rare. Consider trying to find the density of some metal with a foot and stone. You would formally report this as 43 ( ) Before we move on, we need to address two related issues which might not seem related! Firstly, if something like ̅ needs to have its range of reliability, ̅ , reported with it, should not then the average of have its own error, , and why are we not finding it? Secondly, what about what The Wandering Wise One said before, “it will take you about 68.2 minutes, give or take 1.7 minutes to get there.” So if that uncertainly legitimately started in the one’s place, or at least it would have rounded to 2, why wasn’t the answer rounded to 68? Firstly, you will not be using this in this course, but there is indeed a , √ And the reason you don’t need to know it here is because we almost always do not ask you to report . We ask for things like ̅ , so is needed to calculate ̅ , and then ’s use for us essentially ends. This is because of our context. We are generally interested in the physical quantities being measured or determined, like the acceleration of gravity or the resistance of a circuit element, and we tend to find best estimates for them with averages, from either 1D statistics or linear 2D statistics. So after finding a best estimate, we stop being as interested in the precision of just one individual determination. However, the “give or take” The Wandering Wise One gave was not ̅ , it was . In that context, the tourist was in fact interested in making an individual trip. So for time , if The Wandering Wise One was trying to be thorough (and perhaps delightfully annoying to the tourist), the more formal answer would look like ̅ ̅ ̅ But the really disturbing piece of missing information was not about precision, it was that The Wandering Wise One’s directions were not accurate! Accuracy? Now that is a word we haven’t heard in a long time. A long time. Indeed, and this is a very important issue here. Notice that in all of our previous statistical formulas for trying to quantify deviations, nowhere do the accepted, true, values ever come into play. It is always our best estimates, our averages, which represent them. Well, fair enough. Ideally a real scientist is not going to be spending his or her entire career rediscovering things that have already been discovered! However, tests of accuracy will always be important. They are important at the undergraduate level in 44 that it is nice to be able to compare what we discover to what is accepted. Later on, using what is accepted can help scientists develop new experimental techniques and explore new substances in various ways by using what is already well-known as a guide. Furthermore, at all levels, tests of accuracy are vital in helping us explore those errors that deviated us away from what we are trying to determine, but are not properly handled by our statistical treatments. Systematic Errors A systematic error is anything that would not be properly reflected in quantifying precision, the influence of random errors. The influence of systematic errors may or may not affect any of the preceding statistics, but if it does, it would not affect the statistical calculations in the way they are designed to handle; if the statistics treats a systematic error properly, then it is actually a random error. Yet this does not mean a systematic error will affect a statistical calculation at all or even enough to be noticeable beyond the normal variations random errors produce. Yet, this does not mean that all mathematics will not be useful in accounting for systematic errors. For example, by analyzing anomalous results in statistical calculations, like a test of accuracy, this can help expose systematic errors. So overall, systematic errors can be a varied as nature itself. Unlike random errors, there are no set ways of dealing with them. They may or may not be detectable and treatable mathematically. They can never be fully removed from any real measurement. By analyzing data; by mindfully designing, constructing, and performing experiments; by identifying and judging relevance; the sagacious scientist can try to account for systematic errors to the ideal point where their influence is negligible compared to the influence of random errors. In other words, you try to make it so you can ignore systematic errors according to level of precision you hope to obtain. If you cannot ignore them, meaning their impact is affecting your results (it is non-negligible) and you cannot account for them by either adjusting your data, your experiment, or your analysis, then you, at the very least, must make sure to properly acknowledge their effects in your experimental reports. However, do not clutter your reports with trying to list systematic errors that are clearly negligible at your precision. Judgment and forthrightness are essential qualities in experimental reporting. Let us develop a frame of reference and then introduce various systematic errors to get a feel for them. We will start with a faithful set of (x,y) pairs. 45 Now suppose there has been some sort of calibration error in Ex. This might be a poor zero’ing at the beginning, or some value was supposed to be subtracted from each Ex and the experimenter forgot to do this. Notice, already, how insidious these systematic errors can be! If this last graph was all you had to look at, would you be tipped off that there was something wrong? Now see how both look together. 46 Now compare these statistically derived quantities. Since this is an error that has affected Ex systematically, in the same way by the same amount, the slope is the same in both cases. So if we were able to compare this to what we expected to get, or more often, if we compared a value calculated with slope to what we predicted we would find, we would not be able to spot this systematic error. Notice also that R2 has not changed, and why should it? R2 helps us determine the nature of the dependency of the two variables, and R2 implies precision. For this particular systematic error, the shifted data is just as linear as the starting data and the precision is the same. Ah, but if we compared the intercept, or something calculated from the intercept, to what we expected, then we would be better able to expose this systematic error! Thus one of the ways we try to identify systematic errors is to compare what we experimentally get to what we believe we should have gotten—tests of accuracy. 47 Comparing Precision with Accuracy. Compare the following two deviations. On one side, use the statistically derived best estimate of the range of reliability from the best estimate to the accepted, true, value of . On the other side, use the actual deviation of your best estimate to what is considered the accepted, true, value of . | Case one: | | | This means that whatever deviation you see in that test of accuracy is within the plus or minus swing of reliability random errors seem to have already forced upon you. In other words, just based on the determination of precision, whatever deviation is happening on the right hand side is already inside what is considered reliable. This implies that systematic errors have been reduced to negligible influences based on your level of precision. Case two: | | This implies there are errors having a clear influence on your data that are not being properly reflected in the standard error determinations. In other words, hunt for non-negligible systematic errors. This is when you want to look inside the absolute value to see if or , and try to better determine the nature of the inaccuracy. Remember, these are still tests based on estimations. As you might have noticed in the rounding rules concerning standard errors, there will always be significant uncertainty in them based on the number of actual measurements we can make in the laboratory time allotted to us. Furthermore, you should often question the accepted value itself. Often a physical quantity is determined theoretically using approximations along the way, like assuming an ideal gas, while other times many accepted values are dependent on environmental conditions, like temperature and pressure, and your experimental conditions might not actually match those. In other words, you might actually be more accurate than the accepted value! Remember though, this is assuming we can do such tests of accuracy at all. Sometimes we have no accepted, true, values to compare to. And this is partially why scientists sometimes use a particular measurement technique on something well known first as a way to weed out any potential systematic errors. For example, if a scientist is trying to use some technique to determine some property of a new substance, an optical crystal perhaps, then they will first use their technique on something familiar, water perhaps, in order to make sure their technique is producing accurate results. 48 Now consider a systematic error that might affect all the data but not in the same way. Air resistance is a frequent offender here as it gets worse as something moves faster and faster (imagine sticking your hand out of the window of a car at various speeds.) Stray vector fields can cause such errors, such as the earth’s magnetic field interacting with an electric current that is increasing or decreasing at each step. Perhaps you are performing some comparatively high precision length or voltage measurements on a piece of wire, and the temperature of the room is creeping in one direction or the other. Perhaps as more and more people enter the lab and turn on more and more electronic equipment, the temperature is steadily rising. Or perhaps there’s an adjacent experiment where people are using liquid nitrogen and the temperature is getting progressively colder? In either case, such changes in thermal expansion and in resistivity might indeed be producing a systematic error to the data. Notice how both the equation and R2 are affected by this type of systematic error. And R2 is now implying higher precision after this systematic error was added! A systematic error might be affecting the data neither by the same amount nor in the same direction. Consider a poorly skilled experimenter not consciously trying to keep parallel when reading volumes from a cylinder that is having more and more liquid poured into it. Depending on what level the experimenter’s eyes are, this parallax error can start off shifting data in one direction, steadily get better, and then start shifting the data in the other direction! 49 As you might have noticed, spotting and properly accounting for systematic errors can get quite difficult, especially when tests of accuracy are not possible. The ways to account for them can vary too. For example, if you spot a forgotten subtraction that shifted all your data, then you simply fix this. If you suspect air resistance, a stray vector field, or heat gain is affecting your results, you try to modify the experiment to try to account for these (as we actually do in the inclined plane experiment in 181, the current balance experiment in 182, and when using dual cups of liquid nitrogen in 182). Or you might be able to update your analysis by including mathematical terms to properly factor in these effects, which usually involve differential equations however. In general, you should always properly label any systematic errors you believe have affected your data, meaning it is non-negligible compared to your precision. Exposures of systematic errors have led to some historic discoveries! The famous Rutherford’s gold foil experiment, the development of Einstein’s theory of relativity, and the discovery of cosmic rays are three instances where mysterious and inexplicable results led to profound insight into physical theory itself! Alas, not all systematic errors get to enjoy a life of mystery and eminence. It could be just a mistake. It happens. You should always inspect your spread of data, and if a datum appears unreasonably too far from the rest of the data, it could be just a mistake. Thus it should be labeled as such and not included in subsequent analyses. (Meaning if you are using Excel, cut-and-paste it out of the columns you are doing statistical analyses on, and label it as a suspect datum). Sometimes it can be fixed due to there being no ambiguity in how the mistake happened. An example of this (and a frustratingly frequent example for this dyslexic author) is simply discovering that a number has been recorded backwards. That might show up like this: 50 Or a point might have been skipped during the experiment causing the subsequent measurements to be recorded in the wrong places. Although they will not be discussed here, there are more formal means when trying to decide if a datum is just a trivial goof or not, based on how many standard errors away the datum is. For such trivial systematic errors, use your judgment when deciding whether or not a datum is an unambiguous goof that can just be changed. There are other schools of thought that say nothing should be considered a goof. Always err on the side of caution in such decisions. The way to judge relevance and impact of systematic errors is closely linked to your level of precision. Did you notice how I switched to Vee vs. You in that last example? I needed to increase that running example’s precision in order to demonstrate that last systematic error. This is the same error applied to the first set. 51 As we saw a few examples ago, this systematic error of skipping a point and entering the subsequent data in the wrong spots actually improves R2 from 0.9793 to 0.9942! Welcome to experimental science! This imprecision is bad enough to actually obscure and benefit from a systematic error! ***** As a matter of fact, the existence and impact of systematic errors are intimately linked with precision; they almost act as nature’s “checks-and-balances” in experimental science. Let us discuss this in a context you will see over and over again, performing a two-dimensional, linear experiment. In the second 181 experiment where you will find the density of water, you carefully set a value of mass that is independent of the amount of water you will add afterward. Then the volume you add depends on where you had set the mass. It is important to fully take advantage of this extra control in precision in setting such independent variables. Remember that the two-dimensional statistical analysis that is normally performed by pre-programed machines like your calculator or Excel automatically treats the precision error in the independent variables as being zero—negligible and safely ignorable—compared to the precision error in measuring the dependent variable. So do not make liars out of them! As we discussed earlier, this is analogous to using a ruler to measure the length of some object. Spending time to carefully align one end to an edge of one of the nicely calibrated, clear lines, usually the zero line, followed by measuring whatever interval in-between lines the other end happens to fall in allows us to consider the precision error in the first measurement to be zero—negligible and safely ignorable—compared to the second measurement’s precision error. Yet if you decided to randomly align the object just anywhere on the ruler, resulting in the same type of interval measurement for both ends, then you need to consider a propagated subtraction error. As a matter of fact, this inherently higher precision in using an edge over an interval is why independent variables are often written as whole numbers, as would be the 0 in the subtraction with the ruler. 52 In other words, if this subtraction was written out, it would probably be written out as L=2.47-0 cm. Now this can be confusing for some as we would tend to write 0 and not 0.00. But there is a huge difference in these two measurements. The reason you know the 2.47cm measurement to the second decimal place is because that is where the uncertainty begins, in an interval. However, we set the 0 measurement (ideally) by carefully aligning one end to the edge of the clear, nicely calibrated line. Notice this is different than trying to align to the center of a line, which is trying to guess the middle of yet another, albeit smaller, interval. So we are using an edge of a line! We can claim that to be 0.000 cm! Perhaps even 0.0000 cm! Well, if we are really careful, what about 0.00000 cm? Nooooope!!! *CRASH!!!* If you try to increase precision more and more, you will start crashing into systematic errors that were once considered negligible and safely ignorable. In this example, if you really want to claim such high precision as four or five decimal places with a normal ruler, then the temperature of the room will start becoming relevant. Nearly all scientific equipment is calibrated at 20 degrees Celsius, and it is very unlikely that the laboratory you are sharing with other students will be just that temperature. And suppose it is, then the non-random issues inherent in thermal expansion will give your increasing precision another wall to crash into. This checks-and-balances of random errors versus systematic errors comes up everywhere. Here is a more specific example based on one of the experiments you will perform in 181. In the Kinematics of Free Fall experiment you will be determining the acceleration of gravity, . There is a systematic error of air resistance that will be negligible for some and not for others depending on how precisely measurements are made. Some experimenters will (unfortunately) be so imprecise in finding that their range of reliability, , will swing so far as to make whatever impact air resistance had on their | | for them, which is why tests of accuracy results negligible. And it is even possible that can sometimes fool experimenters into thinking they did a “good job” when, in fact, their lack of precision was so overwhelming that it obscured insight into a deeper understanding the physical situation. This is not to say that accuracy is not important, but it can be important in other ways than | |. just seeing “good” percentages. The more precise experimenters will first find their Then this poor accuracy will cause them to hunt for systematic errors. Then (ideally of course) they will start looking inside the absolute values and realize that , and they will eventually realize that 53 their level of precision was high enough to expose that the systematic error of air resistance had a nonnegligible impact on their data, and unless it is accounted for in some way, they will never truly be able to get closer to the theoretical true value for no matter how many similar measurements are made. Such improved precision grants them deeper insight into the physical world, which is acting as a systematic error. Taking this experiment beyond our humble laboratory, the more you improve your precision, those systematic errors that were at one point negligible compared to your precision will start waving at you. So if you do manage to account for air resistance and then further increase your precision, the systematic error of performing such an experiment on an accelerating frame of reference will start to wave at you. And if you account for that and further increase your precision, the ocean’s tides will start waving at you. And eventually, if you try to improve your precision enough, your quantum mechanics textbook will start waving at you. Science is not an exact science. Comments on Language Systematic errors versus non-random errors This category called systematic errors is difficult to properly label; “non-random errors” would be a better term. However, in the various textbooks on error analysis, this category is usually called “systematic errors,” or perhaps broken up into “systematic errors” and “mistakes.” Since this document is intended to be a launching point to undergraduate error analysis, eventually the developing scientist will need to reference a proper textbook treatment of this material. So I had to decide which term would be the least confusing overall, do I continue to use the term “systematic” to describe errors that really are not that systematic but make using proper texts potentially less confusing, or do I use the better term “non-random” when describing this group but potentially increasing the confusion when trying to reference proper texts? Well you already know the way I went. I did try to italicize the word systematic as sort of a less distracting way of saying “systematic” errors, but I didn’t italicize it in the example where the error was legitimately systematic. The bottom line is you should consider the term “systematic” a loose description while the term “random” in random errors is a firm description. What also makes this category very slippery is when we start getting nitpicky about just what does and does not belong in each category. Like the issue of a mistake, for “we all make mistakes” right? Like writing a number backwards? Should not that be a random error then? And as a matter of fact, if I keep making more and more measurements, as I keep updating my state of knowledge about both the value I’m trying to determine and its standard error, eventually a mistake that had a large impact on the data will start to fade away and become irrelevant. So should not these kinds of mistakes fall into the random error category? Ah, but what if a mistake is made right at the beginning of an experiment, like some knob is bumped affecting all subsequent data, or the coefficient of the factored in air resistance term in your analysis has accidentally been entered as zero? Then this is a mistake that would not be fixed by taking more and 54 more measurements like the previous mistake was! So these types of mistakes clearly fall in the systematic error category! The truth is that any categorization of what we are trying to categorize is error prone itself. Remember just what we are trying to group into categories here. Error itself! That is quite abstract when you get right down to it, like trying to categorize love! Indeed the point is systematic errors need to be handled on a case-by-case basis. If an error seems to be appropriately random, then it should be left alone to allow its proper influence in the standard error. stdev.s() versus stdev.p() When discussing the Excel code for entering in the 1D standard error, √∑ ( ̅) we had to distinguish between stdev.s() and stdev.p(). The former is for a sample of a population and the latter is for an entire population. The only difference is the population one divides by N instead of N-1, which is only really a problem for us when N is small. This is a tricky topic that depends on just what these deviations are from. If the deviations are from a mean value determined by the data, we have lost a “degree of freedom,” as reflected in N becoming one less, N-1. However, if you are measuring deviations from a true or accepted value, usually produced independently of any measurements you have made, then you divide by N. For example, we know the mean of rolling two six-sided dice is 7. If you roll two dice once as a “measurement” of this true value and get 9, and we are interesting in its deviation from the true value of 7, then we use the standard error with 7 in place of the mean and divide by N and not N-1. Even before the calculation, we know this must come out to 2. However, if we want to use the standard error we normally use, our average of this one value would be 9, and we would need to divide by N-1. Even before the calculation, we know this must be undefined, confirming nicely that we actually have no knowledge of any deviation with only one measurement. Not knowing the true value of what we are trying to measure is the general case in the natural sciences, and even when we have an accepted value, as discussed above, it is still important to keep this out of our standard error calculations in the hopes of being able to catch systematic errors later on by comparing precision to accuracy. Yet that is still not quite what stdev.p() is, for it uses both the mean and divides by N. This can only be justified if the mean happens to also be the true value. In other sciences, particularly the social sciences, you can feasibly obtain an entire population in some sort of statistical calculation. For example, if you want to see how many people in your Bingo club make under a certain amount of money a year, this is a perfectly obtainable sample size. So when you have polled everyone, your mean is also the true value, and you may use stdev.p(). Yet if you are trying to use your data to discuss all Bingo players in the world, what you have is just a sample of a larger population; therefore, your mean is just your best estimate of the true value. This best estimate is 55 calculated from your data; thus this statistical double-dipping requires you to divide by N-1 and use stdev.s(). Anyway, as mentioned before, as is the case in most natural science measurements, we will always just have a sample of an infinitely large population; stdev.s() is the only option for us. If your calculator gives you both and you are not sure which is which, the bigger number is what we want (meaning it is being divided by the smaller N-1 and not N). Standard Error versus Standard Deviation As I mentioned before, do you know how many times I have had to correct students from doing a 1D statistics on 2D data? Enough to make me scared of making it worse! In other words, you can perform a 1D mean and standard deviation calculation on a column of independent variables or dependent variables that are supposed to be changing. But the 1D standard deviation just does not mean the same thing in the 2D case. As a matter of fact, it really is a totally different mathematical exercise as it has lost its “standard-ness” of indicating that roughly 7/10th of your data falls plus or minus one standard deviation from the mean, etc. And it has also lost its “deviation-ness” as the 1D average in this case is not a best estimate of anything necessarily. Suppose I have some independent variables, 1, 2,3, 4, 5. Well the mean of that is 3, but so what? They are supposed to be changing! What if I then did a sixth point at 100? The mean is massively shifted to that extreme, but again, so what? The averages that matter here are the slope and intercept, and as long as the dependent variables are acting as we hope, adding that sixth point should only improve our meaningful averages. But now the other textbooks come into play in how I need to decide what to label terms. Basically, what is called the “standard error” in the 2D case is now what really has the “deviations” that have the properties that make us consider it “standard.” So that is really what should be called the “standard deviation” in the 2D case, but alas, it is never is. And to make this worse, the 1D mean and standard deviation does have some use for us in how R2 is calculated, that is why I reduced its font size earlier! It is just a mathematical convenience now; it is not representing deviations in the “standard” way even though, unfortunately, it is still called the “standard deviation” in the 2D case. Mean notation. Sigh, soooooooo why does an 1D average get special notation, like fancy bars over them, ̅ , and the 2D averages, slope and intercept, do not, just and ? And suppose you use a couple values that have fancy bars over them to calculate other values, but somehow the bars do not transfer over? For example, in a 181 experiment, you will calculate an initial velocity using two averages and some constants: ̅√ ̅ 56 Sooooooo if is essentially a value determined by averages, why is this not labeled as an average too? Suppose you had 10 measurements of each and , then used them to calculate 10 individual ’s, and then averaged them, would not then this get the fancy bar over it, ̅ ? Ah, but if I first find ̅ and then ̅ and use them to calculate the exact same , this does not get a fancy bar over it? And then why is the word “regression” used in the 2D case and not used in the 1D case when it is theoretically just a “regression”? Alas, there are no good answers to some these confusions in notation. The notations in this document will not match other documents on error analysis, and those will not match others still. In a way, that statistics is so incredibly important is almost to its own detriment when it comes to people trying to learn it. Because so many different fields use it, they tend to develop their own notations and labels. So what is my advice for you? Think like an artist. Consider something that is even more ubiquitous on our planet, like the color black. Every culture and every language has their own name for this color. The Japanese word for it is kuro. The Icelandic word for it is svart. The Lakota word for it is sapa. But does this mean you cannot be a painter in those lands? Of course not! When in front of a canvas, it does not matter what labels are on your paints; you know the right paint to reach for.