Introduction to Data and Error Analysis for General Physics Lab Experiments 1. INTRODUCTION The determination of the laws of physics comes from observations and experiments. It is essential to learn physics by performing experiments and interpreting experimental data properly. We consider two basic types of experiments that scientists often perform in order to learn about the physical world: 1) Through measurements to determine the numerical value of some physical quantity 2) To test whether a particular theory is consistent with experimental data. Our lab experiments are designed to teach you the techniques for making measurements and for comparing data from an experiment with the predictions of physical laws which you learn from the lectures. In order to obtain meaningful results from an experiment, you need to analyze data with good understanding of experimental errors. It is very important to learn how to identify sources of experimental errors and to estimate their sizes. Please keep in mind that any valid experimental data must be presented with the associated errors. For example, the measurement of the speed of light is given as: c = (2:99792458 0:00000004) 108 m=s The information we can learn from the above data presentation includes two aspects: i) the measured speed of light is 2.99792458108 m=s ii) the error (or uncertainty) of the measurement is 4m=s. For our lab course, it is required that you present experimental data with errors indicated in your lab report. In the following sections we briey introduce the basic concepts in experimental data and error analysis through denitions and simple examples. 2. BASIC CONCEPTS 2.1 True value, Experimental Value and Error 1 True value, A0: It is an exact physical value which often appears in fundamental laws of physics. Examples are: the speed of light in vacuum (c) in Maxwell's equations gravitational acceleration (g) in Newton's equation and so on. The numerical values of these physical quantities must be determined through measurements. Experimental value, A: It is the numerical value obtained by performing experiments designed to measure A0. In general, the measured value does not exactly equal its true value. This is because the experimental instruments and methods are not perfect, so that the measured value, A has uncertainty, which is called experimental error. The smaller the experimental error, the closer is the measured value A to the true value A0. Experimental error, : It is the dierence between A0 and A, = A ; A0. It indicates how close a measured value comes to its true value. However, we do not know the exact value of A0, so is also an unknown value. The task of data analysis is to nd the sources of errors, and to estimate the size of the errors. Based on the estimated , we can give a range of values where the true value, A0, is likely to lie. The size of the error indicates the accuracy of the experimental value A. 2.2 The Rule for Data Recording Before we discuss error analysis, we need to understand the rule for data recording in experiments. This is because a meaningful experimental value is dierent from a pure mathematic value, it contains certain physical meaning. When we make measurements and record the measured data, we must determine how many valid digits should be recorded. This idea can be illustrated through a simple example below. Assume one measures a length of an object by using a ruler which has minimum scale of 1 mm. The measurement is a little bit more than 89 mm (but less than 90 mm), One can record the measurement value as 89.5 mm. Here, 89 mm is accurate, and the additional 0.5 mm is estimated from the reading. Because it is hard for our eyes to tell if that a little bit more is exactly 0.5 mm, the recorded data above should have some uncertainty. We cannot be certain that our estimated 0.5 mm is not actually 0.4 or 0.6 mm, we then say that the measurement error is 0:1 mm. Therefore, the nal measured result should be reported as: Length = (89:5 0:1)mm. In this example, we see that three digits should be recorded in the measurement, which indicates the measurement (read out) precision of the instrument. If one writes the measured data as 89.4987 mm ;! this means he can read out the ruler better than an accuracy of 0.001 mm. This is certainly not true. On the other hand, if one writes his data as 89 mm ;! this indicates his read out error is about 1 mm, clearly too large. In general, the rule of data recording in an experiment is that the number of valid digits of recorded data should reect the minimum read out scale of the measurement instrument. 2 In some cases we need to determine a physical quantity by measuring several values. The number of valid digits for the nal measured result should be determined by the minimum valid digits of individual values. For example, suppose we measure the resistance of a resistor by measuring the circuit current, I , and the voltage, V . Assuming the read out values are V = 10:5V , and I = 1:522A, then the resistance R should be recorded as :5 = 6:70 : R = VI = 110 :522 Here the R value has three valid digits, which is determined by the voltage, which is read out with three valid digits. 2.3 Dierent Types of Errors The discussion about the experimental error will not include the performance mistakes during the experiment process. These mistakes include reading error, recording error, and incorrect instrument operation. These kind of errors are dicult eliminate through data analysis. Therefore, great care should be taken to prevent them from occurring. We discuss here two fundamentally dierent types of errors associated with any measurement procedures: systematic and random errors. Systematic Errors There are basically two sources of systematic errors: 1) Instrument calibration. For example, the zero-point has not been tuned correctly before the measurement: suppose it is at a, not at the zero point, at the beginning. Then all the measured data points will shift by a constant a. As another example, if the full range of a voltage meter is 0-1.9 volts, but the meter scale shows a full scale of 0-2 volts, then the measured voltage value using this meter will be systematically increased by a factor of 2=1:9. Therefore, checking the instrument calibration including the zero-point tuning is important to avoid the systematic errors. 2) Experiment method error. This kind of error often is due to the experiment's design being imperfect. The experimental conditions are not exactly the same as the theoretic model assumes. When comparing the experimental data with the theoretical expectations one must take into account the experimental method errors. We sometimes call such error theoretical systematic error. Understanding the systematic error in a experiment is not an easy job: we must fully understand the experimental principles and carefully check the instrument to estimate the size of the systematic errors. 3 1 f(Δ ) = σ 2π -4 -3 - Δ2/2 σ 2 e -2 -1 0 1 2 3 4 Δ Error Δ distribution obay Gaussian function Figure 1: Random error distribution. Random Errors Random error is often due to the experimental instrument's precision limitations, and imperfectly performed experiments. A special random error comes from the physics process itself. For example, the measurement of the life-time of radioactive particles must take into account the fact that radioactive decay is a random process. Often, under certain conditions, uctuations due to this kind of error obey the Gaussian distribution see ref. 1] as shown in Figure 1. We often refer to these as statistical errors. In general, random errors can be reduced by repeating the measurement. 3. DATA ANALYSIS Data analysis includes determination of the measured mean values and the standard deviations. W e discuss the standard method to present the measurement results in this section. We rst give the denitions and discuss how to combine the errors, then we briey introduce the least squares method for linear variable relations. 3.1 Mean Value and Standard Deviation For N measurements samples of a physical quantity of true value , with each measured value xi, the sample mean value, x is dened by N X (1) x N1 xi hxi i=1 and the corresponding sample variance is given by N X (2) var(x) N 1; 1 (xi ; x)2 : i=1 4 The shaded area indicates : Probability of measurement within t standard deviation of x . 68% 95% 99.7% x μ - σ μ μ+σ x μ +2σ μ μ -2 σ x μ -3 σ μ μ+3σ Figure 2: Interpreting the standard deviation . The sample standard deviation, , is given by v u N q u X = var(x) = t N 1; 1 (xi ; x)2 : i=1 (3) Standard deviation represents how the measured values spread out in repeated measurements, and therefore is a good estimate of the statistical error of the experiment. As shown in Fig. 2, the exact meaning of the standard deviation, , can be related to the probability, pt, for nding a single measurement of x to be within the range ( ; t + t), This is seen to be 68:3%, 95:5% and 99:7% for t = 1, 2, and 3 respectively. for N large enough (typically N p20) it can be shown that the probability for x to be in the interval p ( ; t= N + t= N ) is about 68%, 95%, and 99.7% for t = 1, 2 and 3 respectively. So, to estimate , one measures x and one has that = x p2 N with a condence of 95%. Thus the accuracy in determining improves as N increases. 3.2 Combining errors We are often confronted with a situation where the result of an experiment is the combination of two or more measurements. We want to know what is the error on the nal answer in terms of the errors on the individual measurements. Linear situation As a very simple example, consider the nal result a which is related to the measured values b and c: a = b; c: To nd the error on a, rst dierentiate a = (b) + (;c) : 5 if we were talking about maximum possible errors, then we would simply add the magnitudes of b and c to get the maximum possible a. But it is more sensible to consider the root mean square deviations: a2 = h(a ; a)2i = h(b ; c) ; (b ; c)]2i = h(b ; b)2i + h(c ; c)2i ; 2h(b ; b)(c ; c)i b2 + c2 ; 2cov(b c) The last term involves the covariance of b and c. This has to do with whether their errors are correlated or not. It can be positive, negative or, in the case where the errors are uncorrelated, zero. Thus, provided that the errors on b and c are uncorrelated, the rule is that we add the contributions b and c in quadrature: a2 = b2 + c2 : (4) However, it should be emphasized that only when the individual errors are uncorrelated, Eq.(4) can be applied. To illustrate this point, let's consider the following example. Example: a=b+b then the two variables on the right-hand side of the equation are completely correlated. Thus pif the measurement error of b is b, then a is simply 2b. We notice that here a 6= 2b, as would be expected by (4). (Recall: a2 = b2 + b2 + 2cov(b b) = 4b2, here 2cov(b b) 2h(b ; b)(b ; b)i = 2b2.) Non-linear situations For this case the correct answer can be achieved by rst dierentiating, then collecting together the terms of each independent variable and nally adding these terms in quadrature, i.e. for y(x1 x2 xn ), !2 n X @y 2 y2 = (5) @x x i i=1 thus, for example, if i a = br cs where r and s are known constants. Assuming the errors on b and c are uncorrelated, 2 2 2 a 2 b 2 c = r + s (6) a b c i.e. the fractional errors on b and c are combined to give the fractional error on a. 6 As before, when dealing with ratios or products we must be careful about correlations. When correlations are present between b and c in the above example, the fractional error on a is given : 2 2 2 cov(b c) : a 2 b 2 c = r + s + 2 rs (7) a b c bc Example: if a = b=c = (100 10)=(1 0:2) (assuming errors on b and c are independent.) (a=a)2 = (10=100)2 + (0:2=1)2 = 0:01 + 0:04 = 0:05 p a = a 0:05 = 100 0:22 = 22 The nal result should be presented as: a = 100 22 or a = 100(1 22%): In summary, for simple function with two independent variables, y(x1 x2), the measurement results can be presented as: q y = x1 + x2 ;! y = (x1 + x2) x21 + x22 q y = x1 ; x2 ;! y = (x1 ; x2) x21 + x22 q 2 2 y = x1 x2 ;! y = (x1 x2) 1 (x1 =x1) + (x2 =x2) q x x 1 1 2 2 ;! y = x 1 (x1 =x1) + (x2 =x2) : y = x 2 2 Combining results of dierent experiments When several experiments measure the same physical quantity and give a set of answers ai with dierent errors i, then the best estimates of a and its accuracy are given by P (a = 2) (8) a = Pi (1i=2i) i i and 1 =X 1 : (9) 2 2 i i Thus each experiment is weighted by 1= . In some sense, 1=i2 gives a measure of the information quality of that particular experiment. 2 i 3.3 Least Squares Method The least squares method is very often used in data analysis to determine the experimental parameters from a set of measured data points. In this section, we only consider 7 the simplest situation where the relations between the variables are linear. The mathematic proof will not be given here, only the formulae used in this method are given below. Consider variables y and x are related to one another linearly: y = a + bx (10) where a and b are two parameters to be determined. (We should notice that the above formula is a line equation, with a as the intersection, and b as the slope.) Assuming we measured a set of data points fxi yig i = 1 N , we need to determine a and b from the measurements. We rst dene the following variables for the calculation: X x = N1 xi (11) i X (12) y = N1 yi Lxx = Lyy = Lxy = X i (xi ; x)2 (13) (yi ; y)2 (14) (xi ; x)(yi ; y) (15) X i X i i then the measured mean value of the line parameters, a and b are determined by the following formulae: b = Lxy (16) Lxx a = y ; bx : (17) The errors on a and b can be calculated using the following formulae: P x2 2 a = N P x2 ; i(P x )2 y2 i i N b2 = N P x2 ; (P x )2 y2 i i (18) (19) where y is the uncertainty of the y measurements and can be determined by the following formula: v u P y ; (a + b x )]2 u t y = i i (N ; 2) i (20) 4. EXAMPLE OF DATA ANALYSIS 8 V 1 2 s 6V C R Fig. 3 RC circuit experiment set up diagram. When switch connect to 1: charging C; connect to 2: discharging C. Consider an experiment to measure the RC time constant in following circuit (see Fig. 3), and to determine the resistance, R, for a given capacitance value, C = 10(1 2%)C . (ref: your lab manual: Capacitance experiment). Experiment description A fully charged capacitor of capacitance C (with initial voltage of 6 V), is connected in series with a resistor of resistance R in the circuit. It will lose charge, so that the potential dierence, V , across the capacitor will decay exponentially according to the following law for a capacitor discharging : V (t) = V0e;t=RC where V0 is the initial voltage across the capacitor and V (t) is the voltage at time t. The product RC is called the `time constant'. 1) Measure the circuit time constant, RC 2) Determine the resistance value of the resistor, R. Measurement and data analysis To determine the time constant, we take data on voltage (V ) as function of time (t) by reading the voltage every 5 seconds. Assume that the recorded data are listed in Table 1. Please note: the recorded voltage data have three digits, and the minimum read out scale of the voltage is mV. For data analysis, we rewrite the capacitor discharging formula by taking the logarithm of both sides: 1 lnV (t) = lnV0 ; RC t so that lnV is linear in t and the formulae in section 3.3 can be applied directly in the analysis. In order to see the linear relation directly, data can be plotted on a 9 time (seconds) 0.0 5.0 10.0 15.0 20.0 25.0 Voltage (volts) 6.00 2.22 .823 .305 .118 .045 Table 1: Recording data on voltage vs. time semi-log paper with voltage along the log scale and time along the linear scale. The data points should lie on a straight line. The slope of the line is ;1=RC . The standard way to determine the slope of a line from a set of measured data points is the least squares method. We use the formulae presented in the last section. Let y lnV (t) x t a lnV0 b ;1=RC then the original equation becomes 1 t ;! y = a + bx : lnV (t) = lnV0 ; RC Following the discussions in Least squares Method section, we can calculate all the quantities for determining a, b, and a, b. The calculated results are listed in Table 2. x y Lxx Lyy Lxy P x2i P xi 12.5 -0.672 437.5 16.770 -85.7 1375.0 75.0 Table 2: Calculated quantities for least squares method From these calculated quantities, we determine the measured parameters and errors as : b = Lxy ;1=RC = ;0:196 Lxx a = yv ; bx lnV0 = 1:77 u P y ; (a + b x )]2 u i i i ;2 t y = = 1 : 90 10 (N ; 2) v u P 2 u t a = n P x2 ;x(iP x )2 y2 = 9:97 10;2 i i s b = n P x2 ;n (P x )2 y2 = 6:58 10;3 i i Finally, we obtain the experimental results on the time constant RC and resistance value R in Table 3. 10 The time constant The error on time constant The resistance value The fractional error on R RC = ;1=b = 5:1 seconds = 0:17 seconds RC = (b=b) RC 1 1 R = ; bCq= 0:19610;6 = 510k R=R = (C =C )2 + (b=b)2 = 3:94% Table 3: Determined parameters and errors The measurement errors indicated in Table 3 are the random errors that mainly come from the uncertainties for reading value of the voltage (you might not read out the voltage exactly on time, and the last digit of recorded data contain errors). By repeating the measurement, such error will decrease. In addition to the random error, we should also consider possible sources of systematic errors. { There may be calibration errors in the voltage meter and in the clock. { The measured time constant in fact includes the resistance of the circuit and the instrument, if you determine R through measured time constant, the determined resistance value will be larger than its true value. Once we know the circuit and instrument resistance, we should make corrections: b = ;1=RC , and R = R0 +r, where R0 is the resistance of the resistor, and r is the resistance of the instrument and circuit ;! R0 = R ; r. { The given capacitance value is not exact, but has two percent of uncertainty. This error cannot be reduced by repeating the measurement. From the calculation we know that the minimum error on the R value is 2%, which comes from the capacitance value uncertainty. Presentation of nal results The measured time constant: RC = 5:1 0:17 seconds The measured resistance of the resistor : R0 = 500(1 3:94%)k. Note: we have assumed that the equivelance resistance of the circuit and instrument r is 10, so that nal result is R0 = R ; r. Reference: (1) ` An Introduction to Error Analysis 0, by John R. Taylor 11