Uploaded by Vinci Lau

Data reduction and error analysis 202108

advertisement
Data reduction and error analysis
I. Objectives
(1) Knowing how to record experimental data; knowing what a significant figure is; knowing
the difference between precision and accuracy.
(2) Knowing how to process data; knowing how to access errors; knowing how error propagates;
knowing how to do linear regression.
(3) Knowing how to use EXCEL to carry out basic data reduction and error analysis.
II. Theory and Method
Experiment helps to guide the progress of science. It has not only been used to verify
scientific laws, but also is one of the key forces in driving innovations. As one of the notable
examples in the history, Henry Cavendish’s experiment was the first experiment to measure the
force of gravity between masses in the laboratory and the first to yield accurate values for the
gravitational constant. This experiment besides confirming the accuracy of the Newton’s law of
gravitation, it also gives us a handle to weigh Earth, Sun and the other stellar objects. In
performing the cathode-ray experiment, Wilhelm Rontgen observed that the invisible cathode
rays caused a fluorescent effect in a small cardboard screen, and thus discovered the X-rays.
X-rays nowadays is an important diagnostic tool in medicine as well as in material research.
While with the test of the ether theory in mind, Michelson-Morley Experiment’s negative result
actually indicates the speed of light is a constant. Later, in formulating the special relativity by
Einstein, the universal nature of speed of light became one of the two foundation pillars. Also,
the black-body and photoelectric experiments help to usher the birth of the quantum theory and
guide humanity into the fascinating quantum world.
For any experiments, arguably the measurement data are the most important output. Thus,
how to obtain good quality data is a crucial step in ensuring a successful experiment. Almost
equally important is the posteriori analysis of the experiment data since it is the foundation for
improvement and further development of experiments. Here we will introduce the basics of
processing and analyzing the experimental data. These important skills will be useful to ensure
the quality of the data and are generally applicable any scientific experiments. In the rest of this
experiment, the important basics of the data processing and analyzing will be divided into (1)
1
how and what to record, (2) the significant figure of data, (3) the error analysis, (4) the error
propagation, and (5) the regression analysis.
(1) How and what to record
In general, you will need a good lab book to systemically-record the conditions in
performing the experiment and the experimental data (including the variable settings and
the responses to these settings). Here, in the general physics manual, a few data recording
sheets have been included for you to use.
The following guidelines would be helpful in keeping the lab records:
(a) Write down the experimental time and location; concisely note the experimental setup,
the conditions, and the cautionary items.
(b) Before performing the experiment, draw up the needed tables on the lab book to hold
the experimental data.
(c) During the experiment, carefully record the experimental settings and experimental
data.
(d) Try to keep all the records on the lab book faithful and original. If you make any
mistakes, just cross them out and write the correct data next to them.
(e) The last record for an experiment session would be the finish time of the experiment.
Cautions:
(a) When recording the data, pay special attention to the significant figure and the units of
the quantities.
(b) For any to-be-measured quantities, repeat your measurement at least 5 times and weed
out illegitimate data points.
(2) Significant figure and scientific notation
The smallest division on your ruler in general is 0.1 cm. If you use it to measure the length
of a table, the result of the measurement, say, is 120.36 cm. The first four digits 120.3 is the
quantity that you read directly off the ruler; it is called the accurate value. The last digit “6”
is what you inferred from the smallest division on the ruler and is the estimated value. This
example illustrates the measured quantity that typically consists of the accurate value and
the estimated value. Furthermore, in this example, the significant figure is five. The
reported value from this measurement is down to second digit, a “6”, after the decimal,
2
which also means any measurements using this ruler cannot be accurate more than 0.01 cm.
Hence the number of the significant in a measurement also is a reflection of the precision of
the measuring tool.
For any measurements, the significant figure can be decided using the following rules:
1. The leftmost non-zero digit is the most significant number.
2. If the number contains no decimal point, the rightmost non-zero figure will be the least
significant number; having a decimal point, then the rightmost figure is the LSN.
3. The number of figures between the MSN and LSN is the significant figure.
For example: the following numbers have 4 significant numbers 1234; 123400; 123.4;
1001; 1010.; 101.0; 10.10; 0.0001010; the number “1010.” includes an ending decimal
point to indicate it has four significant figure. To avoid confusing, an underscore can be
added to the LSN, e.g. 1010 and 1010 respectively, to denote the number of significant
figures being 3 and 4. One of the most “scientific” ways to denote the significant figure
maybe is using the scientific notation to record the data. Actually, the scientific
notation is a standard way in writing the significant figure of an experimental data. It
uses exponent of 10 to express the dimension of the data. It is preceded by a string of
digits that represent the measured data with the proper number of significant figures.
For examples: 1010; 1010 can be written as 1.010 x 103 and 1.01 x 103, respectively. It
is especially useful to keep the significant number from decreasing or increasing during
performing unit conversions.
After the experiment, a series of calculations involving the data need to be carried out to
obtain the result. Here, the most often seen mistake is to simply copy down the number that
came out of the calculator without considering the proper significant digits. Actually, one
must deal with how many digits of the results are to be kept and how the uncertainty of the
measured data propagates. Since the last digit in a measured value contains estimation, any
arithmetic operation involving this digit will become an estimated value. The following
two examples will illustrate this point:
(a) For adding and subtracting operations, the basic rule is: during adding, if one of the
digits contains estimation, its sum will be an estimated value; however the carry-over
from the addition becomes a firm number. The following examples will further clarify
3
this point. (1) 0.7 + 0.6 = 1.3 has two significant figures; (2) 5.7 + 8.6 = 14.3; 5 + 8=13
contains a carry digit; thus, the significant figure becomes three; (3) taking three
experimental values 123.79, 17.321, and 6.8 and performs the following operation
y=123.79-17.321+6.8=113.269, what would be the significant figure? In 6.8, 0.8 is the
estimated value and thus the digit after the decimal is the least significant digit; while in
123.45, the most significant digit is hundreds. Hence the significant figure for this
number would be four. From the calculation, we knew 0.269 is the estimated value
(highlighted by the underscore). Since the least significant figure is the one below
decimal, the digits below it will be rounded and 113.269 should be written as 113.3 and
thus y=113.3.
(b) For a division operation, the rule to retain the significant digit is the same as that for
addition and subtraction; for examples: (1) 17.3*8.2=141.86=142. and (2)
z=(12.6*4.83)/2.4. The rules are as following: firstly, perform the multiplication of the
denominators to obtain z=60.858/2.4. After taking care of the proper significant digit,
the result becomes z=60.9/2.4. By completing the division, the quotient is z=25.375.
Due to the division, the digit in ones already contains estimated value and a proper
rounding has to be applied to produce z=25.
Exercise: a = 3.7528, b = 0.582, c = 2.5, compute (a+b), (a+b)*c and (a+b)/c. Pay
special attention to the significant figures. Ans: a + b = 4.335, (a+b)*c = 11, (a+b)/c
= 1.7
(3) Error analysis
All experiment contains measurement errors and uncertainty. However, with improving lab
design and techniques, the error and uncertainty can be reduced. Measurement error can be
separated into illegitimate error, systematic error and random error.
1. illegitimate error:
The data is obviously wrong or too far out of kilter from the expectance. In short, the
error is erroneous and unreasonable. The causes for the illegitimate error could be due
to carelessness (e.g. hand-writing error…) or the other unknown reasons. Repeated
measurements can weed out the illegitimate errors. As it will be discussed in the
random error section, multiple measurements can produce average value and standard
4
deviation. When the deviation of a data point is greater than three standard deviations,
then this point should be discarded.
2. Systematic error
There are two main sources of this type of error:
(a) Intrinsic error in the instruments; for examples, zero-point drifting, inaccurate
division, wore-out or aging instruments, and even badly designed instruments.
(b) Environmental factor: for example, variation of the lab temperature causing the
rulers to expand or contract.
To eliminate systematic error, proper instrument calibration and having a good
experimental environment control would be essential pre-caution steps.
3. Random error
Most of the experiments contain statistical uncertainties; meaning each measurement
may get a different result and the distribution of the measured values usually follows
the Gaussian distributions; see Figure 1. This forever existing random error cannot be
eliminated. However, through repeated measurement, we can quantify the type of error.
Some of the often-seen terms associated with random errors are:
๏ผญean
Measure the same physical quantity N times and the resulting data are
๐‘ฅ! , ๐‘ฅ" , ๐‘ฅ# , … … . , ๐‘ฅ$ . The mean ๐‘ฅฬ… is ๐‘ฅฬ… = ∑$
! ๐‘ฅ% /๐‘.
For normally distributed data, the larger the number of measurements is the closer to
the expectance is. In doing the measurements, the mean is one of the most sought-after
results.
Deviation
After N measurements to obtain the mean, the difference between each data and mean
is called the deviation) ๐‘‘% ๏ผš๐‘‘% = ๐‘ฅ% − ๐‘ฅฬ… ใ€‚
When ๐‘ฅฬ… approaching the real value, ๐‘‘% reflects the deviation of a data away from the
true value. The magnitude of the random error in a dataset is quantified by standard
deviation.
5
Standard deviation
For a finite number of measurement, the standard deviation σ is defined as
σ=.
"
∑#
! '!
$(!
"
%
"
) ∑%
& *$ ((∑& *$ )
=.
)()(!)
.
In a set of normally distributed data, the meaning of the standard deviation σis as
following: 68.2% of the data points will locate between ๐‘ฅฬ… − ๐œŽ and ๐‘ฅฬ… + ๐œŽ, 95.5 % of
them lies in ๐‘ฅฬ… − 2๐œŽ and ๐‘ฅฬ… + 2๐œŽ, while 99.7 % of the data point will be in ๐‘ฅฬ… − 3๐œŽ
and ๐‘ฅฬ… + 3๐œŽ. In short, only 0.3% of the data will appear beyond ๐‘ฅฬ… ± 3๐œŽ; see Figure 1.
Therefore, if the data points are plenty, you may discard those beyond 3๐ˆ as the
illegitimate data.
Figure 1. the normal (Gaussian) distribution
After repeat measuring a physical quantity ๐’™ many times, the result can be
6 ± ๐ˆ; ๐‘ฅฬ… is the mean and σ is the standard deviation. The smaller the
written as ๐’™
standard deviation is the smaller the random error is and the higher the accuracy.
Note: in each series of measurement with the same setup and similar experimental
conditions, the data follow the normal distribution and have the same σ. If one wants to
combine the data obtained by different groups, a more involved statistical analysis is
needed; for detail, you may want to consult๏ผš P. R. Bevington & D. K. Robinson: Data
6
Reduction and Error Analysis for the Physical Science, McGraw-Hill Book Co. Inc.๏ผŒ
N.Y., 2003. (3rd Ed.)
Precision and accuracy:
(1) Precision measures the closeness of the data points; the smallσis the similarity of
the data are; see Figure 2.
(2) Accuracy depicts how close the data points to the true value. The degree of
accuracy often expresses in the form of percentage error:
percentage error =
|๐ฆ๐ž๐š๐ฌ๐ฎ๐ซ๐ž๐ ๐ฏ๐š๐ฅ๐ฎ๐ž − ๐ญ๐ก๐ž๐จ๐ซ๐ž๐ญ๐ข๐œ๐š๐ฅ (๐จ๐ซ ๐›๐ž๐ฌ๐ญ) ๐ฏ๐š๐ฅ๐ฎ๐ž|
× 100%
๐ญ๐ก๐ž๐จ๐ซ๐ž๐ญ๐ข๐œ๐š๐ฅ (๐จ๐ซ ๐›๐ž๐ฌ๐ญ) ๐ฏ๐š๐ฅ๐ฎ๐ž
Figure 2. Target-shooting results from two different guns. (a) High accuracy but low precision. (b) Low
accuracy but high precision. Gun b probably has a misaligned front sighting; after correcting the bullet
marks will center on the bullseye.
(4) Error propagation
After acquiring the data, arithmetic or other forms of operations are often needed to derive
the final result. The area of a rectangle is equal length x width, and the density of a body is
its mass divided by its volume are two examples of the derived quantities. The error in a
derived quantity is inherited from the involved basic quantities. Here are some of the rules
on how the errors propagate in arithmetic operations (addition, subtraction, multiplication,
and division).
For illustration, let’s simply take the error as one half the smallest divisions in your
instrument. If a ruler has a smallest measuring division of 0.1 cm, the maximum error
maybe is 0.05 cm. Using this ruler to measure two wooden sticks, the first one has a length
of 16.73±0.05 cm and the second stick has a length of 3.27±0.05 cm. If you glue them
7
together, what the resulting length could be? Obviously, the maximum is 16.78๏ผ‹3.32๏ผ
20.10 cm while the minimum length is 16.68๏ผ‹3.22๏ผ19.90 cm. Hence the length of the
combined stick can be written as 20.00±0.10 cm. In addition, the lengths of the two sticks,
their errors are additive. Subtraction operation also follows the same rule.
In general, if there are several measured data points of the same physical quantity, their
values are V1±e1๏ผŒV2±e2๏ผŒV3±e3๏ผŒ……; then
๏ผˆV1±e1๏ผ‰๏ผ‹๏ผˆV2±e2๏ผ‰๏ผ‹๏ผˆV3±e3๏ผ‰๏ผ‹……
๏ผ๏ผˆV1๏ผ‹V2๏ผ‹V3๏ผ‹……๏ผ‰±๏ผˆe1๏ผ‹e2๏ผ‹e3๏ผ‹……๏ผ‰
๏ผˆV1±e1๏ผ‰๏ผ๏ผˆV2±e2๏ผ‰๏ผ๏ผˆV3±e3๏ผ‰๏ผ‹……
๏ผ๏ผˆV1๏ผV2๏ผV3๏ผ……๏ผ‰±๏ผˆe1๏ผ‹e2๏ผ‹e3๏ผ‹……๏ผ‰
This is to say, after adding (or subtracting) N times, the resulting error would be
e= e1๏ผ‹e2๏ผ‹e3๏ผ‹……+eNใ€‚
Therefore, if one of the data point’s errors is substantially larger than the rest, the
resulting error can simply be taken as the error of this sore thumb data point.
Now, if we have two physical quantities and their measured values respectively are V1±e1
and V2±e2, the product will be
é
æ e1 e 2 öù
+ ÷÷ú ๏ผŒ
V
è 1 V2 øû
๏ผˆV1±e1๏ผ‰๏ผˆV2±e2๏ผ‰๏ผV1V2±๏ผˆe1V2๏ผ‹e2V1๏ผ‰±
๏ผˆe1e2๏ผ‰ » V1V2 ê1 ± çç
ë
(If the errors are very small, the last term e1e2 is negligible.)
The same rule is also applicable to the division operation:
ù V é æe
V1 ± e1 V1 é æ e1 e2 ö
e öù
= ê1 ± ç + ÷ ± a negligible quantityú » 1 ê1 ± çç 1 + 2 ÷÷ú .
V2 ± e2 V2 ë è V1 V2 ø
û V2 ë è V1 V2 øû
From the above discussions, we learned that, after N multiplications/divisions, the
maximum error is๏ผš
๐‘’ ๐‘’! ๐‘’"
๐‘’$
= + +โ‹ฏ
๐‘‰ ๐‘‰! ๐‘‰"
๐‘‰$
8
The maximum deviation (error) is related to (๐‘ฅ% − ๐‘ฅฬ… ) and the standard deviation σ is
related to (๐‘ฅ% − ๐‘ฅฬ… )" . Hence the standard deviation σ will amplify as σ2 during each
arithmetic operation. In most of the cases, the so-called error propagation is in the form
of the standard deviation propagation.
Formulae for standard deviation σ propagations:
For two independently measured quantities u and v, their mean and standard deviation
respectively are (๐‘ขZ๏ผŒ๐‘ฃฬ… ) and (๐œŽ- ๏ผŒ๐œŽ. ); u = ๐‘ขZ ± ๐œŽ- and v = ๐‘ฃฬ… ± ๐œŽ. . Also, a and b are
constants. The followings are the formulae for five types of basic operations.
(a) Addition and subtraction:
When ๐‘ฅ = ๐‘Ž๐‘ข + ๐‘๐‘ฃ๏ผŒ
then ๐‘ฅ = ๐‘ฅฬ… ± ๐œŽ/ ; where ๐‘ฅฬ… = ๐‘Ž๐‘ขZ + ๐‘๐‘ฃฬ… and ๐œŽ/ " = ๐‘Ž" ๐œŽ- " + ๐‘ " ๐œŽ. " .
When ๐‘ฅ = ๐‘Ž๐‘ข − ๐‘๐‘ฃ๏ผŒ
then = ๐‘ฅฬ… ± ๐œŽ/ ; where ๐‘ฅฬ… = ๐‘Ž๐‘ขZ − ๐‘๐‘ฃฬ… and ๐œŽ/ " = ๐‘Ž" ๐œŽ- " + ๐‘ " ๐œŽ. " .
(b) Multiplication and division:
When ๐‘ฅ = ๐‘Ž๐‘ข๐‘ฃ๏ผŒ
then ๐‘ฅ = ๐‘ฅฬ… ± ๐œŽ/ ; where ๐‘ฅฬ… = ๐‘Ž๐‘ขZ๐‘ฃฬ… and ๐œŽ/ " /๐‘ฅฬ… " = ๐œŽ- " /๐‘ขZ" + ๐œŽ. " /๐‘ฃฬ… " .
When ๐‘ฅ = ๐‘Ž๐‘ข/๐‘ฃ๏ผŒ
then ๐‘ฅ = ๐‘ฅฬ… ± ๐œŽ/ ; where ๐‘ฅฬ… = ๐‘Ž๐‘ขZ/๐‘ฃฬ… and ๐œŽ/ " /๐‘ฅฬ… " = ๐œŽ- " /๐‘ขZ" + ๐œŽ. " /๐‘ฃฬ… " .
(c) Powers:
When ๐‘ฅ = ๐‘Ž๐‘ข±1 ๏ผŒ
then ๐‘ฅ = ๐‘ฅฬ… ± ๐œŽ/ ; where ๐‘ฅฬ… = ๐‘Ž๐‘ขZ±1 and ๐œŽ/ /๐‘ฅฬ… = ๐‘๐œŽ- /๐‘ขZ.
(d) Exponents:
When ๐‘ฅ = ๐‘Ž๐‘’ ±1- ๏ผŒ
then ๐‘ฅ = ๐‘ฅฬ… ± ๐œŽ/ ; where ๐‘ฅฬ… = ๐‘Ž๐‘’ ±1-2 and ๐œŽ/ /๐‘ฅฬ… = ๐‘๐œŽ- .
(e) Logarithms:
When ๐‘ฅ = ๐‘Ž ๐‘™๐‘›(±๐‘๐‘ข)๏ผŒ
when ๐‘ฅ = ๐‘ฅฬ… ± ๐œŽ/ ; where ๐‘ฅฬ… = ๐‘Ž ๐‘™๐‘›(±๐‘๐‘ขZ) and ๐œŽ/ = ๐‘Ž๐œŽ- /๐‘ขZ.
9
Here are some examples. Assume u = 5.75 ± 0.20, v = 4.25 ± 0.10 and ๐‘Ž = 2 , ๐‘ =
3๏ผŒ
Eg.1: when ๐‘ฅ1 = ๐‘Ž๐‘ข + ๐‘๐‘ฃ, then ๐‘ฅ1 = 24.25 ± 0.50.
Eg.2: when ๐‘ฅ2 = ๐‘Ž๐‘ข๐‘ฃ, then ๐‘ฅ2 = 48.9 ± 2.1 (with the proper significant figure).
Eg. 3: when ๐‘ฅ3 = ๐‘Ž๐‘ข1 , then ๐‘ฅ3 = 380 ± 40 (with the proper significant figure).
Exercise 2: complete the arithmetic of the above examples.
(5) Linear regression
To process experimental data, linear regression is an extremely important and useful tool.
After completing an experiment, we may have a large set of data on hands; (xn, yn) and n =
1, ……, N.
For these data, if we want to find a function to best describe them; linear regression is certainly
the best tool at hands. The following content would be more involved for most you. However,
you just need to understand the essence of this method and how to find the best fit function. You
can enlist the help of Excel, which has built-in function that you can call and use.
Figure 3. The difference between the measured and the fitted.
In doing the linear regression, we need a quantity call the χ2 (read asใ€”kai squareใ€•) to tell us
how good the fit is. If the fitted function is Y = Y(x), then the χ2 is defined by Eq. (1).
10
For Y = Y(x), every x = x n has a corresponding value Yn = Y(x n ); see Figure 3. The measured
value yn and the functional value differs by
Δyn = yn - Yn
Theχ2 is defined as
N
c 2 = å (Dy n )2
n =1
N
2
= å (y n๏ผY n ) ·········································································· (1).
n =1
χ2 represents the squared difference of the data points and the values of the fitted function. The
best fit function would give us the smallest χ2. Hence the best fit function is also called the least
square fit.
We will use a linear function as an illustration to explain how to find the best fit for the
experimental data. Assume the data are (x n , yn) and n = 1, ……, N.
The fitting function is Y(x)๏ผax๏ผ‹b, where a and b are unknowns and to be determined from the
experimental data. The procedure is as following:
For every x = x n , the corresponding theoretical value would be
Yn = ax n + b ···················································································(2)
Plugging (2) into (1) and
N
c 2 = å (y n - ax n - b )2 ······································································(3)
n =1
Now the problem becomes finding the most suitable a and b to minimizeχ2. Hence it is minimal
finding problem. When χ2 has a minimal, its partial differential with respective a and b needed
to be zero separately.
N
¶c 2
= 2å [(y n - ax n - b)(- x n )] = 0 ························································(4)
¶a
n =1
N
¶c 2
= 2å [(y n - ax n - b )(- 1)] = 0 ·························································(5)
¶b
n =1
11
After re-arranging, they become
N
N
n =1
n =1
N
- å y n x n + a å x n +bå x n = 0 ··························································(6)
2
n =1
N
N
N
n =1
n =1
n =1
- å y n + a å x n +bå (1) = 0 ·······························································(7)
Here we define
N
N
N
ì
(1) = N
S
=
x
,
S
=
y
,
S
=
å
ïï x å n y å n
n =1
n =1
n =1
···················································(8)
í
N
N
2
ï
S xx = å x n , S xy = å x n y n
ïî
n =1
n =1
and plug them into Eqs. (6) and (7) to obtain
ìaS xx + bS x = S xy
í
î aS x + bS = S y
Solve for these simultaneous equations,
ì a = (S xyS - S x S y ) D
·····································································(9)
í
îb = (S xxS y - S xyS x ) D
Where D๏ผSSxx๏ผSx2. Plug the experimental data into (8) and compute all the S, Sx, Sy, Sxx, Sxy.
And put them into (9) to solve for a and b. Finally, the best fit function is Y(x) = ax + b.
How well does the fitting function Y(x) = ax + b describe the experimental data (x n , yn)? We
need the correlation coefficient R2 to quantify it.
$
$
"
(๐‘ ∑$
! ๐‘ฅ3 ๐‘ฆ3 − ∑! ๐‘ฅ% ∑! ๐‘ฆ3 )
๐‘… =
$
$ "
$
"
"
"
[๐‘ ∑$
! ๐‘ฅ3 − (∑! ๐‘ฅ3 ) ][๐‘ ∑! ๐‘ฆ3 − (∑! ๐‘ฆ3 ) ]
"
∑$
∑$
Z)"
! (๐‘Œ3 − ๐‘ฆ
! (๐‘Œ3 − ๐‘ฆ3 )
= $
=1− $
∑! (๐‘ฆ3 − ๐‘ฆZ)"
∑! (๐‘ฆ3 − ๐‘ฆZ)"
"
where ๐‘ฆZ =
∑#
& 4'
$
, ๐‘Œ3 = ๐‘Ž๐‘ฅ3 + ๐‘
From the above definition, when the measured value yn approach the fitted value Yn, R2
will approaches unity. In short, the better the fitting is, the higher the correlation of linear
function Y(x) = ax+b to the experimental data.
12
As you know that every data contains measuring error. Under this situation, how do we
modify the linear regression method to account for this fact? Assume the data are (xn!yn ±σn),
then we redefine χ2 as
N
N
n =1
n =1
c 2 = å (Dy n s n )2 = å [(y n - ax n - b ) s n ]2 ······································· (10)
Why do we divide the Δyn for every data point by σn? We want to reduce the contribution of the
data points with a larger error to the χ2. In this way, we also mean these points are less important.
Repeat the above procedures and then
N
N
N
ì
2
2
2
ïïS x = å x n s n , S y = å y n s n , S = å 1 s n
n =1
n =1
n =1
······························· (11)
í
N
N
2
2
2
ï
S xx = å x n s n , S xy = å x n y n s n
ïî
n =1
n =1
(
)
(
(
)
)
(
(
)
)
ì a = (S xyS - S x S y ) D
··································································· (12)
í
(
)
b
=
S
S
S
S
D
xx
y
xy
x
î
D = SS xx - S x
2
With these, we conclude the introduction to the linear regression method.
If you want to use other functions, besides the linear functions, the linear regression
method is still applicable. Just the situation will be more involved; for detail, please consult any
books on numerical analysis methods (for example: P. R. Bevington & D. K. Robinson: Data
Reduction and Error Analysis for the Physical Science, McGraw-Hill Book Co. Inc.๏ผŒN.Y., 2003.
(3rd Ed.)).
13
Exercises:
EXCEL has built-in macros and functions that you can call to perform data processing
and error analysis. Some of the examples are AVERAGE for finding the average, STDEV to
compute the standard deviation, scatter plot in the graphing macros to plot the distribution of (yn,
xn), the “adding trend” capability to find the fitting function and examine the goodness of the
fitting from the R2 value. Moreover, you can also add the error and the standard deviation
information
to
each
data
plot.
For
detail,
you
can
refer
to
https://www.clemson.edu/ces/phoenix/tutorials/excel/regression.html,
https://msu.edu/course/psy/403/StatDemos/Regression/Regression.htm, or ask GOOGLE to
look for “linear regression using excel.” Here are four exercises to wet your hands.
(1) You order sixty 180cm by 90cm lab tables from a manufacture. After they delivered the
tables, you checked their sizes and found their length and width respectively are 180.45 ±
1.35cm and 89.75 ± 1.05cm. Find the perimeter and the area of the table.
(2) Table 1 contains fifty surface gravity values measured at Tainan. (a) Using EXCEL to
compute the average and the standard deviation. (b) Using the “bar graph” in EXCEL to plot
the data, do you recognize any unreasonable data points? (c) Plot the data lie in the range of
7.0-13.0, bin the 0.1 size bin and find out the number of data in each bin (e.g. there are 3
data points in the 9.5-9.6bin). Is the data normally distributed? Where do the possible
illegitimate data lie? (d) Using the results from a-c to weed out the illegitimate data and
using EXCEL to re-compute their average and standard deviation. (e) If the accepted value
for the surface gravity at Tainan is 9.785m/s2, what would be the percentage error?
TABLE 1. The surface gravity measured at Tainan (in units of 9.785m/s2)
9.135
9.252
9.479
9.356
8.978
9.021
8.533
8.853
9.811
9.288
9.511
8.663
9.221
9.479
9.173
9.149
9.389
7.135
9.263
9.663
9.767
8.885
9.528
9.053
8.623
9.451
9.151
9.246
9.937
9.589
9.867
9.456
9.312
9.384
9.238
8.451
8.326
9.233
9.043
9.434
9.276
12.383 9.378
8.747
10.023 8.923
9.334
9.078
9.115
9.215
14
(3) TABLE 2 contains ten sets of velocity data taken at five different time. (a) Using EXCEL to
compute the average velocity and the standard deviation at these times. (b) Using EXCEL to
plot the average velocity versus time and plot the trend. Fitting the data with a function
v(t)=a t + b and find a, b and the correlation coefficient R2. (c) Plot the error bar for each
average velocity using the standard deviation.
TABLE 2. velocity versus time
t (sec) v1 (m/sec) v2 (m/sec) v3 (m/sec) v4 (m/sec) v5 (m/sec) v6 (m/sec) v7 (m/sec) v8 (m/sec) v9 (m/sec) v10 (m/sec)
0.072 0.392
0.318
0.387
0.445
0.269
0.256
0.423
0.371
0.303
0.273
0.145 0.971
1.182
1.123
1.027
1.141
1.056
1.092
1.004
1.128
1.035
0.217 1.847
1.652
1.804
1.712
1.789
1.721
1.826
1.688
1.686
1.806
0.290 2.548
2.332
2.513
2.399
2.501
2.417
2.495
2.435
2.533
2.424
0.362 3.298
3.082
3.277
3.12
3.255
3.134
3.226
3.165
3.172
3.18
(4) Using a camera to record the free fall of a marble at rest from height y(0); the trajectory of
!
the marble is ๐‘ฆ(๐‘ก) = ๐‘ฆ(0) − " ๐‘”๐‘ก " . In TABLE 3, you will find the positions of the marble
deduced from the six image frames. The interval between adjacent frames is Δt =0.0724 s,
though the exact recording time t1 of the first frame is unknown. You are asked to find the
gravitational acceleration g and the initial height y(0). The procedure is as following:
(a) Let t=t1+t’ and use EXCEL to plot the average position versus time. Plot on the trend
line and determine the fitting function ๐‘ฆ(๐‘ก 6 ) = ๐‘Ž๐‘ก′" + ๐‘๐‘ก 6 + ๐‘ and the correlation
coefficient R2.
!
(b) Plug t=t1+t’ into the trajectory ๐‘ฆ(๐‘ก) = ๐‘ฆ(0) − " ๐‘”๐‘ก " to obtain another function y(t’).
Compare it with the fitting function ๐‘ฆ(๐‘ก 6 ) = ๐‘Ž๐‘ก′" + ๐‘๐‘ก 6 + ๐‘ in part (a); then find a, b, c,
gravitational acceleration g, t1 and initial height y(0).
15
TABLE 3. height versus time for a free-falling marble
t (s)
t’ (s)
y(t), y(t’) (m)
1
t1
0
-0.0130
2
t1+0.0724
0.0724
-0.0380
3
t1+0.1448
0.1448
-0.1160
4
t1+0.2172
0.2172
-0.2430
5
t1+0.2897
0.2897
-0.4190
6
t1+0.3621
0.3621
-0.6500
ๅฝฑๅƒ็ทจ่™Ÿ
Note: Excel also be used for simulative experiments, the related information can be looked
up at http://gplab.phys.ncku.edu.tw/course/first/1/.
Part B: References
(1) P. R. Bevington & D. K. Robinson ๏ผš Data Reduction and Error Analysis for the Physical
Science, McGraw-Hill Book Co. Inc.๏ผŒN.Y., 2003. (3rd Ed.)
(2) William H. Press, Saul A. Teukolsky , William T. Vetterling Brian, P. Flannery “Numerical
Recipes 3rd Edition: The Art of Scientific Computing”, 2007. (3rd Ed.)
16
Self-evaluation check list: Data reduction and error analysis – self-evaluation
check list
After completing the experiment, use this check list to evaluate your grasp of the experiment. If
you have items that mark as “completely lost” or “very vague”, please re-examine the
experimental procedure, re-read the lab manual, or consult your TA or instructor to improve
your understanding of the content.
Item
Fully
Mostly
understood understood
Vague
Completely
lost
1. Know how to recognize illegitimate data
2. Know how to take significant figure
3. Know how to use EXCEL to find the average
and the standard deviation
4. Know how to use EXCEL to perform linear
regression
5. Know how the error propagate in compuations
6. Know the difference between precision and
accuracy
17
Editor: Department of Physics,
National Cheng Kung University
Revision date: 2021/08
By Yi Yang
Download