ch6

advertisement
CHAPTER 6
DATA AND STATISTICAL ANALYSIS
Extrinsic m-scripts and functions
stat( )
Basic statistical calculations
Input: n2 array for frequency and value of measurements
Output: number of measurements, mean, mode, median,
population & sample standard deviations, standard error of the
mean
gauss( )
Gaussian (normal) probability distribution function: plot and
probabilities
Inputs: mean, standard deviation, lower & upper limits
Outputs: probability of measurement between lower and upper
limits, full width at half maximum
chi2test( )
Chi-squared distribution: plot and probability of 2 > 2max
Inputs: degrees of freedom and max value of chi-squared
Outputs: probability of 2 > 2max
linear_fit( )
Fitting a linear, power or exponential equation to a set of
measurements
Inputs: x, y data, range of x values for plot and equation type
Outputs: values of equation coefficients & their uncertainties,
correlation coefficient
weighted_fit
Fitting an equation to a set of data that has uncertainties
associated with the y values. Need to modify m-script for each
fit_function( ) fit. The data & fitted function are plotted and the equation
part_der( )
coefficients with uncertainties are given and a chi-squared test is
performed to give an estimate of the goodness of the fit.
Measurement is an essential part of any science and statistics is an indispensable tool
used to analyze data. A statistical treatment of the data allows the uncertainties
inherent in all measured quantities to be considered and allows one to draw justifiable
conclusions from the data.
The data to be analyzed can be assigned to a matrix in the Command Window or
through the Workspace and Array Editor Windows. For example, to enter data
directly into the Array Editor just like you would enter data into a spreadsheet, follow
the steps:
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
1

Create the array, for example, StatData in the Command Window
StatData = []


Open the Workspace Window and then Array Editor Window for the matrix
StatData.
Enter the size of the matrix and then enter the data into the cells, just as you
would do in a spreadsheet.
Data can also be transferred to and from MS EXCEL.
1
DATA TRANSFER TO MATLAB FROM MS EXCEL
Select and copy the data to the clipboard within the MS EXCEL Worksheet Window.
Go to the Matlab Command Window:
Edit  Paste Special  Import Wizard  Next  Finish
The data from MS EXCEL has now been transferred to the array called clipboarddata.
Transfer the data to a new array, for example,
StatData = clipboarddata
Type whos in the Command Window to review the properties of the clipbboardata
and StatData arrays
Name
clipboarddata
StatData
Grand total is 24
Size
6x2
6x2
elements using 192
Bytes Class
96 double array
96 double array
bytes
The data can be saved as a file using the save command, for example,
save test_data StatData
To recall the data, use the load command
load test_data or load test_data Statdata
2
DATA TRANSFER TO MS EXCEL FROM MATLAB
Numbers can be transferred from Matlab into MS EXCEL through the Workspace
Window and Array Editor.
Consider the row vector
Xdata = [1 2 3 4 5 6 7 8 9 pi]
View the Array Editor Window for Xdata:
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
2
Edit  Select All  Edit  Copy
In the MS EXCEL worksheet window with the cursor at the insertion point:
Edit  Paste
The data will be pasted into 10 adjacent columns. The 10th column will contain the
number  = 3.1416. Only five significant figures have been pasted into MS EXCEL.
The number of significant figures can be changed in the Array Editor Window, by
selecting the Numeric format:
longG
longE
 pi = 3.14159265358979
 pi = 3.14159265358979E+00
Then the extra significant figures can be pasted into MS EXCEL.
It may be more useful to place the numbers into rows than columns in MS EXCEL.
You can do this by finding the transpose of the row vector to give a column vector
Xdatat = Xdata’
then copying and pasting it into MS EXCEL gives the numbers in the rows of a
column.
Two dimensional arrays can be copied and pasted into MS EXCEL in exactly the
same way. For example, in Matlab
data2 = [1 2 3; 4 5 6; 7 8 pi ; exp(1) rand rand]
gives
1
4
7
2.7183
2
5
8
0.95013
3
6
3.1416
0.23114
in MS EXCEL.
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
3
3
MEASUREMENT AND BASIC STATISTICS
In this section, an introduction to the basic ideas used in analyzing experimental data
and assessing the precision of the results is discussed. A measurement is the result of
some process of observation or experiment. The aim of the measurement is to
estimate the ‘true’ value of some physical quantity. However, we can never know the
‘true’ value and so there is always some uncertainty that is associated with the
measurement (except for simple counting processes). The sources of uncertainty in a
measurement are often classified as random or systematic. Systematic uncertainties
are those that are inherent in the measuring system and can’t be detected by taking
many repeated measurements. A measurement that is affected by systematic
uncertainties are said to be inaccurate. When repeated measurements are made on a
physical quantity, there are usually some fluctuations in the results. These fluctuations
give rise to the random uncertainties. The term precision refers to the size of the
random fluctuations. An accurate measurement is one in which the systematic
uncertainties are small and a precise measurement is one in which the random
uncertainties are small.
The statistical treatment of measurement is related to the fluctuations associated with
the random uncertainties. If the set of measurements of a physical quantity that
correspond to the data is only organized into a table, then one can’t see clearly the
essential features of the data just by inspecting the numbers. The data can be better
viewed when it is displayed in a bar graph or histogram. A histogram is drawn by
dividing the original measurements into intervals of pre-determined magnitude and
counting the number of observations found within each interval. If the number of
readings is very high, so that a fine sub-division of the scale of values can be made,
the histogram approaches a continuous curve called a distribution curve.
Consider an experiment carried out to measure the speed in air. One hundred results
were obtained as shown in the table
Speed v
(m.s-1)
Frequency f
328.2
328.3
328.4
328.5
328.6
328.7
328.8
2
3
37
40
12
4
2
The data can be entered into a 72 matrix called speed_data. Column 1 for the
frequencies and column 2 for the speeds.
The results can be displayed in a histogram using the bar command as shown in
figure 6.1 using the extrinsic function stat( ).
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
4
Speed of Sound in Air
40
35
Frequncy f
30
25
20
15
10
5
0
Fig. 6.1.
328.2
328.4
328.6
Speed v (m.s -1)
328.8
Histogram plot of the speed data using the function stat.
There are several important statistical quantities used to describe the frequency
distribution for a set of measurements xi of frequency fi. The most important one being
the mean (or average) x that is simply the sum of all the measurements xi divided by
the number of measurements n. For a symmetric distribution, the mean gives us the
most probable measurement or the best estimate of the ‘true’ value. The measurement
that corresponds to the peak in the distribution curve is known as the mode. If there
are some measurements that are abnormally low or high, they will distort the mean
and the median may be a better estimate of the ‘true’ value. If there are n
measurements, then the median is the middle measurement, so that as many readings
fall below it in value as above it. It is located on the distribution curve by the value
that will divide the area into two equal portions.
n   fi
Number of measurements
(6.1)
i
Mean
x
1
 fi xi
n i
(6.2)
The mean, mode and median give information about the centre of the frequency
distribution, but not about the spread of the readings or the width of the distribution.
The most widely used measures of the scatter for the measurements about the mean
are the population standard deviation sn, sample or experimental standard
deviation sn-1 and the standard error of the mean E.
sn 
Population standard deviation
Sample standard deviation
a04/mat/mg/ch6.doc
3/9/2016
sn 1 
9:23 PM
 x  x 
2
i
i
(6.3)
n
 x  x 
2
i
i
n 1
(6.4)
5
E
Standard Error of the mean
sn
s
 n 1
n 1
n
(6.5)
The variance is also a useful quantity and is equal to the square of the standard
deviation. If the standard deviation has been determined from a sample of a great
many readings, it gives a measure of how far individual readings are likely to be from
the mean value. If the measurements are subjected to small random fluctuations then
the distribution is called the normal or Gaussian distribution.
The function stat can be used for various statistical calculations. The data to be
analyzed is passed to the function stat as an n2 matrix, where n is the number of
measurement intervals and the two columns are for the frequencies and values. The
function plots a histogram and calculates some of the more commonly used statistical
quantities used to describe the frequency distribution. The function stat can be easily
edited to remove or add quantities calculated. The function stat is described by
Inputs
Outputs
Population
standard
deviation
Number of measurements
Mean
Mode
[n, xbar, mode, median, s_pop, s_sample, E] = stat(StatData)
Median
Sample
standard
deviation
Measurements:
Frequency Value
Standard error of the mean
The results for the speed of sound data using the function stat are
[n, xbar, mode, median, s_pop, s_sample, E] = stat(speed_data)

n = 100
xbar = 328.4770
mode = 328.5000
median = 328.5000
s_pop = 0.0104
s_sample = 0.0105
E = 0.0010
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
6
4
PROBABILITY DISTRIBUTIONS
Statistics only considers the random nature of measurement. Random processes can
be discrete (throwing a single dice) or continuous (height of a population) and are
described by a probability density function P that gives the expected frequency of
occurrence for each possible outcome. For a random process involving the single
variable x, the probability density function P(x) is
Discrete: probability of xi occurring = P(xi)
(6.6)
Continuous: probability of x occurring between x1 and x2
x2
P( x1  x  x2 )   P( x) dx
(6.7)
x1
The probability of an occurrence of any event must be one and so the probability
density function is normalized
Discrete: probability of any xi occurring =
 P( x )  1
i
(6.8)
i
Continuous: probability of any x =

xmax
xmin
P( x) dx  1
(6.9)
The expectation value of a function is defined to be
E  f ( x)    f ( x) P( x) dx
(6.10)
A probability distribution is often characterized by its first two moments which are
determined by an expectation value. The first moment about zero is called the
expectation value of x and is simply the mean or average x value
E  x      x P ( x) dx
(6.11)
This expectation value  refers to theoretical mean of the distribution and needs to be
distinguished from the mean of a sample x .
The second moment or second central moment about the mean 2 is called the
variance and again, this is different to the sample variance s2
2
E  x       2    x    P( x) dx


2
(6.12)
The square root of the variance is the standard deviation  and measures the
dispersion or width of the probability distribution.
A process can be characterized by several random variables x, y, z, … and is described
by a multivariate distribution P(x, y, z, … ). We can define the covariance cov of
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
7
the distribution as a measure of the linear correlation between any two of the
variables, for example, the covariance between x and y is
cov( x, y )   ( x   x )( y   y ) dx dy
(6.13)
Similar relations exist for cov(x, z) and cov(y, z).
This is often expressed as the correlation coefficient rxy between two values x and y
with standard deviations x and y
rxy 
cov( x, y)
(6.14)
x y
The correlation coefficient r varies between –1 and +1. If | rxy | = 1 then the two
variables x and y are perfectly linearly correlated. If rxy = 0 then x and y are linearly
independent, but we can not say that they are completely independent, for example, if
y = x2 then rxy = 0 and x and y are not independent.
4.1
Gaussian or Normal Distribution
The Gaussian or normal distribution pays a central role in the statistics associated
with the all the sciences and engineering. The Gaussian distribution often provides a
good approximation to the distribution of the random uncertainties associated with
most measurements. The Gaussian probability density function P(x) is continuous and
depends upon the theoretical mean  and theoretical variance 2
  x   2 
P( x) 
exp  


2  2 
2 

1
(6.15)
The standard deviation  is a measure of the width of the distribution and from the
area under the Gaussian curve
Probability[( - )  x  ( + )] = 0.68
Probability[( - 2)  x  ( + 2)] = 0.95
Probability[( - 3)  x  ( + 3)] = 0.997
Hence, for a Gaussian distribution, 68% of the measurements will be within ±  of the
mean, 95% will be within ± 2 and 99.7% will be within ± 3 the mean value.
These values should be kept in mind when interpreting a measurement. For a set of
measurements of a quantity with normally distributed uncertainties, then 68% of the
measurements should fall in the range of x  s where x and s are the sample mean
and sample standard deviation respectively.
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
8
If the measurement was quoted as x  E where E is the standard error of the mean,
then this is interpreted as there is a 68% chance that the “true value” falls in this range
or if a set of mean values were calculated, 68% of those values would fall in this
range. That is, the standard error of the mean, E enables one to assess how closely the
sample mean is likely to be to the mean of the population. Since the standard error of
the mean depends upon the number of measurements, it should always be quoted with
the number of measurements.
In many application, a measure of the width of the distribution is the full width at
half maximum, FWHM and is related to the standard deviation by
FWHM  2 2ln 2  2.35
(6.16)
The extrinsic function gauss can be used to display the Gaussian probability density
and calculate the FWHM and probability of x being in the range from x1 to x2 by
finding the area under the curve using the Matlab command trapz(x,y). The function
is described by
Outputs
Inputs
mean
Total area under curve (%)
Probability (%)
Standard deviation
[area, prob, FWHM, xFWHM1, xFWHM2] = gauss(mu, sigma, x1, x2)
Full Width at Half Maximum and corresponding x values
Limits for probability calculation
For example, gauss(100, 10, 90, 110) 
mean, mu = 100
standard deviation, sigma = 10
area (%) = 100.0000
prob (%) = 68.2689
FWHM = 23.5482
xFWHM1 = 88.2259
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
xFWHM2 = 111.7741
9
Gaussian distribtion
0.04
 = 100
x 1 = 90
0.035
 = 10
x 2 = 110
Prob(x 1<x<x 2) = 0.6827
0.03
FWHM = 23.5482
P(x)
0.025
0.02
0.015
0.01
0.005
0
40
Fig 6.2
4.2
60
80
100
x
120
140
160
Gaussian distribution using the function gauss
The Chi-Squared Distribution
The chi-squared distribution 2 (2 is a single entity and is not equal to ) and is
very useful for testing the goodness-of-of fit of a theoretical equation to a set of
measurements. For a set of n independent random variables xi that have a Gaussian
distribution with theoretical means i and standard deviations i, the chi-squared
distribution 2 defined as
 x  i 
   i

i 
i 
n
2
2
(6.17)
2 is also a random variable because it depends upon the random variables xi and i
and follows the distribution
 
 1

P( x) 
  2  2


 2 
exp    2 
(6.18)
 
2 
2
where ( ) is the gamma function and  is the number of degrees of freedom and is
the sole parameter related to the number of independent variables in the sum used to
describe the distribution. The mean of the distribution is  =  and the variance is  =
2. This distribution can be used to test a hypothesis that a theoretical equation fits a
set of measurements. If an improbable chi-squared value is obtained, one must
question the validity of the fitted equation. The chi-squared characterizes the
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
10
fluctuations in the measurements xi and so on average {(xi - i) / i} should be about
one. Therefore, one can define the reduced chi-squared value  2 r as
2
 r

2
(6.19)
therefore, for a good fit the between theory and measurement  2 r
1
The function extrinsic chi2test can be used to display the distribution for a given
degree of freedom  and give the probability of a chi-squared value exceeding a given
chi-squared value.
Outputs
Inputs
Probability 2 > 2max
Value of 2 to test
prob = chi2test(dof, chi2Max)
Degrees of freedom 
For example, chi2test(6, 12)  prob = 6.2 %. This would imply that the
hypothesis should be rejected because there is only a relatively small probability that
2 = 12 with  = 6 degrees of freedom would occur by chance.
chi-squared distribtion
0.14
dof = 6
0.12
chi2Max = 12
prob % = 6.17
0.1
P(x)
0.08
0.06
0.04
0.02
0
0
5
10
15

Fig 6.3
20
25
30
2
chi-squared value using the function chi2test.
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
11
5
GRAPHICAL ANAYSIS OF EXPERIMENTAL DATA
Individual judgment can be used to draw an approximating curve to fit a set of
experimental data. However, a much better approach is to perform a statistical
analysis on the experimental data to find a mathematical equation that ‘best fits the
data’. There are a number of alternative methods that can be used to fit a theoretical
curve to a set of measurements. However, no one method is able to fit all functions to
the experimental data.
5.1
Curve Fitting: Method of Least Squares (no uncertainties in data)
To avoid individual judgments in approximating the curves to fit a set of data in
which any uncertainties are ignored, it is necessary to agree on a definition of ‘best
fit’. One way to do this is that all the curves approximating a given set of
experimental data, have the property that
(y  f )
i
i
2
is a minimum
(6.20)
i
where (yi – fi) is the deviation between the value of the measurement (xi, yi) and the
fitted value fi = f(xi). This approach of finding the curve of best fit is known as the
method of least squares or regression analysis.
A straight line fit is the simplest and most common curve fitted to a set of
measurements. The equation of a straight line is
f(x) = y = m x + b
(6.21)
where the constants m and b are the slope or gradient of the straight line and the
intercept (value of y when x = 0) respectively. If a straight line fits the data, we say
that there is a linear relationship between the measurements x and y and if the
intercept b = 0 then y is said to be proportional to x: y  x or y = m x where the slope
m corresponds to the constant of proportionality.
Using the method of least squares for a set of n measurements (xi, yi), estimates of the
slope m, intercept b and uncertainties in the slope Em and intercept Eb for the line of
best fit are
n   xi yi   xi  yi
slope
m
intercept
1

b    yi  m  xi 
n i
i

a04/mat/mg/ch6.doc
3/9/2016
i
i
i


n  xi 2    xi 
i
 i 
9:23 PM
2
(6.22)
(6.23)
12
2
1
1



i yi  n  i yi   m  i xi yi  n i xi i yi 
 n  2
2
s
n
standard error in slope
Em  s
standard error in intercept


  xi 
 i 
Eb  s
2


2
n  xi    xi 
i
 i 


n  xi 2    xi 
i
 i 
2
(6.24)
(6.25)
2
(6.26)
correlation coefficient
r
x
i
yi 
i
1
 xi i yi
n i
2
2

1
1
 
 
2
2
  xi    xi     yi    yi  
 i
n  i    i
n i
 

(6.27)
The correlation coefficient r is a measure of the how good the line of best fit is to the
data. The value of r lies varies from zero and one. If r = 0 there is no linear correlation
between the measurements x and y and if r = 1, then the linear correlation is perfect.
The standard errors of the slope and intercept give an indication of the accuracy of
the regression. Simply quoting the values of the slope m and intercept b is not very
useful, it is always best to give a measures of the ‘goodness of the fit’: the correlation
coefficient r, and the uncertainties of the slope Em and intercept Eb.
Often the relationship between the x and y data is non-linear but of a form that can be
easily reduced to one which is linear. Two very common relationships of this form are
the
power relationship
y  a1 xa2
(6.28)
exponential relationship
y  a1 e a2 x
(6.29)
and
Power relationship
y  a1 x a2
 log10 y  log10 a1  a2 log10 x
 Y  log10 y
X  log10 x m  a2
b  log10 a1 a1  10b
(6.30)
 Y  m X b
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
13
The intercept b is determined from equation (6.23) and equation (6.30) to give
a1  10b . The slope m = a2 is the power. The uncertainty in the power Ea2 = Em is
given by equation (6.25). The uncertainty Eb in the intercept b is given by equation
(6.26) determines the uncertainty E a1
 a 
Ea1   1  Eb
 b 
log10a1 = b 
1  a1 
 a1 
b

 1  
  a1  10
a1  b 
 b 
Ea1  10b Eb
(6.31)
Exponential relationship
y  a1 e a2 x
 log e y  log e a1  a2 y
 Y  log e y
X  x m  a2
b  log e a1
a1  eb
(6.32)
 Y  mX  b
The intercept b is determined from equation (6.23) and equation (6.32) and gives
a1  eb . The slope m = a2 is the power. The uncertainty in the power is Em is given by
equation (6.25). The uncertainty in the intercept b is given by equation (6.26) and the
uncertainty Ea1 is
 a 
Ea1   1  Eb
 b 
logeA = a  log ea1  b 
Ea1  eb Eb
1  a1 
 a1 
b

 1  
  a1  e
a1  b 
 b 
(6.33)
The extrinsic function linear_fit can be used to fit a straight line to the experiment
data and plot the data and the line of best fit for relationships of the form
1.
y  a1  a2 x
2.
y  a1 x a2
3.
y  a1 ea2 x
Data in the form of a n2 matrix, the minimum and maximum values of the x
coordinate for plotting the graph and a flag to select the type of relationship is passed
to the function. The function returns values for the coefficients a1 and a2 and the
uncertainties E a1 and Ea2 and the correlation coefficient r. Entering or changing the
labeling of the graph is done within the m-script. The positions of the labels can be
changed by activating Edit Plot in a Plot Window.
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
14
The function linear_fit is described by
Flag = 1 or 2 or 3
1: y  a1  a2 x
coefficients
2: y  a1 x a
2
uncertainties
3: y  a1 e
correlation coefficient
a2 x
[a1, a2, Ea1, Ea2, r] = linear_fit(xyData, xmin, xmax, flag)
xi, yi data in a matrix of n rows by 2 columns
Column 1: xi data
Column 2: yi data
max x value
for plot
min x value
for plot
Three examples are given on how to use the linear_fit function. The measurements
used for the data for each of the three relationships, linear, power and exponential
come from observations of an object hanging from a vertical spring.
Example 6.1
A spring had a load added to it causing it to extend. The load F was measured in
newtons and the extension e in mm. The hypothesis to be tested is that the load F is
proportional to the extension e
F=ke
where the constant of proportionality k is known as the spring constant which is
normally measured in N.m-1.
The measurements for the load F and extension e were
e (mm)
F (N)
0
0
20
0.50
55
1.00
78
1.50
98
2.00
130
2.50
154
3.00
173
3.50
205
4.00
With this data, the hypothesis can be tested and if it is accepted, the value of the
spring constant k can be determined.
The data was entered into the matrix xyData1 and in the Command Window, the
function linear_fit was executed
[a1, a2, Ea1, Ea2, r] = linear_fit(xyData1, 0, 250, 1)
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
15
The output of this function to the Command Window is
y=mx+b
n= 9
slope m = 0.01957
intercept b = 0.0147
Em = 0.0003751
Eb = 0.04537
correlation r = 0.9987
and plot of the data and fitted function is shown in figure 6.4.
Loaded Spring
5
slope m = 0.01957 intercept b = 0.0147
4.5
Em = 0.00038
Eb = 0.045
4
r = 0.9987
Load F (N)
3.5
3
2.5
2
1.5
1
0.5
0
0
Fig. 6.4
50
100
150
Extension e (mm)
200
250
Plot for the loaded spring showing the measurements, the straight
line of best fit and values for the slope, intercept and correlation
coefficient.
A straight line fits the data well with a correlation r > 0.998, therefore the hypothesis
can be accepted that an appropriate model to describe the extension of the loaded
spring is F = k x and that the quantities m and b are meaningful. The text boxes can be
moved in the Plot Window by enabling Edit Plot. The number of significant figures
that are displayed in the values for slope and intercept can be changed by altering
format of the numbers in the m-script for the linear_fit.
The slope of the line of the line and its uncertainty are
m = a2 = k = 0.019575 N.mm-1 = 19.57 N.m-1
Em = 0.0003751 N.mm-1 = 0.3751 N.m-1
The intercept and the uncertainty are
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
16
b = 0.0147 N
Eb = 0.04537 N
It is best that the uncertainty in a measurement is quoted to only 1 or 2 significant
figures, therefore the slope and intercept are
k = (19.6 ± 0.4) N.m-1
b = (0.01 ± 0.04) N
We can conclude that the intercept b is zero and that a straight line through the origin
fits the data so that the extension e is proportional to the load F and the value of the
spring constant is
k = (19.6 ± 0.4) N.m-1
Example 6.2
A spring had a load added to it causing it to extend. The spring was then displacement
from its equilibrium so that it vibrated up and down about its equilibrium position.
The period T of the oscillations was measured for different loads m. The period T was
measured in seconds and the load m was measured in kilograms. The hypothesis is to
be tested is that the period of oscillations T is related to the load m by the relationship
T  2
m
k
 T
2 12
m
k
where k is the spring constant which is normally measured in N.m-1. The
measurements were entered into the matrix xyData2
m (kg)
T (s)
0.020 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400
0.20 0.31 0.46 0.53 0.62 0.71 0.76 0.84 0.91
You can’t have any measurements entered as zero since the log10(0) = -infinity.
In the Command Window, the following was entered to execute the function
linear_fit
[a1, a2, Ea1, Ea2, r] = linear_fit(xyData2, 0.01, 0.4, 2)
The output of this function to the Command Window is shown below and figures 6.5
and 6.6 show plots for the data and fitted function
y = a1 x ^(a2)
n= 9
a1 = 1.413
a2 = 0.5016
Ea1 = 0.009916
Ea2 = 0.007565
correlation r = 0.9992
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
17
Vibrating Spring
0
a1 = 1.413
power a2 = 0.5016
-0.2
Ea1 = 0.0099
Ea2 = 0.0076
-0.4
-0.6
10
log ( T )
r = 0.9992
-0.8
-1
-1.2
-1.4
-3
-2.5
Fig. 6.5
-2
-1.5
log10(m)
-1
-0.5
0
Log – log plot for the vibrating showing a straight line fits the
data.
Vibrating Spring
1
a1 = 1.413
0.9
power a2 = 0.5016
Ea1 = 0.0099
0.8
Ea2 = 0.0076
r = 0.9992
period T (s)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
Fig. 6.6
0.05
0.1
0.15
0.2
0.25
Load m (kg)
0.3
0.35
0.4
Plot for the vibrating spring showing the measurements and the
curve of best fit.
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
18
A straight line fits the data well with a correlation r > 0.999, therefore the hypothesis
can be accepted that an appropriate model to describe the period of vibration of the
spring is
T
2 12
m
k
The slope of the line and its uncertainty are
m = a2 = (0.502 ± 0.008)
which confirms the hypothesis that T  m
The coefficient a1 and its uncertainty are
a1 = (1.41 ± 0.01) s.kg-1/2
The value of the spring constant k is determined from the coefficient a1
4 2
k 2
a1
Ea
Ek
2 1
k
a1
k = (19.7 ± 0.3) N.m-1
which agrees with the value of k from the data in Example 6-1, k = (16.6 ± 0.4) N.m-1.
Example 6.3
A spring had a load added to it causing it to extend. The spring was then displacement
from its equilibrium so that it vibrated up and down about its equilibrium position.
The amplitude A of the vibration slowly decreased. The amplitude A of the vibration
was measured in millimeters and the time t in seconds. The hypothesis is to be tested
is that the amplitude of vibration A decreases exponentially with time t
A  Ao e   t
where  is the decay constant. The measurements were entered into the matrix
xyData3.
t (s)
A (mm)
0
20.0
10
12.5
20
8.0
30
5.0
40
3.5
50
2.5
60
1.5
70
1.0
80
05
In the Command Window, the following was entered to execute the function
linear_fit
[a1, a2, Ea1, Ea2, r] = Linear_fit(xyData3, 0, 80, 3)
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
19
The output of this function to the Command Window is shown below and the plots for
the data and fitted function are shown in figures 6.7 and 6.8
y = a1 exp(a2 * x)
n= 9
a1 = 19.9
a2 = -0.04396
Ea1 = 1.127
Ea2 = 0.001189
correlation r = -0.9974
Vibrating Spring
3
a1 = 19.9
a2 = -0.04396
Ea1 = 1.1
Ea2 = 0.0012
2.5
2
r = -0.9974
log ( A )
1.5
e
1
0.5
0
-0.5
-1
0
20
10
30
40
time t (s)
50
60
70
80
Plot for the vibrating spring showing the log – linear graph..
Fig. 6.7
Vibrating Spring
20
18
16
a1 = 19.9
a2 = -0.04396
Ea1 = 1.1
Ea2 = 0.0012
r = -0.9974
amplitude A (mm)
14
12
10
8
6
4
2
0
0
Fig. 6.8
10
20
30
40
time t (s)
50
60
70
80
Plot for the vibrating spring showing the measurements and the
curve of best fit.
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
20
A straight line fits the data well with a correlation r > 0.997, therefore the hypothesis
can be accepted that an appropriate model to describe the decay in the amplitude is of
the form
A  Ao e   t
The initial amplitude is given by the coefficient a1 and the decay constant by the
coefficient a2
Ao = (19.9 ± 1.1) m
 = (-0.0440 ± 0.0012) s-1
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
21
5.2
Curve Fitting: Method of Least Squares (uncertainties in data)
In many experiments, the functional relationship between two variables x and y is
investigated by measuring a set of n values of xi, yi and the uncertainty xi, yi. The
functional relationship between x and y can be written as
yf = f(a1, a2, … , am; x) = f(a; x)
(6.34)
The goal is to find the unknown coefficients a = {a1, a2, … , am} to fit the function
f(a; x) to the set of n measurements. For a statistical analysis, it is difficult to consider
simultaneously the uncertainties in both the x and y values. In this treatment, only the
uncertainties yi in the y measurements will be considered. For the method of least
squares, to find the coefficients a, the best estimates are those that minimizes the 2
value, given by equation (6.35)
 y  f (a; xi ) 
   i


 yi
i 1 

n
2
2
(6.35)
This is simply the sum of the squared deviations of the measurements from the fitted
function f(a; x) weighted by the uncertainties yi in the y values.
How to implement the method will be given in some detail so that you can modify the
m-script weighted_fit to add your own functions to fit a set of measurements. This
m-script calls two extrinsic functions: fit_function to evaluate the fitted function and
part_dev to calculate the partial derivative of the function with respect to the
f (a, x)
coefficients,
.
ak
An iterative procedure is used based upon the method of Marquardt where the
minimum of 2 is found by adjusting the value of the coefficients through a damping
factor u. The steps in the procedure and details of the m-script weighted_fit are
described in the following section.
Step1: Inputs and changes to the m-script weighted_fit
The measurements must be entered into a matrix called wData of dimension (n4).
The m-script weighted_fit uses this matrix to store the data in separate matrices.
Measurements
x
y
y
x
Matrix
x = wData(:, 1)
y = wData(:, 2)
dy = wData(:, 3)
dx = wData(:, 4)
The analysis only uses the uncertainties y associated with the y measurements. The
uncertainties x are only included to show any error bars when the measurements are
plotted.
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
22
The equations included in the m-script weighted_fit for fitting to your data include
1
f  a1 x  a2
2
f  a1 x
3
f  a1 x 2  a2 x  a3
4
f  a1 x 2
5
f  a1 x3  a2 x 2  a3 x  a4
6
f  a1 x a2
7
f  a1 exp(a2 x)
8
f  a1 1  exp  a2 x  
The equation to be fitted to the data is chosen by entering the value for the variable
eqType, for example, eqType = 1 selects the linear relationship f  a1 x  a2 . A
function is evaluated by the function fit_function. It is easy to add your own
functions to the codes weighted_fit and fit_function. The range of x values (xMin
and xMax) and titles for the plot are entered by amending the m-script weighted_fit.
Weights w are assigned to the uncertainties dy in the y measurements
w = 1/dy
if dyk = 0 then wk = 1 for any k
An adjustable parameter u known as the damping factor is initially set to 0.001 so that
the coefficients a can be adjusted to minimize the 2 value by simply adjusting the
value of u.
Step 2: Set the starting values for the coefficients a
To use the least squares method, we have to estimate starting values for the
coefficients a. If the equation can be made linear in some way, then we can solve n
simultaneous equations to find the unknown values of a. For example,
EqType = 5
% f = a1 * x^3 + a2 * x^2 + a3 * x + a4
xx(:,1) = x.^3;
xx(:,2) = x.^2;
xx(:,3) = x;
a = xx\y;
cubic polynomial
If this can’t be done, a simpler method is used to set the coefficients or a = 1.
Step 3: Minimize 2 value
The counters for the number of data points n and number of coefficients m are
i = 1, 2, …, n
k = 1, 2, … , m
j = 1, 2, … , m
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
23
Calculate the weighted difference matrix D
Di  wi  yi  fi 

D = w’ * (y – f)
The value of 2 is then
2 = D’ * D
where ’ gives the transpose of a matrix.
We need to adjust the coefficients by an iterative method until we find the true
minimum of 2. For step L in the iterative procedure
a(L+1)
= a(L) + da
and our desired goal is that 2{a(L+1)} < 2{a(L)}.
The minimum of 2(a) is given by the condition
 2
0
 ak
For small variations of the coefficients, the value of 2{a(L+1)} may be expanded in
terms of a Taylor’s series around 2{a(L)} and if the expansion is truncated after the
second term, we can use the approximation
 2
ak
a( L )
2 2
m 
  
  

j  ak a j




da

a( L ) 
j



This can be written in matrix form as
B = CUR * da
where
Bk  
1  2
2 ak
a(L )
and
CURkj 
1 2  2
2 ak a j
a( L)
where da is the matrix for the increments in the coefficients, CUR is called the
curvature matrix as it expresses the curvature of 2(a) with respect to a.
The B matrix, after performing the partial differentiation can be written as
n
 f

Bk   wi i wi  yi  fi 
ak
i 

a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
24
We need to calculate the partial derivatives of the fitted function with respect to the
coefficients a. To make the program more general, the weighted partial derivates pdf
are calculated numerically by the function part_der using the difference
approximation to the derivative
f (ak  )  f (ak  )
f

ak
2
The matrix B written in matrix form is
B = pdf’ * D
The curvature matrix CUR can be approximated by
n 
 f   f  

CURkj    wi i   wi i  


ak   a j  
i 1 


CUR = pdf’ * pdf
The elements of the curvature matrix CUR may have different magnitudes. To
improve the numerical stability, the elements can be scaled by the diagonal elements
of the curvature matrix and the damping factor u can be added to the diagonal
elements to give the modified curvature matrix MCUR
MCURkj 
1  u   CUR
kj
kj
CURkk CUR jj
where kj is the Kronecker delta function (kj = 1 if k = j otherwise kj = 0).
Therefore, we can approximate the incremental changes in the coefficients as
B = MCUR * da
da = (MCUR)-1 * B = MCOV * B
where MCOV = MCUR-1 is the modified covariance matrix.
The new estimates of the coefficients and the corresponding 2 value can then be
calculated
anew = a + da
To test the minimization 2 of as part of the iterative process, the following is done
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
25
If 2new > 2old 
Moving away from a minimum, keep current a values and
set u = 10 u
Repeat iteration.
If 2new < 2old 
Approaching a minimum, set a = anew, u = u / 10
Repeat iteration.

if | 2new < 2old | < 0.001
Terminate the iteration
u=0
2 = 2new
Calculate: CUR, MCUR, MCOV
Step 4: Output the results
The square root of the diagonal elements of the covariance matrix COV give the
uncertainties in the coefficients for a
sigmak  COVkk
where
COVkj 
MCOVkj
CURkk CUR jj
Basically, we have set up a hypothesis that our measurements can be described by
some analytical function f(a; x). We need to be able to statistically test the hypothesis.
This can be done using the value of 2. 2 is a measure of the total agreement between
our measurements and the hypothesis. It can be assumed that the minimum value of 2
is distributed according to the 2 distribution with (n-m) degrees of freedom. Often the
reduced 2 value{2reduced = 2/(n-m)} is quoted as a measure of the goodness-of-fit
2reduced ~ 1
 hypothesis is acceptable
2reduced << 1
 the fit is much better than expected given the size of the
measurement uncertainties. The hypothesis is acceptable,
but the uncertainties y may have been overestimated.
2reduced >> 1
 hypothesis may not be acceptable
Also the probability of the 2 value being exceeded is given as another measure of the
goodness-of-fit.
Finally, the measurements and fitted function are plotted together with a plot of the 2
distribution.
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
26
A number of examples are given to illustrate the use of the weighted least squares fit.
Example 6.4
The data from Example 6.1 is used to fit a linear function.
x: e (mm)
y: F (N)
y (N)
x (mm)
0
0
0
0
20
0.50
0
0
55
1.00
0
0
78
1.50
0
0
98
2.00
0
0
130
2.50
0
0
154
3.00
0
0
173
3.50
0
0
205
4.00
0
0
The results of the weighted least squares:
1: y = a1 * x + a2
linear
No. measurements = 9
Degree of freedom = 7
chi2 = 0.0384772
Reduced chi2 = 0.00549675
Probability of exceeding chi2 value = 100
Coefficients a1, a2, ... , am
0.0196 0.0147
Uncertainties in coefficient
0.0051 0.6120
? Fit may be too good ?
Example 6.5
The data from Example 6.1 is used to fit a linear function with uncertainties assigned
to both the x and y values.
x: e (mm)
y: F (N)
y (N)
x (mm)
0
0
0
0
20
0.50
0.1
2
55
1.0
0.1
2
78
1.5
0.1
2
98
2.0
0.2
2
130
2.5
0.2
2
154
3.0
0.2
2
173
3.5
0.3
2
205
4.0
0.3
2
The results of the weighted least squares fit are:
1: y = a1 * x + a2
linear
No. measurements = 9
Degree of freedom = 7
chi2 = 2.08357
Reduced chi2 = 0.297652
Probability of exceeding chi2 value = 95.5
Coefficients a1, a2, ... , am
0.0194 0.0147
Uncertainties in coefficient
0.0011 0.0923
? Fit may be too good ?
The results are similar but not identical to example 6.4.
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
27
Weighted Least Squares Fit
5
4.5
4
3.5
Y
3
2.5
2
1.5
1
0.5
0
0
50
100
150
200
250
X
Fig. 6.9
Plot of the data and fitted function for Example 6.5.
chi-squared distribtion
0.14
0.12
dof = 7
chi2Max = 2.0836
0.1
prob % = 95.5
P(x)
0.08
0.06
0.04
0.02
0
0
5
10
15

Fig 6.10
a04/mat/mg/ch6.doc
20
25
30
2
Plot of the chi-squared distribution showing the probability of
the 2 value being exceeded.
3/9/2016
9:23 PM
28
Example 6.6
The data from Example 6.2 is used to fit a power relation y  a1 x a2 .
x: m (kg)
y: T (s)
y (s)
x (kg)
0
0
0
0
0.02
0.20
0.10
0
0.05
0.31
0.10
0
0.10
0.46
0.10
0
0.15
0.62
0.10
0
0.20
0.71
0.10
0
0.30
0.76
0.10
0
0.35
0.84
0.10
0
0.40
0.40
0.10
0
The results of the weighted least squares fit are:
6: y = a1 * x^a2
power
No. measurements = 9
Degree of freedom = 7
chi2 = 0.104884
Reduced chi2 = 0.0149834
Probability of exceeding chi2 value = 100
Coefficients a1, a2, ... , am
1.4379
0.5129
Uncertainties in coefficient
0.2088 0.0988
? Fit may be too good ?
Example 6.7
The data from Example 6.3 is used to fit a power relation y  a1 e  a2 x .
x: t (s)
y: A (mm)
y (mm)
x (s)
0
10
20 12.5
1.0 1.0
1.0 1.0
20
8.0
1.0
1.0
30
5.0
0.5
1.0
40
3.5
0.5
1.0
50
2.5
0.5
1.0
60
1.5
0.5
1.0
70
1.0
0.5
1.0
80
0.5
0.5
1.0
The results of the weighted least squares fit are:
7: y = a1 * exp(- a2 * x)
exponential decay
No. measurements = 9
Degree of freedom = 7
chi2 = 1.06625
Reduced chi2 = 0.152321
Probability of exceeding chi2 value = 99.3
Coefficients a1, a2, ... , am
20.0000
0.0444
Uncertainties in coefficient
0.8857
0.0023
? Fit may be too good ?
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
29
Weighted Least Squares Fit
25
20
Y
15
10
5
0
0
10
20
Fig 6.11
30
40
50
X
60
70
80
90
100
Plot of data and exponential decay fit for data in Example 6.7.
Example 6.8
The weighted-fit m-script can be used for interpolation. For example, the viscosity
 of water is a function of temperature T and tables give the viscosity at only fixed
temperatures. By fitting a polynomial to the data, one can estimate the viscosity at a
temperature between the fixed values.
x: T (°C)
y:  (mPa.s)
y (mm)
x (s)
0
10
20
30
40
50
60
80
100
1.783 1.302 1.002 0.800 0.651 0.548 0.469 0.354 0.281
0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
The results for fitting a 3rd order polynomial are:
5: y = a1 * x^3 + a2 * x^2 + a3 * x + a4
cubic polynomial
No. measurements = 9
Degree of freedom = 5
chi2 = 168.672
Reduced chi2 = 33.7343
Probability of exceeding chi2 value = -0.0217
Coefficients a1, a2, ... , am
-0.0000
0.0006
-0.0476
1.7562
Uncertainties in coefficient
0.0000
0.0000
0.0004
0.0045
??? Fit may not be acceptable ???
The results of fitting a 4th order polynomial are:
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
30
No. measurements = 9
Degree of freedom = 4
chi2 = 13.5519
Reduced chi2 = 3.38797
Probability of exceeding chi2 value = 0.866
Coefficients a1, a2, ... , am
0.0000
-0.0000 0.0010
-0.0556
1.7781
Uncertainties in coefficient
0.0000
0.0000
0.0000
0.0008
0.0048
??? Fit may not be acceptable ???
The 4th order polynomial has a much lower reduced 2 value and gives a better fit to
the data than the 3rd order polynomial.
viscosity 
(mPa.s)
Table
3rd oder
polynomial
4th oder
polynomial
20 °C
1.002
1.0212
0.9996
24 °C
---
0.9200
0.9055
Weighted Least Squares Fit
1.8
1.6
1.6
1.4
1.4
1.2
1.2
1
1
Y
Y
Weighted Least Squares Fit
1.8
0.8
0.8
0.6
0.6
0.4
0.4
0.2
-20
0
20
40
60
80
100
120
Fig 6.12
a04/mat/mg/ch6.doc
0.2
-20
0
20
40
60
80
100
120
X
X
3rd and 4th order polynomial fit to viscosity data of example 6.8.
3/9/2016
9:23 PM
31
CHAPTER 6
M-SCRIPTS AND FUNCTIONS
stat( )
Inputs
Outputs
Number of measurements
Mean
Mode
Population
standard
deviation
[n, xbar, mode, median, s_pop, s_sample, E] = stat(StatData)
Median
Sample
standard
deviation
Measurements:
Frequency Value
Standard error of the mean
function [n, xbar, mode, median, s_pop, s_sample, E] = stat(StatData)
% Extrinsic function to calculate basic statistical qunatities
% Input: Column 1 (frequency) and Column (measurements)
% xData and frequency
f = StatData(:,1);
x = StatData(:,2);
% bar graph
figure(1)
set(gca,'Fontsize',14);
bar(x, f);
title('Speed of Sound in Air','Fontsize',14);
xlabel('Speed v (m.s^{-1})','Fontsize',14);
ylabel('Frequncy f','Fontsize',14)
% Number of measurements
n = sum(f);
%mean
xbar = sum(f .* x)/n;
% mode
fmax = max(f);
index = find(f == fmax);
mode = x(index);
% median
count = 0;
c = 1;
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
32
while count < n/2
median = x(c);
count = count + f(c);
c = c+1;
end
% standard deviation
s_pop = sqrt(sum(f .* (x-xbar).^2))/n;
s_sample = sqrt(sum(f .* (x-xbar).^2))/(n-1);
% Standard Error of the Mean
E = s_pop/sqrt(n-1);
% output
n
xbar
mode
median
s_pop
s_sample
E
gauss( )
Outputs
Inputs
mean
Total area under curve (%)
Probability (%)
Standard deviation
[area, prob, FWHM, xFWHM1, xFWHM2] = gauss(mu, sigma, x1, x2)
Full Width at Half Maximum and corresponding x values
Limits for probability calculation
function [area, prob, FWHM, xFWHM1, xFWHM2] = gauss(mu, sigma, x1, x2)
% PLot of Gausssian probability density function
% Pecentage probability from area under curve
% Full Width at Half Max
% --------------------------------------------------------------------xMax = mu + 6 * sigma;
% range for plot and calculations
xMin = mu - 6 * sigma;
% mean, mu and standard deviation, sigma
num = 1000;
% number of data points
x = linspace(xMin, xMax, num);
xF = linspace(x1, x2, num);
x2
a04/mat/mg/ch6.doc
3/9/2016
% x values for range
% x values for area / prob calculation from x1 to
9:23 PM
33
yo = 1/(sigma*sqrt(2*pi));
% Gaussian probability distribution
y = yo * exp(-(x - mu).^2/(2*sigma^2));
yF = yo * exp(-(xF - mu).^2/(2*sigma^2));
area = 100 * trapz(x,y);
prob = 100 * trapz(xF,yF);
FWHM =
xFWHM1
xFWHM2
yFWHM1
yFWHM2
% Area under curve from x1 to x2
% Probability expressed as a percentage
2 * sigma * sqrt(2*log(2));
% Calculation of FWHM
= mu - FWHM/2;
= mu + FWHM/2;
= yo * exp(-(xFWHM1 - mu).^2/(2*sigma^2));
= yo * exp(-(xFWHM2 - mu).^2/(2*sigma^2));
% graphics -----------------------------------------------------------close
figure(1)
bar(xF, yF, 2)
hold on
plot(x,y,'LineWidth',2)
plot([xFWHM1 xFWHM2],[yFWHM1 yFWHM2],'r','LineWidth',3)
% Labeling Graph ----------------------------------------------------xPos = xMin+(xMax-xMin)/25;
yPos = max(y);
tm = 'Gaussian distribtion';
title(tm);
xlabel('x');
yLabel('P(x)');
tL1
tL2
tL3
tL4
tL5
=
=
=
=
=
'\mu = ';
num2Str(mu);
' \sigma = ';
num2str(sigma);
[tL1 tL2 tL3 tL4];
tL6 = 'x_1 = ';
tL7 = num2Str(x1);
tL8 = ' x_2 = ';
tL9 = num2str(x2);
tL10 = [tL6 tL7 tL8 tL9]
tL13
tL14
tL15
tL16
tL17
tL18
=
=
=
=
=
=
'Prob(x_1<x<x_2) = ';
num2Str(prob/100,4);
[tL13 tL14];
'FWHM = ';
num2str(FWHM);
[tL16 tL17];
text(xPos, 0.95*yPos,tL5);
text(xPos, 0.88*yPos,tL10);
text(xPos, 0.80*yPos,tL15);
text(xPos, 0.72*yPos,tL18);
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
34
chi2test( )
Outputs
Inputs
Probability 2 > 2max
Value of 2 to test
prob = chi2test(dof, chi2Max)
Degrees of freedom 
function prob = chi2test(dof, chi2Max)
% Function to calculate how chi2 is distibuted
% and give prob of a chi2 value exceeding chi2Max
num = 500;
% number of calculations for area
flag = 10;
% flag to end calculation
c = 1;
dx = 0.1;
% intialize counter
% increment in chi2 value
% calculate P(chi2) values -------------------------------------------while flag > 0.0001
x(c) = (c-1) * dx;
y(c) = (x(c)/2)^(dof/2-1)*exp(-x(c)/2)/(2*gamma(dof/2));
if x(c) > 1, flag = y(c); end ;
c=c+1;
end
% calculate probabilty of chi2 > chi2Max -----------------------------xF = linspace(chi2Max, x(c-1),num);
yF = (xF./2).^(dof/2-1).*exp(-xF./2)./(2*gamma(dof/2));
prob = 100 * trapz(xF,yF);
% Graphics -----------------------------------------------------------close
figure(99)
bar(xF, yF,2)
hold on
plot(x,y,'LineWidth',2)
% Labeling graph ----------------------------------------------------tm = 'chi-squared distribtion';
title(tm);
xlabel('\chi ^2');
yLabel('P(x)');
tL1 = 'dof = ';
tL2 = num2Str(dof);
tL3 = [tL1 tL2];
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
35
tL4 = 'chi2Max = ';
tL5 = num2Str(chi2Max);
tL6 = [tL4 tL5];
tL7 = 'prob % = ';
tL8 = num2Str(prob, 3);
tL9 = [tL7 tL8];
xPos = max(x);
yPos = max(y);
text(0.8*xPos, 0.95*yPos,tL3);
text(0.8*xPos, 0.88*yPos,tL6);
text(0.8*xPos, 0.80*yPos,tL9);
linear_fit( )
Flag = 1 or 2 or 3
1: y  a1  a2 x
coefficients
2: y  a1 x a
2
uncertainties
3: y  a1 e
correlation coefficient
a2 x
[a1, a2, Ea1, Ea2, r] = linear_fit(xyData, xmin, xmax, flag)
xi, yi data in a matrix of n rows by 2 columns
Column 1: xi data
Column 2: yi data
min x value
for plot
max x value
for plot
function [a1, a2, Ea1, Ea2, r] = linear_fit(xyData,xmin, xmax, flag)
%
%
%
%
%
Function to detemine the coefficients a1 and a2
for a linear fit to experimental data
Flag = 1: Linear relationship y = mx + b = a1 + a2 x a1 = b and a2 = m
Flag = 2: Power relationship y = a1 x^a2
Flag = 3: Exponential relationship y = a1 exp(a2)
% Inputs to function
% (x,y) data in a matrix (n x 2) with column 1 for x data & column 2 for y data
% Graph min and max x values are xmin & xmax
% Flag to select type of relationship
% Outputs
% a1 and a2 for the coefficients as determined by the value of the flag
% Ea1 and Ea2 the uncertainties for a1 and a2
% Correclation coefficent r
close all
% close all Plot Windows
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
36
% Input Graph Titles -------------------------------------------------% Main title / Graph 1 linear axes / Graph 2 log or linear axes
tm = 'Vibrating Spring';
tx1 = 'time t (s)';
ty1 = 'amplitude A (mm)';
tx2 = 'time t (s)';
ty2 = 'log_{e}( A )';
% Process input data -------------------------------------------------switch flag
case 1
x = xyData(:,1);
y = xyData(:,2);
% linear
case 2
% power
x = log10(xyData(:,1));
y = log10(xyData(:,2));
case 3
% exponentail
x = xyData(:,1);
y = log(xyData(:,2));
end
n = length(x);
% number of data points
% Calculations for slope and intercept y = m x + b -------------------sx = sum(x);
sy = sum(y);
sxx = sum(x .* x);
syy = sum(y .* y);
sxy = sum(x .* y);
sx2 = sx * sx;
sy2 = sy * sy;
m = (n * sxy - sx * sy)/(n * sxx - sx2);
b = (1/n) * (sy - m * sx);
s = sqrt((syy - sy2/n - m * (sxy - sx * sy/n))/(n-2));
Em = s * sqrt(n / (n * sxx - sx2));
Eb = s * sqrt(sxx / (n * sxx - sx2));
r = (sxy - sx * sy /n) / sqrt((sxx - sx2/n) * (syy - sy2/n));
% Calculations for coeffficients a1 and a2 & fitted functions
% and text labelling for graphs -------------------------------------nx = 400;
xf = linspace(xmin, xmax, nx);
t31 = 'r = ';
t32 = num2str(r, '%0.4g');
text3 = [t31 t32];
switch flag
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
37
case 1
a1 = b; Ea1 = Eb;
a2 = m; Ea2 = Em;
%labels
t11 = 'slope m = ';
t12 = num2str(m, '%0.4g');
t13 = ' intercept b = ';
t14 = num2str(b, '%0.4g');
text1 = [t11 t12 t13 t14];
t21 = 'Em = ';
t22 = num2str(Em, '%0.2g');
t23 = ' Eb = ';
t24 = num2str(Eb, '%0.2g');
text2 = [t21 t22 t23 t24];
yf = m .* xf + b;
case 2
a1 = 10^b; Ea1 = 10^b*Eb;
a2 = m; Ea2 = Em;
%labels
t11 = 'a_1 = ';
t12 = num2str(a1, '%0.4g');
t13 = '
power a_2 = ';
t14 = num2str(a2, '%0.4g');
text1 = [t11 t12 t13 t14];
t21 = 'Ea_1 = ';
t22 = num2str(Ea1, '%0.2g');
t23 = '
Ea_2 = ';
t24 = num2str(Ea2, '%0.2g');
text2 = [t21 t22 t23 t24];
yf = a1 .* xf.^a2;
case 3
a1 = exp(b); Ea1 = exp(b)*Eb;
a2 = m; Ea2 = Em;
%labels
t11 = 'a_1 = ';
t12 = num2str(a1, '%0.4g');
t13 = '
a_2 = ';
t14 = num2str(a2, '%0.4g');
text1 = [t11 t12 t13 t14];
t21 = 'Ea_1 = ';
t22 = num2str(Ea1, '%0.2g');
t23 = '
Ea_2 = ';
t24 = num2str(Ea2, '%0.2g');
text2 = [t21 t22 t23 t24];
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
38
yf = a1 .* exp(a2 .* xf);
end
% Graphical output --------------------------------------------------% Text postion for a1,a2, Ea1 and Ea2 on plot window
xpos1 = (xmin+(xmax-xmin)/4);
ypos1 = max(yf);
xpos2 = (xmin+(xmax-xmin)/4);
ypos2 = 0.95*max(yf);
xpos3 = xpos1;
ypos3 = 0.75*max(yf);
switch flag
case 1
figure(1)
plot(xf,yf,'LineWidth',2);
hold on
plot(x,y,'o');
grid on
text(xpos1, ypos1, text1);
text(xpos2, ypos2, text2);
text(xpos3, ypos3, text3);
title(tm);
xlabel(tx1);
ylabel(ty1);
case 2
figure(1)
% power plot
plot(xf,yf,'LineWidth',2);
hold on
plot(xyData(:,1),xyData(:,2),'o');
grid on
text(xpos1, ypos1,text1);
text(xpos2, ypos2,text2);
text(xpos3, ypos3,text3);
title(tm);
xlabel(tx1);
ylabel(ty1);
figure(2)
% log log plot
plot(log10(xf),log10(yf),'LineWidth',2);
hold on
plot(x,y,'o');
grid on
text(log10(xpos1), log10(ypos1),text1);
text(log10(xpos2), log10(ypos2),text2);
text(log10(xpos3), log10(ypos3),text3);
title(tm);
xlabel(tx2);
ylabel(ty2);
case 3
figure(1)
% power plor
plot(xf,yf,'LineWidth',2);
hold on
plot(xyData(:,1),xyData(:,2),'o');
grid on
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
39
text(xpos1, ypos1,text1);
text(xpos2, ypos2,text2);
text(xpos3, ypos3,text3);
title(tm);
xlabel(tx1);
ylabel(ty1);
figure(2)
% log log plot
plot(xf,log(yf),'LineWidth',2);
hold on
plot(x,y,'o');
grid on
text(xpos1, log(ypos1),text1);
text(xpos2, log(ypos2),text2);
text(xpos3, log(ypos3),text3);
title(tm);
xlabel(tx2);
ylabel(ty2);
end
% text output to Command Window -------------------------------------switch flag
case 1
disp('y = m x + b')
fprintf('n = %0.4g \n', n)
fprintf('slope m = %0.4g \n', m)
fprintf('intercept b = %0.4g \n', b)
fprintf('Em = %0.4g \n', Em)
fprintf('Eb = %0.4g \n', Eb)
fprintf('correlation r = %0.4g \n', r)
case 2
disp('y = a1 x ^(a2)')
fprintf('n = %0.4g \n', n)
fprintf('a1 = %0.4g \n', a1)
fprintf('a2 = %0.4g \n', a2)
fprintf('Ea1 = %0.4g \n', Ea1)
fprintf('Ea2 = %0.4g \n', Ea2)
fprintf('correlation r = %0.4g \n', r)
case 3
disp('y = a1 exp(a2 * x)')
fprintf('n = %0.4g \n', n)
fprintf('a1 = %0.4g \n', a1)
fprintf('a2 = %0.4g \n', a2)
fprintf('Ea1 = %0.4g \n', Ea1)
fprintf('Ea2 = %0.4g \n', Ea2)
fprintf('correlation r = %0.4g \n', r)
end
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
40
weighted_fit
% weightedFit.m
%
%
%
%
%
Fitting an equation to a set of measurements that has uncertainties
Uses extrinsic functions: fit_function and part_der
Input: Range of x values
Input: Equation type
Input: Graph labeling
close
% INPUTS -------------------------------------------------------------% Plot range
xmin = 0; %<<<--xmax = 100; %<<<--% Graph Titles
tm = 'Weighted Least Squares Fit'; %<<<--tx = 'X'; %<<<--ty = 'Y'; %<<<--% Equation Type
eqType = 5; %<<<--% 1: y = a1 * x + a2
linear
% 2: y = a1 * x
linear - proportional
% 3: y = a1 * x^2 + a2 * x + a3
quadratic
% 4: y = a1 * x^2
simple quadratic
% 5: y = a1 * x^3 + a2 * x^2 + a3 * x + a4
cubic polynomial
% 6: y = a1 * x^a2
power
% 7: y = a1 * exp(- a2 * x)
exponential decay
% 8: y = a1 * (1 - exp(-a2 * x))
"exponetial increase"
% 9: y = a1 * x^4 + a2 * x^3 + a3 * x^2 + a4 * x^2 + a5 polynomial
% SETUP --------------------------------------------------------------% Data must be stored in matrix wData: Col 1 X; Col 2 Y; Col 3 dY; Col 4 dX
n = length(wData(:,1));
% number of data points
m = 1;
% number of fitting parameters a1, a2, ...
mMax = 5;
% max number of coefficients
nx = 120;
dx = (xmax - xmin)/nx;
xf = xmin : dx : xmax;
% number of x points for plotting
% increment in x for plotting
% x data for plotting fitted fucntion
nIterations = 500;
if
if
if
if
if
if
if
if
if
eqType
eqType
eqType
eqType
eqType
eqType
eqType
eqType
eqType
==
==
==
==
==
==
==
==
==
1,
2,
3,
4,
5,
6,
7,
8,
9,
m
m
m
m
m
m
m
m
m
% number of iterations for minimizing chi2
=
=
=
=
=
=
=
=
=
a04/mat/mg/ch6.doc
2;
1;
3;
1;
4;
2;
2;
2;
5;
end
end
end
end
end
end
end
end
end
3/9/2016
9:23 PM
41
% INITIALIZE MATRICES ------------------------------------------------wData = sortrows(wData);
% sort matrix by increasing xValues
x = wData(:,1);
y = wData(:,2);
dy = wData(:,3);
dx = wData(:,4);
error bars =
w = zeros(1,n);
% x measurements
% y measurements
% dy uncertainties in y values
% dx uncertainties in x values - used only for
% 1/dy
xx = ones(n,m);
% matrix used to get start values for matrix a
u = 0.001;
% intial value for weighting factor
% Calculation of 1/dy for y data
for cc = 1 : n
if dy(cc) == 0, w(cc) = 1;
else w(cc) = 1/dy(cc);
end
end
% SET STARTING VALUES FOR COEFFICIENTS -------------------------------switch eqType
case 1
% 1: y = a1 * x + a2
xx(:,1) = x;
a = xx\y;
case 2
% 2: y = a1 * x
a = y(n)/x(n);
linear
linear - proportional
case 3
% 3: y = a1 * x^2 + a2 * x + a3
xx(:,1) = x.*x;
xx(:,2) = x;
a = xx\y;
case 4
% 4: y = a1 * x^2
xx = x.*x;
a = xx\y;
quadratic
simple quadratic
case 5
% 5: y = a1 * x^3 + a2 * x^2 + a3 * x + a4
xx(:,1) = x.^3;
xx(:,2) = x.^2;
xx(:,3) = x;
a = xx\y;
cubic polynomial
case 6
% 6: y = a1 * x^a2
power
a(2,1) = log10(y(1)/y(n)) / log10(x(1)/x(n));
a(1,1) = y(1) / x(1)^a(2);
case 7
% 7: y = a1 * exp(- a2 * x)
exponential decay
a(2,1) = log(y(n)/y(1)) / (x(1) - x(n));
a(1,1) = y(1) / exp(-a(2)*x(1));
case 8
% 8: y = a1 * (1 - exp(-a2 * x))
"exponetial increase"
a(1,1) = y(n); a(2,1) = abs(log(1-y(1)/a(1))/x(1));
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
42
a(1) = 10
a(2) = 0.31
case 9
% 9: y = a1 * x^4 + a2 * x^3 + a3 * x^2 + a4 * x + a5 polynomial
xx(:,1) = x.^4;
xx(:,2) = x.^3;
xx(:,3) = x.^2;
xx(:,4) = x;
a = xx\y;
end
f = zeros(n,1);
D = zeros(n,1);
f = zeros(n,1);
chi2 = 0;
chi2new = 0;
rchi2 = 0;
pdf = zeros(m,m);
B = zeros(n,1);
CUR = zeros(m,m);
MCUR = zeros(m,m);
MCOV = zeros(m,m);
COV = zeros(m,m);
anew = zeros(m,1);
da = zeros(m,1);
sigma = zeros(m,1);
% Calculations -------------------------------------------------------for cc = 1 : nIterations
f = Fit_function(a, x, eqType);
% fitted function evaluated at data points
D = w' .* (y - f);
% weigthed difference matrix
chi2 = D' * D;
% chi squared
pdf = part_der(a, x, w, eqType,n,m);
B = pdf' * D;
% partail derivatives df/da
% B matrix
CUR = pdf' * pdf;
% Curvature matrix
for c1 = 1 : m
% Modified Curvature matrix
for c2 = 1 : m
MCUR(c1,c2) = CUR(c1,c2) / sqrt(CUR(c1,c1) * CUR(c2,c2));
end
end
for c1 = 1 : m
MCUR(c1,c1) = (1+u)*MCUR(c1,c1);
end
MCOV = inv(MCUR);
a04/mat/mg/ch6.doc
% Modified Covariance matrix
3/9/2016
9:23 PM
43
da = MCOV * B;
% increments in a coefficients
anew = a + da;
% new coefficients
f = Fit_function(anew, x, eqType);
D = w' .* (y - f);
% fitted function evaluated at data points
% weigthed difference matrix
chi2new = D' * D;
% chi squred
if chi2new < chi2
u = u/10;
a = anew;
if abs(chi2-chi2new) < 0.0001, break; end
end
if chi2new > chi2, u = u*10; end
end %cc Iterations
ch12 = chi2new;
rchi2 = chi2/(n-m);
u = 0;
pdf= part_der(a, x, w, eqType, n, m);
CUR = pdf' * pdf;
% Curvature matrix
% Modified Curvature matrix
for c1 = 1 : m
for c2 = 1 : m
MCUR(c1,c2) = CUR(c1,c2) / sqrt(CUR(c1,c1) * CUR(c2,c2));
end
end
MCOV = inv(MCUR);
% Modified Covariance matrix
for c1 = 1 : m
% Covriance matrix
for c2 = 1 : m
COV(c1,c2) = MCOV(c1,c2) / sqrt(CUR(c1,c1) * CUR(c2,c2));
end
end
for c = 1 : m
sigma(c) = sqrt(COV(c,c));
end
% Uncertainties in the coefficients
prob = chi2test(n-m, chi2);
% chi-squared probability
% Goodness-of-Fit
tgof = ' *** Acceptable Fit ***';
if rchi2 > 2.5, tgof = ' ??? Fit may not be acceptable ???', end;
if rchi2 < 0.4, tgof = ' ? Fit may be too good ?', end
% Display results
disp(' ');
wData
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
44
disp('
');
if eqType ==
if eqType ==
if eqType ==
if eqType ==
if eqType ==
polynomial');
if eqType ==
if eqType ==
if eqType ==
end;
1, disp('1:
2, disp('2:
3, disp('3:
4, disp('4:
5, disp('5:
end;
6, disp('6:
7, disp('7:
8, disp('8:
y
y
y
y
y
=
=
=
=
=
a1
a1
a1
a1
a1
*
*
*
*
*
x + a2
linear'); end;
x linear proportional'); end;
x^2 + a2 * x + a3
quadratic'); end;
x^2
simple quadratic'); end;
x^3 + a2 * x^2 + a3 * x + a4
cubic
y = a1 * x^a2
power'); end;
y = a1 * exp(- a2 * x)
exponential decay'); end;
y = a1 * (1 - exp(-a2 * x))
"exponetial increase"');
fprintf('No. measurements = %0.0g \n', n);
fprintf('Degree of freedom = %0.0g \n', (n-m));
fprintf('chi2 = %0.6g \n', chi2);
fprintf('Reduced chi2 = %0.6g \n', rchi2);
fprintf('Probability of exceeding chi2 value = %0.3g
disp(' ');
disp('Coefficients a1, a2, ... , am');
disp(a);
disp('Uncertainties in coefficient');
disp(sigma);
disp(' ')
disp(tgof);
\n', prob);
yf = Fit_function(a, xf, eqType);
% Graphics ---------------------------------------------------------figure(1)
set(gcf,'PaperType','A4');
plot(x,y,'o','MarkerFaceColor','k','MarkerEdgeColor','k','MarkerSize',4);
hold on
plot(xf,yf,'b','LineWidth',2);
grid on
title(tm);
xlabel(tx);
ylabel(ty);
yerrP = y + dy;
yerrM = y - dy;
xerrP = x + dx;
xerrM = x - dx;
for c = 1 : n
plot([x(c),x(c)],[yerrM(c), yerrP(c)],'k','LineWidth',1);
plot([xerrM(c),xerrP(c)],[y(c), y(c)],'k','LineWidth',1);
end
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
45
fit_function( )
function f = fit_function(a, x, eqType)
% Evaluate function
switch eqType
case 1
% 1: y = a1 * x + a2
f = a(1) .* x + a(2);
case 2
% 2: y = a1 * x
f = a(1) .* x;
linear
linear - proportional
case 3
% 3: y = a1 * x^2 + a2 * x + a3
quadratic
f = a(1) * x.^2 + a(2) .* x + a(3);
case 4
% 4: y = a1 * x^2
f = a(1) .* x.^2;
simple quadratic
case 5
% 5: y = a1 * x^3 + a2 * x^2 + a3 * x + a4
cubic polynomial
f = a(1) .* x.^3 + a(2) .* x.^2 + a(3) .* x + a(4);
case 6
% 6: y = a1 * x^a2
power
f = a(1) .* (x.^a(2));
case 7
% 7: y = a1 * exp(- a2 * x)
f = a(1) .* exp(- a(2) .* x);
exponential decay
case 8
% 8: y = a1 * (1 - exp(-a2 * x))
f = a(1) .* (1 - exp(-a(2) .* x)) ;
"exponetial increase"
case 9
% 9: y = a1 * x^4 + a2 * x^3 + a3 * x^2 + a4 * x^2 + a5 polynomial
f = a(1) .* x.^4 + a(2) .* x.^3 + a(3) .* x.^2 + a(4) .* x + a(5);
end
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
46
part_der( )
function pdf = part_der(a, x, w, eqType,n,m)
fPlus = zeros(n,m);
fMinus = zeros(n,m);
pdf = zeros(n,m);
for c = 1 : m
aP = a;
aM = a;
aP(c) = a(c)*1.001;
aM(c) = a(c)*0.999;
fPlus(:,c) = fit_function(aP, x, eqType);
fMinus(:,c) = fit_function(aM, x,eqType);
pdf(:,c) = (fPlus(:,c) - fMinus(:,c)) /( 0.002 * a(c));
end
for c = 1 : m
pdf(:,c) = pdf(:,c) .* w';
end
a04/mat/mg/ch6.doc
3/9/2016
% weighted partial derivatives
9:23 PM
47
6.6
BIBLIOGRAPHY
G. Cowan, Statistical Data Analysis, (Clarendon Press, Oxford, 1998)
Advanced treatment of practical applications of statistics in data analysis.
A. Kharab and R.B. Guenther, An Introduction to Numerical Methods – A Matlab
Approach, (Chapman & Hall/CRC, Boca, 2002)
An introduction to theory and applications in numerical methods for researchers and
professionals with a background only in calculus and basic programming. CD-ROM
contains the Matlab code for many algorithms
L. Lyons, A Practical Guide to Data Analysis for Physical Science Students,
(Cambridge University Press, Cambridge, 1991)
Basic introduction to analysis of measurement and least squares fitting.
L. Kirkup, Experimental Methods, (John Wiley & Sons, Brisbane, 1994)
Introduction to the analysis and presentation of experimental data.
W. R. Leo, Techniques for Nuclear and Particle Physics Experiments, (SpringVerlag, Berlin, 1987)
Chapter on Statistics and Treatment of Experimental Data including Curve Fitting:
linear and non linear fits.
W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery, Numerical
Recipes in C, (Cambridge University Press, Cambridge, 1992)
Chapter on modeling of data including linear and non linear fits.
S. S. M. Wong, Computational Methods in Physics and Engineering, (Prentice
Hall, Englewood Cliffs, 1992)
Contains a very good chapter on the Method of Least squares and the Method of
Marquardt.
a04/mat/mg/ch6.doc
3/9/2016
9:23 PM
48
Download