Jake Blanchard
Spring 2010
Uncertainty Analysis for Engineers 1
Statistical inference=process of drawing conclusions from random data
Conclusions of this process are
“propositions,” for example
◦ Estimates
◦ Confidence intervals
◦ Credible intervals
◦ Rejecting a hypothesis
◦ Clustering data points
Part of this is the estimation of model parameters
Uncertainty Analysis for Engineers 2
Point Estimation
◦ Calculate single number from a set of observational data
Interval Estimation
◦ Determine interval within which true parameter lies (along with confidence level)
Uncertainty Analysis for Engineers 3
Bias=expected value of estimator does not necessarily equal parameter
Consistency=estimator approaches parameter as n approaches infinity
Efficiency=smaller variance of parameter implies higher efficiency
Sufficient=utilizes all pertinent information in a sample
Uncertainty Analysis for Engineers 4
Start with data sample of size N
Example: estimate fraction of voters who will vote for particular candidate
(estimate is based on random sample of voters)
Other examples: quality control, clinical trials, software engineering, orbit prediction
Assume successive samples are statistically independent
Uncertainty Analysis for Engineers 5
Maximum likelihood
Method of moments
Minimum mean squared error
Bayes estimators
Cramer-Rao bound
Maximum a posteriori
Minimum variance unbiased estimator
Best linear unbiased estimator etc
Uncertainty Analysis for Engineers 6
Suppose we have a random variable x with pdf f(x; )
Take n samples of x
What is value of that will maximize the likelihood of obtaining these n observations?
Let L=likelihood of observing this set of values for x
Then maximize L with respect to
Uncertainty Analysis for Engineers 7
L
x
1
,
x
2
,...
x n
;
L
x
1
, log x
2
,...
x n
;
L
x
1
,
f ( x
1
;
)
x
2
,...
x n
;
0
0 f ( x
2
;
)...
f ( x n
;
)
Uncertainty Analysis for Engineers 8
Time between successive arrivals of vehicles at an intersection are 1.2, 3, 6.3,
10.1, 5.2, 2.4, and 7.2 seconds
Assume exponential distribution
Find MLE for
Uncertainty Analysis for Engineers 9
f
1
e
t /
L
i
7
1
1
e
t t
/
1
7 exp
1 i
7
1 t i
log( L )
log(
L )
7 Log (
)
1
7
35 .
3
2
0 i
7
1 t i
7 Log (
)
35 .
3
35 .
3
7
5 .
04
Uncertainty Analysis for Engineers 10
Measure cycles to failure of saturated sand (25, 20, 28, 33, 26 cycles)
Assume lognormal distribution
Uncertainty Analysis for Engineers 11
f
L
i n
1
1
2
x i exp
1
2
ln( x i
)
2
1
2
ln( L )
n ln( x i exp
1
2
ln( x i
)
2
2
)
n ln(
)
i n
1 ln
i
1
2
2
1
2
i n
1
1 x i
exp
1
2
2 i n
1
ln( x i
)
2 i n
1
ln( x i
)
2
ln(
L )
ln(
L )
1
2 i n
1
ln( x i
)
0
n
1
3 i n
1
ln( x i
)
2
0
1 n i n
1 ln( x i
)
3 .
26
2
1 n i n
1
ln(
0 .
164 x i
)
2
0 .
027
Uncertainty Analysis for Engineers 12
Use sample moments (mean, variance, etc.) to set distribution parameters
Uncertainty Analysis for Engineers 13
Time between successive arrivals of vehicles at an intersection are 1.2, 3, 6.3,
10.1, 5.2, 2.4, and 7.2 seconds
Assume exponential distribution
Mean=5.05
Uncertainty Analysis for Engineers 14
Measure cycles to failure of saturated sand (25, 20, 28, 33, 26 cycles)
Assume lognormal distribution
Mean=26.4
Standard Deviation=4.72
Solve for and
=3.26
=0.177
Uncertainty Analysis for Engineers 15
f
L
i n
1
1
2
x i exp
1
2
ln( x i
)
2
1
2
ln( L )
n ln( x i exp
1
2
ln( x i
)
2
2
)
n ln(
)
i n
1 ln
i
1
2
2
1
2
i n
1
1 x i
exp
1
2
2 i n
1
ln( x i
)
2 i n
1
ln( x i
)
2
ln(
L )
ln(
L )
1
2 i n
1
ln( x i
)
0
n
1
3 i n
1
ln( x i
)
2
0
1 n i n
1 ln( x i
)
3 .
26
2
1 n i n
1
ln(
0 .
164 x i
)
2
0 .
027
Uncertainty Analysis for Engineers 16
Choose parameters to minimize mean squared error between measured data and continuous distribution
Essentially a curve fit
Uncertainty Analysis for Engineers 17
Excel
◦ Guess parameters
◦ Calculate sum of squares of errors
◦ Vary guessed parameters to minimize error
(use the Solver)
Matlab
◦ Use fminsearch function
Uncertainty Analysis for Engineers 18
Solar insolation data
◦ Gather data
◦ Form histogram
◦ Normalize histogram by number of samples and width of bins
Uncertainty Analysis for Engineers 19
4500
4000
3500
3000
2500
2000
1500
1000
500
0
0
5 10 15 20 25
12
10
8
6
4
2
0
3500
30 35
3700 3900 4100 4300
Uncertainty Analysis for Engineers
4500
20
0,0035
0,003
0,0025
0,002
0,0015
0,001
0,0005
0
3500
Mean=3980 (fit)
Mean=3915 (data)
3700 3900 4100 4500
0,003
0,0025
0,002
0,0015
0,001
0,0005
0
3500 3700 3900 4100 4300
Uncertainty Analysis for Engineers
4500
21
Uncertainty Analysis for Engineers 22
Uncertainty Analysis for Engineers 23
Uncertainty Analysis for Engineers 24
y=xlsread('matlabfit.xlsx','normal')
[s,t]=hist(y,8); s=s/((max(t)-min(t))/8)/numel(y); numpts=numel(t); zin(1)=mean(t); zin(2)=std(t); sumoferrs(zin,t,s) zout=fminsearch(@(z) sumoferrs(z,t,s), zin) sumoferrs(zout,t,s) xplot=t(1):(t(end)-t(1))/(10*numel(t)):t(end); yplot=curve(xplot,zout); plot(t,s,'+',xplot,yplot)
Uncertainty Analysis for Engineers 25
function f=curve(x,z) mu=z(1); sig=z(2); f=normpdf(x,mu,sig); function f=sumoferrs(z, x, y) f=sum((curve(x,z)-y).^2);
Uncertainty Analysis for Engineers 26
How do we assess inaccuracy in using sample mean to estimate population mean?
x
1 n i n
1 x i
x
Var
x
E
1 n i n
1 x i
1 n
Var
1 n i n
1 x i
n
1 n
2
Var
i n
1 x i
1 n
2
2 n
Uncertainty Analysis for Engineers 27
Expected value of mean is equal to population mean
Mean of sample is unbiased estimator of mean of population
Variance of sample mean is sampling error
By CLT, sample mean is Gaussian for large n
Mean of x is N( , / n)
Estimator for improves as n increases
Uncertainty Analysis for Engineers 28
In previous derivation, is the population mean
This is generally not known
All we have is the sample variance (s 2 )
If sample size is small, distribution will not be Gaussian
We can use a “student’s t-distribution” f
T
( t )
f f
1 f
/
/
2
2
1
t
2 f
1
2
f
1
f=number of degrees of freedom
Uncertainty Analysis for Engineers 29
s
2 n
1
1 i n
1
x i
x
2
E i n
1 s
n
1
1
E
i n
1
x i
x
2
n
1
1
E
i n
1
x i
x
2
x i
x
2 i n
1
x i
2
2
x i
x
x
2
i n
1
x i
x
n x
x i
i
1
i n
1 i n
1
x i
x
x
x i
x
x
2 n
2 n
n
2
i
1
2 n x x i
i n
1
x i
2
n
x
2
E s
n
1
1
E
i n
1
x i
2
E
n
1
1
n
2
2
2
nE
x
2
Uncertainty Analysis for Engineers 30
Sample variance is unbiased estimator of population variance
Var
n
4
4
4
n n
3
1
4
E
x
4
For normal variates
( n
1 ) s
2 i n
1
x i
x
2 i n
1
x i
2
n
x
2
Chi-Square
Distribution with n-1 dof
( n
1 ) s 2
2
i n
1
2
/
n
2
This approaches normal distribution for large n
Uncertainty Analysis for Engineers 31
Used to make decisions about population based on sample
Steps
◦ Define null and alternative hypotheses
◦ Identify test statistic
◦ Estimate test statistic, based on sample
◦ Specify level of significance
Type I error: rejecting null hypothesis when it is true
Type II error: accepting null hypothesis when it is false
◦ Define region of rejection (one tail or two?)
Uncertainty Analysis for Engineers 32
Type I error
◦ Level of significance ( )
◦ Typically 1-5%
Type II error ( ) is seldom used
Uncertainty Analysis for Engineers 33
We need yield strength of rebar to be at least 38 psi
We order sample of 25 rebars
Sample mean from 25 tests is 37.5 psi
Standard deviation of rebar strength =3 psi
Use one-sided test
Hypotheses: null =38; alt. <38
Uncertainty Analysis for Engineers 34
Z z
x
n
1 (
)
37 .
5
38
0 .
833
3
25
1 ( 0 .
05 )
norminv ( 0 .
05 , 0 , 1 )
1 .
64
So we cannot reject the null hypothesis and the supplier is considered acceptable
Uncertainty Analysis for Engineers 35
Suppose standard deviation is not known
Use student’s t-distribution
Sample stand. dev. = 3.5 psi x t
37 .
5 psi s
T f
3 .
5 psi
x
25 n
1
37 .
5
38
3 .
5
24 dof
25
0 .
714
tinv ( 0 .
05 , 24 )
1 .
711
So we cannot reject the null hypothesis and the supplier is considered acceptable
Uncertainty Analysis for Engineers 36
Sample size increased to 41
Sample mean=37.6 psi
Sample standard deviation = 3.75 psi
Null-variance=9
Alternative-variance>9
Use Chi-Square distribution
Uncertainty Analysis for Engineers 37
C
n
1
2
s
2
41
1
9
2
3 .
75
2
0 .
025 f
40
62 .
5 c
0 .
975
chi 2 inv ( 0 .
975 , 40 )
59 .
34
So we reject the null hypothesis and the supplier is not acceptable
Uncertainty Analysis for Engineers 38
In addition to mean, standard deviation, etc., confidence intervals can help us characterize populations
For example, the mean gives us a best estimate of the expected value of the population, but confidence intervals can help indicate the accuracy of the mean
Confidence interval is defined as the range within which a parameter will lie – within a prescribed probability
Uncertainty Analysis for Engineers 39
First, we’ll assume the variance is known
The central limit theorem states that the pdf of the mean of n individual observations from any distribution with finite mean and variance approaches a normal distribution as n approaches infinity
Uncertainty Analysis for Engineers 40
K
x
n
N ( 0 , 1 )
P
K
2
P
x
K
2
CI
K
2
K
1
2
x
n
n
K
1
2
1
1
1
1
x
1
1
2
2
x
K
1
2
K
2
n
;
n
1
x
K
1
2
n
Is CDF of standard normal variate
Uncertainty Analysis for Engineers 41
Measure strength of rebar
25 samples
Mean=37.5 psi
Standard deviation=3 psi
Find 95% confidence interval for mean
Uncertainty Analysis for Engineers 42
K
2
K
0 .
025
1
0 .
975
1 .
96
K
1
2
K
0 .
975
1
0 .
975
1 .
96
0 .
95
0 .
95
37 .
5
36 .
3 ;
1 .
96
38 .
7
3
;
25
psi
37 .
5
1 .
96
3
25
So the mean of the strength falls between 36.3 and 38.7 with a 95% confidence level
Uncertainty Analysis for Engineers 43
mu=37.5
sig=3 n=25 alpha=0.05
ka=-norminv(1-alpha/2) k1ma=-ka cil=mu+ka*sig/sqrt(n) ciu=mu-ka*sig/sqrt(n)
Uncertainty Analysis for Engineers 44
What if the variance of the population ( ) is not known?
That is, we only know variance of sample.
Let s=standard deviation of sample
We can show that x s
n does not conform to a normal distribution, especially for small n
Uncertainty Analysis for Engineers 45
We can show that this quantity follows a
Student’s t-distribution with n-1 degrees of freedom (f) f t
( t )
P
t
2
, n
1
f
1 f
/
2
2
1
t
2 f
1
2
f
1
x
s n
t
1
2
, n
1
1
Uncertainty Analysis for Engineers 46
Measure strength of rebar
25 samples
Mean=37.5 psi s=3.5 psi
Find 95% confidence interval for mean
Uncertainty Analysis for Engineers 47
Result is 36.06, 38.94
xbar=37.5; s=3.5; n=25; alpha=0.05; ka=-tinv(1-alpha/2,n-1); kb=-tinv(alpha/2,n-1); cil=xbar+ka*s/sqrt(n) ciu=xbar+kb*s/sqrt(n)
Uncertainty Analysis for Engineers 48
Sometimes we only care about the upper or lower bounds
Lower
)
1
x
K
1
n
)
1
x
t
1
, n
1
Upper
1
x
K
1
n s n
1
x
t
1
, n
1 s n
Uncertainty Analysis for Engineers 49
100 steel specimens – measure strength
Mean=2200 kgf; s=220 kgf
Specify 95% confidence limit of mean
Assume =s=220 kgf
1 =0.95; =0.05
k
0 .
95
1 ( 0 .
95 )
1 .
65
0 .
95
2200
1 .
65
220
100
2164
Manufacturer has
95% confidence that yield strength is at least 2164 kgf
Uncertainty Analysis for Engineers 50
Now only 15 steel specimen
Mean=2200 kgf; s=220 kgf
Specify 95% confidence limit of mean t
0 .
95 , 14
1 .
761
0 .
95
2200
1 .
761
220
15
2100
Manufacturer has
95% confidence that yield strength is at least 2100 kgf
Uncertainty Analysis for Engineers 51
P
c
2
, n
1
2
1
n
1
2 s
2
c
1
2
, n
1
1
n
1
s
2
;
n
1
c
1
2
, n
1 c
2
, n
1 s
2
Uncertainty Analysis for Engineers 52
25 storms, sample variance for measured runoff is 0.36 in 2
Find upper 95% confidence limit for variance
2
1
n
1
s
2 c
, n
1
0 .
624 in
2
So, we can say, with 95% confidence, that the upper bound of the variance of the runoff is
0.624 in 2 and the upper bound of the standard deviation is 0.79 in
Uncertainty Analysis for Engineers 53
var=0.36
n=25 alpha=0.05
c=chi2inv(alpha,n-1) ci=1/c*var*(n-1) si=sqrt(ci)
Uncertainty Analysis for Engineers 54
Suppose we are measuring distances d
1
, d
2
, …, d n are measured distances
Distance estimate is d
1 n d i i n
1
Standard error is
d
s n
◦ s=standard deviation of sample
◦ d is the expected value of the mean d
1
d
t
2
, n
1 s n
; d
t
1
2
, n
1
Uncertainty Analysis for Engineers s n
55