Probability Density Functions - Problem Solving with Excel and Matlab

advertisement

Statistical Inferences

Jake Blanchard

Spring 2010

Uncertainty Analysis for Engineers 1

Introduction

Statistical inference=process of drawing conclusions from random data

Conclusions of this process are

“propositions,” for example

◦ Estimates

◦ Confidence intervals

◦ Credible intervals

◦ Rejecting a hypothesis

◦ Clustering data points

Part of this is the estimation of model parameters

Uncertainty Analysis for Engineers 2

Parameter Estimation

Point Estimation

◦ Calculate single number from a set of observational data

Interval Estimation

◦ Determine interval within which true parameter lies (along with confidence level)

Uncertainty Analysis for Engineers 3

Properties

Bias=expected value of estimator does not necessarily equal parameter

Consistency=estimator approaches parameter as n approaches infinity

Efficiency=smaller variance of parameter implies higher efficiency

Sufficient=utilizes all pertinent information in a sample

Uncertainty Analysis for Engineers 4

Point Estimation

Start with data sample of size N

Example: estimate fraction of voters who will vote for particular candidate

(estimate is based on random sample of voters)

Other examples: quality control, clinical trials, software engineering, orbit prediction

Assume successive samples are statistically independent

Uncertainty Analysis for Engineers 5

Estimators

Maximum likelihood

Method of moments

Minimum mean squared error

Bayes estimators

Cramer-Rao bound

Maximum a posteriori

Minimum variance unbiased estimator

Best linear unbiased estimator etc

Uncertainty Analysis for Engineers 6

Maximum Likelihood

Suppose we have a random variable x with pdf f(x;  )

Take n samples of x

What is value of  that will maximize the likelihood of obtaining these n observations?

Let L=likelihood of observing this set of values for x

Then maximize L with respect to 

Uncertainty Analysis for Engineers 7

Maximum Likelihood

L

 x

1

,

 

  x

2

,...

x n

;

L

 x

1

, log x

2

,...

x n

;

L

 x

1

,

 f ( x

1

;

)

 x

2

,...

x n

;

0

 

0 f ( x

2

;

)...

f ( x n

;

)

Uncertainty Analysis for Engineers 8

Example

Time between successive arrivals of vehicles at an intersection are 1.2, 3, 6.3,

10.1, 5.2, 2.4, and 7.2 seconds

Assume exponential distribution

Find MLE for 

Uncertainty Analysis for Engineers 9

Solution

f

1

 e

 t /

L

 i

7 

1

1

 e

 t t

/

 

1

7 exp

 

 

1 i

7 

1 t i

 log( L )

 log(

 

L )

7 Log (

)

1

7

35 .

3

2

0 i

7 

1 t i

 

7 Log (

)

35 .

3

 

35 .

3

7

5 .

04

Uncertainty Analysis for Engineers 10

2-Parameter Example

Measure cycles to failure of saturated sand (25, 20, 28, 33, 26 cycles)

Assume lognormal distribution

Uncertainty Analysis for Engineers 11

Solution

f

L

 i n 

1

1

2

  x i exp

1

2

 ln( x i

)

  



2

1

2

  ln( L )

  n ln( x i exp

1

2

 ln( x i

)

  



2

2

)

 n ln(

)

 i n 

1 ln

  i

1

2

2

1

2

 

 i n 

1

1 x i

 exp

1

2

2 i n 

1

 ln( x i

)

  

2 i n 

1

 ln( x i

)

  

2

 ln(

 

L )

 ln(

 

L )

1

2 i n 

1

 ln( x i

)

  

0

 n

1

3 i n 

1

 ln( x i

)

  

2 

0

 

1 n i n 

1 ln( x i

)

3 .

26

2 

1 n i n 

1

 ln(

0 .

164 x i

)

  

2 

0 .

027

Uncertainty Analysis for Engineers 12

Method of Moments

Use sample moments (mean, variance, etc.) to set distribution parameters

Uncertainty Analysis for Engineers 13

Example

Time between successive arrivals of vehicles at an intersection are 1.2, 3, 6.3,

10.1, 5.2, 2.4, and 7.2 seconds

Assume exponential distribution

Mean=5.05

Uncertainty Analysis for Engineers 14

2-Parameter Example

Measure cycles to failure of saturated sand (25, 20, 28, 33, 26 cycles)

Assume lognormal distribution

Mean=26.4

Standard Deviation=4.72

Solve for  and 

 =3.26

 =0.177

Uncertainty Analysis for Engineers 15

Solution

f

L

 i n 

1

1

2

  x i exp

1

2

 ln( x i

)

  



2

1

2

  ln( L )

  n ln( x i exp

1

2

 ln( x i

)

  



2

2

)

 n ln(

)

 i n 

1 ln

  i

1

2

2

1

2

 

 i n 

1

1 x i

 exp

1

2

2 i n 

1

 ln( x i

)

  

2 i n 

1

 ln( x i

)

  

2

 ln(

 

L )

 ln(

 

L )

1

2 i n 

1

 ln( x i

)

  

0

 n

1

3 i n 

1

 ln( x i

)

  

2 

0

 

1 n i n 

1 ln( x i

)

3 .

26

2 

1 n i n 

1

 ln(

0 .

164 x i

)

  

2 

0 .

027

Uncertainty Analysis for Engineers 16

Minimum Mean Square Error

Choose parameters to minimize mean squared error between measured data and continuous distribution

Essentially a curve fit

Uncertainty Analysis for Engineers 17

Approach

Excel

◦ Guess parameters

◦ Calculate sum of squares of errors

◦ Vary guessed parameters to minimize error

(use the Solver)

Matlab

◦ Use fminsearch function

Uncertainty Analysis for Engineers 18

Example

Solar insolation data

◦ Gather data

◦ Form histogram

◦ Normalize histogram by number of samples and width of bins

Uncertainty Analysis for Engineers 19

4500

4000

3500

3000

2500

2000

1500

1000

500

0

0

Scatter Plot and Histogram

5 10 15 20 25

12

10

8

6

4

2

0

3500

30 35

3700 3900 4100 4300

Uncertainty Analysis for Engineers

4500

20

Normal and Weibull Fits

0,0035

0,003

0,0025

0,002

0,0015

0,001

0,0005

0

3500

Mean=3980 (fit)

Mean=3915 (data)

3700 3900 4100 4500

0,003

0,0025

0,002

0,0015

0,001

0,0005

0

3500 3700 3900 4100 4300

Uncertainty Analysis for Engineers

4500

21

Excel Screen Shot

Uncertainty Analysis for Engineers 22

Excel Screen Shot

Uncertainty Analysis for Engineers 23

Solver Set Up

Uncertainty Analysis for Engineers 24

Matlab Script

y=xlsread('matlabfit.xlsx','normal')

[s,t]=hist(y,8); s=s/((max(t)-min(t))/8)/numel(y); numpts=numel(t); zin(1)=mean(t); zin(2)=std(t); sumoferrs(zin,t,s) zout=fminsearch(@(z) sumoferrs(z,t,s), zin) sumoferrs(zout,t,s) xplot=t(1):(t(end)-t(1))/(10*numel(t)):t(end); yplot=curve(xplot,zout); plot(t,s,'+',xplot,yplot)

Uncertainty Analysis for Engineers 25

Matlab Script

function f=curve(x,z) mu=z(1); sig=z(2); f=normpdf(x,mu,sig); function f=sumoferrs(z, x, y) f=sum((curve(x,z)-y).^2);

Uncertainty Analysis for Engineers 26

Sampling Distributions

How do we assess inaccuracy in using sample mean to estimate population mean?

x

1 n i n 

1 x i

 x

Var

 x

E

1 n i n 

1 x i

1 n

Var

1 n i n 

1 x i

 n

 

1 n

2

Var

 i n 

1 x i

1 n

2

2 n

Uncertainty Analysis for Engineers 27

Conclusions

Expected value of mean is equal to population mean

Mean of sample is unbiased estimator of mean of population

Variance of sample mean is sampling error

By CLT, sample mean is Gaussian for large n

Mean of x is N(  ,  /  n)

Estimator for  improves as n increases

Uncertainty Analysis for Engineers 28

Sample Mean with Unknown

In previous derivation,  is the population mean

This is generally not known

All we have is the sample variance (s 2 )

If sample size is small, distribution will not be Gaussian

We can use a “student’s t-distribution” f

T

( t )

  f f

1 f

/

/

2

2

1

 t

2 f



1

2

 f

1

 f=number of degrees of freedom

Uncertainty Analysis for Engineers 29

Distribution of Sample Variance

s

2  n

1

1 i n 

1

 x i

 x

2

E i n 

1 s

 n

1

1

E

 i n 

1

 x i

 x

2

 n

1

1

E

 i n 

1

  x i

    x

   

2

  x i

    x

   

2  i n 

1

 x i

  

2 

2

 x i

   x

    x

  

2

 i n 

1

 x i

    x

  n  x

    x i

    i

1

 i n 

1 i n 

1

 x i

    x

    x

  

 x i

    x

 

 x

  

2 n

2 n

 

 n

2

 i

1

2 n x x i

 i n 

1

 x i

  

2

 n

 x

  

2

E s

 n

1

1

E

 i n 

1

 x i

  

2

E

 

 n

1

1

 n

2  

2

 

2

 nE

 x

  

2

Uncertainty Analysis for Engineers 30

Conclusions

Sample variance is unbiased estimator of population variance

Var

 n

4

4

4

 n n

3

1

4

E

 x

  

4

For normal variates

( n

1 ) s

2  i n 

1

 x i

 x

2  i n 

1

 x i

  

2

 n

 x

  

2

Chi-Square

Distribution with n-1 dof

( n

1 ) s 2

2

 i n 

1

 2

 

 

/

 n

2

This approaches normal distribution for large n

Uncertainty Analysis for Engineers 31

Testing Hypotheses

Used to make decisions about population based on sample

Steps

◦ Define null and alternative hypotheses

◦ Identify test statistic

◦ Estimate test statistic, based on sample

◦ Specify level of significance

 Type I error: rejecting null hypothesis when it is true

 Type II error: accepting null hypothesis when it is false

◦ Define region of rejection (one tail or two?)

Uncertainty Analysis for Engineers 32

Level of Significance

Type I error

◦ Level of significance (  )

◦ Typically 1-5%

Type II error (  ) is seldom used

Uncertainty Analysis for Engineers 33

Example

We need yield strength of rebar to be at least 38 psi

We order sample of 25 rebars

Sample mean from 25 tests is 37.5 psi

Standard deviation of rebar strength =3 psi

Use one-sided test

Hypotheses: null =38; alt. <38

Uncertainty Analysis for Engineers 34

Solution

Z z

 x

  n

  

1 (

)

37 .

5

38

 

0 .

833

3

25

  

1 ( 0 .

05 )

 norminv ( 0 .

05 , 0 , 1 )

 

1 .

64

So we cannot reject the null hypothesis and the supplier is considered acceptable

Uncertainty Analysis for Engineers 35

Variation of This Example

Suppose standard deviation is not known

Use student’s t-distribution

Sample stand. dev. = 3.5 psi x t

37 .

5 psi s

T f

3 .

5 psi

 x

 

25 n

1

37 .

5

38

3 .

5

24 dof

25

 

0 .

714

 tinv ( 0 .

05 , 24 )

 

1 .

711

So we cannot reject the null hypothesis and the supplier is considered acceptable

Uncertainty Analysis for Engineers 36

Third Variation

Sample size increased to 41

Sample mean=37.6 psi

Sample standard deviation = 3.75 psi

Null-variance=9

Alternative-variance>9

Use Chi-Square distribution

Uncertainty Analysis for Engineers 37

Solution

C

 n

1

2

 s

2

41

1

9

2

3 .

75

2

0 .

025 f

40

62 .

5 c

0 .

975

 chi 2 inv ( 0 .

975 , 40 )

59 .

34

So we reject the null hypothesis and the supplier is not acceptable

Uncertainty Analysis for Engineers 38

Confidence Intervals

In addition to mean, standard deviation, etc., confidence intervals can help us characterize populations

For example, the mean gives us a best estimate of the expected value of the population, but confidence intervals can help indicate the accuracy of the mean

Confidence interval is defined as the range within which a parameter will lie – within a prescribed probability

Uncertainty Analysis for Engineers 39

CI of the Mean

First, we’ll assume the variance is known

The central limit theorem states that the pdf of the mean of n individual observations from any distribution with finite mean and variance approaches a normal distribution as n approaches infinity

Uncertainty Analysis for Engineers 40

CI of the Mean

K

 x

  n

N ( 0 , 1 )

P 

K

2

P

 x

K

2

CI

K

2

K

1

2

 x

  n

 n

K

1

2

1

 

 

1

 

1

1

 x

1

1

2

 

2

 x

K

1

2

K

2

 n

;

 n

1

  x

K

1

2

 n

 Is CDF of standard normal variate

Uncertainty Analysis for Engineers 41

Example

Measure strength of rebar

25 samples

Mean=37.5 psi

Standard deviation=3 psi

Find 95% confidence interval for mean

Uncertainty Analysis for Engineers 42

Solution

K

2

K

0 .

025

   

1

0 .

975

 

1 .

96

K

1

2

K

0 .

975

   

1

0 .

975

1 .

96

0 .

95

0 .

95

37 .

5

36 .

3 ;

1 .

96

38 .

7

3

;

25

 psi

37 .

5

1 .

96

3

25

So the mean of the strength falls between 36.3 and 38.7 with a 95% confidence level

Uncertainty Analysis for Engineers 43

The Script

mu=37.5

sig=3 n=25 alpha=0.05

ka=-norminv(1-alpha/2) k1ma=-ka cil=mu+ka*sig/sqrt(n) ciu=mu-ka*sig/sqrt(n)

Uncertainty Analysis for Engineers 44

Variance Not Known

What if the variance of the population (  ) is not known?

That is, we only know variance of sample.

Let s=standard deviation of sample

We can show that x s

  n does not conform to a normal distribution, especially for small n

Uncertainty Analysis for Engineers 45

Variance Not Known

We can show that this quantity follows a

Student’s t-distribution with n-1 degrees of freedom (f) f t

( t )

P 

 t

2

, n

1

 f

1 f

/

2

2



1

 t

2 f 



1

2

 f

1

 x

  s n

 t

1

2

, n

1

1

 

Uncertainty Analysis for Engineers 46

Example

Measure strength of rebar

25 samples

Mean=37.5 psi s=3.5 psi

Find 95% confidence interval for mean

Uncertainty Analysis for Engineers 47

Script

Result is 36.06, 38.94

xbar=37.5; s=3.5; n=25; alpha=0.05; ka=-tinv(1-alpha/2,n-1); kb=-tinv(alpha/2,n-1); cil=xbar+ka*s/sqrt(n) ciu=xbar+kb*s/sqrt(n)

Uncertainty Analysis for Engineers 48

One-Sided Confidence Limit

Sometimes we only care about the upper or lower bounds

Lower 

)

1

 

 x

K

1

 

 n

)

1

 

 x

 t

1

 

, n

1

Upper

1

 

 x

K

1

 

 n s n

1

 

 x

 t

1

 

, n

1 s n

Uncertainty Analysis for Engineers 49

Example

100 steel specimens – measure strength

Mean=2200 kgf; s=220 kgf

Specify 95% confidence limit of mean

Assume  =s=220 kgf

1 =0.95;  =0.05

k

0 .

95

  

1 ( 0 .

95 )

1 .

65

 

0 .

95

2200

1 .

65

220

100

2164

Manufacturer has

95% confidence that yield strength is at least 2164 kgf

Uncertainty Analysis for Engineers 50

Example

Now only 15 steel specimen

Mean=2200 kgf; s=220 kgf

Specify 95% confidence limit of mean t

0 .

95 , 14

1 .

761

 

0 .

95

2200

1 .

761

220

15

2100

Manufacturer has

95% confidence that yield strength is at least 2100 kgf

Uncertainty Analysis for Engineers 51

Confidence Interval of Variance

P

 c

2

, n

1

2

1

 

 n

1

2 s

2

 c

1

2

, n

1 

1

 

 

  n

1

 s

2

;

 n

1

 c

1

2

, n

1 c

2

, n

1 s

2

Uncertainty Analysis for Engineers 52

Example

25 storms, sample variance for measured runoff is 0.36 in 2

Find upper 95% confidence limit for variance

2

1

 

 n

1

 s

2 c

, n

1

0 .

624 in

2

So, we can say, with 95% confidence, that the upper bound of the variance of the runoff is

0.624 in 2 and the upper bound of the standard deviation is 0.79 in

Uncertainty Analysis for Engineers 53

Script

var=0.36

n=25 alpha=0.05

c=chi2inv(alpha,n-1) ci=1/c*var*(n-1) si=sqrt(ci)

Uncertainty Analysis for Engineers 54

Measurement Theory

Suppose we are measuring distances d

1

, d

2

, …, d n are measured distances

Distance estimate is d

1 n d i i n 

1

Standard error is

 d

 s n

◦ s=standard deviation of sample

◦ d is the expected value of the mean d

1

 

 d

 t

2

, n

1 s n

; d

 t

1

2

, n

1

Uncertainty Analysis for Engineers s n

55

Download