MIT_ML_Est1

advertisement

Modeling and Estimation of

Uncertain Systems

Lecture 1:

Uncertainty I: Probability and Stochastic

Processes

2

Modeling and Estimation

What is the problem?

Two distinct activities:

Modeling is the activity of constructing a mathematical description of a system of interest and encompasses specification of model structure and its parameterization,

Estimation is concerned with determining the “state” of a system relative to some model.

Broadly speaking, there are four possible cases:

Well defined system, rich data source(s),

Poorly defined system, rich data source(s),

Well defined system, sparse data,

Poorly defined system, sparse data.

Increasing difficulty

Classification characterized by amount of a priori information that can be embedded in model and amount of data available for inference.

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

3

Uncertainty

“What” is the problem!

Uncertainty permeates every aspect of this problem:

What parts of the system are important?

What are the right descriptions of the constituents? What is the right way to describe their interactions?

What are the available observations? What are the dynamics of the observation processes?....

Two types of uncertainty:

 Epistemic: Uncertainty due to a lack of knowledge of quantities or processes of the system or the environment. AKA subjective uncertainty , reducible uncertainty , or model form uncertainty .

 Aleatory: Inherent variation associated with the physical system or the environment. AKA variability , irreducible uncertainty , or stochastic uncertainty .

Many different “uncertainty” theories, each with their own strengths and weaknesses.

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

4

Syllabus

Lecture Series Agenda

1. Uncertainty I: Probability and Stochastic

Processes

2. Filtering I: Kalman Filtering

3. Filtering II: Estimation

– The Big Picture

4. Uncertainty II: Information Theory

5. Model Inference I: Symbolic Dynamics and the

Thermodynamic Formalism

6. Model Inference II: Probabilistic Grammatical

Inference

7. Uncertainty III: Representations of Uncertainty

8. Decision under Uncertainty: Plausible Inference

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

5

Probability and Stochastic Processes

Lecture 1 Agenda

1.

What is probability?

a) Frequentist Interpretation b) Bayesian Interpretation

2.

Calculus of Probability a) Probability Spaces b) Kolmogorov’s Axioms c) Conditioning and Bayes ’ Theorem

3.

Random Variables (RVs) a) Distribution and Density Functions b) Joint and Marginal Distributions c) Expectation and Moments

4.

Stochastic Processes a) Stationarity b) Ergodicity

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

Frequentist Interpretation

6

Physical Interpretation

Probabilities of events are associated with their relative frequencies in a long run of trials:

Associated with random physical systems (e.g., dice),

Makes sense only in the context of well defined situations.

Frequentism

If n

A is the number of occurances of event A in n trials, then

P r f Ag = lim n ! 1 n n

A

= p

The odds of getting “Heads” in a fair coin toss is ½ because its been demonstrated empirically not because there are two equally likely events.

Propensity Theory

Interpret probability as the “propensity” for an events occurance,

 Explain long run frequencies via the

P r

¯

¯ n

A o

Law of Large Numbers

!

1 as n !

1

( LLN ); n

¡ p

¯

¯

¯· "

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

7

Bayesian Interpretation

Evidentiary Interpretation

Probability can be assigned to any statement whatsoever, even when no random process is involved:

Represents subjective plausibility or the degree to which the statement is supported by available evidence,

Interpreted as a “measure of a state of knowledge.”

Bayesian approach specifies a prior probability which is then updated in the light of new information.

Objectivism

Bayesian statistics can be justified by the requirements of rationality and consistency and interpreted as an extension of logic,

Not dependent upon belief.

Subjectivism

Probability is regarded as a measure of the degree of belief of the individual assessing the uncertainty of a particular situation,

Rationality and consistency constrain the probabilities.

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

8

Probability Spaces

Basis for Axiomatic Probability Theory

A probability space is a triple (

, F , ½ )

 is the set of all possible outcomes, known as the sample space,

F is a set of events , where each event is a subset of

 containing zero or more outcomes, F must form a ¾ -algebra under complementation and intersection,

½ is a measure of the probability of an event and is called a probability measure .

Example: Pairs of (fair) coin tosses:

- = f H H; H T; TH; TTg

F = f ? ; f H H g; f H T g; f T H g; f T; T g; f H H ; T T g; f H T; T H g; f H H ; H T; T H g; f H T; T H ; T T g; f H H ; H T; T H ; T T gg

½(E) = jEj=j- j

Describes processes containing states that occur

randomly.

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

Kolmogorov Axioms

The Probability Axioms

Constraints on ½ are needed to ensure consistency

Kolmogorov Axioms:

Non-negativity: The probability of an event is a non-negative real

½(E) ¸ 0

Unit measure: The probability that some event in the entire sample

9

 ¾ -additivity: The probability of the union (sum) of a collection of non-intersecting sets is equal to the sum of the probabilities of each

F n = 1

A n f A g

F ½(

1 n = 1

A n

) =

P

1 n = 1

½(A n

)

A measure which satisfies these axioms is known as a

probability measure.

Not the only set of probability axioms, merely the most common.

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

10

Conditioning

Basic Probabilistic Inference

Assume that if an event E occurs we will only know the event that has occurred is in E ( E 2 E ). What is the probability that E has occurred given E has occurred?

 If E E

 The relative probabilities of events in E remain unchanged, i.e., if

E

1

, E

2

µ E , with ½ ( E

2

) ¸ 0 , then

½(E

1

) ½(E

1 jE)

=

½(E

2

) ½(E

2 jE)

E

E

A little bit of algebra yields

½(E jE) =

½(E \ E)

½(E) E \ E

 We call ½ ( E | E ) the conditional probability of E and say that it is conditioned on E .

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

11

Conditioning Example

Converging to the right answer…

Consider a biased coin that has a bias (towards heads) of either b

H

= 2/3 or b

T

= 1/3 ;

Assume b

T is deemed more likely with ½ ( b

T

) = 0.99

Assume coin is tossed 25 times and heads comes up 19 times…

We’ve probably made a bad assumption and would like to update the probability based upon new information:

2 26 possible events (2 25 for two biases),

The prior probability of a coin having a bias b

T particular sequence E

½(b

T n

\ E n with n heads is

µ n

1

) = 0:99

3

µ

2

3

¶ and getting a

25¡ n

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

12

Conditioning Example (Cont.)

Converging to the right answer…

Thus, given 19 heads, we have

½(b

T

\ E

19

) = 0:99

µ

1

19

µ

µ

3

2

19

µ

½(b

H

\ E

19

) = 0:01

3 and

½(E

19

) = ½(b

T

\ E

19

) + ½(b

H

2

6

3

1

6

3

\ E

19

)

Conditioned on seeing the sequence E

19

, the probability that the coin has bias b

T is thus

½(b

T

) =

0:99(1=3) 19 (2=3) 6

½(E

19

)

=

99

99 + 2 13

¼ 0:01

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

13

Bayes ’ Rule

Inverse Conditioning

Bayes ’ Rule: for ½ ( E ) , ½ ( E ) > 0

½(EjE )½(E )

½(E jE) =

½(E)

One the most widely used results in probability theory as it provides a fundamental mechanism for updating beliefs.

Example, consider a test known to be 99% reliable. If this test indicates that an event E has occurred, how likely is it that event has occurred?

What does 99% reliable mean?

Assume that it means

– 99% of the time E occurs, the test indicates correctly that it has occurred (false negative rate), and,

– 99% of the time that E does not occur, the source correctly indicates that it does not occur (false positive rate).

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

14

Bayes ’ Rule Example (Cont.)

The Reliability of Tests

 Let P be the event that the test indicates that E has occurred. By Bayes ’ rule we have

½(P jE )½(E )

½(E jP ) =

½(P )

Since the (positive) reliability is 99%, we have ½ ( P | E ) = 0.99

,

Note, we cannot compute the ½ ( E | P ) without additional information, i.e., ½ ( E ) .

Though it looks like we also need ½ ( P ) , we can in fact construct this from the reliability (positive and negative): where

½(E \ P ) = ½(P jE )½(E ) = 0:99½(E )

¡

½( ¹ = ½(P j ¹ P j ¹

= 0:01 (1 ¡ ½(E ))

¢

(1 ¡ ½(E ))

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

15

Bayes ’ Rule (Cont.)

The Reliability of Tests

 hence

½(P ) = 0:99½(E ) + 0:01 ¡ 0:01½(E )

= 0:01 + 0:98½(E )

Substituting into Bayes ’ rule produces

½(E jP ) =

0:99½(E )

0:01 + 0:98½(E )

N.B. We cannot determine the probability of event E conditioned on a positive test result P without knowing the probability of E , i.e., ½ ( E ) .

Only if the event is very common does the reliability approximate the probability!

r ( E )

0.01

0.001

0.3333… r ( E | P )

0.5

≈ 0.9

0.98

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

Random Variables

Alea iacta est

A random variable (RV) x is a process of assigning a number x ( E ) to every event E . This function must satisfy two conditions:

 The set { x · x } is an event for every x .

 The probabilities of the events x = 1 and x = 1 equals zero.

The key observation here is that random variables provide a tool for structuring sample spaces.

In many cases, some decision or diagnosis must be made upon the basis of expectation and RVs play a key role in computing these expectations.

Note that a random variable does not have a value per se. A realization of a random variable does, however have a definite value.

16

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

17

Probability Distributions

Spreading the Wealth Around

The elements of the set S that are contained in the event

{ x · x } change as the number x takes on different values. The probability Pr { x · x } of the event

{ x · x } is, therefore, a number that depends on x . This number is expressed in terms of a (cumulative)

distribution function of the random variable x and is denoted F x

( x ) . Formally, we say F x

( x )= Pr { x · x } for every x . The derivative dF x

(x) f x

(x) = dx is known as the density function and is closely related to the measure ½ introduced earlier. For our purposes, we will treat this density function as the specification of ½ .

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

Distribution Example

Normal Distribution

18

CDF: F x

(x) =

1

·

1 + erf

2

µ x ¡ ¹ p

2¾ 2

¶ ¸

Distributions can be discrete, continuous, or hybrid

PDF: f x

(x) = p

1

2¼¾ 2 e

¡

( x ¡ ¹ ) 2

2 ¾2

Discrete CDF

Continuous CDF

Modeling and Estimation of Uncertain Systems

Hybrid CDF

Lecture 1

May 15, 2011

Joint Distributions

Building Up Multivariate Distributions

 A probability distribution that is a function of multiple

RVs is a multivariate distribution and is defined by the joint distribution

F (x

1

; : : : ; x

N

) = Prf x

1

· x

1

; : : : ; x

N

· x

N g

 and the associated joint density function can be determined via partial differentiation f (x

1

; : : : ; x

N

) =

F

@x

1

: : : @x

N

¯

¯

¯

¯ x

The probability that an event lies within a domain is thus Z

Pr(x

1

; : : : ; x

N

2 D ) = f x

1

;:::;x

N

(x

1

; : : : ; x

N

)dx

1

; : : : ; dx

N

D

19

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

Independence and Marginal Distributions

Breaking Down Multivariate Distributions

 We say that the RVs are independent if f x

1

;:::;x

N

(x

1

; : : : ; x

N

) = f x

1

(x

1

); : : : ; f x

N

(x

N

)

 The statistics of a subset of the random variables a multivariate distribution are known as marginal

statistics. The associated distributions are known as

marginal distributions and are defined

Z Z Z Z f x i

(x i

) = ¢¢¢ ¢¢¢ f x

1

;:::;x

N

(x

1

; : : : ; x

N

)dx

1

¢¢¢dx i ¡ 1 dx i + 1

¢¢¢dx

N

D

1

D i ¡ 1

D i + 1

D

N

 The distribution of the marginal variables is said to be obtained by marginalizing over the distribution of the variables being discarded and the discarded variables are said to have been marginalized out.

20

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

Example Multivariate Distributions

Visualizing Joint and Marginal Distributions

21

Joint Distribution

Marginal

Distributions

Modeling and Estimation of Uncertain Systems

Marginal distributions are projections of the joint distribution

Lecture 1

May 15, 2011

Expected Values

What did you expect?

The expected value, or mean of an RV x is defined by the integral

Z 1

E f x g = xf x

(x)dx

¡ 1

This is commonly denoted h x or just h .

For RVs of discrete (lattice) type, we obtain the

E f x g = p i x i p i

= Prf x = x i g i

The conditional mean or conditional expected value is obtained by replacing f x

( x ) with the conditional density f ( x | E )

Z 1

E f x jEg = xf (xjE)dx

¡ 1

22

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

23

Variance and Higher Moments

Concentration and Distortion

The variance is defined by the integral

Z 1

¾

2

= (x ¡ ´ )

2 f x

(x)dx

¡ 1

The constant ¾ , also denoted ¾ x

, is called the standard deviation of x ,

The variance measures the concentration of probability mass near the mean h .

This is the second (central) moment of the distribution, other moments of interest are:

Moments: m n

= E f x

Central Moments: n g =

R

1

¡ 1 x n

¹ n

= E f (x ¡ ´ ) f (x)dx n g =

1

¡ 1

 Absolute Moments: Ef jxj n g Ef jx ¡ ´ j n g

(x ¡ ´ ) n f (x)dx

 Generalized Moments: Ef (x ¡ a) n g Ef jx ¡ aj n g

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

24

Stochastic Processes

Generalized RVs

A stochastic process x( t ) is a rule for assigning a function x( t , E ) to every event E .

We shall denote stochastic processes by x( t ) and hence x( t ) can be interpreted several ways:

A family, or ensemble , of functions x( t , E ) [ t and E are variable] ,

A single time function (or sample of the process) [ E is fixed],

A random variable [ t is fixed],

A number [ t and E are fixed].

Examples:

Brownian motion x( t ) consists of the motion of all particles

(ensemble), A realization x( t , E i

) is the motion of a specific particle,

 family of pure sine waves and a single sample of x(t; E i

) = r (E i

) cos(! t + Á(E i

))

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

25

Statistics of Stochastic Processes

Time Dependence

First order properties:

For a specific t , x( t ) is an RV with distribution

F (x; t) = Prf x(t) · xg

 F (x, t ) is called the first-order distribution of x( t ) . Its derivative w.r.t. x is called the first-order density of x( t );

@F (x; t) f (x; t) =

@x

Second-Order properties:

The mean h ( t ) of x( t ) is the expected value of the RV x( t ) ;

1

´ (t) = E f x(t)g =

R xf (x; t)dx

¡ 1

The autocorrelation R ( t

1

, t

2

) of x( t ) is the expected value of the product x( t

1

)x( t

2

) :

R

1

R(t

1

; t

2

) = E f x(t

1

)x(t

2

)g =

¡ 1 x

1 x

2 f (x

1

; x

2

; t

1

; t

2

)dx

1 dx

2

The autocovariance C ( t

1

, t

2

) of x( t ) is the covariance of the RVs x( t

1

) and x( t

2

) : C(t

1

; t

2

) = R(t

1

; t

2

) ¡ ´ (t

1

)´ (t

2

)

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

26

Stationarity

Hard to hit a moving target

A stochastic process x( t ) is called strict-sense

stationary (SSS) if its statistical properties are invariant to a shift of the origin: f (x

1

; : : : ; x n

; t

1

; : : : ; t n

) = f (x

1

; : : : ; x n

; t

1

+ c; : : : ; t n

+ c) for any c .

A stochastic process is called wide-sense stationary

(WSS) if its mean is constant

Ef x(t)g = ´

And its autocorrelation depends only on ¿ = t

1

Ef x(t + ¿)x

¤

(t)g = R(¿)

– t

2

A SSS process is WSS.

Stationarity basically says that the statistical properties don’t evolve in time.

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

27

Ergodicity

Mixing it up

Ergodicity is a property connected with the homogeneity of a process. A process whose time average is the same as its space or ensemble average is said to be mean-ergodic.

This definition can be extended to include other statistics as well

(e.g., covariance),

This is also a measure of how well the process “mixes.”

Example: Brownian motion

 The average motion of a specific particle will tend toward the ensemble average of all of the particle’s motion.

Ergodicity is important because it tells us how long or how often a process must be sampled in order for its statistics to be estimated.

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

28

References

Some Good Books….

Athanasios Papoulis, Probability, Random Variables, and Stochastic Processes, Third Ed., McGraw-Hill,

New York, NY, 1991.

E.T. Jaynes, Probability: The Logic of Science,

Cambridge University Press, Cambridge, UK, 2003.

Eugene Wong, Stochastic Processes in Information and Dynamical Systems, McGraw-Hill, New York,

NY, 1971.

Modeling and Estimation of Uncertain Systems

Lecture 1

May 15, 2011

Download