Review Slides I

advertisement
Probability and Measure
September 2, 2011
Nonparametric Bayesian
Fundamental Problem: Estimating Distribution from a collection of
Data E. ( X a distribution-valued random variable)
P(X|E)≈ P(E|X)P(X)
P ( X | E ) is the posterior distribution given evidence E.
P ( E | X ) is the likelihood of E given the distribution X.
P ( X ) is the prior distribution of X.
For given data E, we don’t assume any particular form for X.
For example, X needs not to be Gaussian, or other particular family
of distributions.
P(X|E)≈ P(E|X)P(X)
As a simple example, consider a set with three elements {1, 2, 3 }.
Suppose we observe a sequence of occurrences { 1, 1, 2, 2, 1, 2, 1 , 2, 3, 1,
2, 2, 1, 2, 1} and we believe that the there is an unknown probability
distribution X the drives this process. That is, the points are generated
independently by sampling the distribution X.
The goal is to estimate X. Clearly, one solution is X ≈ ( 7/15, 7/15, 1/15).
This is an empirical estimate. For a small number of data point, this can be
biased.
Suppose we have a prior distribution on X (A distribution on distributions),
Say a Dirichlet distirbution (with parameter alpha).
How does the posterior change?
1
3
2
Nonparametric Bayesian
The goal is to do similar thing with a more general base set than
{1, 2, 3}. For example, with the set of real numbers R, or the unit
interval [0, 1 ] as the base set.
Difficulties:
• How to define a distribution over the set of distributions ?
• How to define integral on the set of distributions ?
• How to define convergence on the set of distributions ?
• What is the appropriate (mathematical) language ?
Probability as Integral
We all know the given a distribution P(x) over R, say a normal distribution,
the probability for the corresponding random variable X to have value
insider an interval I is
There are two problems:
How does one define the integral (existence)? (Much more important)
How does one compute the integral?
Riemann Integral
We divide the interval into small subinterval (mesh) and compute the Riemann
sum. The limit of the Riemann sum when the size of the mesh goes to zero
gives us the integral.
It can be proved that for continuous function f, its Riemann integral always
exists.
What are the shortcomings?
• What if we can’t define the mesh?
• Simple functions don’t have Riemann integrals.
Recall that our goal (an ambitious one) is to do probability on complicated
space (not just Rn ! )
In general, it is difficult to carry out Riemann’s definition to more general
spaces
Even in R1, there are functions (although not continuous) that one should be able
to integrate but can’t. For example, a function on [0, 1]
F ( x ) = 1 if x is rational, 0 otherwise.
This function is not integrable in the Riemann sense (why?)
Therefore, one can’t even talk about
Modern Approach (Lebesque Integral)
We want to integrate a (real-valued) function F(x) defined on some (abstract)
space X.
F: X  R
Here, it is the range ( R ) that is a familiar space. The domain can be
arbitrary. So if we are going to divide anything, it has to be in the range, not
the domain.
What data do we need to specify on X in
order to define the integral ?
Measurable Spaces and Measureable Functions
This is main component of Theory of Lebesque Integral
It is a very general theory in the following sense:
X can be any arbitrary set.
On X, there are two things that have been specified:
• Σ: a collection of subsets of X, called its sigma-algebra
• μ : a measure, which is a function μ : Σ  R>=0 satisfying some
properties.
If you given the triple ( X, Σ, μ), then you are at least able to talk about
Download