Probability and Measure September 2, 2011 Nonparametric Bayesian Fundamental Problem: Estimating Distribution from a collection of Data E. ( X a distribution-valued random variable) P(X|E)≈ P(E|X)P(X) P ( X | E ) is the posterior distribution given evidence E. P ( E | X ) is the likelihood of E given the distribution X. P ( X ) is the prior distribution of X. For given data E, we don’t assume any particular form for X. For example, X needs not to be Gaussian, or other particular family of distributions. P(X|E)≈ P(E|X)P(X) As a simple example, consider a set with three elements {1, 2, 3 }. Suppose we observe a sequence of occurrences { 1, 1, 2, 2, 1, 2, 1 , 2, 3, 1, 2, 2, 1, 2, 1} and we believe that the there is an unknown probability distribution X the drives this process. That is, the points are generated independently by sampling the distribution X. The goal is to estimate X. Clearly, one solution is X ≈ ( 7/15, 7/15, 1/15). This is an empirical estimate. For a small number of data point, this can be biased. Suppose we have a prior distribution on X (A distribution on distributions), Say a Dirichlet distirbution (with parameter alpha). How does the posterior change? 1 3 2 Nonparametric Bayesian The goal is to do similar thing with a more general base set than {1, 2, 3}. For example, with the set of real numbers R, or the unit interval [0, 1 ] as the base set. Difficulties: • How to define a distribution over the set of distributions ? • How to define integral on the set of distributions ? • How to define convergence on the set of distributions ? • What is the appropriate (mathematical) language ? Probability as Integral We all know the given a distribution P(x) over R, say a normal distribution, the probability for the corresponding random variable X to have value insider an interval I is There are two problems: How does one define the integral (existence)? (Much more important) How does one compute the integral? Riemann Integral We divide the interval into small subinterval (mesh) and compute the Riemann sum. The limit of the Riemann sum when the size of the mesh goes to zero gives us the integral. It can be proved that for continuous function f, its Riemann integral always exists. What are the shortcomings? • What if we can’t define the mesh? • Simple functions don’t have Riemann integrals. Recall that our goal (an ambitious one) is to do probability on complicated space (not just Rn ! ) In general, it is difficult to carry out Riemann’s definition to more general spaces Even in R1, there are functions (although not continuous) that one should be able to integrate but can’t. For example, a function on [0, 1] F ( x ) = 1 if x is rational, 0 otherwise. This function is not integrable in the Riemann sense (why?) Therefore, one can’t even talk about Modern Approach (Lebesque Integral) We want to integrate a (real-valued) function F(x) defined on some (abstract) space X. F: X R Here, it is the range ( R ) that is a familiar space. The domain can be arbitrary. So if we are going to divide anything, it has to be in the range, not the domain. What data do we need to specify on X in order to define the integral ? Measurable Spaces and Measureable Functions This is main component of Theory of Lebesque Integral It is a very general theory in the following sense: X can be any arbitrary set. On X, there are two things that have been specified: • Σ: a collection of subsets of X, called its sigma-algebra • μ : a measure, which is a function μ : Σ R>=0 satisfying some properties. If you given the triple ( X, Σ, μ), then you are at least able to talk about