Many paths to computing normalizing constants

Many paths to computing normalizing constants Yuri Burda yburda@gmail.com 08/15/2014 CIFAR NCAP Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 1 / 13 Annealed importance sampling Normalizing constants Need for normalizing constants We often have models where p px q9f px q with simple f px q. For instance if H px q is a function, then p px q distribution. p q is a probability 1 H x Ze Examples: exponential family, Boltzmann machines etc. Want to compute p px q, hence also Z Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 2 / 13 Annealed importance sampling Normalizing constants Need for normalizing constants We often have models where p px q9f px q with simple f px q. For instance if H px q is a function, then p px q distribution. p q is a probability 1 H x Ze Examples: exponential family, Boltzmann machines etc. Want to compute p px q, hence also Z Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 2 / 13 Annealed importance sampling Normalizing constants Need for normalizing constants We often have models where p px q9f px q with simple f px q. For instance if H px q is a function, then p px q distribution. p q is a probability 1 H x Ze Examples: exponential family, Boltzmann machines etc. Want to compute p px q, hence also Z Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 2 / 13 Annealed importance sampling Normalizing constants Need for normalizing constants We often have models where p px q9f px q with simple f px q. For instance if H px q is a function, then p px q distribution. p q is a probability 1 H x Ze Examples: exponential family, Boltzmann machines etc. Want to compute p px q, hence also Z Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 2 / 13 Annealed importance sampling Normalizing constants What is Z ? p px q Z ³ X 1 Zf px q f px q dx Such integral is often intractable Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 3 / 13 Annealed importance sampling Normalizing constants What is Z ? p px q Z ³ X 1 Zf px q f px q dx Such integral is often intractable Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 3 / 13 Annealed importance sampling Normalizing constants What is Z ? p px q Z ³ X 1 Zf px q f px q dx Such integral is often intractable Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 3 / 13 Annealed importance sampling Normalizing constants Approximation - Importance Sampling Z ³ X f px q dx ³ p q q px q dx pq f x q x N1 ° qf ppxx qq Observation: variation of the estimate is i i 1 N Varq pf {q q If p and q are not close, N has to be too large to get useful estimates Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 4 / 13 Annealed importance sampling Normalizing constants Approximation - Importance Sampling Z ³ X f px q dx ³ p q q px q dx pq f x q x N1 ° qf ppxx qq Observation: variation of the estimate is i i 1 N Varq pf {q q If p and q are not close, N has to be too large to get useful estimates Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 4 / 13 Annealed importance sampling Normalizing constants Approximation - Importance Sampling Z ³ X f px q dx ³ p q q px q dx pq f x q x N1 ° qf ppxx qq Observation: variation of the estimate is i i 1 N Varq pf {q q If p and q are not close, N has to be too large to get useful estimates Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 4 / 13 Annealed importance sampling Normalizing constants Approximation - Importance Sampling Z ³ X f px q dx ³ p q q px q dx pq f x q x N1 ° qf ppxx qq Observation: variation of the estimate is i i 1 N Varq pf {q q If p and q are not close, N has to be too large to get useful estimates Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 4 / 13 Annealed importance sampling Normalizing constants Approximation - Importance Sampling Z ³ X f px q dx ³ p q q px q dx pq f x q x N1 ° qf ppxx qq Observation: variation of the estimate is i i 1 N Varq pf {q q If p and q are not close, N has to be too large to get useful estimates Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 4 / 13 Annealed importance sampling Normalizing constants Annealed Importance Sampling Take a sequence of distributions q0 tractable q0 , and qi px q Z1i fi px q p0q q1 . . . qn p with simple p0q Sample x1 , . . . , xN from q0 , p0q {q x p0q approximates Z , 0 1 i average of f1 xi p1q p1q get samples x1 , . . . , xN from q1 , p1q {q x p1q approximates Z , 1 2 i average of f2 xi etc., until we get an estimate of Zn Yuri Burda (yburda@gmail.com) Z Many paths to computing normalizing constants 08/15/2014 5 / 13 Annealed importance sampling Normalizing constants Annealed Importance Sampling Take a sequence of distributions q0 tractable q0 , and qi px q Z1i fi px q p0q q1 . . . qn p with simple p0q Sample x1 , . . . , xN from q0 , p0q {q x p0q approximates Z , 0 1 i average of f1 xi p1q p1q get samples x1 , . . . , xN from q1 , p1q {q x p1q approximates Z , 1 2 i average of f2 xi etc., until we get an estimate of Zn Yuri Burda (yburda@gmail.com) Z Many paths to computing normalizing constants 08/15/2014 5 / 13 Annealed importance sampling Normalizing constants Annealed Importance Sampling Take a sequence of distributions q0 tractable q0 , and qi px q Z1i fi px q p0q q1 . . . qn p with simple p0q Sample x1 , . . . , xN from q0 , p0q {q x p0q approximates Z , 0 1 i average of f1 xi p1q p1q get samples x1 , . . . , xN from q1 , p1q {q x p1q approximates Z , 1 2 i average of f2 xi etc., until we get an estimate of Zn Yuri Burda (yburda@gmail.com) Z Many paths to computing normalizing constants 08/15/2014 5 / 13 Annealed importance sampling Normalizing constants Annealed Importance Sampling Take a sequence of distributions q0 tractable q0 , and qi px q Z1i fi px q p0q q1 . . . qn p with simple p0q Sample x1 , . . . , xN from q0 , p0q {q x p0q approximates Z , 0 1 i average of f1 xi p1q p1q get samples x1 , . . . , xN from q1 , p1q {q x p1q approximates Z , 1 2 i average of f2 xi etc., until we get an estimate of Zn Yuri Burda (yburda@gmail.com) Z Many paths to computing normalizing constants 08/15/2014 5 / 13 Annealed importance sampling Normalizing constants Annealed Importance Sampling A smarter scheme along these lines works even when the x pi 1q sampled from Ti 1 x pi q , with Ti 1 a Markov chain operator with stationary distribution qi It’s called AIS Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 6 / 13 Annealed importance sampling Normalizing constants Annealed Importance Sampling A smarter scheme along these lines works even when the x pi 1q sampled from Ti 1 x pi q , with Ti 1 a Markov chain operator with stationary distribution qi It’s called AIS Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 6 / 13 Annealed importance sampling Normalizing constants Choice of qi Common choice: with 0 β0 . . . βn 1 Yuri Burda (yburda@gmail.com) qi 9p01βi p βi Many paths to computing normalizing constants 08/15/2014 7 / 13 RBM example Samples RBM We will try it on a binary RBM with 500 hidden units trained on MNIST with PCD. Samples from RBM: Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 8 / 13 RBM example Samples Chain of samples when running AIS AIS run with 5000 intermediate distributions, βi linearly spaced between 0 and 1 n0 Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 9 / 13 RBM example Samples Chain of samples when running AIS AIS run with 5000 intermediate distributions, βi linearly spaced between 0 and 1 n 1000 Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 9 / 13 RBM example Samples Chain of samples when running AIS AIS run with 5000 intermediate distributions, βi linearly spaced between 0 and 1 n 2000 Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 9 / 13 RBM example Samples Chain of samples when running AIS AIS run with 5000 intermediate distributions, βi linearly spaced between 0 and 1 n 3000 Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 9 / 13 RBM example Samples Chain of samples when running AIS AIS run with 5000 intermediate distributions, βi linearly spaced between 0 and 1 n 4000 Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 9 / 13 RBM example Samples Chain of samples when running AIS AIS run with 5000 intermediate distributions, βi linearly spaced between 0 and 1 n 5000 Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 9 / 13 RBM example Samples Chain of samples when running AIS AIS run with 5000 intermediate distributions, βi linearly spaced between 0 and 1 Another 1000 Gibbs samples Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 9 / 13 RBM example Samples Idea: replace hidden units by their average activations. Samples: n 0, 20 units Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 10 / 13 RBM example Samples Idea: replace hidden units by their average activations. Samples: n 1000, 100 units Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 10 / 13 RBM example Samples Idea: replace hidden units by their average activations. Samples: n 2000, 200 units Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 10 / 13 RBM example Samples Idea: replace hidden units by their average activations. Samples: n 3000, 300 units Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 10 / 13 RBM example Samples Idea: replace hidden units by their average activations. Samples: n 4000, 400 units Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 10 / 13 RBM example Samples Idea: replace hidden units by their average activations. Samples: n 5000, 500 units Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 10 / 13 RBM example Samples Numbers With 5000 intermediate distributions geometric averages path underestimates Z by about 3 nats, while the one with varying number of hidden units underestimates it only by 1 nat. With 500 intermediate distributions geometric averages path underestimates Z by about 5 nats, while the one with varying number of hidden units underestimates it only by 8 nats. Not clear which is better. Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 11 / 13 RBM example Samples Numbers With 5000 intermediate distributions geometric averages path underestimates Z by about 3 nats, while the one with varying number of hidden units underestimates it only by 1 nat. With 500 intermediate distributions geometric averages path underestimates Z by about 5 nats, while the one with varying number of hidden units underestimates it only by 8 nats. Not clear which is better. Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 11 / 13 RBM example Samples Numbers With 5000 intermediate distributions geometric averages path underestimates Z by about 3 nats, while the one with varying number of hidden units underestimates it only by 1 nat. With 500 intermediate distributions geometric averages path underestimates Z by about 5 nats, while the one with varying number of hidden units underestimates it only by 8 nats. Not clear which is better. Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 11 / 13 RBM example Samples Conclusion Would be nice to have a way to go from a large RBM to smaller “fuzzier” one without going through meaningless distributions in between. See also “Annealing Between Distributions by Averaging Moments” by R. Grosse Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 12 / 13 RBM example Samples Conclusion Would be nice to have a way to go from a large RBM to smaller “fuzzier” one without going through meaningless distributions in between. See also “Annealing Between Distributions by Averaging Moments” by R. Grosse Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 12 / 13 Questions? Many paths to computing normalizing constants Yuri Burda yburda@gmail.com 08/15/2014 CIFAR NCAP Yuri Burda (yburda@gmail.com) Many paths to computing normalizing constants 08/15/2014 13 / 13

Many paths to computing normalizing constants

Related documents

Products

Support

Many paths to computing normalizing constants

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib