Bayes. Policy Learning with Transdimensional MCMC
M. Hoffman, N. de Freitas, A. Doucet, A. Jasra

Problem: want to find policy parameters maximizing P kk J () = E  r(xk )

large, highly peaked reward start

Idea: use reversible jump MCMC to sample from

()  J ()P ()
Focus samples on areas of high reward.

small, broad reward Focus on optimizing here

Poster ID: M7