Bayes. Policy Learning with Transdimensional MCMC M. Hoffman, N. de Freitas, A. Doucet, A. Jasra Problem: want to find policy parameters maximizing P kk J () = E r(xk ) large, highly peaked reward start Idea: use reversible jump MCMC to sample from () J ()P () Focus samples on areas of high reward. small, broad reward Focus on optimizing here Poster ID: M7