Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods Alessandro Lazaric, Marcello Restelli, and Andrea Bonarini samples Politecnico di Milano ­ Milan ­ Italy successful samples Spotlight ID T5 Actorcritic RL algorithm Policies are probability distributions over actions Actor represents its policy with random samples resampling resampling Samples are propagated over time using importance sampling and resampling