Fitted Q-iteration in continuous action-space Markovian Decision Processes András Antos, Remi Munos, Csaba Szepesvári Poster ID T6 Fitted Q-iteration: Qk+1 Dk (Qk ) = = Regress(Dk (Qk )), ( . 1 Xt , At ), Rt + max Qk (Xt+1 , b) tN bA Continuous action spaces Does this algorithm work? When? analysis Actor-critic variant Error bounds Rigorous Continuous state spaces Single trajectory input