Poster M9: Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs Ambuj Tewari & Peter L. Bartlett Learner in an unknown MDP ? ? ? 0.9 $10 0.7 $1 Competitor knows the MDP 1.0 0.8 0.3 0.4 0.2 $0 ? ? ? 0.1 1.0 0.6 OLP - O(log T ) · OLP is based on BurnetasKatehakis algorithm · Is computationally more efficient · Similar in flavor to Auer-Ortner algorithm (NIPS 2006) · Doesn't have to know support sets of transition distributions · Has a simpler analysis · Has better (S vs. S5) state space size dependence compared to AO