Poster M9: Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs Ambuj Tewari & Peter L. Bartlett Learner in an unknown MDP
? ? ?
0.9 $10 0.7 $1

Competitor knows the MDP
1.0 0.8

0.3

0.4 0.2 $0

?

? ?

0.1 1.0

0.6

OLP


- O(log T )

· OLP is based on BurnetasKatehakis algorithm · Is computationally more efficient · Similar in flavor to Auer-Ortner algorithm (NIPS 2006)

· Doesn't have to know support sets of transition distributions · Has a simpler analysis · Has better (S vs. S5) state space size dependence compared to AO