Temporal Difference (TD) Up dating without a Learning Rate Marcus Hutter AN U www.hutter1.net Poster T7 NICTA IDSIA ¡ Shane Legg www.vetta.org IDSIA SUPSI USI © ¡¤ ¢¡£ ¨§¦ ¥ ¡£ ¢¡ ¡¢ ¡ £¡ · In every setting that we have tested, superior performance & fewer parameters to tune ¢¤¡ ¢£¡ ¤¡ ~ · We derived learning rate t for TD with eligibility traces from statistical principles. black=low= =good=ours ¤¡ ' & #%¡ "$ ¥ £¡ ' & #%¡ "$ ¥ # ¡£"! ¢¡ ¡ · Reinforcement learning TD update: ~ Vst+1 = Vst + t (s, st+1 )(rt + Vstt+1-Vstt )