Q-learning for non-Markovian environments

Speaker: Vivek Borkar


Abstract

Reinforcement learning algorithms are based on the premise that the underlying controlled dynamics is a Markov Decision Process (MDP). However, in practice this may not always be the case. This talk will address the issue when this assumption fails, using the classical Q-learning as a test case. The talk will describe some recent attempts towards clarifying this issue, including our own. (Joint work with Siddharth Chandak, Parth Dodhia, Pratik Shah).

Bio

Vivek Borkar obtained his B.Tech. from IIT Bombay, M.S. from Case Western Reserve Uni., and Ph.D. from Uni. of California, Berkeley, in 1976, 1977, 1980 respectively. He has worked in TIFR CAM and IISc in Bengaluru and TIFR and IIT Bombay in Mumbai. He is currently an Emeritus Fellow in IIT Bombay. His research interests are stochastic control and optimization, encompassing theory, algorithms and applications.