Stochastic approximation algorithms aim to find the zeros of an objective function whose analytical form is unknown but where noisy observations are available. Such algorithms find wide applications in reinforcement learning, adaptive control and data-driven optimization as well as host of engineering applications primarily because these are model-free techniques. The most general setting in this situation where the problem complexity is perhaps the highest would involve two objective functions – a higher-level objective defining the broad overall system goal that in turn would depend on a lower-level objective with both objective functions being governed by different sets of parameters — the higher-level parameters updated on a slower timescale as opposed to the lower-level parameters. This setting is relevant in actor-critic algorithms in reinforcement learning, online algorithms for data-driven optimization as well as hierarchical reinforcement learning. The problem becomes even more difficult if one includes set-valued dynamics resulting from say partial state observations, noise that can be an iterate-dependent, non-ergodic, Markov process that further depends on a control-valued sequence in addition to a martingale-difference sequence and such dynamics and noise sequences appear in recursions on both timescales.
We present the first work that fully analyzes stochastic approximation algorithms in the aforementioned generality. We present a set of sufficient conditions under which we show that the recursion on each timescale tracks the flow of a differential inclusion obtained by averaging the set-valued drift function in each recursion with respect to a set of measures which account for both the averaging with respect to the set of stationary distributions on the Markov noise terms as well as the coupling of the recursions on the two timescales. We finally present an application to the problem of computing the optimum in a constrained convex optimization problem where the objective function and constraints are averaged with respect to the stationary distribution of an underlying Markov chain.
[This is joint work with Dr. Vinayaka Yaji, former student of CSA]
Reference:
V.G.Yaji and S.Bhatnagar, Stochastic recursive inclusions in two timescales with non-additive iterate-dependent Markov noise, Mathematics of Operations Research, 45(2):1405-1444, November 2020.
Faculty: Shalabh Bhatnagar, CSA