Improving Sample Efficiency in Evolutionary RL using Off-policy Ranking

Speaker: Gugan Thoppe


Abstract

Evolution Strategy (ES) is a powerful technique for optimization based on the idea of natural evolution. In each of its iterations, a key step entails ranking candidate solutions based on some fitness score. When used in Reinforcement Learning (RL), this ranking step requires evaluating multiple policies. This is presently done via on-policy approaches, leading to increased environmental interactions. To improve sample efficiency, we propose a novel off-policy alternative for ranking. We demonstrate our idea in the context of a state-of-the-art ES method called the Augmented Random Search (ARS). Simulations in MuJoCo tasks show that, compared to the original ARS, our off-policy variant has similar running times for reaching reward thresholds but needs only around 70% as much data. It also outperforms the recent Trust Region ES. We believe our ideas should be extendable to other ES methods as well. This is joint work with my Ph.D. student Eshwar and Prof. Shishir Kolathaya.

Bio

Gugan Thoppe is an Asst. Professor at the Dept. of Computer Science and Automation, Indian Institute of Science (IISc). Before joining IISc, he was a postdoc for four years: the first two at Technion, Israel, and the next two at Duke University, USA. He has done his PhD and MS with Prof. Vivek Borkar at TIFR Mumbai, India. His PhD work won the TAA-Sasken best thesis award for 2017. He is also a two-time recipient of the IBM PhD fellowship award (2013–14 and 2014-15). His research interests include stochastic approximation and random topology and their applications to reinforcement learning and data analysis.