Collaborative Multi-agent Bandits

Speaker: Sanjay Shakkottai


Abstract

In this talk, we consider a (networks+learning) bandit, where N agents collaborate by exchanging only arm recommendations through pairwise gossip over a communication network to determine the best resource (aka arm in a bandit). We establish that even with very limited communications, the regret per agent is a factor of order N smaller compared to the case of no collaborations. Furthermore, we show that the communication constraints only have a second order effect on regret. Based on joint work with Ronshee Chawla, Abishek Sankararaman and Ayalvadi J. Ganesh.

Bio

Sanjay Shakkottai received his Ph.D. from the ECE Department at the University of Illinois at Urbana-Champaign in 2002. He is with The University of Texas at Austin, where he is currently the Temple Foundation Endowed Professor No. 3, and a Professor in the Department of Electrical and Computer Engineering. He received the NSF CAREER award in 2004, and was elected as an IEEE Fellow in 2014. His research interests lie at the intersection of algorithms for resource allocation, statistical learning and networks, with applications to wireless communication networks and online platforms.