Adversarial Bandits with Knapsacks by Mr. Karthik Abinav Sankararaman

Location: CSA Seminar Hall (Room No. 254, First Floor)

Department of Computer Science and Automation
Department Seminar

Speaker : Mr. Karthik Abinav Sankararaman
Ph.D Student
Department of Computer Science
University of Maryland

Title : Adversarial Bandits with Knapsacks

Date : Friday, June 07, 2019

Time : 4:00 PM

Venue : CSA Seminar Hall (Room No. 254, First Floor)

Abstract

“Bandits with Knapsacks” (BwK) is a general model for multi-armed bandits under supply/budget constraints, with motivating examples ranging from dynamic pricing to repeated auctions to dynamic ad allocation to network routing and scheduling. While the prior work on BwK focused on the stochastic version, we pioneer the other extreme in which the outcomes can be chosen adversarially. The objective is to minimize the “competitive ratio”: the ratio of the benchmark reward to algorithm’s reward, as regret minimization is no longer feasible.

We design an algorithm with competitive ratio O(log T) relative to the best fixed distribution over actions, where T is the time horizon; we also prove a matching lower bound. The key conceptual contribution is a new algorithm for the stochastic version of the problem, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work. We then analyze this algorithm for the adversarial version, and use it as a subroutine to solve the latter. This approach also leads to several extensions, both for stochastic and adversarial versions such as contextual BwK and Constrained Bandit Convex Optimization.

Based on this (https://arxiv.org/pdf/1811.11881.pdf) joint work with Nicole Immorlica, Rob Schapire and Alex Slivkins.

Host Faculty : Dr. Anand Louis