Title:
Doubly robust methods for multi armed bandits
Abstract:
In a multi-arm bandit (MAB) setting, in each epoch, the decision maker chooses one arm and observes the reward associated with only that arm, i.e. the rewards of all other arms are missing! Doubly robust (DR) estimation is a well-known technique in the statistics literature on handling missing data. The pseudo-reward samples generated by the DR technique are unbiased provided either the model for the data or the probability that the data is missing is known. In the MAB setting, the probability that the data for an arm is missing is known! Therefore, the DR technique generates unbiased pseudo-rewards for the unselected arms. In each round, the conventional methods for MAB update the estimate for the reward of the chosen arm, and the DR estimator imputes the missing rewards and computes new estimates for all arms in all rounds. Thus, there is a possibility that DR estimates for the rewards of all arms converge uniformly independent of the specific arm selection policy used. This potentially allows us to simultaneously reduce regret and optimize other criteria, e.g. identify the best arm, or the arms on a Pareto front. We show that this is indeed possible in many contexts such as revenue management, Pareto front identification, and sparse linear reinforcement learning, and this leads to improved theoretical guarantees and empirical performance.
Joint work with Wonyoung (Tim) Kim and Assaf Zeevi
Bio:
Garud Iyengar is the Tang Professor of Operations at Columbia Engineering. He received his B. Tech. in Electrical Engineering from IIT Kanpur, and an MS and PhD in Electrical Engineering from Stanford University. His research interests are broadly in control, machine learning and optimization. His current projects focus on the areas of large-scale power systems and supply chains, causal inference, and modeling of cellular processes. He was elected an INFORMS Fellow in 2018. He was the Chair of the Department of Industrial Engineering and Operations Research from 2013-19, and the Associate Director for Research at the Columbia Data Science Institute from 2017-19. He has been an Amazon Scholar since 2019. He is currently the Senior Vice Dean for Research and Academic Programs at Columbia Engineering.