Ballooning Multi-Armed Bandits
Online platforms such as Q&A forums (e.g. StackExchange, Reddit) receive high volume content that streams in from different sources at high speed. These platforms need to rapidly identify and display high quality content to the users after weeding out garbage content. We formulate this problem using Multi-Armed Bandits (MAB) with arms modeling arriving content. We call the model “Ballooning MAB” since the arms (streaming inputs) grow with time. The objective is to achieve extremely low, sublinear regret in being able to display the highest quality content by observing the user reactions to the content. First, we show that when new content arrives with unknown, uniformly distributed quality, it is impossible to achieve a sublinear regret guarantee. We then make a realistic assumption, motivated by several empirical studies in the literature, that the best (highest quality) arms are expected to arrive “early.” This enables to drop “later” arms without exploring them at all and we design a threshold-based algorithm that achieves sublinear regret. In particular, we exactly quantify the sublinear regret when the best arm arrival follows a sub-exponential or sub-Pareto tail distribution and show that the algorithms achieve substantially low sublinear regret.
This is joint work with Ganesh Ghalme (Postdoc, Technion); Swapnil Dhamal (Postdoc, Chalmers); Shweta Jain (IIT-Ropar); and Sujit Gujar (IIIT-Hyderabad).
Ganesh Ghalme, Swapnil Dhamal, Shweta Jain, Sujit Gujar, Y. Narahari. Ballooning Multi-Armed Bandits, Artificial Intelligence, Volume 296, 2021, 31 pages.,
Faculty: Y. Narahari, CSA