LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits.

Photo by geraninmo from unsplash

Upper confidence bound (UCB)-based contextual bandit algorithms require one to know the tail property of the reward distribution. Unfortunately, such tail property is usually unknown or difficult to specify in… Click to show full abstract

Upper confidence bound (UCB)-based contextual bandit algorithms require one to know the tail property of the reward distribution. Unfortunately, such tail property is usually unknown or difficult to specify in real-world applications. Using a tail property heavier than the ground truth leads to a slow learning speed of the contextual bandit algorithm, while using a lighter one may cause the algorithm to diverge. To address this fundamental problem, we develop an estimator (evaluated from historical rewards) for the contextual bandit UCB based on the multiplier bootstrap technique. Our proposed estimator mitigates the problem of specifying a heavier tail property by adaptively converging to the ground truth contextual bandit UCB (i.e., eliminating the impact of the specified heavier tail property) with theoretical guarantees on the convergence. The design and convergence analysis of the proposed estimator is technically nontrivial. The proposed estimator is generic and it can be applied to improve a variety of UCB-based contextual bandit algorithms. To demonstrate the versatility of the proposed estimator, we apply it to improve the linear reward contextual bandit UCB (LinUCB) algorithm resulting in our bootstrapping LinUCB (BootLinUCB) algorithm. We prove that the BootLinUCB has a sublinear regret. We conduct extensive experiments on both synthetic dataset and real-world dataset from Yahoo! to validate the benefits of our proposed estimator in reducing regret and the superior performance of BootLinUCB over the latest baseline.

Keywords: tail property; proposed estimator; contextual bandit; bandit

Journal Title: IEEE transactions on neural networks and learning systems
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.