LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Corruption-tolerant bandit learning

Photo from archive.org

We present algorithms for solving multi-armed and linear-contextual bandit tasks in the face of adversarial corruptions in the arm responses. Traditional algorithms for solving these problems assume that nothing but… Click to show full abstract

We present algorithms for solving multi-armed and linear-contextual bandit tasks in the face of adversarial corruptions in the arm responses. Traditional algorithms for solving these problems assume that nothing but mild, e.g., i.i.d. sub-Gaussian, noise disrupts an otherwise clean estimate of the utility of the arm. This assumption and the resulting approaches can fail catastrophically if there is an observant adversary that corrupts even a small fraction of the responses generated when arms are pulled. To rectify this, we propose algorithms that use recent advances in robust statistical estimation to perform arm selection in polynomial time. Our algorithms are easy to implement and vastly outperform several existing UCB and EXP-style algorithms for stochastic and adversarial multi-armed and linear-contextual bandit problems in wide variety of experimental settings. Our algorithms enjoy minimax-optimal regret bounds, as well as can tolerate an adversary that is allowed to corrupt upto a universally constant fraction of the arms pulled by the algorithm.

Keywords: learning corruption; corruption tolerant; algorithms; bandit; bandit learning; tolerant bandit

Journal Title: Machine Learning
Year Published: 2018

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.