LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Selecting multiple web adverts: A contextual multi-armed bandit with state uncertainty

Photo from wikipedia

Abstract We present a method to solve the problem of choosing a set of adverts to display to each of a sequence of web users. The objective is to maximise… Click to show full abstract

Abstract We present a method to solve the problem of choosing a set of adverts to display to each of a sequence of web users. The objective is to maximise user clicks over time and to do so we must learn about the quality of each advert in an online manner by observing user clicks. We formulate the problem as a novel variant of a contextual combinatorial multi-armed bandit problem. The context takes the form of a probability distribution over the user's latent topic preference, and rewards are a particular nonlinear function of the selected set and the context. These features ensure that optimal sets of adverts are appropriately diverse. We give a flexible solution method which combines submodular optimisation with existing bandit index policies. User state uncertainty creates ambiguity in interpreting user feedback which prohibits exact Bayesian updating, but we give an approximate method that is shown to work well.

Keywords: web; multi armed; state uncertainty; armed bandit

Journal Title: Journal of the Operational Research Society
Year Published: 2020

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.