In the standard optimal stopping problems, actions are artificially restricted to the moments of observations of costs or benefits. In the standard experimentation and learning models based on two-armed Poisson… Click to show full abstract
In the standard optimal stopping problems, actions are artificially restricted to the moments of observations of costs or benefits. In the standard experimentation and learning models based on two-armed Poisson bandits, it is possible to take an action between two sequential observations. The latter models do not recognize the fact that timing decisions depend not only on the rate of arrival of observations, but also on the stochastic dynamics of costs or benefits. We combine these two strands of literature and consider optimal stopping problems with random observations and updating. We formulate the dichotomy principle, an extension of the smooth pasting principle.
               
Click one of the above tabs to view related content.