We consider the NP-hard Stochastic Orienteering Problem, where the goal is to navigate between start and end vertices in a graph, maximizing the sum of rewards for visited vertices while… Click to show full abstract
We consider the NP-hard Stochastic Orienteering Problem, where the goal is to navigate between start and end vertices in a graph, maximizing the sum of rewards for visited vertices while obeying a travel budget over edges with stochastic cost within a given probability of failure. Previously, we solved this by finding an initial path using a deterministic orienteering solver and transformed it into a path policy using a Constrained Markov Decision Process that can skip vertices based on arrival time. In this work we augment our technique, creating a path tree which branches at vertex-time states with high probability of being skipped, allowing for new sequences of vertices in the resulting policy. We demonstrate that this adaptive path method collects significantly more reward in expectation, even when the number of branches is limited to control computation time.
               
Click one of the above tabs to view related content.